MPI-Ipopt-3.14.1 tests failed with MUMPS-5.4.0 #500

sagitter · 2021-07-18T15:11:05Z

Hi all.

Ipopt-3.14.1 (OpenMPI-4.1.1 version) is compiled in Fedora 35 (devel branch) against MUMPS-5.4.0, GCC-11.1.1, the tests are failing with following output:

./run_unitTests
 
Running unitTests...
 
Testing AMPL Solver Executable...
    Test passed!
Testing C++ Example...
    Test passed!
Testing C Example...
    Test passed!
Testing Fortran Example...
    Test passed!
Skip testing Java Example (Java interface not build)
Testing sIpopt Example parametric_cpp...
    Test passed!
Testing sIpopt Example redhess_cpp...
    Test passed!
Testing EmptyNLP Example...
0 
 ---- 8< ---- Start of test program output ---- 8< ----
*** Solve for 0 variables, feasible constraint, feasible bounds
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************
This is Ipopt version 3.14.1, running with linear solver MUMPS 5.4.0.
Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0
Total number of variables............................:        0
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        1
        inequality constraints with only lower bounds:        1
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  0.0000000e+00 0.00e+00 1.00e+00  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
Number of Iterations....: 0
                                   (scaled)                 (unscaled)
Objective...............:   0.0000000000000000e+00    0.0000000000000000e+00
Dual infeasibility......:   0.0000000000000000e+00    0.0000000000000000e+00
Constraint violation....:   0.0000000000000000e+00    0.0000000000000000e+00
Variable bound violation:   0.0000000000000000e+00    0.0000000000000000e+00
Complementarity.........:   0.0000000000000000e+00    0.0000000000000000e+00
Overall NLP error.......:   0.0000000000000000e+00    0.0000000000000000e+00
Number of objective function evaluations             = 1
Number of objective gradient evaluations             = 1
Number of equality constraint evaluations            = 0
Number of inequality constraint evaluations          = 1
Number of equality constraint Jacobian evaluations   = 0
Number of inequality constraint Jacobian evaluations = 1
Number of Lagrangian Hessian evaluations             = 0
Total seconds in IPOPT                               = 0.001
EXIT: Optimal Solution Found.
Finalize called
x =
z_L =
z_U =
lambda = 0
The problem solved in 0 iterations!
The final value of the objective function is 0.
*** Solve for 5 variables, feasible constraint, feasible bounds
*** The MPI_Comm_f2c() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[buildvm-x86-23.iad2.fedoraproject.org:2191282] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
 ---- 8< ----  End of test program output  ---- 8< ----
 
    ******** Test FAILED! ********
Test program existed with nonzero status.
Testing GetCurr Example...
0 
 ---- 8< ---- Start of test program output ---- 8< ----
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  1            0            0            1            0            11           0            -0.0555556  
  1            0            0            1.44444      0            0            0            0           
  -1           0            0            0            0            0            0            0.0555556   
  g(x)         lambda       constr_viol  compl_g     
  3            0.222222     1            -0.222222   
  0            -0.25        0.5          -0.125      
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  1            0            0            1            0            11           0            -0.0555556  
  1            0            0            1.44444      0            0            0            0           
  -1           0            0            0            0            0            0            0.0555556   
  g(x)         lambda       constr_viol  compl_g     
  3            0.222222     1            -0.222222   
  0            -0.25        0.5          -0.125      
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.848244     0            0            0.0228869    0            0.248283     0            0.141131    
  1            0            0            1.19695      0            0            0            0           
  -0.598244    0            0            0            0            0            0            0.174761    
  g(x)         lambda       constr_viol  compl_g     
  2.07741      0.0984729    0.077413     -0.00762308 
  0.361622     -0.591245    0.138378     -0.0818154  
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.848244     0            0            0.0228869    0            0.248283     0            0.141131    
  1            0            0            1.19695      0            0            0            0           
  -0.598244    0            0            0            0            0            0            0.174761    
  g(x)         lambda       constr_viol  compl_g     
  2.07741      0.0984729    0.077413     -0.00762308 
  0.361622     -0.591245    0.138378     -0.0818154  
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.837754     0            0            0.00186575   0            0.0202205    0            0.00213284  
  1            0            0            1.39182      0            0            0            0           
  -0.467717    0            0            0            0            0            0            0.0774144   
  g(x)         lambda       constr_viol  compl_g     
  1.92059      0.195909     0            0.0155568   
  0.483073     -0.790356    0.0169272    -0.0133785  
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.837754     0            0            0.00186575   0            0.0202205    0            0.00213284  
  1            0            0            1.39182      0            0            0            0           
  -0.467717    0            0            0            0            0            0            0.0774144   
  g(x)         lambda       constr_viol  compl_g     
  1.92059      0.195909     0            0.0155568   
  0.483073     -0.790356    0.0169272    -0.0133785  
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.862864     0            0            0.000256656  0            0.00278802   0            0.000782603 
  1            0            0            1.43348      0            0            0            0           
  -0.494598    0            0            0            0            0            0            -0.00140184 
  g(x)         lambda       constr_viol  compl_g     
  1.98916      0.216738     0            0.00234916  
  0.499908     -0.795602    9.20443e-05  -7.32306e-05
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.862864     0            0            0.000256656  0            0.00278802   0            0.000782603 
  1            0            0            1.43348      0            0            0            0           
  -0.494598    0            0            0            0            0            0            -0.00140184 
  g(x)         lambda       constr_viol  compl_g     
  1.98916      0.216738     0            0.00234916  
  0.499908     -0.795602    9.20443e-05  -7.32306e-05
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.865731     0            0            1.37798e-05  0            0.000149728  0            7.61855e-06 
  1            0            0            1.42334      0            0            0            0           
  -0.499506    0            0            0            0            0            0            0.000112534 
  g(x)         lambda       constr_viol  compl_g     
  1.999        0.211671     0            0.000212455 
  0.499984     -0.789205    1.58731e-05  -1.25272e-05
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.865731     0            0            1.37798e-05  0            0.000149728  0            7.61855e-06 
  1            0            0            1.42334      0            0            0            0           
  -0.499506    0            0            0            0            0            0            0.000112534 
  g(x)         lambda       constr_viol  compl_g     
  1.999        0.211671     0            0.000212455 
  0.499984     -0.789205    1.58731e-05  -1.25272e-05
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.866022     0            0            1.69423e-07  0            1.84095e-06  0            1.06005e-07 
  1            0            0            1.42266      0            0            0            0           
  -0.499995    0            0            0            0            0            0            8.47627e-07 
  g(x)         lambda       constr_viol  compl_g     
  1.99999      0.211329     0            2.26724e-06 
  0.5          -0.788681    1.54178e-07  -1.21597e-07
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.866022     0            0            1.69423e-07  0            1.84095e-06  0            1.06005e-07 
  1            0            0            1.42266      0            0            0            0           
  -0.499995    0            0            0            0            0            0            8.47627e-07 
  g(x)         lambda       constr_viol  compl_g     
  1.99999      0.211329     0            2.26724e-06 
  0.5          -0.788681    1.54178e-07  -1.21597e-07
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.866025     0            0            2.30569e-10  0            2.50537e-09  0            1.2108e-11  
  1            0            0            1.42265      0            0            0            0           
  -0.5         0            0            0            0            0            0            1.02611e-10 
  g(x)         lambda       constr_viol  compl_g     
  2            0.211325     0            2.55968e-09 
  0.5          -0.788675    1.8043e-11   -1.42301e-11
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.866025     0            0            2.30569e-10  0            2.50537e-09  0            1.2108e-11  
  1            0            0            1.42265      0            0            0            0           
  -0.5         0            0            0            0            0            0            1.02611e-10 
  g(x)         lambda       constr_viol  compl_g     
  2            0.211325     0            2.55968e-09 
  0.5          -0.788675    1.8043e-11   -1.42301e-11
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.866025     0            0            8.36636e-11  0            9.09091e-10  0            -9.19302e-15
  1            0            0            1.42265      0            0            0            0           
  -0.5         0            0            0            0            0            0            2.22045e-16 
  g(x)         lambda       constr_viol  compl_g     
  2            0.211325     0            9.09091e-10 
  0.5          -0.788675    0            0           
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.866025     0            0            8.36636e-11  0            9.09091e-10  0            -9.19302e-15
  1            0            0            1.42265      0            0            0            0           
  -0.5         0            0            0            0            0            0            2.22045e-16 
  g(x)         lambda       constr_viol  compl_g     
  2            0.211325     0            9.09091e-10 
  0.5          -0.788675    0            0           
Finalizing:
  x = 0.866025 1 -0.5
  z_L = 8.36636e-11 1.42265 0
  z_U = 0 0 0
  g = 2 0.5
  lambda = 0.211325 -0.788675
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.866025     0            0            8.36636e-11  0            9.09091e-10  0            -9.19302e-15
  1            0            0            1.42265      0            0            0            0           
  -0.5         0            0            0            0            0            0            2.22045e-16 
  g(x)         lambda       constr_viol  compl_g     
  2            0.211325     0            9.09091e-10 
  0.5          -0.788675    0            0           
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.866025     0            0            8.36636e-11  0            9.09091e-10  0            -9.19302e-15
  1            0            0            1.42265      0            0            0            0           
  -0.5         0            0            0            0            0            0            2.22045e-16 
  g(x)         lambda       constr_viol  compl_g     
  2            0.211325     0            9.09091e-10 
  0.5          -0.788675    0            0           
*** The MPI_Comm_f2c() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[buildvm-x86-23.iad2.fedoraproject.org:2191303] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
 ---- 8< ----  End of test program output  ---- 8< ----
 
    ******** Test FAILED! ********

The text was updated successfully, but these errors were encountered:

svigerske · 2021-07-19T07:23:27Z

These are tests that create and destroy the IpoptApplication, including the Mumps interface, several times. It looks like this comes up at the 2nd time the Mumps interface is used.

Since you probably didn't use the Mumps buildsystem from ThirdParty-Mumps, there is some MPI initialization and finalization happening in the constructor and destructor of the Mumps interface. So the calling sequence should be

   MPI_Initialized(&mpi_initialized);
   if( !mpi_initialized )
   {
      MPI_Init(&argc, &argv);
   }
.... [Ipopt solve, calling Mumps, calling MPI]
   MPI_Finalized(&mpi_finalized);
   assert(!mpi_finalized);
   MPI_Finalize();
...
   MPI_Initialized(&mpi_initialized);
   if( !mpi_initialized )
   {
      MPI_Init(&argc, &argv);
   }
.... [Ipopt solve, calling Mumps, calling MPI]

The error then comes up in this second solve.
Isn't it sufficient to call MPI_Init() again? (I'm not much familar with MPI, nor did I use a MPI-enabled version of Mumps).

Relevant code:

Ipopt/src/Algorithm/LinearSolvers/IpMumpsSolverInterface.cpp

Lines 66 to 138 in 17cc606

    
           int MumpsSolverInterface::instancecount_mpi = 0; 
        
           MumpsSolverInterface::MumpsSolverInterface() 
        
           { 
        
              DBG_START_METH("MumpsSolverInterface::MumpsSolverInterface()", 
        
                             dbg_verbosity); 
        
           #ifndef MUMPS_MPI_H 
        
           #if defined(HAVE_MPI_INITIALIZED) 
        
              int mpi_initialized; 
        
              MPI_Initialized(&mpi_initialized); 
        
              if( !mpi_initialized ) 
        
              { 
        
                 int argc = 1; 
        
                 char** argv = NULL; 
        
                 MPI_Init(&argc, &argv); 
        
                 assert(instancecount_mpi == 0); 
        
                 instancecount_mpi = 1; 
        
              } 
        
              else if( instancecount_mpi > 0 ) 
        
              { 
        
                 ++instancecount_mpi; 
        
              } 
        
           #endif 
        
              int myid; 
        
              MPI_Comm_rank(MPI_COMM_WORLD, &myid); 
        
           #endif 
        
              //initialize mumps 
        
              MUMPS_STRUC_C* mumps_ = static_cast<MUMPS_STRUC_C*>(calloc(1, sizeof(MUMPS_STRUC_C))); 
        
              mumps_->job = -1; //initialize mumps 
        
              mumps_->par = 1; //working host for sequential version 
        
              mumps_->sym = 2; //general symmetric matrix 
        
              mumps_->comm_fortran = USE_COMM_WORLD; 
        
           #ifndef IPOPT_MUMPS_NOMUTEX 
        
              const std::lock_guard<std::mutex> lock(mumps_call_mutex); 
        
           #endif 
        
              mumps_c(mumps_); 
        
              mumps_->icntl[1] = 0; 
        
              mumps_->icntl[2] = 0; //QUIETLY! 
        
              mumps_->icntl[3] = 0; 
        
              mumps_ptr_ = (void*) mumps_; 
        
           } 
        
           MumpsSolverInterface::~MumpsSolverInterface() 
        
           { 
        
              DBG_START_METH("MumpsSolverInterface::~MumpsSolverInterface()", 
        
                             dbg_verbosity); 
        
           #ifndef IPOPT_MUMPS_NOMUTEX 
        
              const std::lock_guard<std::mutex> lock(mumps_call_mutex); 
        
           #endif 
        
              MUMPS_STRUC_C* mumps_ = static_cast<MUMPS_STRUC_C*>(mumps_ptr_); 
        
              mumps_->job = -2; //terminate mumps 
        
              mumps_c(mumps_); 
        
           #ifndef MUMPS_MPI_H 
        
           #ifdef HAVE_MPI_INITIALIZED 
        
              if( instancecount_mpi == 1 ) 
        
              { 
        
                 int mpi_finalized; 
        
                 MPI_Finalized(&mpi_finalized); 
        
                 assert(!mpi_finalized); 
        
                 MPI_Finalize(); 
        
              } 
        
              --instancecount_mpi; 
        
           #endif 
        
           #endif 
        
              delete[] mumps_->a; 
        
              free(mumps_); 
        
           }

sagitter · 2021-07-19T09:19:04Z

In Fedora, we're using this patch on Ipopt/src/Algorithm/LinearSolvers/IpMumpsSolverInterface.cpp, i don't know if it can compromise on tests execution.

svigerske · 2021-07-19T09:46:05Z

A post like the answer on https://stackoverflow.com/questions/15126814/boost-test-unit-can-not-call-mpi-function says that "MPI can be only initialised once during the lifetime of the program and can only be finalised once", so the place where this is put at the moment (Mumps interface constructor/destructor) is wrong. It would need to be moved into global constructors/destructors (.ini/fini) or be left to the user (the one who implements main()) to take care of this.

That could and should probably be done, but is there a specific reason to use the mpich version of MUMPS? Could you switch to MUMPS-devel?

sagitter · 2021-07-19T10:03:42Z

That could and should probably be done, but is there a specific reason to use the mpich version of MUMPS? Could you switch to MUMPS-devel?

I'm compiling both MPI (OpenMPI and MPICH) version and serial version, MUMPS's tests without MPI are correctly executed.

- call MPI_Init() and MPI_Finalize() if not using the dummy mpi.h from Mumps - should allow a 2nd round of Ipopt within the same program - the function attributes are GCC specific

svigerske · 2021-07-19T13:10:32Z

In branch 500-mpi-inifini, the MPI_Init() and MPI_Finalize() calls are moved into the ctor and dtor of the library. This makes the test pass again for me.
Is there a chance that you could try this out?

I hope you wouldn't need a patch anymore. In your patch, you change the include of mpi.h to MUMPS/mpi.h (https://src.fedoraproject.org/rpms/coin-or-Ipopt/blob/rawhide/f/coin-or-Ipopt-mumps.patch#_18), which I don't really understand. I thought that MUMPSs mpi.h is for a sequential version of MUMPS and has only a dummy mpi interface. But if you have MPI everywhere, then you would want the systems mpi.h

The issue with cassert not included should also be gone.

sagitter · 2021-07-20T14:52:43Z

I hope you wouldn't need a patch anymore. In your patch, you change the include of mpi.h to MUMPS/mpi.h (https://src.fedoraproject.org/rpms/coin-or-Ipopt/blob/rawhide/f/coin-or-Ipopt-mumps.patch#_18), which I don't really understand. I thought that MUMPSs mpi.h is for a sequential version of MUMPS and has only a dummy mpi interface. But if you have MPI everywhere, then you would want the systems mpi.h

Header files are separately installed when you install the RPMs MUMPS-openmpi or MUMPS-mpich or MUMPS, under a private directory called MUMPS

In branch 500-mpi-inifini, the MPI_Init() and MPI_Finalize() calls are moved into the ctor and dtor of the library. This makes the test pass again for me.
Is there a chance that you could try this out?

Of course!

svigerske · 2021-07-20T14:56:34Z

I hope you wouldn't need a patch anymore. In your patch, you change the include of mpi.h to MUMPS/mpi.h (https://src.fedoraproject.org/rpms/coin-or-Ipopt/blob/rawhide/f/coin-or-Ipopt-mumps.patch#_18), which I don't really understand. I thought that MUMPSs mpi.h is for a sequential version of MUMPS and has only a dummy mpi interface. But if you have MPI everywhere, then you would want the systems mpi.h

Header files are separately installed when you install the RPMs MUMPS-openmpi or MUMPS-mpich or MUMPS, under a private directory called MUMPS

But you want mpi.h from OpenMPI or MPICH if using MUMPS-openmpi or MUMPS-mpich.
And you want MUMPS/mpi.h if using serial MUMPS (which should be found, because you have --with-mumps-cflags=-I%{_includedir}/MUMPS)
So maybe I looked at the wrong place.

sagitter · 2021-07-20T15:48:35Z

MUMPS-openmpi-devel owns /usr/include/openmpi-$arch/MUMPS/mpi.h
MUMPS-mpich-devel owns /usr/include/mpich-$arch/MUMPS/mpi.h
MUMPS-devel owns /usr/include/MUMPS/mpi.h

/usr/include main sub-directories are set by Configure during compilation. However, i will remove this change in next Ipopt rpm releases.

sagitter · 2021-07-20T16:28:53Z

In branch 500-mpi-inifini, the MPI_Init() and MPI_Finalize() calls are moved into the ctor and dtor of the library. This makes the test pass again for me.
Is there a chance that you could try this out?

Correctly compiled and tested; build log on x86_64 architecture: https://kojipkgs.fedoraproject.org//work/tasks/6491/72246491/build.log

svigerske · 2021-07-20T17:46:27Z

Thank you! I can make a release with this soon.

So you got rid of most patches now?

sagitter · 2021-07-20T17:59:21Z

Thank you! I can make a release with this soon.

So you got rid of most patches now?

Yes. Thanks!

svigerske closed this as completed Jul 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPI-Ipopt-3.14.1 tests failed with MUMPS-5.4.0 #500

MPI-Ipopt-3.14.1 tests failed with MUMPS-5.4.0 #500

sagitter commented Jul 18, 2021

svigerske commented Jul 19, 2021 •

edited

sagitter commented Jul 19, 2021

svigerske commented Jul 19, 2021

sagitter commented Jul 19, 2021

svigerske commented Jul 19, 2021

sagitter commented Jul 20, 2021

svigerske commented Jul 20, 2021 •

edited

sagitter commented Jul 20, 2021

sagitter commented Jul 20, 2021

svigerske commented Jul 20, 2021

sagitter commented Jul 20, 2021

MPI-Ipopt-3.14.1 tests failed with MUMPS-5.4.0 #500

MPI-Ipopt-3.14.1 tests failed with MUMPS-5.4.0 #500

Comments

sagitter commented Jul 18, 2021

svigerske commented Jul 19, 2021 • edited

sagitter commented Jul 19, 2021

svigerske commented Jul 19, 2021

sagitter commented Jul 19, 2021

svigerske commented Jul 19, 2021

sagitter commented Jul 20, 2021

svigerske commented Jul 20, 2021 • edited

sagitter commented Jul 20, 2021

sagitter commented Jul 20, 2021

svigerske commented Jul 20, 2021

sagitter commented Jul 20, 2021

svigerske commented Jul 19, 2021 •

edited

svigerske commented Jul 20, 2021 •

edited