Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI-Ipopt-3.14.1 tests failed with MUMPS-5.4.0 #500

Closed
sagitter opened this issue Jul 18, 2021 · 11 comments
Closed

MPI-Ipopt-3.14.1 tests failed with MUMPS-5.4.0 #500

sagitter opened this issue Jul 18, 2021 · 11 comments

Comments

@sagitter
Copy link

Hi all.

Ipopt-3.14.1 (OpenMPI-4.1.1 version) is compiled in Fedora 35 (devel branch) against MUMPS-5.4.0, GCC-11.1.1, the tests are failing with following output:

./run_unitTests
 
Running unitTests...
 
Testing AMPL Solver Executable...
    Test passed!
Testing C++ Example...
    Test passed!
Testing C Example...
    Test passed!
Testing Fortran Example...
    Test passed!
Skip testing Java Example (Java interface not build)
Testing sIpopt Example parametric_cpp...
    Test passed!
Testing sIpopt Example redhess_cpp...
    Test passed!
Testing EmptyNLP Example...
0 
 ---- 8< ---- Start of test program output ---- 8< ----
*** Solve for 0 variables, feasible constraint, feasible bounds
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************
This is Ipopt version 3.14.1, running with linear solver MUMPS 5.4.0.
Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0
Total number of variables............................:        0
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        1
        inequality constraints with only lower bounds:        1
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  0.0000000e+00 0.00e+00 1.00e+00  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
Number of Iterations....: 0
                                   (scaled)                 (unscaled)
Objective...............:   0.0000000000000000e+00    0.0000000000000000e+00
Dual infeasibility......:   0.0000000000000000e+00    0.0000000000000000e+00
Constraint violation....:   0.0000000000000000e+00    0.0000000000000000e+00
Variable bound violation:   0.0000000000000000e+00    0.0000000000000000e+00
Complementarity.........:   0.0000000000000000e+00    0.0000000000000000e+00
Overall NLP error.......:   0.0000000000000000e+00    0.0000000000000000e+00
Number of objective function evaluations             = 1
Number of objective gradient evaluations             = 1
Number of equality constraint evaluations            = 0
Number of inequality constraint evaluations          = 1
Number of equality constraint Jacobian evaluations   = 0
Number of inequality constraint Jacobian evaluations = 1
Number of Lagrangian Hessian evaluations             = 0
Total seconds in IPOPT                               = 0.001
EXIT: Optimal Solution Found.
Finalize called
x =
z_L =
z_U =
lambda = 0
The problem solved in 0 iterations!
The final value of the objective function is 0.
*** Solve for 5 variables, feasible constraint, feasible bounds
*** The MPI_Comm_f2c() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[buildvm-x86-23.iad2.fedoraproject.org:2191282] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
 ---- 8< ----  End of test program output  ---- 8< ----
 
    ******** Test FAILED! ********
Test program existed with nonzero status.
Testing GetCurr Example...
0 
 ---- 8< ---- Start of test program output ---- 8< ----
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  1            0            0            1            0            11           0            -0.0555556  
  1            0            0            1.44444      0            0            0            0           
  -1           0            0            0            0            0            0            0.0555556   
  g(x)         lambda       constr_viol  compl_g     
  3            0.222222     1            -0.222222   
  0            -0.25        0.5          -0.125      
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  1            0            0            1            0            11           0            -0.0555556  
  1            0            0            1.44444      0            0            0            0           
  -1           0            0            0            0            0            0            0.0555556   
  g(x)         lambda       constr_viol  compl_g     
  3            0.222222     1            -0.222222   
  0            -0.25        0.5          -0.125      
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.848244     0            0            0.0228869    0            0.248283     0            0.141131    
  1            0            0            1.19695      0            0            0            0           
  -0.598244    0            0            0            0            0            0            0.174761    
  g(x)         lambda       constr_viol  compl_g     
  2.07741      0.0984729    0.077413     -0.00762308 
  0.361622     -0.591245    0.138378     -0.0818154  
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.848244     0            0            0.0228869    0            0.248283     0            0.141131    
  1            0            0            1.19695      0            0            0            0           
  -0.598244    0            0            0            0            0            0            0.174761    
  g(x)         lambda       constr_viol  compl_g     
  2.07741      0.0984729    0.077413     -0.00762308 
  0.361622     -0.591245    0.138378     -0.0818154  
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.837754     0            0            0.00186575   0            0.0202205    0            0.00213284  
  1            0            0            1.39182      0            0            0            0           
  -0.467717    0            0            0            0            0            0            0.0774144   
  g(x)         lambda       constr_viol  compl_g     
  1.92059      0.195909     0            0.0155568   
  0.483073     -0.790356    0.0169272    -0.0133785  
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.837754     0            0            0.00186575   0            0.0202205    0            0.00213284  
  1            0            0            1.39182      0            0            0            0           
  -0.467717    0            0            0            0            0            0            0.0774144   
  g(x)         lambda       constr_viol  compl_g     
  1.92059      0.195909     0            0.0155568   
  0.483073     -0.790356    0.0169272    -0.0133785  
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.862864     0            0            0.000256656  0            0.00278802   0            0.000782603 
  1            0            0            1.43348      0            0            0            0           
  -0.494598    0            0            0            0            0            0            -0.00140184 
  g(x)         lambda       constr_viol  compl_g     
  1.98916      0.216738     0            0.00234916  
  0.499908     -0.795602    9.20443e-05  -7.32306e-05
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.862864     0            0            0.000256656  0            0.00278802   0            0.000782603 
  1            0            0            1.43348      0            0            0            0           
  -0.494598    0            0            0            0            0            0            -0.00140184 
  g(x)         lambda       constr_viol  compl_g     
  1.98916      0.216738     0            0.00234916  
  0.499908     -0.795602    9.20443e-05  -7.32306e-05
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.865731     0            0            1.37798e-05  0            0.000149728  0            7.61855e-06 
  1            0            0            1.42334      0            0            0            0           
  -0.499506    0            0            0            0            0            0            0.000112534 
  g(x)         lambda       constr_viol  compl_g     
  1.999        0.211671     0            0.000212455 
  0.499984     -0.789205    1.58731e-05  -1.25272e-05
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.865731     0            0            1.37798e-05  0            0.000149728  0            7.61855e-06 
  1            0            0            1.42334      0            0            0            0           
  -0.499506    0            0            0            0            0            0            0.000112534 
  g(x)         lambda       constr_viol  compl_g     
  1.999        0.211671     0            0.000212455 
  0.499984     -0.789205    1.58731e-05  -1.25272e-05
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.866022     0            0            1.69423e-07  0            1.84095e-06  0            1.06005e-07 
  1            0            0            1.42266      0            0            0            0           
  -0.499995    0            0            0            0            0            0            8.47627e-07 
  g(x)         lambda       constr_viol  compl_g     
  1.99999      0.211329     0            2.26724e-06 
  0.5          -0.788681    1.54178e-07  -1.21597e-07
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.866022     0            0            1.69423e-07  0            1.84095e-06  0            1.06005e-07 
  1            0            0            1.42266      0            0            0            0           
  -0.499995    0            0            0            0            0            0            8.47627e-07 
  g(x)         lambda       constr_viol  compl_g     
  1.99999      0.211329     0            2.26724e-06 
  0.5          -0.788681    1.54178e-07  -1.21597e-07
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.866025     0            0            2.30569e-10  0            2.50537e-09  0            1.2108e-11  
  1            0            0            1.42265      0            0            0            0           
  -0.5         0            0            0            0            0            0            1.02611e-10 
  g(x)         lambda       constr_viol  compl_g     
  2            0.211325     0            2.55968e-09 
  0.5          -0.788675    1.8043e-11   -1.42301e-11
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.866025     0            0            2.30569e-10  0            2.50537e-09  0            1.2108e-11  
  1            0            0            1.42265      0            0            0            0           
  -0.5         0            0            0            0            0            0            1.02611e-10 
  g(x)         lambda       constr_viol  compl_g     
  2            0.211325     0            2.55968e-09 
  0.5          -0.788675    1.8043e-11   -1.42301e-11
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.866025     0            0            8.36636e-11  0            9.09091e-10  0            -9.19302e-15
  1            0            0            1.42265      0            0            0            0           
  -0.5         0            0            0            0            0            0            2.22045e-16 
  g(x)         lambda       constr_viol  compl_g     
  2            0.211325     0            9.09091e-10 
  0.5          -0.788675    0            0           
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.866025     0            0            8.36636e-11  0            9.09091e-10  0            -9.19302e-15
  1            0            0            1.42265      0            0            0            0           
  -0.5         0            0            0            0            0            0            2.22045e-16 
  g(x)         lambda       constr_viol  compl_g     
  2            0.211325     0            9.09091e-10 
  0.5          -0.788675    0            0           
Finalizing:
  x = 0.866025 1 -0.5
  z_L = 8.36636e-11 1.42265 0
  z_U = 0 0 0
  g = 2 0.5
  lambda = 0.211325 -0.788675
Current iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.866025     0            0            8.36636e-11  0            9.09091e-10  0            -9.19302e-15
  1            0            0            1.42265      0            0            0            0           
  -0.5         0            0            0            0            0            0            2.22045e-16 
  g(x)         lambda       constr_viol  compl_g     
  2            0.211325     0            9.09091e-10 
  0.5          -0.788675    0            0           
Scaled iterate (regular mode):
  x            x_L_viol     x_U_viol     z_L          z_U          compl_x_L    compl_x_U    grad_lag_x  
  0.866025     0            0            8.36636e-11  0            9.09091e-10  0            -9.19302e-15
  1            0            0            1.42265      0            0            0            0           
  -0.5         0            0            0            0            0            0            2.22045e-16 
  g(x)         lambda       constr_viol  compl_g     
  2            0.211325     0            9.09091e-10 
  0.5          -0.788675    0            0           
*** The MPI_Comm_f2c() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[buildvm-x86-23.iad2.fedoraproject.org:2191303] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
 ---- 8< ----  End of test program output  ---- 8< ----
 
    ******** Test FAILED! ********
@svigerske
Copy link
Member

svigerske commented Jul 19, 2021

These are tests that create and destroy the IpoptApplication, including the Mumps interface, several times. It looks like this comes up at the 2nd time the Mumps interface is used.

Since you probably didn't use the Mumps buildsystem from ThirdParty-Mumps, there is some MPI initialization and finalization happening in the constructor and destructor of the Mumps interface. So the calling sequence should be

   MPI_Initialized(&mpi_initialized);
   if( !mpi_initialized )
   {
      MPI_Init(&argc, &argv);
   }
.... [Ipopt solve, calling Mumps, calling MPI]
   MPI_Finalized(&mpi_finalized);
   assert(!mpi_finalized);
   MPI_Finalize();
...
   MPI_Initialized(&mpi_initialized);
   if( !mpi_initialized )
   {
      MPI_Init(&argc, &argv);
   }
.... [Ipopt solve, calling Mumps, calling MPI]

The error then comes up in this second solve.
Isn't it sufficient to call MPI_Init() again? (I'm not much familar with MPI, nor did I use a MPI-enabled version of Mumps).

Relevant code:

int MumpsSolverInterface::instancecount_mpi = 0;
MumpsSolverInterface::MumpsSolverInterface()
{
DBG_START_METH("MumpsSolverInterface::MumpsSolverInterface()",
dbg_verbosity);
#ifndef MUMPS_MPI_H
#if defined(HAVE_MPI_INITIALIZED)
int mpi_initialized;
MPI_Initialized(&mpi_initialized);
if( !mpi_initialized )
{
int argc = 1;
char** argv = NULL;
MPI_Init(&argc, &argv);
assert(instancecount_mpi == 0);
instancecount_mpi = 1;
}
else if( instancecount_mpi > 0 )
{
++instancecount_mpi;
}
#endif
int myid;
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
#endif
//initialize mumps
MUMPS_STRUC_C* mumps_ = static_cast<MUMPS_STRUC_C*>(calloc(1, sizeof(MUMPS_STRUC_C)));
mumps_->job = -1; //initialize mumps
mumps_->par = 1; //working host for sequential version
mumps_->sym = 2; //general symmetric matrix
mumps_->comm_fortran = USE_COMM_WORLD;
#ifndef IPOPT_MUMPS_NOMUTEX
const std::lock_guard<std::mutex> lock(mumps_call_mutex);
#endif
mumps_c(mumps_);
mumps_->icntl[1] = 0;
mumps_->icntl[2] = 0; //QUIETLY!
mumps_->icntl[3] = 0;
mumps_ptr_ = (void*) mumps_;
}
MumpsSolverInterface::~MumpsSolverInterface()
{
DBG_START_METH("MumpsSolverInterface::~MumpsSolverInterface()",
dbg_verbosity);
#ifndef IPOPT_MUMPS_NOMUTEX
const std::lock_guard<std::mutex> lock(mumps_call_mutex);
#endif
MUMPS_STRUC_C* mumps_ = static_cast<MUMPS_STRUC_C*>(mumps_ptr_);
mumps_->job = -2; //terminate mumps
mumps_c(mumps_);
#ifndef MUMPS_MPI_H
#ifdef HAVE_MPI_INITIALIZED
if( instancecount_mpi == 1 )
{
int mpi_finalized;
MPI_Finalized(&mpi_finalized);
assert(!mpi_finalized);
MPI_Finalize();
}
--instancecount_mpi;
#endif
#endif
delete[] mumps_->a;
free(mumps_);
}

@sagitter
Copy link
Author

In Fedora, we're using this patch on Ipopt/src/Algorithm/LinearSolvers/IpMumpsSolverInterface.cpp, i don't know if it can compromise on tests execution.

@svigerske
Copy link
Member

A post like the answer on https://stackoverflow.com/questions/15126814/boost-test-unit-can-not-call-mpi-function says that "MPI can be only initialised once during the lifetime of the program and can only be finalised once", so the place where this is put at the moment (Mumps interface constructor/destructor) is wrong. It would need to be moved into global constructors/destructors (.ini/fini) or be left to the user (the one who implements main()) to take care of this.

That could and should probably be done, but is there a specific reason to use the mpich version of MUMPS? Could you switch to MUMPS-devel?

@sagitter
Copy link
Author

That could and should probably be done, but is there a specific reason to use the mpich version of MUMPS? Could you switch to MUMPS-devel?

I'm compiling both MPI (OpenMPI and MPICH) version and serial version, MUMPS's tests without MPI are correctly executed.

svigerske added a commit that referenced this issue Jul 19, 2021


- call MPI_Init() and MPI_Finalize() if not using the dummy mpi.h from
  Mumps
- should allow a 2nd round of Ipopt within the same program
- the function attributes are GCC specific
@svigerske
Copy link
Member

In branch 500-mpi-inifini, the MPI_Init() and MPI_Finalize() calls are moved into the ctor and dtor of the library. This makes the test pass again for me.
Is there a chance that you could try this out?

I hope you wouldn't need a patch anymore. In your patch, you change the include of mpi.h to MUMPS/mpi.h (https://src.fedoraproject.org/rpms/coin-or-Ipopt/blob/rawhide/f/coin-or-Ipopt-mumps.patch#_18), which I don't really understand. I thought that MUMPSs mpi.h is for a sequential version of MUMPS and has only a dummy mpi interface. But if you have MPI everywhere, then you would want the systems mpi.h

The issue with cassert not included should also be gone.

@sagitter
Copy link
Author

I hope you wouldn't need a patch anymore. In your patch, you change the include of mpi.h to MUMPS/mpi.h (https://src.fedoraproject.org/rpms/coin-or-Ipopt/blob/rawhide/f/coin-or-Ipopt-mumps.patch#_18), which I don't really understand. I thought that MUMPSs mpi.h is for a sequential version of MUMPS and has only a dummy mpi interface. But if you have MPI everywhere, then you would want the systems mpi.h

Header files are separately installed when you install the RPMs MUMPS-openmpi or MUMPS-mpich or MUMPS, under a private directory called MUMPS

In branch 500-mpi-inifini, the MPI_Init() and MPI_Finalize() calls are moved into the ctor and dtor of the library. This makes the test pass again for me.
Is there a chance that you could try this out?

Of course!

@svigerske
Copy link
Member

svigerske commented Jul 20, 2021

I hope you wouldn't need a patch anymore. In your patch, you change the include of mpi.h to MUMPS/mpi.h (https://src.fedoraproject.org/rpms/coin-or-Ipopt/blob/rawhide/f/coin-or-Ipopt-mumps.patch#_18), which I don't really understand. I thought that MUMPSs mpi.h is for a sequential version of MUMPS and has only a dummy mpi interface. But if you have MPI everywhere, then you would want the systems mpi.h

Header files are separately installed when you install the RPMs MUMPS-openmpi or MUMPS-mpich or MUMPS, under a private directory called MUMPS

But you want mpi.h from OpenMPI or MPICH if using MUMPS-openmpi or MUMPS-mpich.
And you want MUMPS/mpi.h if using serial MUMPS (which should be found, because you have --with-mumps-cflags=-I%{_includedir}/MUMPS)
So maybe I looked at the wrong place.

@sagitter
Copy link
Author

MUMPS-openmpi-devel owns /usr/include/openmpi-$arch/MUMPS/mpi.h
MUMPS-mpich-devel owns /usr/include/mpich-$arch/MUMPS/mpi.h
MUMPS-devel owns /usr/include/MUMPS/mpi.h

/usr/include main sub-directories are set by Configure during compilation. However, i will remove this change in next Ipopt rpm releases.

@sagitter
Copy link
Author

In branch 500-mpi-inifini, the MPI_Init() and MPI_Finalize() calls are moved into the ctor and dtor of the library. This makes the test pass again for me.
Is there a chance that you could try this out?

Correctly compiled and tested; build log on x86_64 architecture: https://kojipkgs.fedoraproject.org//work/tasks/6491/72246491/build.log

@svigerske
Copy link
Member

Thank you! I can make a release with this soon.

So you got rid of most patches now?

@sagitter
Copy link
Author

Thank you! I can make a release with this soon.

So you got rid of most patches now?

Yes. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants