Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convex.jl tests very slow with SDPA (also some failures) #17

Closed
ericphanson opened this issue Jun 12, 2019 · 9 comments · Fixed by #55
Closed

Convex.jl tests very slow with SDPA (also some failures) #17

ericphanson opened this issue Jun 12, 2019 · 9 comments · Fixed by #55

Comments

@ericphanson
Copy link
Contributor

I (and a summer student) are interested in getting SDPA-GMP working with Convex.jl, and I thought the place to start would be with SDPA. I tried running Convex.jl's tests with the solver set to SDPA under each of the three modes. Each test run had access to four processors on the same SLURM cluster (you can see the submit scripts and everything here: https://github.com/ericphanson/HP_SDPs_fawcett/tree/master/testing-SDPA). These are the results:

Mode Time to finish tests Tests failed (of 375) Log
PARAMETER_DEFAULT 8h 39 13 test1
PARAMETER_UNSTABLE_BUT_FAST 6h 25 2 test2
PARAMETER_STABLE_BUT_SLOW 5h 22 73 test3

I haven't had time to check each failure, but all the ones I scanned through were tolerance failures (the optimal values or values for the variables were not close enough to the true ones or to those obtained by solving the problem another way).

These results seemed pretty strange to me. PARAMETER_UNSTABLE_BUT_FAST had the fewest failures, but from the name I assumed it would have the most. And PARAMETER_STABLE_BUT_SLOW had the most failures and ran the fastest (and some of the failures looked quite bad: Evaluated: -265.0031579792031 ≈ 0 (atol=0.001)).

I'm worried about both the test failures and the run time (in comparison, ECOS and SCS pass the tests, and the complete test run finishes in ~4 minutes on travis: https://travis-ci.org/JuliaOpt/Convex.jl/builds/543652452). One thing I noticed is that SDPA seemed to be running single-threaded, no matter how I set the environmental variable OMP_NUM_THREADS that the SDPA manual suggested using. The manual also mentioned that linking against different BLAS versions could speed it up by a lot.

Do you know why I would be getting these failures and slow run times? Are there things I can do to try to improve it?

@blegat
Copy link
Member

blegat commented Jun 13, 2019

About the PARAMETER_UNSTABLE_BUT_FAST and PARAMETER_STABLE_BUT_SLOW, I also noticed it. SDPA does not even pass MOI tests with PARAMETER_STABLE_BUT_SLOW.
I have never checked that SDPA works with multithreads so I wouldn't be surprised if it does not. PR welcome :)
As you can see here:
https://github.com/JuliaOpt/SDPABuilder/blob/master/build_tarballs.jl
it is linked to COINBLAS: https://github.com/JuliaOpt/COINBLASBuilder.
You can try compiling it from source with different blas an copying the library to your SDPA/.../deps/usr/lib folder to see if it speeds up.
Another point to note is that SDPA has no support for SOC so SOC are transformed into SDP. This means that ECOS and SCS are likely to outperform SDPA if there are large SOC constraints.

@ericphanson
Copy link
Contributor Author

Thanks for the information!

I was hoping the SOC tests might be what's causing those huge runtimes, but I tried again with only LP and SDP (and affine) tests, and got almost the same results:

Mode Time to finish tests Tests failed (of 298) Log
PARAMETER_DEFAULT 8h 08 14 test1-lp_sdp
PARAMETER_UNSTABLE_BUT_FAST 6h 20 1 test2-lp_sdp
PARAMETER_STABLE_BUT_SLOW 5h 23 46 test3-lp_sdp

It seems a little odd that PARAMETER_STABLE_BUT_SLOW increased in time with strictly fewer tests; maybe the environment is too noisy.

In terms of the failures, PARAMETER_UNSTABLE_BUT_FAST seems pretty good; only one failure, and it's just barely out of tolerance (Evaluated: 18.99891502846731 ≈ 19 (atol=0.001)). Maybe it's worth adding to the readme that it's a more reliable choice? (Since otherwise one would probably assume PARAMETER_STABLE_BUT_SLOW is the most reliable, when here it seems to be the least). I can make a pull request if you are ok with adding something like that.

I think my next steps (probably in a few weeks) will be to try to add timing commands to figure out how long each test takes to see if they are all slow or if only a few problems are slow, and try changing out the BLAS library. I hope to try to get to multithreading at some point, but it's a factor of ~50 times too slow for single-threaded performance so that seems more important to fix.

@ericphanson ericphanson changed the title Convex.jl tests fail with SDPA Convex.jl tests very slow with SDPA (also some failures) Jun 14, 2019
@blegat
Copy link
Member

blegat commented Jun 14, 2019

I can make a pull request if you are ok with adding something like that.

That would be welcome ! I always felt that SDPA was overfitted for the DIMACS problems (http://plato.asu.edu/ftp/sparse_sdp.html).
It would be interested to verify this and try the different parameters for these problems.

@ericphanson
Copy link
Contributor Author

Just as an update on this, we (well, mostly @JiazhengZhu) more or less got SDPA-GMP (and -QD and -DD) up and running over at https://github.com/ericphanson/SDPA_GMP.jl, and passing Convex.jl's non-SOC tests. From what we learned there, I think this issue is a combination of:

  • SOC constraints creating huge PSD problems (which maybe is helped on MOI 9.2?), which makes for really slow run times. (And there are some SOC constraints even in the "SDP" tests, because there are test problems involving both).
  • Linearly dependent constraints, which violates the assumption of the solver, causing choleky miss condition warnings + incorrect results, which results in the test errors.

@JiazhengZhu wrote a naive Gaussian-elimination presolve step to delete linearly independent constraints, which fixes the test errors (at least via SDPA-GMP) for us. It's not numerically stable though (since actually the linear dependence itself requires exact comparisons). We thought about doing a more numerically stable procedure where instead of deleting constraints until one gets linear independence, choosing a new (smaller) set of constraints which are linearly independent, via a matrix factorization of the constraint matrix. We didn't fully figure out how to do that, but along the way it became clear that the problem with that approach is you lose sparsity of the constraints.

Anyway, if you have any ideas for a better presolve, we'd be happy to hear them. Maybe it would be helpful also to use the presolve for SDPA.jl, or otherwise reuse code between SDPA_GMP.jl and SDPA.jl.

@ericphanson
Copy link
Contributor Author

See also https://ericphanson.github.io/ConvexTests.jl/dev/SDPA.html which runs Convex.jl's tests for each preset choice of parameters, with and without Dualization.jl

@blegat
Copy link
Member

blegat commented Mar 6, 2020

Wow, I love how the idea of outputting these information on a website and how clean it looks. We should do the same with the solver tests in SumOfSquares

@ericphanson
Copy link
Contributor Author

We could even put them all in the same repo, since some solvers can be used in lots of ways. E.g. the COSMO page could have its results on the Convex tests and the SumOfSquares tests (and the MOI tests?). I think the downside of that is longer CI run times, and the upside is having just one central place (MOISolverTests.jl or something).

@odow
Copy link
Member

odow commented Apr 18, 2023

Is there anything actionable here? Users might want to choose a different parameter mode, but SDPA.jl should probably with with DEFAULT.

@ericphanson
Copy link
Contributor Author

ericphanson commented Apr 18, 2023

i think it could make sense to link to ConvexTests to provide info about the reliability of the modes, and maybe mention the need for presolving (though i'm not sure there's a good solution for that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants