New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cvxcore optimization #1255
cvxcore optimization #1255
Conversation
And revert to conservative sparse*sparse products in cvxcore ... pruned products intermittently crash.
You can manually supply CLFAGS and LDFLAGS to enable openmp TODO: wheels should be built with openmp
Looks reasonable to me! I'd appreciate some comments explaining how you're doing things in a more copy-free/efficient way. Also looks like there are compile issues on travis. |
Thanks for the review! Previously, the copy constructor of I'll fix the Travis CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good to me! I did ask one question in cvxcore.cpp
about gracefully handling potential OOM errors. Other than that I'd say we need to make sure to update the web documentation.
Edit: @SteveDiamond and @akshayka what do you think about adding in some tests (maybe with a flag on a certain TravisCI or AppVeyor build) to ensure multi-threaded cvxcore behaves as expected?
To add tests, the build would need to be configured to compile cvxcore with openmp. Do you have thoughts on how to do that cleanly? |
The Travis config file can define an environment variable tailored to each build configuration. We'd need to edit the Travis installation script to check the value of that environment variable, and then proceed with or without OpenMP (as appropriate). We can make sure OpenMP on these systems just by installing from conda-forge. The one weird thing we'd need to do is explicitly set the number of OpenMP threads to 2 or 4 (https://docs.travis-ci.com/user/languages/c/#openmp-projects). Edit: we use this kind of environment-variable build-flag logic when selecting the python version https://github.com/cvxgrp/cvxpy/blob/master/continuous_integration/Travis/install_dependencies.sh#L29. |
Configurable via OMP_NUM_THREADS or cvxpy.set_num_threads()
Some timings: master, this branch 1 thread, this branch 8 threads (test_benchmarks.py). This branch, single-threaded, is somewhat faster than master. Most of these problems don't have many expression trees that need to be canonicalized, so you don't see a big difference in timings. However EDIT 1: Note that most of these benchmarks time the EDIT 2: I found and eliminated additional copies, yielding significant savings. The updated numbers are below. master (total time 15.7 seconds)
1 thread (total time 11.7 seconds)
8 threads (total time 9.1 seconds)
|
I made a few more small optimizations (eliminating copies), upgraded Eigen to 3.3.9, and switched to pruned sparse-sparse multiplication (which works in 3.3.9, but was broken in the version we were using). There are still some copies that I wasn't able to get rid of. Most notably The improvements are fairly significant. See below (test_benchmarks.py, all caveats from the above comment apply). master (total time 15.6s)
this PR, 1 thread (total time 10.5s)
this PR, 8 threads (total time 8.1s)
|
I believe this PR is almost done. There are two outstanding things.
I've added an entry to the build matrix that builds and tests with OpenMP enabled. This works on linux, but is currently failing on macOS. @rileyjmurray or @SteveDiamond , can you help me get the macOS build passing? I'm not exactly sure what's wrong, because Travis is truncating the log. My guess is that I'm passing the wrong compiler/linker flags to gcc. I don't own a mac, so I can't test this locally. EDIT: Per offline discussion, I've disabled testing with OpenMP on macOS.
My vote is to add the web docs in a future PR. Right now we don't really say anything about performance on cvxpy.org. I am of the opinion that we should add a new section on performance. This section will emphasize the fact that CVXPY is a compiler, and will have tips on how to get better compilation performance, including compiling with OpenMP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of lines changed jumped massively, but I trust this is just because of updating Eigen. Approved!
Adds several optimizations to cvxcore, and adds parallelism.
This change makes the processing of
LinOp
s in cvxcore'sbuild_matrix
function happen in parallel, provided that cvxcore was compiled with OpenMP. The user can choose the number of threads to be used viacvxpy.set_num_threads()
.By default, cvxcore is not built with OpenMP, since some systems might not have it. To build with OpenMP, specify the compiler/linker flags at the command line. eg,
That said, we should make sure that our wheels are built with OpenMP by default. I'd appreciate it if someone could help with that, in a future PR.
Additionally, this change gets rid of some unnecessary copies in cvxcore.