New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize LinExpr multiplication by one or zero #98
Comments
Hi Frederico,
Yes, we are aware that LinExpr is probably the part of Python-MIP where
optimizations are more needed now.
Let me explain why it is slow now, maybe you can think of something to
speedup it (I've tried some approaches, not very successful).
if you enter an expression like 2 x1 + 5 x2 + 1 x1, as Python goes parsing
it, several increasingly larger LinExpr are created. So, part of the cost
is related to the creation of several objects. Also coefficients of
the same variable need to be grouped (in our example to 3 x1 + 5 x2)
before sending it to cbc. Thus, to "quickly" (in big O notation) update
variables' coefficients we keep a dictionary mapping variables to
coefficients.
You use CPython or Pypy ?
Could you provide us some profile from your code ?
Cheers
…--
=============================================================
Haroldo Gambini Santos
Computing Department
Universidade Federal de Ouro Preto - UFOP
email: haroldo@ufop.edu.br
Haroldo.GambiniSantos@cs.kuleuven.be
home/research page: www.decom.ufop.br/haroldo
It has long been an axiom of mine that the little things are infinitely
the most important.
-- Sir Arthur Conan Doyle, "A Case of Identity"
On Mon, 11 May 2020, Federico Bonelli wrote:
I'm working on a problem that is mathematically defined with a couple of big sparse binary
matrices,
I'm using quite a lot of python-mip binary variables and I collected them into numpy tensors to
have some syntactic sugar and matrix operations over them.
It's very nice but very very slow, and it seems to me that with some basic optimization it could
perform much better, notably to the LinExpr.__mul__() function.
Would it be reasonable to have the special cases for multiplication by 1 or 0 to speed up this
kind of use?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, orunsubscribe.[AB4VZOSR6TUKXSD5HTKPYCDRRBYYFA5CNFSM4M6JIBP2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW
45C7NFSM4JF2XQ2A.gif]
|
Hi @h-g-s, thanks for your answer. I understand the situation, I'm using CPython and started profiling to better understand if I can be of any support. Here is an extract of the profile results for the heaviest matrix operation, which is a numpy.matmul between a numpy array containing python-mip variables and another with dtype=int in 0 and 1.
I cut the results at 1 second or more of tottime Here attached is the plot of the same profiling run |
If you already have all your input data as numpy matrices, you probably just want to use CyLP. Someone with the basically the same issue just reported reducing the time to get the instance into Cbc from 10 minutes with python-mip down to 10 seconds with CyLP (see coin-or/CyLP#95). |
Thanks I'll see into CyLP, but I like Python-MIP so far so I'll give it a good try before changing. I tried basic optimization against the master branch:
this improves significantly my case of numpy.matmul with a binary integer matrix: here are some measurements not optimized: optimized: for that same operation in this particular case I get a gain of about 5.5x time faster Do you think this is worth a PR? |
Hi @bonelli , your optimization seems good, please make a PR. I'm thinking in some optimizations here that I can test, but I would need a code example where this bottleneck appears. Do you think you could write a simpler code that exposes the same performance bottleneck ? Then I could test my ideas and see the effect in an environment similar to yours. @tkralphs , didn't know that the matrix interface could be so much faster, I'll check on how to provide a similar input in Python-MIP. |
I was thinking of making the same optimisation, but wouldn't the following code result in issues?
|
If the *= operator performs in-place (I think so) it is correct, a would be multiplied by 3. Frankly with this kind of mathematical applications I believe that having in-place operations is so misleading that they shouldn't be allowed. Mathematical variables/constants are defined once and immutable after that, encouraging this kind of imperative programming in a math/optimization library is something I wouldn't do. |
@h-g-s sure, I'll make a PR and post some examples of slow numpy operations with python-mip |
Line 185 in 0eb7f7c
yes that's right, it's an in-place operation. Removing in-place ops would of course be a severe backwards compatibility issue (perhaps for version 2). I'd support it though. |
All due respect, I don't think it's useful to add a matrix notation if it's gonna be just syntactic sugar, I just arranged normal variables inside a numpy array and the matrix notation is already done, and as a plus I get all the matrix operations I need, already implemented in numpy. If on the other hand you're thinking of implementing faster versions of the LinExpr operations, taking advantage of the matrix structure (as you did with xsum for iterables), then I'm all for it. Sorry I couldn't make the PR today, didn't have time even if it's high on my priority list |
a faster LinExpr is the priority but my idea is also to make a available a low level interface to communicate with the solver. With CFFI it is possible to allocate C arrays, which can be directly passed to CBC, for example, this would be a "last resort" for those who want to the maximum performance. |
Here is the PR for reference #100 |
release 1.9.0 resolve this issue for me, I'd close it |
I'm working on a problem that is mathematically defined with a couple of big sparse binary matrices,
I'm using quite a lot of python-mip binary variables and I collected them into numpy tensors to have some syntactic sugar and matrix operations over them.
It's very nice but very very slow, and it seems to me that with some basic optimization it could perform much better, notably to the
LinExpr.__mul__()
function.Would it be reasonable to have the special cases for multiplication by 1 or 0 to speed up this kind of use?
The text was updated successfully, but these errors were encountered: