-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance speed-up options? #45
Comments
Hi @yxie20, Thanks for trying out PySR! Your suggestions are very good - I think having some batched call so that processes don't need to start for each dimension would be really nice.
Hopefully this helps! |
FYI I just added multi-output capabilities to the backend! On the Cheers, |
Thank you Miles! I'm excited to give it a try! Now a basic question: How can I update the Julia backend so that I PySR can use the Thanks again! |
It will be in v0.6.0 of PySR. Not ready yet; I'll write when it is. Cheers, |
Release candidate is up: pip install --upgrade pysr==0.6.0rc1 It will allow for a matrix of y. Let me know how this works! |
Looks like we got an error:
|
Sorry, this looks like a bug. I missed the behavior where both |
Fixed in 0.6.3+ |
Thank you Miles! It seems like 0.6.3 is even slower than before.
|
|
By the way, if you want smaller startup time, you could set |
Thank you for 1) and 2)! Looking forward to 3)! For 4), exactly as you said. I get a few of those empty (zero equations) for several iterations. It always says:
then no equations listed. After about 2 minutes of hanging, the normal printout appears. The hanging is much longer when populations is set to a large number. The results are fine! Maybe it's nothing to worry about! |
It doesn't say "Progress: 1 / ...", right? It's stuck at "Progress: 0"? This is expected behaviour, although maybe I should wait for some equations before starting the printing. By the way - on PySR 0.6.5, which will be up later today - I added a patch which boosts performance by nearly 2x. It turns out the optimization library I was using (main bottleneck) did not require a differentiable function, so I implemented a faster non-differentiable version. |
One other idea. The backend of PySR is in Julia, and Julia has a bit of a slow startup time, hence the slow startup of PySR. There's a way to avoid the startup time, by using this package - https://github.com/dmolina/DaemonMode.jl. It would probably let you execute PySR runs in quick succession. The idea would be to startup a Julia daemon when first running PySR, pre-compile SymbolicRegression in that daemon, then execute each new script within that daemon. Thus, you wouldn't need to restart Julia everytime you call PySR. Edit: just tried it; it doesn't really help. |
More ideas, which would probably help quite a bit:
|
Following up: If early-stop (based on MSE) can be implemented, that would be super helpful in speeding up pysr on my end, where I have PySR running inside a large for loop. Do you think this is possible? Thank you! |
Sure; what sort of things would you want to trigger the stop? An absolute error reached, or relative error, or something like no error improvement for N iterations? |
I think both 2) relative error and 3) convergence makes sense! I can work with 1) absolute error as well. For my purposes, having both 2) and 3) or 1) and 3) will be sufficient. Thank you, Miles! |
Just a note on multiple-output (output y being dimensional) with early exit: Thanks again! |
FYI
Actually, each output is computed at the same time asynchronously. One particular batch of computation may finish earlier than another, which might make it seem that it is done sequentially.
This is a really good idea to do early stopping on each output separately. That would free up more cores for the remaining outputs. |
Hello Miles! Thank you for open-sourcing this powerful tool! I am working on including PySR in my own research, and running into some performance bottlenecks.
I found regressing a simple equation (e.g. the quick-start example) takes roughly 2 minutes. Ideally, I am aiming to reduce that time to ~30 seconds. Would you give me some pointers on this? Meanwhile, I will try break down the challenge in several pieces:
If the above wouldn't work, then allowing y to be vector-valued (as mentioned in Allow
y
to be vector-valued? #35) would be a second-best option! Even better, if we could create a "batched" version ofpysr(X, y)
apipysr_batched(X, y)
, such thatX
andy
are python lists, and we return the results in a list as well, so that we only generate one Julia script, and callos.system()
once to keep the Julia environment up.Multi-threading: I noticed that increasing
procs
from 4 to 8 resulted in slightly longer running time. I am running on a 8-core 16-tread CPU. Did I do something dumb?I went into
pysr/sr.py
and addedruntests=false
flag in line 438 and 440. That saved ~20 seconds.The text was updated successfully, but these errors were encountered: