Performance speed-up options? #45

yxie20 · 2021-05-26T15:35:06Z

Hello Miles! Thank you for open-sourcing this powerful tool! I am working on including PySR in my own research, and running into some performance bottlenecks.

I found regressing a simple equation (e.g. the quick-start example) takes roughly 2 minutes. Ideally, I am aiming to reduce that time to ~30 seconds. Would you give me some pointers on this? Meanwhile, I will try break down the challenge in several pieces:

Activating a new environment at each API call: I noticed that a new Julia (?) environment is created each time I call pysr() api (see terminal output below). Could we keep the environment up so we can skip this process for subsequent calls?

Running on julia -O3 /tmp/tmpe5qmgemh/runfile.jl
  Activating environment at `~/anaconda3/envs/rw/lib/python3.7/site-packages/Project.toml`
    Updating registry at `~/.julia/registries/General`
  No Changes to `~/anaconda3/envs/rw/lib/python3.7/site-packages/Project.toml`
  No Changes to `~/anaconda3/envs/rw/lib/python3.7/site-packages/Manifest.toml`
Activating environment on workers.
  Activating environment at `~/anaconda3/envs/rw/lib/python3.7/site-packages/Project.toml`
  Activating environment at `~/anaconda3/envs/rw/lib/python3.7/site-packages/Project.toml`
  Activating  Activating  environment at `~/anaconda3/envs/rw/lib/python3.7/site-packages/Project.toml`
environment at `~/anaconda3/envs/rw/lib/python3.7/site-packages/Project.toml`
  Activating  Activating environment at `~/anaconda3/envs/rw/lib/python3.7/site-packages/Project.toml` 
environment at `~/anaconda3/envs/rw/lib/python3.7/site-packages/Project.toml`
Importing installed module on workers...Finished!
Started!

If the above wouldn't work, then allowing y to be vector-valued (as mentioned in Allow y to be vector-valued? #35) would be a second-best option! Even better, if we could create a "batched" version of pysr(X, y) api pysr_batched(X, y), such that X and y are python lists, and we return the results in a list as well, so that we only generate one Julia script, and call os.system() once to keep the Julia environment up.
Multi-threading: I noticed that increasing procs from 4 to 8 resulted in slightly longer running time. I am running on a 8-core 16-tread CPU. Did I do something dumb?
I went into pysr/sr.py and added runtests=false flag in line 438 and 440. That saved ~20 seconds.

The text was updated successfully, but these errors were encountered:

MilesCranmer · 2021-05-26T16:55:49Z

Hi @yxie20,

Thanks for trying out PySR! Your suggestions are very good - I think having some batched call so that processes don't need to start for each dimension would be really nice.

The workers aren't creating an environment, but rather, activating a Julia environment. This is done within each process. This is expensive at startup, but it should be negligible for long running jobs. This is required for multi-node computation since they are entirely separate processes. For single node, it would be nice to have multiple threads instead, but I think having a single interface makes things easier to maintain. For very short jobs, you can do procs=0 which turns off multiprocessing but avoids this expensive startup. That may be a good solution to multi-dimensional output in the short term, actually?
Good suggestions; would be nice to add, and I'd be very interested in having this too! Will take a few code changes in the backend but I think there's a smart way to do it that would incur very few structural changes.
This is probably just because of startup time. More procs means more work for the head node. In the limit of large runtimes, more procs will be better (assuming you have populations>procs), but for very short runtimes indeed it will hurt. You can turn off multiprocessing with procs=0. On a 16 thread CPU you could do 16 procs, and have populations=2*16 so each thread is always occupied. Also, if you have many procs, you might want to increase ncyclesperiteration so that the processes take longer between sending results back to the head node; that way it doesn't get saturated.
runtests=True runs some tests on the backend before execution. This includes things like testing user operators for bad definitions (e.g., sqrt instead of sqrt_abs - although I automatically swap these now), whether user-defined operators are successfully copied to processes on other nodes, and also testing whether the whole pipeline works. I think it's good to have as a default to flag issues that are difficult to debug once the pipeline is actually running, but indeed if you know your setup already works and want speed, you can turn it off.

Hopefully this helps!
Cheers,
Miles

MilesCranmer · 2021-05-26T20:17:41Z

FYI I just added multi-output capabilities to the backend! On the multi-output branch in SymbolicRegression.jl. Will work its way into PySR soon enough.

Cheers,
Miles

yxie20 · 2021-05-28T15:18:32Z

Thank you Miles! I'm excited to give it a try! Now a basic question: How can I update the Julia backend so that I PySR can use the multi-output branch?

Thanks again!

MilesCranmer · 2021-05-28T18:45:03Z

It will be in v0.6.0 of PySR. Not ready yet; I'll write when it is.

Cheers,
Miles

MilesCranmer · 2021-05-30T06:17:52Z

Release candidate is up:

pip install --upgrade pysr==0.6.0rc1

It will allow for a matrix of y.

Let me know how this works!
Cheers,
Miles

yxie20 · 2021-06-05T20:39:59Z

Looks like we got an error:

Importing installed module on workers...Finished!
Testing module on workers...Finished!
Testing entire pipeline on workers...Finished!
Started!
ERROR: LoadError: SystemError: opening file "out2_/tmp/tmpjk5ery5w/hall_of_fame.csv": No such file or directory
Stacktrace:
  [1] systemerror(p::String, errno::Int32; extrainfo::Nothing)
    @ Base ./error.jl:168
  [2] #systemerror#62
    @ ./error.jl:167 [inlined]
  [3] systemerror
    @ ./error.jl:167 [inlined]
  [4] open(fname::String; lock::Bool, read::Nothing, write::Nothing, create::Nothing, truncate::Bool, append::Nothing)
    @ Base ./iostream.jl:293
  [5] open(fname::String, mode::String; lock::Bool)
    @ Base ./iostream.jl:355
  [6] open(fname::String, mode::String)
    @ Base ./iostream.jl:355
  [7] open(::SymbolicRegression.var"#47#73"{Options{Tuple{typeof(+), typeof(*)}, Tuple{typeof(cos), typeof(exp), typeof(sin)}, L2DistLoss}, Vector{PopMember}, SymbolicRegression.../Dataset.jl.Dataset{Float32}}, ::String, ::Vararg{String, N} where N; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Base ./io.jl:328
  [8] open
    @ ./io.jl:328 [inlined]
  [9] EquationSearch(datasets::Vector{SymbolicRegression.../Dataset.jl.Dataset{Float32}}; niterations::Int64, options::Options{Tuple{typeof(+), typeof(*)}, Tuple{typeof(cos), typeof(exp), typeof(sin)}, L2DistLoss}, numprocs::Int64, procs::Nothing, runtests::Bool)
    @ SymbolicRegression ~/.julia/packages/SymbolicRegression/8HpEO/src/SymbolicRegression.jl:398
 [10] EquationSearch(X::Matrix{Float32}, y::Matrix{Float32}; niterations::Int64, weights::Nothing, varMap::Vector{String}, options::Options{Tuple{typeof(+), typeof(*)}, Tuple{typeof(cos), typeof(exp), typeof(sin)}, L2DistLoss}, numprocs::Int64, procs::Nothing, runtests::Bool)
    @ SymbolicRegression ~/.julia/packages/SymbolicRegression/8HpEO/src/SymbolicRegression.jl:144
 [11] top-level scope
    @ /tmp/tmpjk5ery5w/runfile.jl:7
in expression starting at /tmp/tmpjk5ery5w/runfile.jl:7
Traceback (most recent call last):
  File "/home/yxie20/anaconda3/envs/rw/lib/python3.7/site-packages/pysr/sr.py", line 759, in get_hof
    all_outputs = [pd.read_csv(f'out{i}_' + str(equation_file) + '.bkup', sep="|") for i in range(1, nout+1)]
  File "/home/yxie20/anaconda3/envs/rw/lib/python3.7/site-packages/pysr/sr.py", line 759, in <listcomp>
    all_outputs = [pd.read_csv(f'out{i}_' + str(equation_file) + '.bkup', sep="|") for i in range(1, nout+1)]
  File "/home/yxie20/anaconda3/envs/rw/lib/python3.7/site-packages/pandas/io/parsers.py", line 610, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/yxie20/anaconda3/envs/rw/lib/python3.7/site-packages/pandas/io/parsers.py", line 462, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/yxie20/anaconda3/envs/rw/lib/python3.7/site-packages/pandas/io/parsers.py", line 819, in __init__
    self._engine = self._make_engine(self.engine)
  File "/home/yxie20/anaconda3/envs/rw/lib/python3.7/site-packages/pandas/io/parsers.py", line 1050, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/home/yxie20/anaconda3/envs/rw/lib/python3.7/site-packages/pandas/io/parsers.py", line 1867, in __init__
    self._open_handles(src, kwds)
  File "/home/yxie20/anaconda3/envs/rw/lib/python3.7/site-packages/pandas/io/parsers.py", line 1368, in _open_handles
    storage_options=kwds.get("storage_options", None),
  File "/home/yxie20/anaconda3/envs/rw/lib/python3.7/site-packages/pandas/io/common.py", line 647, in get_handle
    newline="",
FileNotFoundError: [Errno 2] No such file or directory: 'out1_/tmp/tmpjk5ery5w/hall_of_fame.csv.bkup'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "batch_test.py", line 20, in <module>
    temp_equation_file=True,
  File "/home/yxie20/anaconda3/envs/rw/lib/python3.7/site-packages/pysr/sr.py", line 365, in pysr
    equations = get_hof(**kwargs)
  File "/home/yxie20/anaconda3/envs/rw/lib/python3.7/site-packages/pysr/sr.py", line 763, in get_hof
    raise RuntimeError("Couldn't find equation file! The equation search likely exited before a single iteration completed.")
RuntimeError: Couldn't find equation file! The equation search likely exited before a single iteration completed.

MilesCranmer · 2021-06-06T05:03:35Z

Sorry, this looks like a bug. I missed the behavior where both temp_equation_file=True and multioutput=True. Will fix now.

MilesCranmer · 2021-06-06T06:56:27Z

Fixed in 0.6.3+

yxie20 · 2021-06-07T12:42:50Z

Thank you Miles! It seems like 0.6.3 is even slower than before.

Did you change any default argument values?
How come even if I set niterations to 5, pysr still runs 200 iterations? (I think this is the main reason why 0.6.3 runs much slower.
Is there an early-stop option possible, perhaps based on MSE score to further speed-up the performance?
I am also getting weird printout like this during the first few iterations:

==============================

Cycles per second: 0.000e+00
Head worker occupation: 0.0%
Progress: 0 / 200 total iterations (0.000%)
==============================
Best equations for output 1
Hall of Fame:
-----------------------------------------
Complexity  Loss       Score     Equation

==============================

MilesCranmer · 2021-06-07T15:17:55Z

Yes, the default arguments now do more iterations and have a larger number of populations, resulting in the default run taking longer. annealing has also been set to false, which makes simple equations take longer, but more complex equations achievable. For simple equations you could probably set annealing=True again.
niterations arguments is iterations per population, so the progress bar shows populations*niterations.
No, but you can set the timeout argument or hit <ctrl-c>. Might be nice to have a user-set condition for early stopping though, that's a good point.
What do you mean by "first few iterations"? Do you mean it has 0 equations but several iterations have passed?

MilesCranmer · 2021-06-07T15:20:03Z

By the way, if you want smaller startup time, you could set julia_optimization=0. That will turn off the optimizing compiler for the Julia code, which should let it start faster.

yxie20 · 2021-06-07T18:58:30Z

Thank you for 1) and 2)! Looking forward to 3)!

For 4), exactly as you said. I get a few of those empty (zero equations) for several iterations. It always says:

Cycles per second: 0.000e+00
Head worker occupation: 0.0%
Progress: 0 / <however much> total iterations (0.000%)

then no equations listed. After about 2 minutes of hanging, the normal printout appears. The hanging is much longer when populations is set to a large number.

The results are fine! Maybe it's nothing to worry about!

MilesCranmer · 2021-06-08T17:55:07Z

It doesn't say "Progress: 1 / ...", right? It's stuck at "Progress: 0"? This is expected behaviour, although maybe I should wait for some equations before starting the printing.

By the way - on PySR 0.6.5, which will be up later today - I added a patch which boosts performance by nearly 2x. It turns out the optimization library I was using (main bottleneck) did not require a differentiable function, so I implemented a faster non-differentiable version.

MilesCranmer · 2021-06-09T03:43:30Z

One other idea. The backend of PySR is in Julia, and Julia has a bit of a slow startup time, hence the slow startup of PySR.

There's a way to avoid the startup time, by using this package - https://github.com/dmolina/DaemonMode.jl. It would probably let you execute PySR runs in quick succession. The idea would be to startup a Julia daemon when first running PySR, pre-compile SymbolicRegression in that daemon, then execute each new script within that daemon. Thus, you wouldn't need to restart Julia everytime you call PySR.

Edit: just tried it; it doesn't really help.

MilesCranmer · 2021-06-20T04:04:42Z

More ideas, which would probably help quite a bit:

Use Threads instead of Distributed. That would cut down on startup time quite a bit, since you are only using a single Julia process instead of one for each procs.
Get PySR to start Julia with workers julia -p {procs}, rather than create them dynamically and copy-in user definitions.
Make EquationSearch specialize to type of parallelism, rather than have it as a variable.

yxie20 · 2021-07-03T15:11:40Z

Following up: If early-stop (based on MSE) can be implemented, that would be super helpful in speeding up pysr on my end, where I have PySR running inside a large for loop. Do you think this is possible? Thank you!

MilesCranmer · 2021-07-05T03:48:04Z

Sure; what sort of things would you want to trigger the stop? An absolute error reached, or relative error, or something like no error improvement for N iterations?

yxie20 · 2021-07-05T13:41:03Z

I think both 2) relative error and 3) convergence makes sense! I can work with 1) absolute error as well. For my purposes, having both 2) and 3) or 1) and 3) will be sufficient.

Thank you, Miles!

yxie20 · 2021-07-07T00:07:04Z

Just a note on multiple-output (output y being dimensional) with early exit:
Right now it seems like we compute each output dimension sequentially. To make early exit work correctly, we should probably early-exit on each dimension, so we don't exit when first few dimensions have finished while the rest hasn't started.

Thanks again!

MilesCranmer · 2021-07-12T21:28:02Z

FYI multithreading is now an option in PySR v0.6.11. That should help startup time.

Right now it seems like we compute each output dimension sequentially.

Actually, each output is computed at the same time asynchronously. One particular batch of computation may finish earlier than another, which might make it seem that it is done sequentially.

we should probably early-exit on each dimension, so we don't exit when first few dimensions have finished while the rest hasn't started.

This is a really good idea to do early stopping on each output separately. That would free up more cores for the remaining outputs.

MilesCranmer mentioned this issue May 27, 2021

Batched equation search / multi-output search MilesCranmer/SymbolicRegression.jl#26

Merged

MilesCranmer mentioned this issue May 30, 2021

Multi-output y / batched equation search #46

Merged

MilesCranmer closed this as completed May 30, 2021

MilesCranmer mentioned this issue Jun 20, 2021

Add multithreading as alternative to distributed MilesCranmer/SymbolicRegression.jl#34

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance speed-up options? #45

Performance speed-up options? #45

yxie20 commented May 26, 2021 •

edited by MilesCranmer

Loading

MilesCranmer commented May 26, 2021

MilesCranmer commented May 26, 2021

yxie20 commented May 28, 2021

MilesCranmer commented May 28, 2021

MilesCranmer commented May 30, 2021

yxie20 commented Jun 5, 2021

MilesCranmer commented Jun 6, 2021

MilesCranmer commented Jun 6, 2021 •

edited

Loading

yxie20 commented Jun 7, 2021

MilesCranmer commented Jun 7, 2021

MilesCranmer commented Jun 7, 2021

yxie20 commented Jun 7, 2021 •

edited

Loading

MilesCranmer commented Jun 8, 2021

MilesCranmer commented Jun 9, 2021 •

edited

Loading

MilesCranmer commented Jun 20, 2021 •

edited

Loading

yxie20 commented Jul 3, 2021

MilesCranmer commented Jul 5, 2021

yxie20 commented Jul 5, 2021

yxie20 commented Jul 7, 2021

MilesCranmer commented Jul 12, 2021

Performance speed-up options? #45

Performance speed-up options? #45

Comments

yxie20 commented May 26, 2021 • edited by MilesCranmer Loading

MilesCranmer commented May 26, 2021

MilesCranmer commented May 26, 2021

yxie20 commented May 28, 2021

MilesCranmer commented May 28, 2021

MilesCranmer commented May 30, 2021

yxie20 commented Jun 5, 2021

MilesCranmer commented Jun 6, 2021

MilesCranmer commented Jun 6, 2021 • edited Loading

yxie20 commented Jun 7, 2021

MilesCranmer commented Jun 7, 2021

MilesCranmer commented Jun 7, 2021

yxie20 commented Jun 7, 2021 • edited Loading

MilesCranmer commented Jun 8, 2021

MilesCranmer commented Jun 9, 2021 • edited Loading

MilesCranmer commented Jun 20, 2021 • edited Loading

yxie20 commented Jul 3, 2021

MilesCranmer commented Jul 5, 2021

yxie20 commented Jul 5, 2021

yxie20 commented Jul 7, 2021

MilesCranmer commented Jul 12, 2021

yxie20 commented May 26, 2021 •

edited by MilesCranmer

Loading

MilesCranmer commented Jun 6, 2021 •

edited

Loading

yxie20 commented Jun 7, 2021 •

edited

Loading

MilesCranmer commented Jun 9, 2021 •

edited

Loading

MilesCranmer commented Jun 20, 2021 •

edited

Loading