Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch Julia interface to PyJulia #87

Merged
merged 41 commits into from Jan 20, 2022
Merged

Switch Julia interface to PyJulia #87

merged 41 commits into from Jan 20, 2022

Conversation

MilesCranmer
Copy link
Owner

@MilesCranmer MilesCranmer commented Jan 14, 2022

This switches the entirety of the Julia/Python interface to use PyJulia rather than a subprocess call to a generated script. Multithreading and distributed computing both work!

This brings several advantages:

  • Huge speedups in startup time on second pysr( ) calls in the same python call
  • Simplified error debugging on the Julia side
  • Greatly simplified codebase with no huge chunks of code generation
  • Possibility to checkpoint the symbolic regression search, and restart later

The disadvantage is that this will put PySR at the mercy of PyJulia and any bugs that it experiences. However, I think in the longterm this is still much better than running a script of generating code.

Installation with this new interface is now:

pip install pysr
python -c 'import pysr; pysr.install()'

This will install PyCall.jl (required by PyJulia) as well as the packages defined in SymbolicRegression.jl.

Will initially be implemented in v0.7.0a1 for a while until things stabilize.

TODO:

  • Update documentation to describe PyJulia install.
  • Update documentation to mention that repeat calls will have a quicker startup time.
  • Check that error is correct when PyJulia not installed if pysr.install() has not been run.
  • Check if there are any performance hits using PyJulia vs Julia.
  • Make it so we can KeyboardInterrupt the PyJulia execution of EquationSearch. See KeyboardInterrupt support / SIGINT handler JuliaPy/pyjulia#211.
  • This does not work when python is statically linked. In this situation, should we use compiled_modules=False? Or enfoce that the user has to get this working?
  • Is there anything I can do to get conda working with PyCall.jl?
  • Test conda with the CI.

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Jan 17, 2022

To get KeyboardInterrupt to work, I'm going to launch EquationSearch in a multiprocessing.Process. Upon KeyboardInterrupt, a signal will be given to a temp file. The actual SymbolicRegression.jl EquationSearch will check for this signal during the search loop. It will then manually exit if this signal is set to 1.

Update: instead, I will have SymbolicRegression.jl itself read stdin constantly, and if "q" is entered, it will quit and return the current set of populations and hall of fame. This works!

@MilesCranmer
Copy link
Owner Author

So it seems like PySR actually speeds up with PyJulia versus the old method of launching a script.

Measurements before the change:

[64700., 67500., 68800., 68900., 69300., 69800., 68400., 73000.,
       74600., 75100., 75500., 75400., 57700., 59700., 60600., 61300.,
       61900., 62500., 62900.]

(average 67242)
Measurements after the change:

[ 87800.,  87900.,  88000.,  87700.,  87200.,  90900.,  90800.,
        89500.,  89000.,  88700., 121000., 121000., 121000.]

(average 96192).

These measurements are very small jobs but this at least shows it doesn't drastically hurt performance, so I think we are good on that front.

@MilesCranmer MilesCranmer merged commit 57de954 into master Jan 20, 2022
@MilesCranmer MilesCranmer deleted the pyjulia branch January 20, 2022 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant