[Feature] Easy loading from equation file #167

MilesCranmer · 2022-07-25T04:06:19Z

This makes it easier to load models directly from a saved equation file. There are two ways to use this:

You can use pysr.load(...) to load from a standalone csv file of equations. However, you must pass a few different attributes here, to initialize the model: binary_operators, unary_operators, and n_features_in. If you have custom variable names, you also need to pass that, as well as nout if the number of outputs is not 1.
PySRRegressor will now automatically create a pickle file of the model parameters when running fit(...). This has the same name as the equation file, but with a .pkl extension. This lets you run, e.g., pysr.load("equations.csv"), and have everything loaded from the equation. This also loads all the other parameters. (Although, you also need to pass extra_sympy_mappings and extra_torch_mappings as they are unpicklable).

@tttc3 what do you think of this?

I am also wondering if it might be more intuitive to have a PySRRegressor.from_file(...) constructor.

MilesCranmer · 2022-07-25T04:20:48Z

@kazewong this makes it easier to load from past runs

tttc3 · 2022-07-27T11:11:26Z

Making it simpler to load past runs is a nice addition!

I might have missed something, but I think the automatically created pickle file is of the unfitted model as it is called before pysr._run(...). Is this the desired behaviour?

For me, the from_file constructor would be more intuitive. You can already use pickle to save and load a PySR instance without any additional code, ignoring the issue of the non-serializable attributes. The main feature that's being added is the ability to create a new instance of a PySR model from a file. This file could still be the .pkl, in which case, this constructor
is mostly a convenience function for handling the reinitialization of the non-serializable attributes.

MilesCranmer · 2022-07-27T15:29:18Z

I might have missed something, but I think the automatically created pickle file is of the unfitted model as it is called before pysr._run(...). Is this the desired behaviour?

It was the desired behavior, but your comment is making me reconsider. I guess I can see two scenarios where people would want to load their model:

PySR run is ongoing, or PySR run crashed, and you want to load up and visualize the output equations (which are automatically saved during the run).
PySR run completed successfully, and now you want to analyze the equations in a new python process.

For 1., it is necessary to checkpoint the parameters before the actual equation search finishes. With the loaded model, you can do model.refresh(), and it will load the checkpointed equations.

For 2., I think you could have the equations stored in the pickle file itself, and perhaps not need the additional equation file.

Maybe a nice solution for both these is to dump the pickle file twice: once before the fit runs to checkpoint the model parameters in case the search quits early, and once with the fitted equations (to the same file). What do you think?

tttc3 · 2022-08-01T14:03:24Z

I think dumping the pickle file twice is a good simple solution

Pablo-Lemos · 2022-08-09T12:22:09Z

Loading is really helpful, works perfectly as far as I can tell!

MilesCranmer · 2022-08-10T08:04:01Z

Thanks for the feedback! The merged version saves a pickle file twice - so now you can send someone that pickle file and they can evaluate your equations.

MilesCranmer added 15 commits July 20, 2022 14:28

load function to init model from saved equations

ccf71e9

Call refresh in load function

e5b4869

Correctly set path names

179fef6

Allow pickling without equations_ stored

85371bb

Remove extra_sympy_mappings from pickle file

dde0ef7

Automatically pickle file at initialization

b16d9ef

Allow loading from pickle file

5c0ad55

Add pickle files to gitignore

dc1d663

Add missing pickle import

4ae8a5c

Add test for loading from pickle file

78cdb0e

Fix filename concat in test

214744b

Test both with and without bkup file

58e25a9

Don't check for equation_file_ until after checkpoint_file set

1f01976

Allow both bkup and csv file

f1ac704

Additional logging messages during load

c6902b7

MilesCranmer added 8 commits August 1, 2022 14:24

Checkpoint model before and after fit

6501ca0

Add additional test for loading from pickle file

b53e7fa

Use .pkl instead of .csv.pkl

b8a97f1

Fix bug with inplace editing of equation_file_contents_

a6bed2c

Reduce precision of tests

f5577ea

Change model load to classmethod

34f4e3f

Add assertion for csv filename

07217e1

Fix assertion on csv filenames

f5a5c8e

Add README example for from_file

9433a83

MilesCranmer merged commit 1099283 into master Aug 10, 2022

MilesCranmer deleted the loading branch August 10, 2022 07:34

MilesCranmer added this to the v0.10.0 milestone Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Easy loading from equation file #167

[Feature] Easy loading from equation file #167

MilesCranmer commented Jul 25, 2022 •

edited

Loading

MilesCranmer commented Jul 25, 2022

tttc3 commented Jul 27, 2022

MilesCranmer commented Jul 27, 2022 •

edited

Loading

tttc3 commented Aug 1, 2022

Pablo-Lemos commented Aug 9, 2022

MilesCranmer commented Aug 10, 2022

[Feature] Easy loading from equation file #167

[Feature] Easy loading from equation file #167

Conversation

MilesCranmer commented Jul 25, 2022 • edited Loading

MilesCranmer commented Jul 25, 2022

tttc3 commented Jul 27, 2022

MilesCranmer commented Jul 27, 2022 • edited Loading

tttc3 commented Aug 1, 2022

Pablo-Lemos commented Aug 9, 2022

MilesCranmer commented Aug 10, 2022

MilesCranmer commented Jul 25, 2022 •

edited

Loading

MilesCranmer commented Jul 27, 2022 •

edited

Loading