Refactor of PySRRegressor #146

tttc3 · 2022-05-27T13:54:35Z

Re Issue #143

Compatibility with scikit-learn should be improved.

Noteable breaking changes for users: PySRRegressor.equations is now called PySRRegressor.equations_

Tests have been updated to allow compatibility with the refactored code but still assess the same functionality. All tests should pass.

Please let me know if there are any concerns or if you would like me to document/explain any of the changes in detail.

Code now allows for better interoperability with scikit-learn.

MilesCranmer · 2022-05-27T20:59:02Z

Awesome contribution, thank you so much @tttc3! This is great. I am reviewing now.

pysr/sr.py

test/test.py

pysr/sr.py

MilesCranmer · 2022-05-28T02:32:49Z

General comment: is there a way I can continue a search? In the current PySR, if you run model.fit(X, y), and then model.fit(X, y), a second time, it will continue where it left off. Is there a way to do that with this refactoring?

Basically, it stores the Julia output into self.raw_julia_state_. However, here it looks like it is set to None every call to fit. Maybe by default, it should only set it to None if it is undefined (to set it to None, you would call model.reset()), or, alternatively, there could be a parameter passed to fit to "continue" the search? I'm not sure if there is a standard scikit-learn style for such things.

Edit: I thought about this some more. Maybe the best strategy is to have a parameter passed: model.fit(X, y, continue=True), but otherwise, reset the search state. At the end of any model.fit, if continue=True is not passed, a message could be printed like "To continue the search from where you left off, please run model.fit(X, y, continue=True)", or something of that nature. What do you think?

MilesCranmer · 2022-05-28T02:57:19Z

I see this warning when I run model.predict(X) when X is a numpy array:

/Users/mcranmer/venvs/main/lib/python3.8/site-packages/sklearn/base.py:450: UserWarning: X does not have valid feature names, but PySRRegressor was fitted with feature names

MilesCranmer · 2022-05-28T03:34:36Z

The following code produces an error with these changes when it doesn't on master. Something to do with feature selection?

import numpy as np
import pandas as pd
from pysr import PySRRegressor

X = pd.DataFrame({f"k{i}": np.random.randn(1000) for i in range(30)})
y = X["k15"] ** 2 + np.cos(X["k20"])

model = PySRRegressor(unary_operators=["cos"], select_k_features=3)
model.fit(X, y)
ypred = model.predict(X)  # Errors

Edit: added this as a unit-test

tttc3 · 2022-05-28T15:26:32Z

The latest commit should resolve the following:
#146 (comment) - Revert default values and perform validation
#146 (comment) - Handle warning message properly
#146 (comment) - Pass new unit-test

With respect to continuing a fit, the scikit-learn standard way is to inform the fit method via the warm_start parameter that is passed in init. If warm_start is True and fit is called a second time, fitting continues from where it left off. I'll have a look at implementing this now.

MilesCranmer · 2022-05-28T18:42:57Z

Thanks!

I'm still seeing some issues with the selection for some reason. For example, if I take the above example, and pass a numpy array instead of a pandas dataframe:

import numpy as np
import pandas as pd
from pysr import PySRRegressor

X = pd.DataFrame({f"k{i}": np.random.randn(1000) for i in range(30)})
y = X["k15"] ** 2 + np.cos(X["k20"])

model = PySRRegressor(unary_operators=["cos"], select_k_features=3)
model.fit(X.values, y.values)
ypred = model.predict(X.values)  # Errors

Added as a unittest.

MilesCranmer · 2022-06-03T20:22:12Z

Okay I got it working with the following "band-aid fix" to test_torch.py. This issue is deeper but as it is not really part of this PR we can fix it later.

import platform

if platform.system() == "Darwin":  # (macOS)
    # Julia, then Torch
    from pysr.julia_helpers import init_julia

    Main = init_julia()
    import torch
else:
    # Torch, then Julia
    import torch

MilesCranmer · 2022-06-03T22:37:36Z

Okay looks like everything is working now. Let me know if you'd like to look over anything, otherwise I can merge.

Also added early_stop_condition to all the tests, which makes them finish earlier if an equation is found early in searches. This speeds up all the testing by quite a bit...

MilesCranmer

Looks fantastic! Thanks so much for this incredible contribution. I have reviewed all changes and it's looking great.

tttc3 · 2022-06-04T10:08:38Z

Thanks, glad to help! Just addressed some DeepSource issues but other than that, hopefully everything is good for merge. I'm investigating the pytorch issue and will follow up anything useful in your issue.

tttc3 and others added 7 commits May 27, 2022 14:35

Moved julia helper functions out of sr.py

6baa534

Refactor PySRRegressor for scikit-learn

32a2de6

Code now allows for better interoperability with scikit-learn.

Fixed jax export compatibility with refactor

73c6ffd

Updated tests for compatibility with refactor

c7187a6

fixed issues from deepsource

9490776

additional fixes

4173a8b

Add docstring to julia helpers file

ce60798

MilesCranmer self-requested a review May 27, 2022 20:59

tttc3 and others added 8 commits May 27, 2022 22:34

Fixed typos and ensured tests pass

19ef535

Tweak docstring

0387e10

Tidy up

3712b0e

Tidy up code

a828c23

Preserve PySRRegressor.equations but with deprecation

be87321

Expand docstring for PySRRegressor

2fbf19c

Cleanup

780b3a0

Move numpy export code to separate file

5a01e6f

MilesCranmer reviewed May 28, 2022

View reviewed changes

pysr/sr.py Show resolved Hide resolved

pysr/sr.py Outdated Show resolved Hide resolved

test/test.py Outdated Show resolved Hide resolved

pysr/sr.py Outdated Show resolved Hide resolved

pysr/sr.py Show resolved Hide resolved

Clean up example in docstring

cfa9a72

Cleanup

db9848f

MilesCranmer and others added 5 commits May 27, 2022 23:57

Merge branch 'master' into pr/tttc3/146

56cb021

Correct use of Xresampled

e56007b

Merge branch 'master' into refactor-PySRRegressor

35cce07

Merge branch 'master' into pr/tttc3/146

2d0032e

Fixed issues outlined in pull request review

83d8e67

MilesCranmer added 6 commits June 3, 2022 15:48

Switch order of torch depending on operating system

5ffac80

Fix formatting

9b2a102

Add missing default in docstring

893fdd2

Do not store custom loss in Julia main

fb2f513

Add early stop conditions to force speed testing

4c9fe98

Allow functional versions of early stop condition

44dcbea

MilesCranmer added 12 commits June 3, 2022 16:26

More comments for torch test

215a7a1

Add warning that the Julia state is being reset

9049df4

Convert stop condition to string if not set

518e1cc

Fix typo in comment

f07f6e6

Clean up mutation_weights setting

7a5a9a0

Remove unused import

9a4f73f

Update backend version with early_stop_condition fix

f9efd1b

Fix early_stop_condition setting if None

b2f8a6f

Clean up testing code

27fac96

Add error for deterministic setting

15105ad

Add warning about setting random state without deterministic

af8ab17

Add tests for determinism warnings

045bdb1

MilesCranmer added 5 commits June 3, 2022 18:46

Correct ordering of init parameters

494a3ba

Only give warning if raw_julia_state_ already set

21a0846

Improve documentation for validation function

623e6f0

Documentation cleanup

32d0b3a

Set to version 0.9.0 as breaking changes needed

3182a3b

MilesCranmer approved these changes Jun 3, 2022

View reviewed changes

Addressed some DeepSource issues

ad1c492

MilesCranmer merged commit c3dc203 into MilesCranmer:master Jun 4, 2022

MilesCranmer added this to the v0.9.0 milestone Jun 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor of PySRRegressor #146

Refactor of PySRRegressor #146

tttc3 commented May 27, 2022

MilesCranmer commented May 27, 2022

MilesCranmer commented May 28, 2022 •

edited

Loading

MilesCranmer commented May 28, 2022

MilesCranmer commented May 28, 2022 •

edited

Loading

tttc3 commented May 28, 2022 •

edited

Loading

MilesCranmer commented May 28, 2022 •

edited

Loading

MilesCranmer commented Jun 3, 2022 •

edited

Loading

MilesCranmer commented Jun 3, 2022

MilesCranmer left a comment

tttc3 commented Jun 4, 2022

Refactor of PySRRegressor #146

Refactor of PySRRegressor #146

Conversation

tttc3 commented May 27, 2022

MilesCranmer commented May 27, 2022

MilesCranmer commented May 28, 2022 • edited Loading

MilesCranmer commented May 28, 2022

MilesCranmer commented May 28, 2022 • edited Loading

tttc3 commented May 28, 2022 • edited Loading

MilesCranmer commented May 28, 2022 • edited Loading

MilesCranmer commented Jun 3, 2022 • edited Loading

MilesCranmer commented Jun 3, 2022

MilesCranmer left a comment

Choose a reason for hiding this comment

tttc3 commented Jun 4, 2022

MilesCranmer commented May 28, 2022 •

edited

Loading

MilesCranmer commented May 28, 2022 •

edited

Loading

tttc3 commented May 28, 2022 •

edited

Loading

MilesCranmer commented May 28, 2022 •

edited

Loading

MilesCranmer commented Jun 3, 2022 •

edited

Loading