Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor of PySRRegressor #146

Merged
merged 91 commits into from
Jun 4, 2022
Merged

Refactor of PySRRegressor #146

merged 91 commits into from
Jun 4, 2022

Conversation

tttc3
Copy link
Contributor

@tttc3 tttc3 commented May 27, 2022

Re Issue #143

Compatibility with scikit-learn should be improved.

Noteable breaking changes for users: PySRRegressor.equations is now called PySRRegressor.equations_

Tests have been updated to allow compatibility with the refactored code but still assess the same functionality. All tests should pass.

Please let me know if there are any concerns or if you would like me to document/explain any of the changes in detail.

@MilesCranmer
Copy link
Owner

Awesome contribution, thank you so much @tttc3! This is great. I am reviewing now.

@MilesCranmer MilesCranmer self-requested a review May 27, 2022 20:59
pysr/sr.py Show resolved Hide resolved
pysr/sr.py Outdated Show resolved Hide resolved
test/test.py Outdated Show resolved Hide resolved
pysr/sr.py Outdated Show resolved Hide resolved
pysr/sr.py Show resolved Hide resolved
@MilesCranmer
Copy link
Owner

MilesCranmer commented May 28, 2022

General comment: is there a way I can continue a search? In the current PySR, if you run model.fit(X, y), and then model.fit(X, y), a second time, it will continue where it left off. Is there a way to do that with this refactoring?

Basically, it stores the Julia output into self.raw_julia_state_. However, here it looks like it is set to None every call to fit. Maybe by default, it should only set it to None if it is undefined (to set it to None, you would call model.reset()), or, alternatively, there could be a parameter passed to fit to "continue" the search? I'm not sure if there is a standard scikit-learn style for such things.

Edit: I thought about this some more. Maybe the best strategy is to have a parameter passed: model.fit(X, y, continue=True), but otherwise, reset the search state. At the end of any model.fit, if continue=True is not passed, a message could be printed like "To continue the search from where you left off, please run model.fit(X, y, continue=True)", or something of that nature. What do you think?

@MilesCranmer
Copy link
Owner

I see this warning when I run model.predict(X) when X is a numpy array:

/Users/mcranmer/venvs/main/lib/python3.8/site-packages/sklearn/base.py:450: UserWarning: X does not have valid feature names, but PySRRegressor was fitted with feature names

@MilesCranmer
Copy link
Owner

MilesCranmer commented May 28, 2022

The following code produces an error with these changes when it doesn't on master. Something to do with feature selection?

import numpy as np
import pandas as pd
from pysr import PySRRegressor

X = pd.DataFrame({f"k{i}": np.random.randn(1000) for i in range(30)})
y = X["k15"] ** 2 + np.cos(X["k20"])

model = PySRRegressor(unary_operators=["cos"], select_k_features=3)
model.fit(X, y)
ypred = model.predict(X)  # Errors

Edit: added this as a unit-test

@tttc3
Copy link
Contributor Author

tttc3 commented May 28, 2022

The latest commit should resolve the following:
#146 (comment) - Revert default values and perform validation
#146 (comment) - Handle warning message properly
#146 (comment) - Pass new unit-test

With respect to continuing a fit, the scikit-learn standard way is to inform the fit method via the warm_start parameter that is passed in init. If warm_start is True and fit is called a second time, fitting continues from where it left off. I'll have a look at implementing this now.

@MilesCranmer
Copy link
Owner

MilesCranmer commented May 28, 2022

Thanks!

I'm still seeing some issues with the selection for some reason. For example, if I take the above example, and pass a numpy array instead of a pandas dataframe:

import numpy as np
import pandas as pd
from pysr import PySRRegressor

X = pd.DataFrame({f"k{i}": np.random.randn(1000) for i in range(30)})
y = X["k15"] ** 2 + np.cos(X["k20"])

model = PySRRegressor(unary_operators=["cos"], select_k_features=3)
model.fit(X.values, y.values)
ypred = model.predict(X.values)  # Errors

Added as a unittest.

@MilesCranmer
Copy link
Owner

MilesCranmer commented Jun 3, 2022

Okay I got it working with the following "band-aid fix" to test_torch.py. This issue is deeper but as it is not really part of this PR we can fix it later.

import platform

if platform.system() == "Darwin":  # (macOS)
    # Julia, then Torch
    from pysr.julia_helpers import init_julia

    Main = init_julia()
    import torch
else:
    # Torch, then Julia
    import torch

@MilesCranmer
Copy link
Owner

Okay looks like everything is working now. Let me know if you'd like to look over anything, otherwise I can merge.

Also added early_stop_condition to all the tests, which makes them finish earlier if an equation is found early in searches. This speeds up all the testing by quite a bit...

Copy link
Owner

@MilesCranmer MilesCranmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic! Thanks so much for this incredible contribution. I have reviewed all changes and it's looking great.

@tttc3
Copy link
Contributor Author

tttc3 commented Jun 4, 2022

Thanks, glad to help! Just addressed some DeepSource issues but other than that, hopefully everything is good for merge. I'm investigating the pytorch issue and will follow up anything useful in your issue.

@MilesCranmer MilesCranmer merged commit c3dc203 into MilesCranmer:master Jun 4, 2022
@MilesCranmer MilesCranmer added this to the v0.9.0 milestone Jun 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants