Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifications to PCovR #98

Merged

Conversation

bhelfrecht
Copy link
Contributor

This PR is for a few tweaks to PCovR to address a few issues, namely #83 and #81, and to clean up the fitting.

  • The PCovR instantiation has been changed to do away with the alpha parameter, since it didn't seem to ever get passed to the regressor. Instead, one just instantiates PCovR with an sklearn linear regression or ridge object.
  • The Yhat and W parameters have been removed from fit, since this can be achieved by instantiating PCovR with a fitted regressor, which also ensures that Yhat is consistent with W, which was not the case before.
  • The handling of the singular values in _fit_sample_space has been made consistent with that in _fit_feature_space, that is, the inverse singular values are set to zero if the corresponding singular values are less than tol

KPCovR should also be modified to use this "bring your own regressor" initialization, and all fit calls for PCovR-related objects should probably use lowercase y to be consistent with sklearn. These future improvements could be rolled into this PR or given their own.

@bhelfrecht bhelfrecht marked this pull request as ready for review May 3, 2021 10:17
@bhelfrecht bhelfrecht requested a review from rosecers May 3, 2021 10:18
Copy link
Collaborator

@rosecers rosecers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, but I have a design question: why use the name regressor opposed to the sklearn-style estimator?

skcosmo/utils/_pcovr_utils.py Outdated Show resolved Hide resolved
skcosmo/utils/_pcovr_utils.py Show resolved Hide resolved
@bhelfrecht
Copy link
Contributor Author

Generally looks good, but I have a design question: why use the name regressor opposed to the sklearn-style estimator?

estimator is rather general, and can also refer to classifiers -- in PCovR, we specifically require a regressor, hence the name choice. The use of regressor is also consistent with sklearn's TransformedTargetRegressor.

@rosecers
Copy link
Collaborator

rosecers commented May 3, 2021

Generally looks good, but I have a design question: why use the name regressor opposed to the sklearn-style estimator?

estimator is rather general, and can also refer to classifiers -- in PCovR, we specifically require a regressor, hence the name choice. The use of regressor is also consistent with sklearn's TransformedTargetRegressor.

Okay! I just wanted to make sure there was a concrete reason for doing so

@rosecers rosecers self-requested a review May 3, 2021 15:16
Copy link
Collaborator

@rosecers rosecers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Letting it get one more pass from @Luthaf before we merge, but looks ready to me!

Copy link
Collaborator

@Luthaf Luthaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks god overall! I have one small request =)

Comment on lines +135 to +136
UC = UC.T[:, (vC ** 2) > rcond]
vC = vC[(vC ** 2) > rcond]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test hitting this code path? Even something simple with toy X/Y matrices.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhelfrecht, codecov is still not seeing these lines in the coverage report, which is really weird since you explicitly seeded the rng ... Maybe different version of numpy have different RNG? CI is using numpy == 1.20.2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just fixed it -- there was a typo in the test that made it bypass the targeted code

@Luthaf
Copy link
Collaborator

Luthaf commented May 4, 2021

Could you try to rebase & cleanup the git history? I can help to do this if you want!

@bhelfrecht
Copy link
Contributor Author

I think I've got it, depends how messy it gets. How "clean" should it be? For starters, I assume all the silly "Formatting" fixup'ed into the previous commit?

@rosecers
Copy link
Collaborator

rosecers commented May 4, 2021

Why not just fixup the formatting commits and then do squash and merge?

@bhelfrecht
Copy link
Contributor Author

Perfect, will do.

@Luthaf
Copy link
Collaborator

Luthaf commented May 4, 2021

fixup formatting commit and removing merge commits is enough for me, then we can keep the rest of history around and use a normal merge/rebase and merge.

If we go for a squash merge on the github side, there is no need to spend time doing fixup, since everything will be a single commit anyway.

@rosecers
Copy link
Collaborator

rosecers commented May 4, 2021

fixup formatting commit and removing merge commits is enough for me, then we can keep the rest of history around and use a normal merge/rebase and merge.

If we go for a squash merge on the github side, there is no need to spend time doing fixup, since everything will be a single commit anyway.

fixup removes the unnecessary commit messages from the history. I'm still for fixup + squash / merge.

@Luthaf
Copy link
Collaborator

Luthaf commented May 4, 2021

When we squash and merge on github we can edit the commit message as we want, removing unnecessary commit messages if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants