Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for CHOP #14

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Fixes for CHOP #14

wants to merge 7 commits into from

Conversation

GeoffNN
Copy link
Contributor

@GeoffNN GeoffNN commented Mar 26, 2021

  • Added data normalizing with StandardScaler
  • Modified the loss function to exactly match the loss function in objective.py

This doesn't fix the issue mentioned here.

This still needs work for the stochastic case.

@GeoffNN GeoffNN marked this pull request as draft March 26, 2021 10:09
solvers/chop.py Outdated
self.X = torch.tensor(X).to(device)
self.y = torch.tensor(y > 0, dtype=torch.float64).to(device)
scaler = StandardScaler()
X = scaler.fit_transform(X)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be done in the solver but in the dataset as it changes the problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be an option in the dataset?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the libsvmdata package does some preprocessing like this. I would just offer the good dataset (normalized or not).

Copy link
Contributor Author

@GeoffNN GeoffNN Mar 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it seems this was the cause of the infinity values... (Since the scaler wasn't applied at eval time) Modifying.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeoffNN dont change the solver but the dataset. Otherwise solvers will not be comparable

Copy link
Contributor Author

@GeoffNN GeoffNN Mar 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done; I added standardization as a parameter in Covtype, so it runs on both versions of the dataset. We can also just keep the standardized version if you prefer.

Copy link
Contributor

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool !

do you have any new figure you can share?

thx a lot @GeoffNN

Comment on lines +16 to +17
requirements = ['pip:https://github.com/openopt/chop/archive/master.zip',
'pip:scikit-learn']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
requirements = ['pip:https://github.com/openopt/chop/archive/master.zip',
'pip:scikit-learn']
requirements = ['pip:https://github.com/openopt/chop/archive/master.zip']


if self.standardized:
scaler = StandardScaler()
X = scaler.fit_transform(X)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simplify and always standardize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok -- only for Covtype? It would make sense to do it for all non-sparse datasets, right?

@GeoffNN
Copy link
Contributor Author

GeoffNN commented Mar 28, 2021

I'm still working on the stochastic variant -- I'll share the figures for both ASAP

@agramfort
Copy link
Contributor

I don’t know about Madelon but the others should be already scaled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants