Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve train_test_split #12

Closed
cancan101 opened this issue Dec 29, 2014 · 3 comments
Closed

Improve train_test_split #12

cancan101 opened this issue Dec 29, 2014 · 3 comments

Comments

@cancan101
Copy link
Contributor

For NeuralNet:

  1. Why use your own implementation of train_test_split rather than using sklearn.cross_validation::train_test_split?
    -- This would fix train_test_split does not work for Pandas Series with non Dense Index #7 because it uses safe_indexing.
  2. Allow the user to pass in, through composition (like the batch iterator), a class responsible for performing the train vs valid split. Right now the only option is to subclass NeuralNet which is not ideal.
@cancan101
Copy link
Contributor Author

I am currently using this:

class NeuralNetFix(NeuralNet):
    def train_test_split(self, X, y, eval_size):
        assert eval_size is None
        X_train = X
        y_train = y
        X_valid = self.X_valid
        y_valid = self.y_valid
        if not self.regression and self.use_label_encoder:
            y_valid = self.enc_.transform(y_valid).astype(np.int32)
        return X_train, X_valid, y_train, y_valid

I then handle the train_test_split myself.

UPDATE: I no longer read X_valid and y_valid from self.

@dnouri
Copy link
Owner

dnouri commented Jan 1, 2015

@cancan101 To answer your first question: You'll notice that NeuralNet.train_test_split uses StratifiedKFold for classification tasks. That's different to what sklearn's train_test_split would give you.

But maybe the stratified split isn't all that important, and we can just use an overridable train_test_split (component) by default.

@dnouri
Copy link
Owner

dnouri commented Feb 19, 2015

I think we can merge this discussion with #42, and close this one. Also consider #45 when thinking about a better way to override the train_test_split method.

@dnouri dnouri closed this as completed Feb 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants