Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to disable CV? #919

Closed
jhmenke opened this issue Sep 23, 2019 · 3 comments
Closed

Is it possible to disable CV? #919

jhmenke opened this issue Sep 23, 2019 · 3 comments
Labels

Comments

@jhmenke
Copy link
Contributor

jhmenke commented Sep 23, 2019

Context of the issue

My dataset is very small and contains at maximum 380 samples and about 800 features. Naturally, this will mean that the data is highly noisy and depending on the CV splits and also the random_state of the model. Through another issue i learned that fitted_pipeline_ is different from the best pipeline during the search process, so it cannot be used either.

My idea therefore was to write a custom scorer which does not use the training data to score but directly uses the fixed and separate validation set to generate the score. It also keeps track of the best pipeline and saves it independently in the trained state.

There are some issues, e.g., the scorer object being copied i guess when using n_jobs>1, but since my case is an edge case that's okay, i can use n_jobs=1.
However, i really would like to deactivate cross validation for my case, since i want the estimator to be trained on the full training data and then the scorer score it on the full validation data. Using CV with training / validation set indices would not make my validation set independent of the training anymore, which is what i want to avoid.

Is there a workaround to make this happen?

@weixuanfu
Copy link
Contributor

I think you could use custom validation set(s) via cv parameter (similar to issue #767).

Please check the cv parameter in TPOT API. You can merge the validation set with the trainset for fitting in tpot_obj.fit(X, y) and then specify train/test splits via an iterable (see the example for GridSearchCV)

@jhmenke
Copy link
Contributor Author

jhmenke commented Sep 23, 2019

Thank you @weixuanfu , it works well!

@phihongvanvcb
Copy link

@weixuanfu can you please give an example of using an iterator in detail code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants