New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Use dask.delayed within fit #730
Changes from 1 commit
8cb1fae
2e1b373
51cf0cd
d1d10a0
7e7e68f
d06edd6
6d370ec
8808c77
5d21024
439ae5c
d9aca85
b919bed
54011ff
2050b2d
2577d08
36b2d23
3ba6082
c378519
2eb71dd
4ca3b95
ef325b4
a3102ac
3144977
a80888f
6a85646
22226f2
84b4474
ea3a1bd
d9b4d9a
4342853
d279253
ac6b770
b3342fb
3d2fcd1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,6 +23,7 @@ | |
|
||
""" | ||
|
||
import dask | ||
import numpy as np | ||
from deap import tools, gp | ||
from inspect import isclass | ||
|
@@ -395,7 +396,8 @@ def mutNodeReplacement(individual, pset): | |
|
||
@threading_timeoutable(default="Timeout") | ||
def _wrapped_cross_val_score(sklearn_pipeline, features, target, | ||
cv, scoring_function, sample_weight=None, groups=None): | ||
cv, scoring_function, sample_weight=None, | ||
groups=None, delayed=lambda x: x): | ||
"""Fit estimator and compute scores for a given dataset split. | ||
Parameters | ||
---------- | ||
|
@@ -425,24 +427,28 @@ def _wrapped_cross_val_score(sklearn_pipeline, features, target, | |
|
||
cv = check_cv(cv, target, classifier=is_classifier(sklearn_pipeline)) | ||
cv_iter = list(cv.split(features, target, groups)) | ||
scorer = check_scoring(sklearn_pipeline, scoring=scoring_function) | ||
|
||
try: | ||
with warnings.catch_warnings(): | ||
warnings.simplefilter('ignore') | ||
scores = [_fit_and_score(estimator=clone(sklearn_pipeline), | ||
X=features, | ||
y=target, | ||
scorer=scorer, | ||
train=train, | ||
test=test, | ||
verbose=0, | ||
parameters=None, | ||
fit_params=sample_weight_dict) | ||
for train, test in cv_iter] | ||
CV_score = np.array(scores)[:, 0] | ||
return np.nanmean(CV_score) | ||
except TimeoutException: | ||
return "Timeout" | ||
except Exception as e: | ||
return -float('inf') | ||
scorer = delayed(check_scoring)(sklearn_pipeline, scoring=scoring_function) | ||
|
||
def safe_fit_and_score(*args, **kwargs): | ||
try: | ||
return _fit_and_score(*args, **kwargs) | ||
except Exception: | ||
return -float('inf') | ||
|
||
with warnings.catch_warnings(): | ||
warnings.simplefilter('ignore') | ||
# TODO: dive into and delay fit/transform calls on sklearn_pipeline.steps appropriately | ||
# This will help with shared intermediate results, profiling, etc.. | ||
# It looks like the dask_ml.model_selection._search.do_fit_and_score might have good logic here | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger is this task easy for you by any chance? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, can take a look today I think. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TPOT is a fun problem to play with :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alternatively @jcrist if you're around and have time you've probably done this before :) |
||
scores = [delayed(safe_fit_and_score)(estimator=delayed(clone)(sklearn_pipeline), | ||
X=features, | ||
y=target, | ||
scorer=scorer, | ||
train=train, | ||
test=test, | ||
verbose=0, | ||
parameters=None, | ||
fit_params=sample_weight_dict) | ||
for train, test in cv_iter] | ||
CV_score = delayed(np.array)(scores)[:, 0] | ||
return delayed(np.nanmean)(CV_score) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add installation of dask into ci/.travis_install.sh and .appveyor.yml for unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Though, just to reiterate, I'm not trying to get tests to work here at all. This is only up here for conversation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks. But it seems that it passed almost all the unit tests. Great!