-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support numpy.random.RandomState objects (take 2) #556
Conversation
Codecov Report
@@ Coverage Diff @@
## master #556 +/- ##
===========================================
+ Coverage 85.30% 98.83% +13.53%
===========================================
Files 115 115
Lines 4205 4297 +92
===========================================
+ Hits 3587 4247 +660
+ Misses 618 50 -568
Continue to review full report at Codecov.
|
This was the error coming out of all the windows unit tests: |
@@ -26,6 +26,7 @@ class CatBoostClassifier(Estimator): | |||
supported_problem_types = [ProblemTypes.BINARY, ProblemTypes.MULTICLASS] | |||
|
|||
def __init__(self, n_estimators=1000, eta=0.03, max_depth=6, bootstrap_type=None, random_state=0): | |||
random_seed = get_random_seed(random_state, 0, SEED_BOUNDS.max_bound) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The catboost estimators need random_seed
to be between 0 and max int, otherwise they throw an error.
@@ -19,13 +19,14 @@ class XGBoostClassifier(Estimator): | |||
supported_problem_types = [ProblemTypes.BINARY, ProblemTypes.MULTICLASS] | |||
|
|||
def __init__(self, eta=0.1, max_depth=3, min_child_weight=1, n_estimators=100, random_state=0): | |||
random_seed = get_random_seed(random_state, SEED_BOUNDS.min_bound, SEED_BOUNDS.max_bound) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like catboost, the xgboost classifier needs random_seed
to be an int, not np.random.RandomState
. The weird thing is, passing random_state
worked fine on linux... so it appears its only on windows that xgboost requires this!
return random_state.randint(min_bound, max_bound) | ||
if random_state < min_bound or random_state >= max_bound: | ||
return random_state % min(abs(min_bound), abs(max_bound)) | ||
return random_state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to avoid defining a function like this because it introduces complexity. But having this provides a pattern which we can follow for when we add new pipelines which require integer seeds instead of np.random.RandomState
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good solution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I still wish it were simpler and therefore easier to understand, but I'm not sure there's a better alternative right now. Let's continue to think about it :)
d60a2e7
to
fbaa89f
Compare
a295283
to
33ef6ba
Compare
* Squash random_state work from 347_random_state * Lint * Lint * Changelog * Lint * Test update * Always pass random_state to components * Lint * Fix bug: set random state first. Remove usages of random_state as dict param item in test_pipelines.py * update test for clarity * Fix catboost * Update logreg test * Lint catboost * Update tuner impl to handle random_state * Test changes * Lint * Docs changes * Add unit test for get_random_state * Update test * Remove uncalled code after my changes * Fix tests after rebase * Add unit test coverage for RandomSearchTuner.is_search_space_exhausted * Add unit test coverage for max_time * Add test coverage of get_pipeline when invalid * Lint * Add unit test coverage of when fit/score throws in autobase * Remove duplicate * Lets try that again... got mysterious docs failure
… for different classes
33ef6ba
to
85fae99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Closes #347 .
A version of #530 which won't break the windows tests.
Will retarget on
master
once #554 and #555 are merged.Changes
random_state
as a keyword argumentrandom_state
as either anint
seed or anumpy.random.RandomState
objectget_random_state
helper to standardize tonp.random.RandomState
objectsnp.random.RandomState
objects to get random seeds, via aget_random_seed
methodSEED_BOUNDS
range constantBuilding off of #441 , opening to test my own changes on top of Angela's work (thank you @angela97lin!)
Note @kmax12 we had discussed sticking with seeds internally, but using
np.random.RandomState
turned out to be the simpler option.