Skip to content

Rename random state to random seed #1798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 11, 2021

Conversation

freddyaboulton
Copy link
Contributor

Pull Request Description

Fixes #1737 by adding random_seed as an argument and keeping random_state but setting it to None. A warning will show if random_state is used.


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@freddyaboulton freddyaboulton added the refactor Work being done to refactor code. label Feb 8, 2021
@freddyaboulton freddyaboulton self-assigned this Feb 8, 2021
@freddyaboulton freddyaboulton changed the title 1737 rename random state to random seed Rename random state to random seed Feb 8, 2021
@codecov
Copy link

codecov bot commented Feb 8, 2021

Codecov Report

Merging #1798 (5778ed2) into main (cb09781) will increase coverage by 0.2%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff            @@
##            main    #1798     +/-   ##
========================================
+ Coverage   99.9%   100.0%   +0.2%     
========================================
  Files        252      252             
  Lines      20117    20250    +133     
========================================
+ Hits       20086    20242    +156     
+ Misses        31        8     -23     
Impacted Files Coverage Δ
...components/ensemble/stacked_ensemble_classifier.py 100.0% <ø> (ø)
.../components/ensemble/stacked_ensemble_regressor.py 100.0% <ø> (ø)
...components/transformers/encoders/target_encoder.py 100.0% <ø> (ø)
...elines/components/transformers/imputers/imputer.py 100.0% <ø> (ø)
...mponent_tests/test_delayed_features_transformer.py 100.0% <ø> (ø)
evalml/utils/__init__.py 100.0% <ø> (ø)
evalml/automl/automl_algorithm/automl_algorithm.py 100.0% <100.0%> (ø)
...lml/automl/automl_algorithm/iterative_algorithm.py 100.0% <100.0%> (ø)
evalml/automl/automl_search.py 99.7% <100.0%> (ø)
evalml/automl/utils.py 100.0% <100.0%> (ø)
... and 90 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb09781...5778ed2. Read the comment docs.

@freddyaboulton freddyaboulton force-pushed the 1737-rename-random-state-to-random-seed branch from 0fad373 to 69756a3 Compare February 9, 2021 16:30
@@ -19,7 +19,7 @@ def __init__(self,
max_iterations=None,
tuner_class=None,
text_columns=None,
random_state=0,
random_seed=0,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm replacing random_state with random_seed here, rather than adding it as an argument, because this is not a user-facing class.

if n_components:
pca = SkPCA(n_components=n_components, **kwargs)
pca = SkPCA(n_components=n_components, random_state=random_seed, **kwargs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we were not passing random_state to PCA before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooo, good catch 👀

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job fixing everything and raising the warnings! LGTM

@@ -1800,6 +1800,7 @@ def total_pipelines(automl, num_batches, batch_size):
automl.search()
assert automl._pipelines_per_batch == 2
assert automl._automl_algorithm.pipelines_per_batch == 2
assert automl._automl_algorithm.pipelines_per_batch == 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeated line?

@@ -511,3 +511,23 @@ def save_plot(fig, filepath=None, format='png', interactive=False, return_filepa

if return_filepath:
return filepath


def deprecate_arg(old_arg, new_arg, old_value, new_value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test for this helper since it's public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -8,6 +8,7 @@ Release Notes
* Sped up permutation importance for some pipelines :pr:`1762`
* Fixes
* Changes
* Added ``random_seed`` as an argument to our automl/pipeline/component api. Using ``random_state`` will raise a warning :pr:`1798`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably can't add this to breaking_changes, but is there another category similar that we can put it in to warn users? Just wondering since it'll soon be a breaking change once we officially deprecate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for now? Here we're telling users we'll raise a warning and the warning itself will tell them about the impending deprecation.

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments but nice and thorough. Thank you, LGTM!! 🚢

@@ -50,7 +50,7 @@ def __init__(self,
self._best_pipeline_info = {}
self.ensembling = ensembling and len(self.allowed_pipelines) > 1
self._pipeline_params = pipeline_params or {}
self._random_state = random_state
self._random_state = random_seed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to update this to self._random_seed too?

@@ -65,7 +65,7 @@ def next_batch(self):

next_batch = []
if self._batch_number == 0:
next_batch = [pipeline_class(parameters=self._transform_parameters(pipeline_class, {}), random_state=self.random_state)
next_batch = [pipeline_class(parameters=self._transform_parameters(pipeline_class, {}), random_seed=self.random_seed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused, what is self.random_seed set to here? 👀 I ask because I made the previous comment and noticed that we have self._random_state right now... Is this supposed to be self.random_seed, and then we need to update self._random_seed --> self.random_seed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.random_seed == self._random_state. We get self.random_seed from the parent class. self._random_state is used to sample params in the first batch if the user provides pipelines with frozen hyperparams. I'll get rid of self._random_state since we don't need it anymore!

@@ -36,8 +36,8 @@ class CatBoostClassifier(Estimator):
SEED_MAX = SEED_BOUNDS.max_bound
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More out of curiosity but does this mean we dont need SEED_BOUNDS anymore? Or maybe we need to check that the seed provided is within those bounds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of them actually!

if n_components:
pca = SkPCA(n_components=n_components, **kwargs)
pca = SkPCA(n_components=n_components, random_state=random_seed, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooo, good catch 👀

@@ -156,7 +158,9 @@ def __init__(self,
additional_objectives (list): Custom set of objectives to score on.
Will override default objectives for problem type if not empty.

random_state (int): Seed for the random number generator. Defaults to 0.
random_state (None, int): Deprecated - use random_seed instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nit-picking (for no reason loool) but since the other places that were touched are now random_state (int), let's match that hehe

@@ -511,3 +511,23 @@ def save_plot(fig, filepath=None, format='png', interactive=False, return_filepa

if return_filepath:
return filepath


def deprecate_arg(old_arg, new_arg, old_value, new_value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this will be useful for any future dep patterns we follow 🥳

@freddyaboulton freddyaboulton force-pushed the 1737-rename-random-state-to-random-seed branch from a881926 to 7c37f3e Compare February 11, 2021 20:11
@freddyaboulton freddyaboulton merged commit 5cba8a9 into main Feb 11, 2021
@freddyaboulton freddyaboulton deleted the 1737-rename-random-state-to-random-seed branch February 11, 2021 21:26
@chukarsten chukarsten mentioned this pull request Feb 23, 2021
@dsherry dsherry mentioned this pull request Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Work being done to refactor code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rename random_state to random_seed in EvalML
3 participants