Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more doctests #3002

Merged
merged 26 commits into from
Nov 5, 2021
Merged

Add more doctests #3002

merged 26 commits into from
Nov 5, 2021

Conversation

ParthivNaresh
Copy link
Contributor

@ParthivNaresh ParthivNaresh commented Nov 2, 2021

Completes #2936

@codecov
Copy link

codecov bot commented Nov 2, 2021

Codecov Report

Merging #3002 (142b92d) into main (8605ae4) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3002     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        312     312             
  Lines      29853   29856      +3     
=======================================
+ Hits       29762   29765      +3     
  Misses        91      91             
Impacted Files Coverage Δ
evalml/data_checks/class_imbalance_data_check.py 100.0% <ø> (ø)
evalml/data_checks/datetime_format_data_check.py 100.0% <ø> (ø)
evalml/data_checks/datetime_nan_data_check.py 100.0% <ø> (ø)
evalml/data_checks/id_columns_data_check.py 100.0% <ø> (ø)
evalml/data_checks/invalid_targets_data_check.py 100.0% <ø> (ø)
evalml/data_checks/multicollinearity_data_check.py 100.0% <ø> (ø)
...lml/data_checks/natural_language_nan_data_check.py 100.0% <ø> (ø)
evalml/data_checks/no_variance_data_check.py 100.0% <ø> (ø)
evalml/data_checks/outliers_data_check.py 100.0% <ø> (ø)
...alml/data_checks/target_distribution_data_check.py 100.0% <ø> (ø)
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8605ae4...142b92d. Read the comment docs.

@@ -25,12 +25,12 @@ def validate(self, X, y):
Returns:
dict (DataCheckError): List with DataCheckErrors if unequal intervals are found in the datetime column.

Example:
Examples:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just be personal opinion, but I think if we add this many examples (which is great!), it gets a little harder to read. Maybe what we need here is an extra line above each example to better understand what each example is trying to convey? From a quick glance, it's a bit hard to gauge the purpose of each example 😬

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, so maybe a comment above a doctest briefly explaining what it does?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly!

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthivNaresh Thank you for this! This looks good to me. I agree with @angela97lin that since some docstrings test different things, it might be helpful to add some in-line comments above each one to document what's happening.

Other than that, I think it would be helpful if some of the assert float1 == float2 checks used rounding so that these don't start to flake if we're off by negligible amount in a future release.

metadata={"columns": too_sparse_cols},
).to_dict()
)
if too_sparse_cols:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we need to add this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If none of the columns are returned from the sparsity_score, results still appends warning and actions. The warning message ends up being "Input columns ({}) for multiclass problem type are too sparse."

evalml/data_checks/outliers_data_check.py Outdated Show resolved Hide resolved
evalml/objectives/standard_metrics.py Outdated Show resolved Hide resolved
evalml/objectives/utils.py Show resolved Hide resolved
Examples:
>>> from evalml.pipelines.components.estimators.regressors.decision_tree_regressor import DecisionTreeRegressor
>>> assert generate_component_code(DecisionTreeRegressor()) == "from evalml.pipelines.components.estimators.regressors.decision_tree_regressor import DecisionTreeRegressor\n\ndecisionTreeRegressor = DecisionTreeRegressor(**{'criterion': 'mse', 'max_features': 'auto', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0})"
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs don't build without the r prefix? Is it cause of these three dots here? The example renders fine in the doc so I'm just wondering what happened but no need to change anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lint check doesn't pass, I think due to the \n characters in the output

evalml/preprocessing/utils.py Outdated Show resolved Hide resolved
@@ -84,7 +110,7 @@ def validate(self, X, y=None):
percent_null_rows >= self.pct_null_row_threshold
]
if len(highly_null_rows) > 0:
warning_msg = f"{len(highly_null_rows)} out of {len(X)} rows are more than {self.pct_null_row_threshold*100}% null"
warning_msg = f"{len(highly_null_rows)} out of {len(X)} rows are {self.pct_null_row_threshold*100}% or more null"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in the sparsity data check: Just wondering we were modifying this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the threshold check is percent_null_rows >= self.pct_null_row_threshold, so the phrasing or more seemed more appropriate than more than

Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a beast of a change! Thanks for doing this, just left a few tiny tiny semantic nitpicks

docs/source/release_notes.rst Outdated Show resolved Hide resolved
evalml/data_checks/multicollinearity_data_check.py Outdated Show resolved Hide resolved
evalml/data_checks/natural_language_nan_data_check.py Outdated Show resolved Hide resolved
evalml/data_checks/target_distribution_data_check.py Outdated Show resolved Hide resolved
@ParthivNaresh ParthivNaresh merged commit ca65539 into main Nov 5, 2021
@chukarsten chukarsten mentioned this pull request Nov 9, 2021
@freddyaboulton freddyaboulton deleted the Add-More-Doctests branch May 13, 2022 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants