-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more doctests #3002
Add more doctests #3002
Conversation
# Conflicts: # evalml/data_checks/invalid_targets_data_check.py # evalml/data_checks/target_distribution_data_check.py
Codecov Report
@@ Coverage Diff @@
## main #3002 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 312 312
Lines 29853 29856 +3
=======================================
+ Hits 29762 29765 +3
Misses 91 91
Continue to review full report at Codecov.
|
@@ -25,12 +25,12 @@ def validate(self, X, y): | |||
Returns: | |||
dict (DataCheckError): List with DataCheckErrors if unequal intervals are found in the datetime column. | |||
|
|||
Example: | |||
Examples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could just be personal opinion, but I think if we add this many examples (which is great!), it gets a little harder to read. Maybe what we need here is an extra line above each example to better understand what each example is trying to convey? From a quick glance, it's a bit hard to gauge the purpose of each example 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, so maybe a comment above a doctest briefly explaining what it does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ParthivNaresh Thank you for this! This looks good to me. I agree with @angela97lin that since some docstrings test different things, it might be helpful to add some in-line comments above each one to document what's happening.
Other than that, I think it would be helpful if some of the assert float1 == float2
checks used rounding so that these don't start to flake if we're off by negligible amount in a future release.
metadata={"columns": too_sparse_cols}, | ||
).to_dict() | ||
) | ||
if too_sparse_cols: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come we need to add this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If none of the columns are returned from the sparsity_score
, results
still appends warning
and actions
. The warning message ends up being "Input columns ({}) for multiclass problem type are too sparse."
Examples: | ||
>>> from evalml.pipelines.components.estimators.regressors.decision_tree_regressor import DecisionTreeRegressor | ||
>>> assert generate_component_code(DecisionTreeRegressor()) == "from evalml.pipelines.components.estimators.regressors.decision_tree_regressor import DecisionTreeRegressor\n\ndecisionTreeRegressor = DecisionTreeRegressor(**{'criterion': 'mse', 'max_features': 'auto', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0})" | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs don't build without the r
prefix? Is it cause of these three dots here? The example renders fine in the doc so I'm just wondering what happened but no need to change anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lint check doesn't pass, I think due to the \n
characters in the output
@@ -84,7 +110,7 @@ def validate(self, X, y=None): | |||
percent_null_rows >= self.pct_null_row_threshold | |||
] | |||
if len(highly_null_rows) > 0: | |||
warning_msg = f"{len(highly_null_rows)} out of {len(X)} rows are more than {self.pct_null_row_threshold*100}% null" | |||
warning_msg = f"{len(highly_null_rows)} out of {len(X)} rows are {self.pct_null_row_threshold*100}% or more null" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in the sparsity data check: Just wondering we were modifying this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the threshold check is percent_null_rows >= self.pct_null_row_threshold
, so the phrasing or more
seemed more appropriate than more than
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a beast of a change! Thanks for doing this, just left a few tiny tiny semantic nitpicks
Completes #2936