Add more doctests #3002

ParthivNaresh · 2021-11-02T21:21:59Z

Completes #2936

# Conflicts: # evalml/data_checks/invalid_targets_data_check.py # evalml/data_checks/target_distribution_data_check.py

codecov · 2021-11-02T21:26:48Z

Codecov Report

Merging #3002 (142b92d) into main (8605ae4) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3002     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        312     312             
  Lines      29853   29856      +3     
=======================================
+ Hits       29762   29765      +3     
  Misses        91      91

Impacted Files	Coverage Δ
evalml/data_checks/class_imbalance_data_check.py	`100.0% <ø> (ø)`
evalml/data_checks/datetime_format_data_check.py	`100.0% <ø> (ø)`
evalml/data_checks/datetime_nan_data_check.py	`100.0% <ø> (ø)`
evalml/data_checks/id_columns_data_check.py	`100.0% <ø> (ø)`
evalml/data_checks/invalid_targets_data_check.py	`100.0% <ø> (ø)`
evalml/data_checks/multicollinearity_data_check.py	`100.0% <ø> (ø)`
...lml/data_checks/natural_language_nan_data_check.py	`100.0% <ø> (ø)`
evalml/data_checks/no_variance_data_check.py	`100.0% <ø> (ø)`
evalml/data_checks/outliers_data_check.py	`100.0% <ø> (ø)`
...alml/data_checks/target_distribution_data_check.py	`100.0% <ø> (ø)`
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8605ae4...142b92d. Read the comment docs.

angela97lin · 2021-11-04T15:32:32Z

evalml/data_checks/datetime_format_data_check.py

@@ -25,12 +25,12 @@ def validate(self, X, y):
        Returns:
            dict (DataCheckError): List with DataCheckErrors if unequal intervals are found in the datetime column.

-        Example:
+        Examples:


Could just be personal opinion, but I think if we add this many examples (which is great!), it gets a little harder to read. Maybe what we need here is an extra line above each example to better understand what each example is trying to convey? From a quick glance, it's a bit hard to gauge the purpose of each example 😬

Sure thing, so maybe a comment above a doctest briefly explaining what it does?

freddyaboulton

@ParthivNaresh Thank you for this! This looks good to me. I agree with @angela97lin that since some docstrings test different things, it might be helpful to add some in-line comments above each one to document what's happening.

Other than that, I think it would be helpful if some of the assert float1 == float2 checks used rounding so that these don't start to flake if we're off by negligible amount in a future release.

freddyaboulton · 2021-11-04T15:46:04Z

evalml/data_checks/sparsity_data_check.py

-                metadata={"columns": too_sparse_cols},
-            ).to_dict()
-        )
+        if too_sparse_cols:


How come we need to add this?

If none of the columns are returned from the sparsity_score, results still appends warning and actions. The warning message ends up being "Input columns ({}) for multiclass problem type are too sparse."

evalml/data_checks/outliers_data_check.py

evalml/objectives/standard_metrics.py

evalml/objectives/utils.py

freddyaboulton · 2021-11-04T16:16:37Z

evalml/pipelines/components/utils.py

+    Examples:
+        >>> from evalml.pipelines.components.estimators.regressors.decision_tree_regressor import DecisionTreeRegressor
+        >>> assert generate_component_code(DecisionTreeRegressor()) == "from evalml.pipelines.components.estimators.regressors.decision_tree_regressor import DecisionTreeRegressor\n\ndecisionTreeRegressor = DecisionTreeRegressor(**{'criterion': 'mse', 'max_features': 'auto', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0})"
+        ...


Docs don't build without the r prefix? Is it cause of these three dots here? The example renders fine in the doc so I'm just wondering what happened but no need to change anything.

The lint check doesn't pass, I think due to the \n characters in the output

evalml/preprocessing/utils.py

freddyaboulton · 2021-11-04T16:21:10Z

evalml/data_checks/highly_null_data_check.py

@@ -84,7 +110,7 @@ def validate(self, X, y=None):
            percent_null_rows >= self.pct_null_row_threshold
        ]
        if len(highly_null_rows) > 0:
-            warning_msg = f"{len(highly_null_rows)} out of {len(X)} rows are more than {self.pct_null_row_threshold*100}% null"
+            warning_msg = f"{len(highly_null_rows)} out of {len(X)} rows are {self.pct_null_row_threshold*100}% or more null"


Same comment as in the sparsity data check: Just wondering we were modifying this.

Because the threshold check is percent_null_rows >= self.pct_null_row_threshold, so the phrasing or more seemed more appropriate than more than

eccabay

What a beast of a change! Thanks for doing this, just left a few tiny tiny semantic nitpicks

docs/source/release_notes.rst

evalml/data_checks/multicollinearity_data_check.py

evalml/data_checks/natural_language_nan_data_check.py

evalml/data_checks/target_distribution_data_check.py

…into Add-More-Doctests

ParthivNaresh added 6 commits November 1, 2021 11:35

data checks docstrings

5d626dd

Merge branch 'main' into Add-More-Doctests

8ad88a6

rest of the data checks

23d75dc

Final data checks

861665a

more doctests

087d7bf

Merge branch 'main' into Add-More-Doctests

e2839a4

# Conflicts: # evalml/data_checks/invalid_targets_data_check.py # evalml/data_checks/target_distribution_data_check.py

ParthivNaresh added 16 commits November 3, 2021 02:22

standard metrics tests

c4dabf4

no message

f21afdc

fix "Inline strong start-string without end-string"

e60ee87

update tests

2e487c9

fixes

8e7341b

remove test

4b2cd41

invalidtargetdatacheck fix

af6209f

Merge branch 'main' into Add-More-Doctests

761cc2b

metrics and utils

c88787b

invalidtarget and highlynull

15c545e

avoid recursion

7d15de4

no message

65fe16b

no message

3df062e

fix highlynull

c7d7b4a

standard metrics update

bee1271

Merge branch 'main' into Add-More-Doctests

4b8fecf

ParthivNaresh marked this pull request as ready for review November 4, 2021 15:05

auto-assign bot assigned ParthivNaresh Nov 4, 2021

ParthivNaresh requested review from angela97lin, jeremyliweishih, freddyaboulton, bchen1116 and christopherbunn November 4, 2021 15:05

ParthivNaresh requested review from chukarsten and eccabay November 4, 2021 15:06

angela97lin reviewed Nov 4, 2021

View reviewed changes

freddyaboulton approved these changes Nov 4, 2021

View reviewed changes

eccabay approved these changes Nov 4, 2021

View reviewed changes

ParthivNaresh added 4 commits November 4, 2021 20:21

Merge branch 'main' into Add-More-Doctests

51ef753

More updates

a09bfa8

Merge branch 'Add-More-Doctests' of https://github.com/alteryx/evalml …

dc688e8

…into Add-More-Doctests

Merge branch 'main' into Add-More-Doctests

142b92d

ParthivNaresh merged commit ca65539 into main Nov 5, 2021

chukarsten mentioned this pull request Nov 9, 2021

Release v0.37.0 #3029

Merged

angela97lin mentioned this pull request Nov 11, 2021

Add brief comments to doctests #3035

Closed

freddyaboulton deleted the Add-More-Doctests branch May 13, 2022 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more doctests #3002

Add more doctests #3002

ParthivNaresh commented Nov 2, 2021 •

edited

Loading

codecov bot commented Nov 2, 2021 •

edited

Loading

angela97lin Nov 4, 2021

ParthivNaresh Nov 4, 2021 •

edited

Loading

angela97lin Nov 5, 2021

freddyaboulton left a comment

freddyaboulton Nov 4, 2021

ParthivNaresh Nov 5, 2021 •

edited

Loading

freddyaboulton Nov 4, 2021

ParthivNaresh Nov 5, 2021

freddyaboulton Nov 4, 2021

ParthivNaresh Nov 5, 2021

eccabay left a comment

Add more doctests #3002

Add more doctests #3002

Conversation

ParthivNaresh commented Nov 2, 2021 • edited Loading

codecov bot commented Nov 2, 2021 • edited Loading

Codecov Report

angela97lin Nov 4, 2021

Choose a reason for hiding this comment

ParthivNaresh Nov 4, 2021 • edited Loading

Choose a reason for hiding this comment

angela97lin Nov 5, 2021

Choose a reason for hiding this comment

freddyaboulton left a comment

Choose a reason for hiding this comment

freddyaboulton Nov 4, 2021

Choose a reason for hiding this comment

ParthivNaresh Nov 5, 2021 • edited Loading

Choose a reason for hiding this comment

freddyaboulton Nov 4, 2021

Choose a reason for hiding this comment

ParthivNaresh Nov 5, 2021

Choose a reason for hiding this comment

freddyaboulton Nov 4, 2021

Choose a reason for hiding this comment

ParthivNaresh Nov 5, 2021

Choose a reason for hiding this comment

eccabay left a comment

Choose a reason for hiding this comment

ParthivNaresh commented Nov 2, 2021 •

edited

Loading

codecov bot commented Nov 2, 2021 •

edited

Loading

ParthivNaresh Nov 4, 2021 •

edited

Loading

ParthivNaresh Nov 5, 2021 •

edited

Loading