Skip to content

Upgrade to WW 0.16.2#3482

Merged
freddyaboulton merged 8 commits into
mainfrom
js_upgrade_ww_0.16.1
Apr 27, 2022
Merged

Upgrade to WW 0.16.2#3482
freddyaboulton merged 8 commits into
mainfrom
js_upgrade_ww_0.16.1

Conversation

@freddyaboulton
Copy link
Copy Markdown
Contributor

Pull Request Description

Fix all the test failures caused by the latest woodwork version.


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 26, 2022

Codecov Report

Merging #3482 (8838328) into main (881c1e0) will decrease coverage by 0.8%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3482     +/-   ##
=======================================
- Coverage   99.7%   98.9%   -0.7%     
=======================================
  Files        336     336             
  Lines      33368   33364      -4     
=======================================
- Hits       33242   32985    -257     
- Misses       126     379    +253     
Impacted Files Coverage Δ
evalml/data_checks/multicollinearity_data_check.py 100.0% <ø> (ø)
evalml/data_checks/target_leakage_data_check.py 100.0% <ø> (ø)
evalml/tests/data_checks_tests/test_data_checks.py 100.0% <100.0%> (ø)
..._checks_tests/test_multicollinearity_data_check.py 100.0% <100.0%> (ø)
...ata_checks_tests/test_target_leakage_data_check.py 100.0% <100.0%> (ø)
evalml/automl/pipeline_search_plots.py 17.9% <0.0%> (-82.1%) ⬇️
...l/tests/automl_tests/test_pipeline_search_plots.py 23.3% <0.0%> (-76.7%) ⬇️
...ests/automl_tests/test_automl_search_regression.py 74.7% <0.0%> (-20.7%) ⬇️
.../automl_tests/test_automl_search_classification.py 84.5% <0.0%> (-12.1%) ⬇️
evalml/tests/automl_tests/test_automl_utils.py 91.1% <0.0%> (-8.9%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 881c1e0...8838328. Read the comment docs.

@freddyaboulton freddyaboulton changed the title Upgrade to WW 0.16.1 Upgrade to WW 0.16.2 Apr 26, 2022
Comment thread .github/meta.yaml
- requirements-parser >=0.2.0
- shap >=0.40.0
- texttable >=1.6.2
- woodwork >=0.15.0
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.16.2 not available for download yet which is why the build conda pkg is failing. Should be fixed later though


collinear_cols = [c for c in X.columns if "not" not in c]
collinear_cols = [
c for c in X.columns if c not in {"not_collinear", "col_nullable"}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nullable types are now handled differently

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonderful

y = y.astype(bool)
X["d"] = y % 2
X["e"] = [0] * 30
X.ww.init(logical_types={"d": "Boolean"})
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be related to this bug: alteryx/woodwork#1404

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch - was having a lot of trouble with this one..

if input_type == "ww":
y = ww.init_series(y)
y_no_variance = ww.init_series(y_no_variance)
null_leakage = [
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that the actual mutual information calculation changed this release so the scores are now 0.6 as opposed to >0.95 and there's no way to change that from the default data checks.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put up an issue to track adding this coverage back? I'm pretty meh on this since we already have coverage on the unit tests for each check. It's also very painful to find or manipulate the existing test data to fail all these checks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're ok since we already have coverage for all the checks!

X["d"] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
X["e"] = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o"]
X.ww.init(logical_types={"e": "categorical"})
X["d"] = np.zeros(len(y))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted column e because of this ticket: alteryx/woodwork#1405

Copy link
Copy Markdown
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - great catches on the problematic test failures 👏

Columns in X that are highly correlated with each other will be identified using mutual information.

>>> col = pd.Series([1, 0, 2, 3, 4])
>>> col = pd.Series([1, 0, 2, 3, 4] * 15)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is way smarter than what I was doing 🥲

if input_type == "ww":
y = ww.init_series(y)
y_no_variance = ww.init_series(y_no_variance)
null_leakage = [
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put up an issue to track adding this coverage back? I'm pretty meh on this since we already have coverage on the unit tests for each check. It's also very painful to find or manipulate the existing test data to fail all these checks.

y = y.astype(bool)
X["d"] = y % 2
X["e"] = [0] * 30
X.ww.init(logical_types={"d": "Boolean"})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch - was having a lot of trouble with this one..


y = pd.Series([1, 0, 1, 1])
y = pd.Series(
[1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could replace this with the [1, 0, 1] * X format if you wanted!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@freddyaboulton freddyaboulton enabled auto-merge (squash) April 27, 2022 14:19
@freddyaboulton freddyaboulton merged commit b0e0946 into main Apr 27, 2022
@freddyaboulton freddyaboulton deleted the js_upgrade_ww_0.16.1 branch April 27, 2022 15:17
@chukarsten chukarsten mentioned this pull request Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants