Upgrade to WW 0.16.2#3482
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3482 +/- ##
=======================================
- Coverage 99.7% 98.9% -0.7%
=======================================
Files 336 336
Lines 33368 33364 -4
=======================================
- Hits 33242 32985 -257
- Misses 126 379 +253
Continue to review full report at Codecov.
|
| - requirements-parser >=0.2.0 | ||
| - shap >=0.40.0 | ||
| - texttable >=1.6.2 | ||
| - woodwork >=0.15.0 |
There was a problem hiding this comment.
0.16.2 not available for download yet which is why the build conda pkg is failing. Should be fixed later though
|
|
||
| collinear_cols = [c for c in X.columns if "not" not in c] | ||
| collinear_cols = [ | ||
| c for c in X.columns if c not in {"not_collinear", "col_nullable"} |
There was a problem hiding this comment.
Nullable types are now handled differently
| y = y.astype(bool) | ||
| X["d"] = y % 2 | ||
| X["e"] = [0] * 30 | ||
| X.ww.init(logical_types={"d": "Boolean"}) |
There was a problem hiding this comment.
This seems to be related to this bug: alteryx/woodwork#1404
There was a problem hiding this comment.
good catch - was having a lot of trouble with this one..
| if input_type == "ww": | ||
| y = ww.init_series(y) | ||
| y_no_variance = ww.init_series(y_no_variance) | ||
| null_leakage = [ |
There was a problem hiding this comment.
The problem here is that the actual mutual information calculation changed this release so the scores are now 0.6 as opposed to >0.95 and there's no way to change that from the default data checks.
There was a problem hiding this comment.
should we put up an issue to track adding this coverage back? I'm pretty meh on this since we already have coverage on the unit tests for each check. It's also very painful to find or manipulate the existing test data to fail all these checks.
There was a problem hiding this comment.
I think we're ok since we already have coverage for all the checks!
| X["d"] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] | ||
| X["e"] = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o"] | ||
| X.ww.init(logical_types={"e": "categorical"}) | ||
| X["d"] = np.zeros(len(y)) |
There was a problem hiding this comment.
Deleted column e because of this ticket: alteryx/woodwork#1405
jeremyliweishih
left a comment
There was a problem hiding this comment.
LGTM - great catches on the problematic test failures 👏
| Columns in X that are highly correlated with each other will be identified using mutual information. | ||
|
|
||
| >>> col = pd.Series([1, 0, 2, 3, 4]) | ||
| >>> col = pd.Series([1, 0, 2, 3, 4] * 15) |
There was a problem hiding this comment.
this is way smarter than what I was doing 🥲
| if input_type == "ww": | ||
| y = ww.init_series(y) | ||
| y_no_variance = ww.init_series(y_no_variance) | ||
| null_leakage = [ |
There was a problem hiding this comment.
should we put up an issue to track adding this coverage back? I'm pretty meh on this since we already have coverage on the unit tests for each check. It's also very painful to find or manipulate the existing test data to fail all these checks.
| y = y.astype(bool) | ||
| X["d"] = y % 2 | ||
| X["e"] = [0] * 30 | ||
| X.ww.init(logical_types={"d": "Boolean"}) |
There was a problem hiding this comment.
good catch - was having a lot of trouble with this one..
|
|
||
| y = pd.Series([1, 0, 1, 1]) | ||
| y = pd.Series( | ||
| [1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1] |
There was a problem hiding this comment.
nit: you could replace this with the [1, 0, 1] * X format if you wanted!
Pull Request Description
Fix all the test failures caused by the latest woodwork version.
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rstto include this pull request by adding :pr:123.