Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade WW to 0.4.1 #2379

Merged
merged 48 commits into from
Jun 22, 2021
Merged

Upgrade WW to 0.4.1 #2379

merged 48 commits into from
Jun 22, 2021

Conversation

jeremyliweishih
Copy link
Contributor

@jeremyliweishih jeremyliweishih commented Jun 14, 2021

Fixes #2327.

This PR mostly consists of fixing unit tests and functionality where we deal with WW logical types since they are now instantiated instead of returning the class.

Perf tests look good with no changes in model performance and somelittle change in init time and fit time but all within a couple percentage points. Ran with 50 iterations and 3 trials.

ww_upgrade_perf_results.zip

@codecov
Copy link

codecov bot commented Jun 15, 2021

Codecov Report

Merging #2379 (ea5eb8e) into main (461076b) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2379     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        283     283             
  Lines      25240   25244      +4     
=======================================
+ Hits       25140   25144      +4     
  Misses       100     100             
Impacted Files Coverage Δ
...components/transformers/imputers/simple_imputer.py 100.0% <ø> (ø)
...ta_checks_tests/test_invalid_targets_data_check.py 100.0% <ø> (ø)
evalml/data_checks/invalid_targets_data_check.py 100.0% <100.0%> (ø)
evalml/data_checks/target_leakage_data_check.py 100.0% <100.0%> (ø)
evalml/model_understanding/graphs.py 100.0% <100.0%> (ø)
...rmers/preprocessing/delayed_feature_transformer.py 100.0% <100.0%> (ø)
evalml/pipelines/utils.py 99.1% <100.0%> (ø)
evalml/tests/automl_tests/test_automl.py 99.8% <100.0%> (ø)
.../tests/component_tests/test_datetime_featurizer.py 100.0% <100.0%> (ø)
...mponent_tests/test_delayed_features_transformer.py 100.0% <100.0%> (ø)
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 461076b...ea5eb8e. Read the comment docs.

core-requirements.txt Outdated Show resolved Hide resolved
@@ -299,7 +299,7 @@ def test_datetime_featurizer_woodwork_custom_overrides_returned_by_components(
try:
X = X_df.copy()
X.ww.init(logical_types={0: logical_type})
except (ww.exceptions.TypeConversionError, TypeError):
except (ww.exceptions.TypeConversionError, TypeError, ValueError):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be fixed by alteryx/woodwork#991

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for filing!

@@ -80,14 +80,14 @@ def validate(self, X, y):
return results

y = infer_feature_types(y)
is_supported_type = y.ww.logical_type in numeric_and_boolean_ww + [
is_supported_type = type(y.ww.logical_type) in numeric_and_boolean_ww + [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be isinstance(y.ww.logical_type, tuple(numeric_and_boolean_ww)) if you'd like here and in the other places where a similar check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamargrey I think the other option is to use y.ww.logical_type.type_string and then update our lists, equality/member checks to use the type_strings. What are your thoughts? I think both do the same thing but using type_string may be backwards compatible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton I like that idea! It does seem like it would be backwards compatible, which is nice.

It's definitely different from how we've tended to compare logical types inside of Woodwork, so my one worry is that it's using the type_string for something that it wasn't intended for. I'm not sure that that's actually a problem, though.

else:
is_type = X.ww.logical_types[feature] == ltype
is_type = type(X.ww.logical_types[feature]) == ltype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are also places where we can use isinstance, which I think is slightly better from a timing perspective than == checks.

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih Thanks for this! I think it looks great. I have a question on the diff for _retain_woodwork_types... and I think it may be better to follow @tamargrey 's suggestion to use is_instance or my suggestion to use type_string rather than wrapping everything in type. I'm ok with either of those options but @tamargrey 's opinion should take precedence over mine hehe.

@@ -11,7 +11,7 @@ psutil>=5.6.3
requirements-parser>=0.2.0
shap>=0.36.0
texttable>=1.6.2
woodwork==0.3.1
woodwork>=0.4.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can close PR #2373 right?

@@ -299,7 +299,7 @@ def test_datetime_featurizer_woodwork_custom_overrides_returned_by_components(
try:
X = X_df.copy()
X.ww.init(logical_types={0: logical_type})
except (ww.exceptions.TypeConversionError, TypeError):
except (ww.exceptions.TypeConversionError, TypeError, ValueError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for filing!

@@ -101,6 +103,9 @@ def _retain_custom_types_and_initalize_woodwork(
return ww.init_series(new_dataframe, old_logical_types)
if ltypes_to_ignore is None:
ltypes_to_ignore = []
old_logical_types = {
k: v if inspect.isclass(v) else type(v) for k, v in old_logical_types.items()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do need isclass here? The types should always be instances from now on right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC some tests were failing without this change but I'll double check this!

@@ -80,14 +80,14 @@ def validate(self, X, y):
return results

y = infer_feature_types(y)
is_supported_type = y.ww.logical_type in numeric_and_boolean_ww + [
is_supported_type = type(y.ww.logical_type) in numeric_and_boolean_ww + [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamargrey I think the other option is to use y.ww.logical_type.type_string and then update our lists, equality/member checks to use the type_strings. What are your thoughts? I think both do the same thing but using type_string may be backwards compatible?

@freddyaboulton
Copy link
Contributor

FYI I'll file a follow-up issue to use the new functionality added in this release:

Add option to return TableSchema instead of DataFrame from table accessor select method (#916)
Add dropping and renaming columns inplace (#920)

@jeremyliweishih
Copy link
Contributor Author

@freddyaboulton can you also add using concat_columns to that issue?

@dsherry
Copy link
Contributor

dsherry commented Jun 22, 2021

Taking a slight calculated risk and merging this before the docs build completes. We have a lot queued right now. I hope #2404 will help us here!

The build_conda_pkg failure is expected, due to the version mismatch with the latest branch in the conda feedstock repo.

@dsherry dsherry merged commit 7487189 into main Jun 22, 2021
@chukarsten chukarsten mentioned this pull request Jun 22, 2021
@freddyaboulton freddyaboulton deleted the js_2327_ww branch May 13, 2022 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade to Woodwork v0.4.1
5 participants