Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TextFeaturizer dependencies optional (tensorflow) #976

Merged
merged 9 commits into from Jul 24, 2020

Conversation

dsherry
Copy link
Collaborator

@dsherry dsherry commented Jul 24, 2020

Fix #957

Adding nlp_primitives as a core requirement has meant that we now have tensorflow as a core requirement. This is bad because that package is >500MB. Since the text featurizer is still under development, let's kick featuretools and nlp_primitives out of the requirements for now. Will keep in test reqs so that our unit test coverage is maintained.

After this PR is merged, users who run pip install evalml won't be installing tensorflow. Users who want to use the text featurizer can then run pip install nlp_primitives.

Moving forward we do have a long-term plan to have nlp_primitives be an optional requirement if not a core requirement. I'll be filing that separately.

@rwedge FYI since we were just discussing this.

Changes

  • Move featuretools and nlp_primitives out of the requirements which are packaged with evalml releases. This is fine since this is a feature under development, and isn't currently used in automl, the pipelines or elsewhere in our codebase.
  • Add import_or_raise to TextFeaturizer.__init__ to warn any prospective users of missing deps
  • Add importorskip to the text featurizer tests. Should be unnecessary since we have featuretools and nlp_primitives in test-requirements.txt now, meaning we'll continue to run the unit test coverage in the CI jobs.
  • Rename text_featurization.py to text_featurizer.py to match class name

@@ -3,7 +3,6 @@ pandas>=0.25.0
scipy>=1.2.1,<1.5.0 # remove upper limit once tensorflow supports 1.5.x
scikit-learn>=0.21.3,!=0.22,<0.23.0
scikit-optimize
featuretools[nlp_primitives]
Copy link
Collaborator Author

@dsherry dsherry Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to test-requirements.txt

@@ -1,6 +1,5 @@
catboost==0.23.2
cloudpickle==1.5.0
distributed==2.21.0
Copy link
Collaborator Author

@dsherry dsherry Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added by tensorflow from nlp_primitives. This job doesn't install test-requirements.txt so this is no longer needed.

@@ -5,3 +5,5 @@ nbval==0.9.3
graphviz>=0.13
codecov==2.1.0
category_encoders>=2.0.0
featuretools[nlp_primitives]
nlp_primitives
Copy link
Collaborator Author

@dsherry dsherry Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, I needed to have nlp_primitives listed here in addition to the featuretools line above it. Otherwise, one unit test test_featurization_only_text fails on the python 3.8 core-deps job... mysterious 🔮

@@ -4,6 +4,9 @@

from evalml.pipelines.components import TextFeaturizer

pytest.importorskip('featuretools', reason='Skipping test because featuretools not installed')
pytest.importorskip('nlp_primitives', reason='Skipping test because nlp_primitives not installed')
Copy link
Collaborator Author

@dsherry dsherry Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant since we always install test-dependencies.txt when we run the unit tests, but including since this is our treatment for xgboost, catboost, etc

@codecov
Copy link

codecov bot commented Jul 24, 2020

Codecov Report

Merging #976 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #976   +/-   ##
=======================================
  Coverage   99.87%   99.87%           
=======================================
  Files         174      174           
  Lines        8987     8990    +3     
=======================================
+ Hits         8976     8979    +3     
  Misses         11       11           
Impacted Files Coverage Δ
.../components/transformers/preprocessing/__init__.py 100.00% <100.00%> (ø)
...ents/transformers/preprocessing/text_featurizer.py 100.00% <100.00%> (ø)
...alml/tests/component_tests/test_text_featurizer.py 100.00% <100.00%> (ø)
evalml/tests/utils_tests/test_cli_utils.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e591395...f3d6f24. Read the comment docs.

@dsherry dsherry changed the title Make TextFeaturizer additional dependencies optional Make TextFeaturizer dependencies optional (tensorflow) Jul 24, 2020
@dsherry
Copy link
Collaborator Author

dsherry commented Jul 24, 2020

Ok this is ready for review! Sorry, pushed before a meeting and forgot to set to draft while I ironed out the tests. Will re-request review

Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

LGTM :jeremy-approve:

@dsherry
Copy link
Collaborator Author

dsherry commented Jul 24, 2020

Thanks @jeremyliweishih . We need to get you a custom emoji for that 😂

@dsherry dsherry merged commit eedd427 into main Jul 24, 2020
2 checks passed
@dsherry dsherry deleted the ds_957_make_text_featurizer_deps_optional branch Jul 24, 2020
@angela97lin angela97lin mentioned this pull request Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants