Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename TextFeaturizer to NaturalLanguageFeaturizer #3030

Merged
merged 14 commits into from Nov 12, 2021

Conversation

jeremyliweishih
Copy link
Contributor

@jeremyliweishih jeremyliweishih commented Nov 9, 2021

Fixes #2914.

Perf tests look good! Almost identical results with a slight blip in init time for one dataset which should just be some noise.
nlp_primatives.html.zip

@codecov
Copy link

codecov bot commented Nov 9, 2021

Codecov Report

Merging #3030 (1c28da8) into main (7b2cb57) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3030     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        312     312             
  Lines      30183   30184      +1     
=======================================
+ Hits       30094   30095      +1     
  Misses        89      89             
Impacted Files Coverage Δ
...derstanding/prediction_explanations/_algorithms.py 99.3% <ø> (ø)
evalml/pipelines/components/__init__.py 100.0% <ø> (ø)
...alml/pipelines/components/transformers/__init__.py 100.0% <ø> (ø)
evalml/tests/component_tests/test_utils.py 95.7% <ø> (ø)
...s/prediction_explanations_tests/test_explainers.py 100.0% <ø> (ø)
.../prediction_explanations_tests/test_force_plots.py 100.0% <ø> (ø)
...del_understanding_tests/test_partial_dependence.py 99.4% <ø> (ø)
...valml/tests/pipeline_tests/test_component_graph.py 99.9% <ø> (ø)
.../components/transformers/preprocessing/__init__.py 100.0% <100.0%> (ø)
...rmers/preprocessing/natural_language_featurizer.py 100.0% <100.0%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b2cb57...1c28da8. Read the comment docs.

@jeremyliweishih jeremyliweishih marked this pull request as ready for review November 11, 2021 21:02
Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! It's encouraging to me how simple of a change this is.

I have two main things moving forward:

  • There's a few other places in the documentation that should be updated to add the new primitives: the explanatory part of the text tutorial here as well as the docstring where it lists the primitives (marked in another comment)
  • I'm very interested to see how the performance tests change with this update. I did a bunch of performance testing way back when originally adding this component to determine the best primitives to use, and I know there were some I explicitly removed to improve performance. I hope these will reflect real improvement!

Other than that, just left a couple small nits. Looks great though!

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih Thanks for this! I think this looks good. For completeness we may want to rename TextTransformer to NaturalLanguageTransformer but I don't think that's a hard requirement to close out this issue.

@@ -22,6 +22,7 @@ Release Notes
* Added an algorithm to ``DelayedFeatureTransformer`` to select better lags :pr:`3005`
* Added test to ensure pickling pipelines preserves thresholds :pr:`3027`
* Added AutoML function to access ensemble pipeline's input pipelines IDs :pr:`3011`
* Added ``NumWords`` and ``NumCharacters`` primitives to ``TextFeaturizer`` and renamed ``TextFeaturizer` to ``NaturalLanguageFeaturizer`` :pr:`3030`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick: Move to next release section.

@jeremyliweishih
Copy link
Contributor Author

Will update with perf tests soon!

Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Looking forward to seeing the perf test results.

@jeremyliweishih
Copy link
Contributor Author

@eccabay uploaded the perf test results in the description. Lmk what you think!

@jeremyliweishih jeremyliweishih merged commit 598688e into main Nov 12, 2021
@chukarsten chukarsten mentioned this pull request Nov 29, 2021
@freddyaboulton freddyaboulton deleted the js_2914_add_primitives branch May 13, 2022 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add NumCharacters + NumWords primitives to TextFeaturizer and rename to NaturalLanguageFeaturizer
3 participants