Skip to content

Conversation

@jeremyliweishih
Copy link
Collaborator

@jeremyliweishih jeremyliweishih commented Nov 9, 2021

Fixes #2914.

Perf tests look good! Almost identical results with a slight blip in init time for one dataset which should just be some noise.
nlp_primatives.html.zip

@codecov
Copy link

codecov bot commented Nov 9, 2021

Codecov Report

Merging #3030 (1c28da8) into main (7b2cb57) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3030     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        312     312             
  Lines      30183   30184      +1     
=======================================
+ Hits       30094   30095      +1     
  Misses        89      89             
Impacted Files Coverage Δ
...derstanding/prediction_explanations/_algorithms.py 99.3% <ø> (ø)
evalml/pipelines/components/__init__.py 100.0% <ø> (ø)
...alml/pipelines/components/transformers/__init__.py 100.0% <ø> (ø)
evalml/tests/component_tests/test_utils.py 95.7% <ø> (ø)
...s/prediction_explanations_tests/test_explainers.py 100.0% <ø> (ø)
.../prediction_explanations_tests/test_force_plots.py 100.0% <ø> (ø)
...del_understanding_tests/test_partial_dependence.py 99.4% <ø> (ø)
...valml/tests/pipeline_tests/test_component_graph.py 99.9% <ø> (ø)
.../components/transformers/preprocessing/__init__.py 100.0% <100.0%> (ø)
...rmers/preprocessing/natural_language_featurizer.py 100.0% <100.0%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b2cb57...1c28da8. Read the comment docs.

@jeremyliweishih jeremyliweishih marked this pull request as ready for review November 11, 2021 21:02
Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! It's encouraging to me how simple of a change this is.

I have two main things moving forward:

  • There's a few other places in the documentation that should be updated to add the new primitives: the explanatory part of the text tutorial here as well as the docstring where it lists the primitives (marked in another comment)
  • I'm very interested to see how the performance tests change with this update. I did a bunch of performance testing way back when originally adding this component to determine the best primitives to use, and I know there were some I explicitly removed to improve performance. I hope these will reflect real improvement!

Other than that, just left a couple small nits. Looks great though!

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih Thanks for this! I think this looks good. For completeness we may want to rename TextTransformer to NaturalLanguageTransformer but I don't think that's a hard requirement to close out this issue.

* Added an algorithm to ``DelayedFeatureTransformer`` to select better lags :pr:`3005`
* Added test to ensure pickling pipelines preserves thresholds :pr:`3027`
* Added AutoML function to access ensemble pipeline's input pipelines IDs :pr:`3011`
* Added ``NumWords`` and ``NumCharacters`` primitives to ``TextFeaturizer`` and renamed ``TextFeaturizer` to ``NaturalLanguageFeaturizer`` :pr:`3030`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick: Move to next release section.

@jeremyliweishih
Copy link
Collaborator Author

Will update with perf tests soon!

Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Looking forward to seeing the perf test results.

@jeremyliweishih
Copy link
Collaborator Author

@eccabay uploaded the perf test results in the description. Lmk what you think!

@jeremyliweishih jeremyliweishih merged commit 598688e into main Nov 12, 2021
@chukarsten chukarsten mentioned this pull request Nov 29, 2021
@freddyaboulton freddyaboulton deleted the js_2914_add_primitives branch May 13, 2022 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add NumCharacters + NumWords primitives to TextFeaturizer and rename to NaturalLanguageFeaturizer

4 participants