New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing coverage in TextFeaturizer #1842
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1842 +/- ##
========================================
+ Coverage 99.9% 100.0% +0.1%
========================================
Files 255 255
Lines 20655 20658 +3
========================================
+ Hits 20633 20650 +17
+ Misses 22 8 -14
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, took me a second to understand what was going on, but cool.
if pipeline_class == LinearPipelineWithTextFeatures: | ||
X = X.set_types(logical_types={'provider': 'NaturalLanguage'}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my edification, is the reason this test covers those lines because the change in logical types forces a derivation of the provenance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea the provenance is only computed if the text featurizer creates features. If you run it on a dataset without text features, the provenance will always be an empty dict. Since the text featurizer will (now) only compute features on columns of logical type NaturalLanguage
, we have to create one for the component to be able to do its thing.
Pull Request Description
With the removal of
text_columns
in #1652, the_get_feature_provenance
method ofTextFeaturizer
andLSA
was no longer being covered because the test dataset doesn't have any text features and the text features were specified withtext_columns
.After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rst
to include this pull request by adding :pr:123
.