Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing coverage in TextFeaturizer #1842

Merged
merged 2 commits into from Feb 12, 2021
Merged

Conversation

freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented Feb 12, 2021

Pull Request Description

With the removal of text_columns in #1652, the _get_feature_provenance method of TextFeaturizer and LSA was no longer being covered because the test dataset doesn't have any text features and the text features were specified with text_columns.


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Feb 12, 2021

Codecov Report

Merging #1842 (4916f3a) into main (0fe8bdd) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff            @@
##            main    #1842     +/-   ##
========================================
+ Coverage   99.9%   100.0%   +0.1%     
========================================
  Files        255      255             
  Lines      20655    20658      +3     
========================================
+ Hits       20633    20650     +17     
+ Misses        22        8     -14     
Impacted Files Coverage Δ
...understanding_tests/test_permutation_importance.py 100.0% <100.0%> (ø)
...lines/components/transformers/preprocessing/lsa.py 100.0% <0.0%> (+2.5%) ⬆️
...ents/transformers/preprocessing/text_featurizer.py 100.0% <0.0%> (+17.9%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0fe8bdd...4916f3a. Read the comment docs.

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, took me a second to understand what was going on, but cool.

Comment on lines +155 to +157
if pipeline_class == LinearPipelineWithTextFeatures:
X = X.set_types(logical_types={'provider': 'NaturalLanguage'})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my edification, is the reason this test covers those lines because the change in logical types forces a derivation of the provenance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea the provenance is only computed if the text featurizer creates features. If you run it on a dataset without text features, the provenance will always be an empty dict. Since the text featurizer will (now) only compute features on columns of logical type NaturalLanguage, we have to create one for the component to be able to do its thing.

@freddyaboulton freddyaboulton merged commit cc822c7 into main Feb 12, 2021
@freddyaboulton freddyaboulton deleted the fix-coverage-in-text-featurizer branch February 12, 2021 20:45
@chukarsten chukarsten mentioned this pull request Feb 23, 2021
@dsherry dsherry mentioned this pull request Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants