Skip to content

Fix index bug for TextFeaturizer and LSA#1644

Merged
angela97lin merged 8 commits into
mainfrom
1643_text_featurizer_indices
Jan 6, 2021
Merged

Fix index bug for TextFeaturizer and LSA#1644
angela97lin merged 8 commits into
mainfrom
1643_text_featurizer_indices

Conversation

@angela97lin
Copy link
Copy Markdown
Contributor

@angela97lin angela97lin commented Jan 5, 2021

Closes #1643

Fixes an index bug where if the original input DF has custom indices, NaNs are backfilled. In the original repro, we don't specify indices specifically but automl passes in indices split by our data split.

@angela97lin angela97lin self-assigned this Jan 5, 2021
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 5, 2021

Codecov Report

Merging #1644 (419bf79) into main (2cd11f6) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1644     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         240      240             
  Lines       18390    18401     +11     
=========================================
+ Hits        18382    18393     +11     
  Misses          8        8             
Impacted Files Coverage Δ
...lines/components/transformers/preprocessing/lsa.py 100.0% <100.0%> (ø)
...ents/transformers/preprocessing/text_featurizer.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_lsa.py 100.0% <100.0%> (ø)
...alml/tests/component_tests/test_text_featurizer.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2cd11f6...419bf79. Read the comment docs.

Comment thread evalml/tests/conftest.py
return X, y


@pytest.fixture()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this here, since LSA and TextFeaturizer both use it :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!

* Added multiclass check to ``InvalidTargetDataCheck`` for two examples per class :pr:`1596`
* Fixes
* Fix thresholding for pipelines in AutoMLSearch to only threshold binary classification pipelines :pr:`1622` :pr:`1626`
* Fixed thresholding for pipelines in ``AutoMLSearch`` to only threshold binary classification pipelines :pr:`1622` :pr:`1626`
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just standardizing :d

@angela97lin angela97lin changed the title Fix indices for TextFeaturizer and LSA Fix index bug for TextFeaturizer and LSA Jan 5, 2021
@angela97lin angela97lin marked this pull request as ready for review January 5, 2021 19:16
@angela97lin angela97lin requested review from a team, ParthivNaresh, chukarsten, dsherry and jeremyliweishih and removed request for a team January 5, 2021 19:17
Copy link
Copy Markdown
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Copy Markdown
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Copy link
Copy Markdown
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @angela97lin !

@angela97lin and I tried to figure out why we're running into this problem now for the first time (the repro on the ticket is just a vanilla use-case of automl with text). We noticed this must have been introduced since the 0.17.0 release since the text demo on latest is "failing" but it's "passing" on stable.

Nothing in our release notes is related to the text featurizer so maybe this has something to do with the latest featuretools release (released two days after 0.17.0). That being said, this fixes the docs locally so let's get it merged!

@angela97lin angela97lin merged commit fb2aa49 into main Jan 6, 2021
@angela97lin angela97lin deleted the 1643_text_featurizer_indices branch January 6, 2021 18:59
@bchen1116 bchen1116 mentioned this pull request Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue running AutoMLSearch with text columns and LSA

5 participants