-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove text_columns
parameter from LSA and TextFeaturizer components
#1652
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1652 +/- ##
========================================
- Coverage 100.0% 99.9% -0.0%
========================================
Files 255 255
Lines 20621 20571 -50
========================================
- Hits 20613 20549 -64
- Misses 8 22 +14
Continue to review full report at Codecov.
|
@@ -273,7 +273,7 @@ def test_make_pipeline_no_column_names(input_type, problem_type): | |||
def test_make_pipeline_text_columns(input_type, problem_type): | |||
X = pd.DataFrame({"numerical": [1, 2, 3, 1, 2], | |||
"categorical": ["a", "b", "a", "c", "c"], | |||
"text": ["string one", "another", "text for a column", "text string", "hello world"]}) | |||
"text": ["string one", "another", "text for a column, this should be a text column!!", "text string", "hello world"]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this longer so that WW automatically detects its a text column. Could also just use infer_feature_types
heh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin I think this looks good!
evalml/pipelines/components/transformers/preprocessing/text_featurizer.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just left a few questions!
evalml/pipelines/components/transformers/preprocessing/text_transformer.py
Show resolved
Hide resolved
lsa.fit(X) | ||
expected_col_names = set(['LSA(4.75)[0]', | ||
'LSA(4.75)[1]', | ||
'LSA(-1)[0]', | ||
'LSA(-1)[1]']) | ||
'LSA(-1.0)[0]', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this column label become a float versus the original int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Blocking on #1662 so putting this back as In Progress 😭 |
@angela97lin BTW since we decided yesterday that this is not ready to merge, I'm converting this back to a draft. |
…into 1614_remove_text_columns
Closes #1614.