Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LSA Component #1022
Add LSA Component #1022
Changes from 7 commits
0ba5627
da83205
19c355e
6d1b5c8
ed44a33
c9fe512
9bf7427
cb5617a
e360a1c
a1d9a98
e383731
0ad6b0a
14c50ac
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking: what do you think of doing this for the naming:
LSA(my_feature, 0)
andLSA(my_feature, 1)
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it! I only kept this formatting to mirror what the primitives' generated column names look like, but I can change this if you'd prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this warning from
__init__
tofit
to temporarily resolve #1017There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eccabay why is the
feature_matrix.reindex(X.index)
necessary? I know that's not part of this PR, I'm just poking around finding ways we can simplify our row/column indexing across the board.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall correctly, it was because the outputted
feature_matrix
sets its own indices, so it helped to reset to what was originally given.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit-pick: I feel like this line is covered by
set(X_t.columns) == expected_col_names
so maybe not necessary? (same with other tests!)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so as well at first, but this line actually helped me catch a bug yesterday! Since we take the set of
X_t.columns
, any columns with duplicate names will not cause that line to fail -- checking the number of columns explicitly prevents that from slipping through the cracks.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooo huh, I didn't even know duplicate names were allowed but makes sense! 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, @angela97lin , you can do fancy stuff in pandas.
produces a df with two columns which happen to have the same name, although they occupy different positions in the column index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these tests, let's do a direct comparison of the column names:
This has the added benefit of covering the column name order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the column order as outputted by featuretools changes, and as far as I can tell there's no option to fix it. @dsherry would you rather I enforce a column order by sorting, say, alphabetically, or leave this test as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's good to know. Your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to leave this as is, since enforcing an order makes the test bulkier.