Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename target_dataframe_name parameter to target_dataframe_index #353

Merged
merged 26 commits into from
Dec 12, 2022

Conversation

gsheni
Copy link
Contributor

@gsheni gsheni commented Nov 24, 2022

@gsheni gsheni self-assigned this Nov 24, 2022
@gsheni gsheni marked this pull request as ready for review November 24, 2022 19:36
@gsheni gsheni requested a review from a team November 24, 2022 19:55
@codecov
Copy link

codecov bot commented Nov 24, 2022

Codecov Report

Merging #353 (ae874cb) into main (0b476f1) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #353   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           29        29           
  Lines         1339      1339           
=========================================
  Hits          1339      1339           
Impacted Files Coverage Δ
composeml/conftest.py 100.00% <ø> (ø)
composeml/tests/test_data_slice/test_extension.py 100.00% <ø> (ø)
composeml/tests/test_featuretools.py 100.00% <ø> (ø)
composeml/tests/test_label_maker.py 100.00% <ø> (ø)
composeml/tests/test_label_serialization.py 100.00% <ø> (ø)
composeml/tests/test_label_times.py 100.00% <ø> (ø)
composeml/label_maker.py 100.00% <100.00%> (ø)
composeml/label_times/description.py 100.00% <100.00%> (ø)
composeml/label_times/object.py 100.00% <100.00%> (ø)
...mposeml/tests/test_label_transforms/test_sample.py 100.00% <100.00%> (ø)

@@ -20,11 +20,9 @@ def describe_label_times(label_times):
metadata = label_times.settings
target_column = metadata["label_times"]["target_columns"][0]
target_type = metadata["label_times"]["target_types"][target_column]
target_dataframe_name = metadata["label_times"]["target_dataframe_name"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure about what this does and if we should remove this or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I look at the LabelTimes class definition, it seems like we have some pretty confusing terminology. I think in one sense target_column_name refers to the column we want to use as the "target" for creating labels (as in, I want to create labels for my customers using customer_id).

Then target_column here seems to refer to the column that is the machine learning target column (the label column, as in, total_spent for each customer).

I'm not sure how best to resolve this off the top of my head, but we should probably only use target in one case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeff-hernandez Any idea what to do here?

Copy link
Collaborator

@jeff-hernandez jeff-hernandez Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent is to use the same terminology as featuretools. For example, in the past, we used target_entity to specify which entity to create features for. Similarly, the idea was the use target_entity to specify which entity to create labels for. The reason we put it in the metadata is so that we have the settings for reconstructing the label times if we want to re-create the labels.

Copy link
Collaborator

@jeff-hernandez jeff-hernandez Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe using target_column_index instead of target_column_name can help clarify (or target_dataframe_index)

Gaurav Sheni added 2 commits December 12, 2022 14:02
@gsheni gsheni changed the title Rename target_dataframe_name parameter to target_column_name Rename target_dataframe_name parameter to target_dataframe_index Dec 12, 2022
jeff-hernandez
jeff-hernandez previously approved these changes Dec 12, 2022
Copy link
Collaborator

@jeff-hernandez jeff-hernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just had a couple of comments.

composeml/label_times/description.py Show resolved Hide resolved
@gsheni gsheni enabled auto-merge (squash) December 12, 2022 20:26
@gsheni gsheni disabled auto-merge December 12, 2022 20:45
@jeff-hernandez jeff-hernandez self-requested a review December 12, 2022 20:49
@gsheni gsheni merged commit c559039 into main Dec 12, 2022
@gsheni gsheni mentioned this pull request Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rename target_dataframe_name parameter to target_dataframe_index
3 participants