Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactors Data Slice Generator #150

Merged
merged 248 commits into from Jul 21, 2020
Merged

Conversation

jeff-hernandez
Copy link
Collaborator

@jeff-hernandez jeff-hernandez commented Jul 7, 2020

The data slice generator was refactored to improve design and structure. A summary of the changes is listed below:

  • A data slice offset class was created for better handling of integer-location and time based offsets.
  • A data slice generator class was created to loosely couple the generator with the label maker.
  • Index based data slicing was structured as a pandas extension for general utility.
  • Attributes of the data slice context were renamed for clearer reference.
  • Test cases were added and updated for the generator, extension, and offsets.

@jeff-hernandez jeff-hernandez marked this pull request as ready for review July 14, 2020 21:50
Copy link
Contributor

@thehomebrewnerd thehomebrewnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other comments:

  • Is the label_type parameter needed in LabelMaker now? It doesn't seem to be used anywhere that I saw
  • You might consider having more specific tests for the individual methods in DateSliceOffset and DataSliceExtension. Based on the test coverage report it looks like all the code is being tested, but it might be helpful to have more specific tests for these methods in test_extension.py and test_offset.py rather than testing the methods through higher level tests in other files (which I think is what is happening).

composeml/data_slice/extension.py Outdated Show resolved Hide resolved
composeml/label_maker.py Outdated Show resolved Hide resolved
composeml/data_slice/generator.py Outdated Show resolved Hide resolved
composeml/data_slice/offset.py Outdated Show resolved Hide resolved
composeml/tests/test_data_slice/test_extension.py Outdated Show resolved Hide resolved
@jeff-hernandez
Copy link
Collaborator Author

Is the label_type parameter needed in LabelMaker now?

No, the parameter has been removed.

@jeff-hernandez
Copy link
Collaborator Author

Thanks for the test suggestion. I think it can be helpful unless it becomes too redundant. I try to test cases that focus on different ways a user can interact with the public methods and test for the expected outcomes.

@jeff-hernandez
Copy link
Collaborator Author

@thehomebrewnerd thanks for the initial review and great feedback! The PR has been updated and is ready for another review.

Copy link
Contributor

@thehomebrewnerd thehomebrewnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure I fully understand all the logic that is happening in this code, but I don't see any other issues at this point. Might want to get a second opinion before you finalize though.

Copy link
Contributor

@frances-h frances-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@jeff-hernandez jeff-hernandez merged commit 26cf588 into master Jul 21, 2020
@jeff-hernandez jeff-hernandez mentioned this pull request Aug 28, 2020
@jeff-hernandez jeff-hernandez deleted the refactors_data_slice_generator branch September 1, 2020 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants