Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverted the stratify input while maintaining numpy functionality #2

Closed
wants to merge 2 commits into from

Conversation

KensingtonSka
Copy link

Outline of changes:

I have kept my inital alterations to the function estimate_nbins(). Additionally, I have reverted and adjusted the funtion scsplit() to be able to take both pandas dataframes and numpy arrays as inputs while still maintaining the stratify input option. However, I have changed stratify to be an optional input rather than a required one. This is because if you pass both X and y as dataframes, I feel that there is no need to specify y again as the stratifyer.

There are two changes that I have made.
First, I have altered the functions `estimate_nbins()` and `scsplit()` to be able to take both pandas dataframes and numpy arrays as inputs.
The second change I have made is that I have removed the input "stratify" from `scsplit()` and made the inputs "X" and "y" required. The reason I have do this is to make the function behave more like scikit-learn's `train_test_split(X, y)`. This should provide a more seamless integration of the function into other people's projects.

Additionally, I have updated the readme to account for my changes. However, I have not changed the version info.
I have kept my alterations to the functions `estimate_nbins()` while reverting and adjusting `scsplit()` to be able to take both pandas dataframes and numpy arrays as inputs while still maintaining the `stratify` input option. However, I have changed `stratify` to be an optional input rather than a required one.
Comment on lines 133 to 134
X = args[0].drop(stratify.name, axis = 1)
y = args[0][stratify.name]
Copy link
Owner

@DanilZherebtsov DanilZherebtsov Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is going to work?
If the default argument stratify = None, then if the user has not defined this argument - this first X & y definition is going to throw a TypeError.
I think a good practice here would be to catch that exception and print a message that stratify argument is required for single data objects inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants