-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change StratifiedShuffleSplit to ttrain_test_split #112
Conversation
Hmmm... it's saying that no changes were made. Did you update from |
In hurry, forgot to commit! Should be good now. |
Thanks! Did you verify that it returns the same splits? If not, we'll have to verify that before merging. |
Just checked and, by default, X_train, X_test, y_train, y_test = train_test_split(input_data.drop('class', axis=1).values,
input_data['class'].values,
train_size=0.75, test_size=0.25,
random_state=RANDOM_STATE,
stratify=input_data['class'].values) Please make that change and let's see if that passes on Travis-CI. If it does, we'll merge away! |
Ok..sure. |
I have checked the splits, and the split ratio is same for both |
If you set the |
@@ -214,10 +214,10 @@ def fit(self, features, classes): | |||
np.random.shuffle(data_columns) | |||
training_testing_data = training_testing_data[data_columns] | |||
|
|||
training_indices, testing_indices = next(iter(StratifiedShuffleSplit(training_testing_data['class'].values, | |||
n_iter=1, | |||
training_indices, testing_indices = train_test_split(training_testing_data.index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right. The call needs to look something like:
(training_features, testing_features,
training_labels, testing_labels) = train_test_split(input_data.drop('class', axis=1).values,
input_data['class'].values,
train_size=0.75, test_size=0.25,
random_state=RANDOM_STATE,
stratify=input_data['class'].values)
Have you tested this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have tested it on IRIS and MNIST dataset. Works the same. We can also do it the way you have pointed out, but using training_testing_data.index
to get training_indices, testing_indices
is in line with rest of our code format.
Ok, got that. Thanks. |
Understood. Alright, looks good to merge. Thanks again! :-) |
Change StratifiedShuffleSplit to ttrain_test_split
Addresses #99