-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: correct readme example #50
Merged
Merged
Changes from 2 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
3e19351
add language type in readme code
KameniAlexNea 6d9520a
target variable in first code isn't an attribute
KameniAlexNea 92fcd15
feat: remove shuffle and set state to None
KameniAlexNea 12cccf7
feat: add comments for best understanding
KameniAlexNea File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change kfold params?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't set shuffle to
False
and random_state set to0
as in this capture of my notebook.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it changed with a new sklearn version. Anyway, you can set the random_state to None but shuffle should stay as False since it is doing a lazy time split.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it seems like the API has changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe good to add a comment and make the example a bit more general:
sample_df.sort_values("AvSigVersion", inplace=True)
->
sample_df = sample_df.sample(frac=1) # Shuffling rows before CV
and
cv = KFold(n_splits=4, shuffle=False) # No shuffling to keep the same folds for each feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought anyone could search for the competition name and get the data but we can also share the link to the competition on readme? We can also comment that AvSigVersion is a proxy for time because the data had no time column.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not the most common use case.
Ok, now that I read more the readme, you are right that section is called "Example on Kaggle's Microsoft Malware Prediction Competition"
So
Maybe good to add a generic example based on sklearn data like Iris.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments should make it easier to understand 👍
LOFO is usually more useful for non-random split problems but such example could also be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KameniAlexNea can you please add these 2 comments above that @stephanecollot shared since you already update the readme? Then I can merge it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just make a push now