docs: correct readme example #50

KameniAlexNea · 2022-12-04T04:15:09Z

Target attribute, not exist in first code example

aerdem4 · 2022-12-05T19:24:01Z

README.md

@@ -38,10 +43,10 @@ sample_df = train_df.sample(frac=0.01, random_state=0)
 sample_df.sort_values("AvSigVersion", inplace=True)

 # define the validation scheme
-cv = KFold(n_splits=4, shuffle=False, random_state=0)
+cv = KFold(n_splits=4, shuffle=True, random_state=None)


Why did you change kfold params?

We can't set shuffle to False and random_state set to 0 as in this capture of my notebook.

Maybe it changed with a new sklearn version. Anyway, you can set the random_state to None but shuffle should stay as False since it is doing a lazy time split.

Yep, it seems like the API has changed.

Maybe good to add a comment and make the example a bit more general:

sample_df.sort_values("AvSigVersion", inplace=True)
->
sample_df = sample_df.sample(frac=1) # Shuffling rows before CV

and
cv = KFold(n_splits=4, shuffle=False) # No shuffling to keep the same folds for each feature

I thought anyone could search for the competition name and get the data but we can also share the link to the competition on readme? We can also comment that AvSigVersion is a proxy for time because the data had no time column.

And having a time split validation is maybe not the best readme example you want to show.

Why?

not the most common use case.
Ok, now that I read more the readme, you are right that section is called "Example on Kaggle's Microsoft Malware Prediction Competition"
So

sample_df.sort_values("AvSigVersion", inplace=True) # Sort by time for time split validation cv = KFold(n_splits=4, shuffle=False) # Don't shuffle to keep the time split split validation

Maybe good to add a generic example based on sklearn data like Iris.

These comments should make it easier to understand 👍

LOFO is usually more useful for non-random split problems but such example could also be nice.

@KameniAlexNea can you please add these 2 comments above that @stephanecollot shared since you already update the readme? Then I can merge it.

I just make a push now

KameniAlexNea added 2 commits December 4, 2022 05:08

add language type in readme code

3e19351

target variable in first code isn't an attribute

6d9520a

KameniAlexNea mentioned this pull request Dec 5, 2022

Running the example in the readme throws errors #51

Closed

aerdem4 reviewed Dec 5, 2022

View reviewed changes

KameniAlexNea added 2 commits December 7, 2022 08:37

feat: remove shuffle and set state to None

92fcd15

feat: add comments for best understanding

12cccf7

aerdem4 approved these changes Dec 8, 2022

View reviewed changes

aerdem4 merged commit c735439 into aerdem4:master Dec 8, 2022

KameniAlexNea deleted the alex branch December 8, 2022 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: correct readme example #50

docs: correct readme example #50

KameniAlexNea commented Dec 4, 2022

aerdem4 Dec 5, 2022

KameniAlexNea Dec 5, 2022

aerdem4 Dec 6, 2022

KameniAlexNea Dec 6, 2022

stephanecollot Dec 6, 2022

aerdem4 Dec 6, 2022 •

edited

stephanecollot Dec 6, 2022

aerdem4 Dec 6, 2022

aerdem4 Dec 8, 2022

KameniAlexNea Dec 8, 2022

docs: correct readme example #50

docs: correct readme example #50

Conversation

KameniAlexNea commented Dec 4, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aerdem4 Dec 6, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aerdem4 Dec 6, 2022 •

edited