TrainTestSplit random seed #1635

petterton · 2018-11-15T20:00:50Z

I am repeatedly calling TrainTestSplit for a data set (for cross validation) and see that the resulting split is the same every call. In sklearn, the train_test_split function has the possibility of taking a seed for a random number generator as an input. Could this be added also in ML.NET?

The text was updated successfully, but these errors were encountered:

najeeb-kazmi · 2018-11-15T21:46:54Z

Hi @petterton - in ML.NET, the seed is set at the environment level. You can set and change the seed when you create the MLContext object as in this sample.

Does that answer your question?

petterton · 2018-11-15T22:02:15Z

Hi @najeeb-kazmi , I was wondering if that was the case, but changing the seed when creating MLContext did not change the split for me. This also raises another question:
If calling TrainTestSplit multiple times, is every iteration expected to start from the same seed, giving the same split? (this is what I get now) Or do I have to recreate the MLContext for every split?

najeeb-kazmi · 2018-11-16T20:40:07Z

Actually, I was wrong about the seed in MLContext. That does not affect the behavior of TrainTestSplit, which has a deterministic behavior implemented in TrainContextBase here.

@Zruty0 any thoughts on how we can get this in? Currently, we are using RangeFilter to get the splits for both TrainTestSplit and for creating the CV folds in CrossValidateTrain methods. Getting random splits of the data is a pretty common scenario, and something supported by most other toolkits. We should find a way to do this in ML.NET as well.

@petterton if you are trying to get different splits for doing cross validation specifically, you can use the CrossValidate methods in all the training contexts, e.g. MLContext.BinaryClassification.CrossValidate().

Zruty0 · 2018-11-18T04:02:54Z

Yep, it's a bug. We need to make TrainTestSplit take a random seed

petterton · 2018-11-22T11:53:03Z

@Zruty0 : Will this be fixed in 0.8? (Not assigned to a milestone yet...)

petterton · 2019-01-22T15:07:28Z

@Ivanidzo4ka , @Zruty0 : I tested this in v0.9, and this still does not work as I expected. If I don't set a stratificationColumn, the seed works as expected (changing the seed changes the split). But if the stratificationColumn is set, changing the seed seems to have no effect.

dckorben · 2019-01-29T15:21:49Z

The usage of seeds seems to be unclear still in v0.9. I think in some workflows, if the Context seed is provided, you are probably going for deterministic outcomes for testing. However, if the Context seed isn't provided, you are probably doing a real training. It doesn't seem that the Context seed has any effect on the Split. In some cases, you might have a need to call split multiple times, which of course probably provide a seed. But if you load, split and train with a null Context, shouldn't we expect a different Split outcome?

najeeb-kazmi added the bug Something isn't working label Nov 16, 2018

Ivanidzo4ka mentioned this issue Dec 14, 2018

Provide seed parameter for TrainTest routine #1885

Merged

Ivanidzo4ka closed this as completed in #1885 Dec 17, 2018

Ivanidzo4ka added this to Done in v0.9 via automation Dec 17, 2018

Ivanidzo4ka self-assigned this Jan 25, 2019

Ivanidzo4ka reopened this Jan 25, 2019

Ivanidzo4ka mentioned this issue Jan 25, 2019

Make sure seed works for stratification column in TrainTest and CrossValidate #2241

Merged

Ivanidzo4ka closed this as completed in #2241 Jan 29, 2019

shauheen added this to the 0119 milestone Feb 6, 2019

ghost locked as resolved and limited conversation to collaborators Mar 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TrainTestSplit random seed #1635

TrainTestSplit random seed #1635

petterton commented Nov 15, 2018 •

edited

najeeb-kazmi commented Nov 15, 2018

petterton commented Nov 15, 2018 •

edited

najeeb-kazmi commented Nov 16, 2018

Zruty0 commented Nov 18, 2018

petterton commented Nov 22, 2018

petterton commented Jan 22, 2019

dckorben commented Jan 29, 2019 •

edited

TrainTestSplit random seed #1635

TrainTestSplit random seed #1635

Comments

petterton commented Nov 15, 2018 • edited

najeeb-kazmi commented Nov 15, 2018

petterton commented Nov 15, 2018 • edited

najeeb-kazmi commented Nov 16, 2018

Zruty0 commented Nov 18, 2018

petterton commented Nov 22, 2018

petterton commented Jan 22, 2019

dckorben commented Jan 29, 2019 • edited

petterton commented Nov 15, 2018 •

edited

petterton commented Nov 15, 2018 •

edited

dckorben commented Jan 29, 2019 •

edited