Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different result despite same input #9

Closed
iamhuy opened this issue Mar 23, 2017 · 7 comments
Closed

Different result despite same input #9

iamhuy opened this issue Mar 23, 2017 · 7 comments

Comments

@iamhuy
Copy link

iamhuy commented Mar 23, 2017

I tried to create some CRF instances to train with the same training set and same max_iteration param.

crf = sklearn_crfsuite.CRF(
            algorithm='ap', 
            max_iterations=5, 
        )
crf.fit(X_train, Y_train)

t = sklearn_crfsuite.CRF(
            algorithm='ap', 
            max_iterations=5, 
        )
t.fit(X_train, Y_train)

However, their result is different ( I tested them on the same develop set with fmeasure).
Hope to see your response soon.
Thank you

@kmike
Copy link
Contributor

kmike commented Mar 23, 2017

I think this is expected - crfsuite shuffles dataset for Averaged Perceptron training, and uses a global random seed (see here); it means shuffle returns a different result each time.

@iamhuy
Copy link
Author

iamhuy commented Mar 23, 2017

Thank you !
Does it mean for a specific set of hyperparameters, it is necessary to train more than one time to find the best one ( because it depends on the time too) ?

@kmike
Copy link
Contributor

kmike commented Mar 23, 2017

Well, it depends on a goal. If you want to compare hyperparameters then yeah, it could make sense to train on several seeds, and take e.g. an average, or a best model, or just compute variance. But are results really that different in different runs?

@iamhuy
Copy link
Author

iamhuy commented Mar 23, 2017

No. They're not different on different runs:
I mean if I run above code with in 2 different execitions that crf1, t1 ,crf2, t2:
then crf1 = crf2 , t1 = t2 and crf1 != t1

@severinsimmler
Copy link

How do I set the random seed?

@huang-xx
Copy link

huang-xx commented Feb 1, 2021

@iamhuy @severinsimmler Hi, I encountered the same problem, but after I set random_state for the train_test_split function of sklearn.model_selection, the results became consistent.

@UAmsterdam
Copy link

UAmsterdam commented Feb 12, 2024

I am also getting different results while running it on different environment:
Like command-line version of CRFsuite and Python version of CRFsuite.

Does anyone here has some idea whats going on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants