Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVHN - final accuracy #4

Closed
boussad83 opened this issue Jan 31, 2018 · 4 comments
Closed

SVHN - final accuracy #4

boussad83 opened this issue Jan 31, 2018 · 4 comments

Comments

@boussad83
Copy link

Hi, I ran your tensorflow code (file train_svhn.py) and the final accuracy was only around 90%. I did not change anything in the code. I ran it as is ! Do you have any suggestions why I do not get the expected 96% ? By the way, I ran it on one GPU.

@tarvaina
Copy link
Contributor

tarvaina commented Feb 2, 2018

Hi, and thanks for letting me know.

You should see around 94-95% accuracy against the validation set using that runner.

I will try to reproduce your result. I don't know what's causing the discrepancy, but possibly I have made a mistake when I added additional experiments to the code (the consistency_trust, num_logits and logit_distance_cost hyperparams were not there when I ran the primary SVHN experiments that the results are based on) or when I cleaned up the code before publishing it.

In the meanwhile, if you'd like to solve this yourself, you can try the version at commit 1a2d40ac0dcff99ad96aa95e721bfa46e05d4cb3. This was before the additional experiments were added to the code, so the code was simpler and may be free of whatever bug is affecting the accuracy.

In addition, experiments/svhn_final_eval.py should have the exact hyperparams that the experiments in the paper used. To make a single run, you can replace the parameters function with something like this:

def parameters():
    yield {
        'test_phase': True,
        'model_type': 'mean_teacher',
        'n_labeled': 500,
        'n_extra_unlabeled': 0,
        'data_seed': 0
    }

But it is likely that the conceived bug also affects that runner.

@tarvaina
Copy link
Contributor

tarvaina commented Feb 7, 2018

Hi.

I looked into it and believe it's a hyperparameter issue. The runner uses max_consistency_cost = 100.0 whereas it should be max_consistency_cost = 1.0.

However, I wasn't able to replicate the 90% result even with the wrong hyperparameter. I got consistently about 93% - 94% results. Maybe you hit a very unfortunate initialization there, or maybe there is some other issue.

Somewhat tangentially, I am going to change the train_svhn.py hyperparameters so that it converges quickly but may not get quite optimal results. The svhn_final_eval.py will continue to be contain the hyperparameters in the paper that are close to optimal.

@tarvaina
Copy link
Contributor

tarvaina commented Feb 7, 2018

Two other things to keep in mind:

  1. The numbers in the paper are results against test set whereas the train_svhn.py by default evaluates against a validation set drawn from the training set. This worsens the results of train_svhn.py for two reasons:
  • the remaining training set is smaller
  • the SVHN training set is not as well curated as the test set and therefore a validation set gives a lower accuracy than the test set.
  1. The metric we report in the paper is eval/error/ema (for non-mean-teacher models as well) which should give slightly better results than eval/error/1.

@tarvaina
Copy link
Contributor

tarvaina commented Feb 7, 2018

I changed the hyperparams of svhn_train.py so that it converges four times faster now: 99cd11f

I am closing this issue as I wasn't able to reproduce the 90% results, and because the runner is now clearly different from the one used in the paper. Try experiments/svhn_final_eval.py to reproduce the paper results.

Please reopen if your results remain bad.

@tarvaina tarvaina closed this as completed Feb 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants