-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inferior results on all datasets compared to that reported in the paper #8
Comments
hi, attached is the log of CIFAR10 of the 50 Epoch: (the whole log for cifar10/100 is attached below) [2019-03-19 16:40:36,527][main_cifar10_bs_1k_32_alpha005_thres95.py][line:385][INFO][rank:0] Epoch: [51][0/60] Time: 3.951 (3.951) Data: 3.921 (3.921) It doesn't require that many epochs to get the reported results. I suggest you check the following configs: the version of PyTorch, Cuda, the seed of NumPy, the seed of PyTorch, and etc. It's true that the results seem to be really sensitive, we don't know why yet, but it does change a lot with tiny modifications to configurations, which could be left as potential future works. A lot of clustering-based methods suffer the same issue of instability. Please let me know if you are still struggling reproducing the result, maybe I could help checking with you. |
All the libraries have the correct version in my installation, also I have used the exact same seed as you mentioned. Such large variation should not be related to randomness. Are you yourself able to reproduce your results again now? Are you sure you have put the correct hyper parameters in the config file? What is this parameter alpha in your log file name? |
yes we can, we've tested it a lot of times before releasing... |
After having gone through your log files... there seems to be some differences which I could observe. Further I hope "alpha" and "coeff[label]" are similar according to your yaml files. However does "coeff[local]" correspond to the hyerparameter beta in the Equation 15 of the paper? If so, could you please clarify its desired value as in the paper ... you have set the value of beta to 0.1, while in the yaml files ... "coeff[local]" value is set to 0.05? Next, in the log file provided by you, you have provided the value of weight_decay 'wd' to be 1e-05, while in the 'main.py', torch.optim.RMSProp uses the default value 0 as the weight_decay. So it seems there is something wrong with the code or the log files which you seem to have provided.
|
I also got a different result. I am using a newer version of PyTorch (1.3) and Keras (2.3.1), but the result seems to be similar to the one from Kartik. Also, the training speed is slow, it took me around 1 hour for each epoch on my (single) GeForce RTX 2080 Ti. Besides, I've changed the Keras backend to "channel_first" but still got the warning, would it be the reason why the numbers don't match? Thank you. Some log for CIFAR10: [2020-04-16 07:52:12,916][main.py][line:276][INFO][rank:0] Epoch: [10/200] ARI against ground truth label: 0.057 |
hey guys @kartikgupta-at-anu @sumo8291 @TsungWeiTsai |
@kartikgupta-at-anu Most of the clustering methods suffer this problem of instability and large variation of performance, which is one of their defects. They are commonly super sensitive to initialization and hyperparameters, especially on smaller datasets like CIFAR: the accuracy would vary a lot if a part of the samples is misclassified because of the bad initialization. |
@TsungWeiTsai hi, I checked my log and found that we didn't encounter this warning before. I am not that sure but I would say it could be.... I think you could try setting a breakpoint and check whether the images are augmented correctly (for the keras dataloader) |
@Cory-M since this (as your latest log files seem to have got the correct results) seems related to some library version mismatch. I have tried using exact same version of libraries. The best is if you can provide a copy of "pip list" so that I can try exact same versions of all other libraries which you have not mentioned. Also let me know what are the cuda and cudnn versions you are using? Currently I have already been using Python 3.6.5, torch 0.4.1, keras 2.0.2. For example, look at my pip list: Another thing you could do is to upload data directory at least for cifar10/cifar100 on google drive/drop box so that we can copy the exact same setup to reproduce. |
@TsungWeiTsai This is the code of
Note that the default value of |
Hey @kartikgupta-at-anu , we uploaded the data to:
Here is the pip list: Hope that helps. |
I tried with exact same libraries set and your dataset. But still could not reproduce the results. If anyone else except @Cory-M gets to reproduce, please let me know. My pip list: |
I have been trying to reproduce the results mentioned in the paper using the same libraries as you mentioned and using the exp configs provided by you. But the results I get on GTX 1080 Ti are completely different. Also, you mentioned that the training takes only 19 hours but instead it has taken me around 3 days to train it on CIFAR100 for just 100 epochs and still its training.
Are these config files correct? Has anybody else been able to reproduce the results?
Below are the results which i get
STL10--->
[2020-04-12 17:56:34,340][main.py][line:272][INFO][rank:0] Epoch: [199/200] ARI against ground truth label: 0.182
[2020-04-12 17:56:34,353][main.py][line:273][INFO][rank:0] Epoch: [199/200] NMI against ground truth label: 0.296
[2020-04-12 17:56:34,372][main.py][line:274][INFO][rank:0] Epoch: [199/200] ACC against ground truth label: 0.368
CIFAR10 (training still not finished in 3 days)--->
[2020-04-13 03:57:40,966][main.py][line:272][INFO][rank:0] Epoch: [106/200] ARI against ground truth label: 0.305
[2020-04-13 03:57:40,968][main.py][line:273][INFO][rank:0] Epoch: [106/200] NMI against ground truth label: 0.407
[2020-04-13 03:57:40,968][main.py][line:274][INFO][rank:0] Epoch: [106/200] ACC against ground truth label: 0.463
CIFAR100 (training still not finished in 3 days)--->
FutureWarning)
[2020-04-13 03:48:53,305][main.py][line:272][INFO][rank:0] Epoch: [105/200] ARI against ground truth label: 0.169
[2020-04-13 03:48:53,307][main.py][line:273][INFO][rank:0] Epoch: [105/200] NMI against ground truth label: 0.282
[2020-04-13 03:48:53,308][main.py][line:274][INFO][rank:0] Epoch: [105/200] ACC against ground truth label: 0.308
The text was updated successfully, but these errors were encountered: