Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resume training with My own dataset acc problem #105

Closed
johnSmith1990 opened this issue Oct 27, 2019 · 8 comments
Closed

resume training with My own dataset acc problem #105

johnSmith1990 opened this issue Oct 27, 2019 · 8 comments

Comments

@johnSmith1990
Copy link

thanks. my goal is to resume training with my own dataset. My dataset has only one special character + 0-9 and a-z (38 char's). My training images are 316,000 and validation images are 144,000 images. during training, this line of main.py print("correct / total: %d / %d, " % (n_correct, n_total)) prints n_total :64,000. why 64000? None of my training and testing images are 64,000. other note is that acc during training reach 99% but when i load trained model and test some images with demo.py, acc is 0 and all of predictions are wrong. I resume training with your demo.pth but actually, results of demo.pth is very very better than new trained models.

Can you help me what was wrong? thanks

@johnSmith1990
Copy link
Author

johnSmith1990 commented Oct 27, 2019

Update:

based on close issues, I edited this lines:

  1. change parser.add_argument('--alphabet', type=str, default='0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f:g:h:i:j:k:l:m:n:o:p:q:r:s:t:u:v:w:x:y:z:$')
    to parser.add_argument('--alphabet', type=str, default='0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f:g:h:i:j:k:l:m:n:o:p:q:r:s:t:u:v:w:x:y:z:-:$')

  2. change parser.add_argument('--MORAN', default='', help="path to model (to continue training)")
    to parser.add_argument('--MORAN', default='demo.pth', help="path to model (to continue training)")

  3. give the new alphabet to the lmdbdataset() in dataset.py.

  4. change train_nips_dataset = dataset.lmdbDataset(root=opt.train_nips, transform=dataset.resizeNormalize((opt.imgW, opt.imgH)), reverse=opt.BidirDecoder) to train_nips_dataset = dataset.lmdbDataset(root=opt.train_nips, alphabet=opt.alphabet.split(opt.sep) transform=dataset.resizeNormalize((opt.imgW, opt.imgH)), reverse=opt.BidirDecoder)

  5. change MORAN.load_state_dict(MORAN_state_dict_rename, strict=True)
    to MORAN.load_state_dict(MORAN_state_dict_rename, strict=False)

but it give an error don't match in these lines:
for k, v in state_dict.items(): name = k.replace("module.", "") # removemodule.MORAN_state_dict_rename[name] = v MORAN.load_state_dict(MORAN_state_dict_rename, strict=False)

how can i correct that

@johnSmith1990
Copy link
Author

johnSmith1990 commented Oct 28, 2019

Update:
because of your comment:
Don't load the parameters of the attention module
I comment these lines:
for k, v in state_dict.items(): name = k.replace("module.", "") # remove 'module.' MORAN_state_dict_rename[name] = v

  1. is it correct?
  2. does it resume training from demo.pth? if yes why accuracy if 0.003 after one epoch?

@johnSmith1990
Copy link
Author

after training finished, the accuracy of demo.pth was better than saved models.
now I have 2 strategy:

  1. combine my data and your data and train from scratch.
  2. remove special characters and resume training with demo.pth.

What is your opinion?

@Canjie-Luo
Copy link
Owner

Thanks for your attention.
As you added only one character "-" to the alphabet, you can load all the parameters in demo.pth except the last fully connected layer in attentional decoder. To drop certain layers, you can simply use if 'XXX' in k: continue and MORAN.load_state_dict(MORAN_state_dict_rename, strict=False). Please refer to the official documents of PyTorch.
Make sure that the output (size) of the network and your prepared data are correct. Load pre-trained parameters, decrease the learning rate (maybe to 0.01 or 0.001) and fine-tune the MORAN.

@johnSmith1990
Copy link
Author

Ok thanks.
in the training process, two times models is saving. which one do we have to use?
try error?

@Canjie-Luo
Copy link
Owner

Please modify the strategy to evaluate and save your model:

MORAN_v2/main.py

Lines 241 to 245 in 2cd40c4

if acc_tmp > acc:
acc = acc_tmp
torch.save(MORAN.state_dict(), '{0}/{1}_{2}.pth'.format(
opt.experiment, i, str(acc)[:6]))

@johnSmith1990
Copy link
Author

very thanks Dear.

@johnSmith1990
Copy link
Author

johnSmith1990 commented Nov 1, 2019

My problem solved! and get beter results. thanks for your supports:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants