resume training with My own dataset acc problem #105

johnSmith1990 · 2019-10-27T06:28:07Z

thanks. my goal is to resume training with my own dataset. My dataset has only one special character + 0-9 and a-z (38 char's). My training images are 316,000 and validation images are 144,000 images. during training, this line of main.py print("correct / total: %d / %d, " % (n_correct, n_total)) prints n_total :64,000. why 64000? None of my training and testing images are 64,000. other note is that acc during training reach 99% but when i load trained model and test some images with demo.py, acc is 0 and all of predictions are wrong. I resume training with your demo.pth but actually, results of demo.pth is very very better than new trained models.

Can you help me what was wrong? thanks

The text was updated successfully, but these errors were encountered:

johnSmith1990 · 2019-10-27T07:35:09Z

Update:

based on close issues, I edited this lines:

change parser.add_argument('--alphabet', type=str, default='0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f:g:h:i:j:k:l:m:n:o:p:q:r:s:t:u:v:w:x:y:z:$')
to parser.add_argument('--alphabet', type=str, default='0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f:g:h:i:j:k:l:m:n:o:p:q:r:s:t:u:v:w:x:y:z:-:$')
change parser.add_argument('--MORAN', default='', help="path to model (to continue training)")
to parser.add_argument('--MORAN', default='demo.pth', help="path to model (to continue training)")
give the new alphabet to the lmdbdataset() in dataset.py.
change train_nips_dataset = dataset.lmdbDataset(root=opt.train_nips, transform=dataset.resizeNormalize((opt.imgW, opt.imgH)), reverse=opt.BidirDecoder) to train_nips_dataset = dataset.lmdbDataset(root=opt.train_nips, alphabet=opt.alphabet.split(opt.sep) transform=dataset.resizeNormalize((opt.imgW, opt.imgH)), reverse=opt.BidirDecoder)
change MORAN.load_state_dict(MORAN_state_dict_rename, strict=True)
to MORAN.load_state_dict(MORAN_state_dict_rename, strict=False)

but it give an error don't match in these lines:
for k, v in state_dict.items(): name = k.replace("module.", "") # removemodule.MORAN_state_dict_rename[name] = v MORAN.load_state_dict(MORAN_state_dict_rename, strict=False)

how can i correct that

johnSmith1990 · 2019-10-28T06:12:58Z

Update:
because of your comment:
Don't load the parameters of the attention module
I comment these lines:
for k, v in state_dict.items(): name = k.replace("module.", "") # remove 'module.' MORAN_state_dict_rename[name] = v

is it correct?
does it resume training from demo.pth? if yes why accuracy if 0.003 after one epoch?

johnSmith1990 · 2019-10-29T07:23:29Z

after training finished, the accuracy of demo.pth was better than saved models.
now I have 2 strategy:

combine my data and your data and train from scratch.
remove special characters and resume training with demo.pth.

What is your opinion?

Canjie-Luo · 2019-10-29T09:11:29Z

Thanks for your attention.
As you added only one character "-" to the alphabet, you can load all the parameters in demo.pth except the last fully connected layer in attentional decoder. To drop certain layers, you can simply use if 'XXX' in k: continue and MORAN.load_state_dict(MORAN_state_dict_rename, strict=False). Please refer to the official documents of PyTorch.
Make sure that the output (size) of the network and your prepared data are correct. Load pre-trained parameters, decrease the learning rate (maybe to 0.01 or 0.001) and fine-tune the MORAN.

johnSmith1990 · 2019-10-30T08:13:51Z

Ok thanks.
in the training process, two times models is saving. which one do we have to use?
try error?

Canjie-Luo · 2019-11-01T07:55:37Z

Please modify the strategy to evaluate and save your model:

MORAN_v2/main.py

Lines 241 to 245 in 2cd40c4

    
           if acc_tmp > acc: 
        
               acc = acc_tmp 
        
               torch.save(MORAN.state_dict(), '{0}/{1}_{2}.pth'.format( 
        
                       opt.experiment, i, str(acc)[:6]))

johnSmith1990 · 2019-11-01T07:59:02Z

very thanks Dear.

johnSmith1990 · 2019-11-01T08:12:00Z

My problem solved! and get beter results. thanks for your supports:

johnSmith1990 closed this as completed Nov 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resume training with My own dataset acc problem #105

resume training with My own dataset acc problem #105

johnSmith1990 commented Oct 27, 2019

johnSmith1990 commented Oct 27, 2019 •

edited

johnSmith1990 commented Oct 28, 2019 •

edited

johnSmith1990 commented Oct 29, 2019

Canjie-Luo commented Oct 29, 2019

johnSmith1990 commented Oct 30, 2019

Canjie-Luo commented Nov 1, 2019

johnSmith1990 commented Nov 1, 2019

johnSmith1990 commented Nov 1, 2019 •

edited

resume training with My own dataset acc problem #105

resume training with My own dataset acc problem #105

Comments

johnSmith1990 commented Oct 27, 2019

johnSmith1990 commented Oct 27, 2019 • edited

johnSmith1990 commented Oct 28, 2019 • edited

johnSmith1990 commented Oct 29, 2019

Canjie-Luo commented Oct 29, 2019

johnSmith1990 commented Oct 30, 2019

Canjie-Luo commented Nov 1, 2019

johnSmith1990 commented Nov 1, 2019

johnSmith1990 commented Nov 1, 2019 • edited

johnSmith1990 commented Oct 27, 2019 •

edited

johnSmith1990 commented Oct 28, 2019 •

edited

johnSmith1990 commented Nov 1, 2019 •

edited