Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to set a proper -train_steps during training a seq2seq model #866

Closed
alphadl opened this issue Jul 31, 2018 · 9 comments

Comments

@alphadl
Copy link
Contributor

@alphadl alphadl commented Jul 31, 2018

background: because of the changing of the parameters in the latest OpenNMT-py version , the parameter "-epochs " has changed to "-train_steps".

question: If I need to train a seq2seq rather than transformer model by using opennmt recently, how to determine the number of training epochs? 13 or 100000 ? (Because these two Numbers are not in the same order of magnitude, I am quite confused)

@vince62s

This comment has been minimized.

Copy link
Contributor

@vince62s vince62s commented Jul 31, 2018

it all depends on your training dataset. You can do the math in sentence batch mode. in token mode it's a bit trickier.
if you have 1000000 segments, and use batch_size 64, it means you have 15625 steps to finish 1 epoch.
However if you use accum X or more than GPU, it's also different.

@alphadl

This comment has been minimized.

Copy link
Contributor Author

@alphadl alphadl commented Jul 31, 2018

I see , thanks a lot~

@alphadl

This comment has been minimized.

Copy link
Contributor Author

@alphadl alphadl commented Jul 31, 2018

I assume that segments in your answer means the number of sentence pairs ~ so , If I have 500w sentence pairs and the batch_size is 128, each epoch needs 500w/128 ≈ 4w steps ?

@alphadl

This comment has been minimized.

Copy link
Contributor Author

@alphadl alphadl commented Jul 31, 2018

btw, can you explain the meaning of "-max_generator_batches" , is it useful during decoding?

@vince62s

This comment has been minimized.

Copy link
Contributor

@vince62s vince62s commented Jul 31, 2018

yes to your first question.
max_generator_batches is just a mechanism at training time to parallelize the soft max calculation. (no use at inference)

@alphadl

This comment has been minimized.

Copy link
Contributor Author

@alphadl alphadl commented Jul 31, 2018

okay , thanks ~~~

another question :

when I use multi-gpu option, an error was reported:

Traceback (most recent call last):
File "/home/dingliang/tools/OpenNMT-py-master/train.py", line 37, in
main(opt)
File "/home/dingliang/tools/OpenNMT-py-master/train.py", line 22, in main
multi_main(opt)
File "/home/dingliang/tools/OpenNMT-py-master/onmt/train_multi.py", line 20, in main
mp = torch.multiprocessing.get_context('spawn')
AttributeError: 'module' object has no attribute 'get_context'

actually, I looked up the officially document of PyTorch 0.4.1 [https://pytorch.org/docs/stable/multiprocessing.html], they haven't provided the attribute "get_context", how could I tackle this problem if I want to use multi-gpu

@vince62s

This comment has been minimized.

Copy link
Contributor

@vince62s vince62s commented Jul 31, 2018

Provide the command line
However the forum is better suited for question

@merc85garcia

This comment has been minimized.

Copy link

@merc85garcia merc85garcia commented Sep 11, 2018

Hello,
I am trying the Transformer model in OpenNMT-py I am using the last version. When I use the option -train_steps in order to train a model longer, the training process does not start, after charging the train dataset everything stops without giving an error. Do you have any idea of why this is happening? Thank you!

@kingkf

This comment has been minimized.

Copy link

@kingkf kingkf commented Nov 20, 2018

"AttributeError: 'module' object has no attribute 'get_context'"===> I met the same question, I solved it by using python3. I hope can help you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.