Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

where is " [LSEP]" located in? #3

Open
jiaxiansen123 opened this issue Oct 14, 2021 · 3 comments
Open

where is " [LSEP]" located in? #3

jiaxiansen123 opened this issue Oct 14, 2021 · 3 comments

Comments

@jiaxiansen123
Copy link

When running the command of "Model Evaluation",my generated resulet only contains "". Could you help me find the relevant code about " [LSEP]".Thank you very much.

@ybai-nlp
Copy link
Owner

Hi, I used the '[unused5]' token to represent '[LSEP]' in BERT dictionary, hence the id of '[LSEP]' should be "5" in the model.

This modification don't need code change in model architecture, I directly modified data file to construct the training data.

However, it is required to distinguish the position of the '[LSEP]' token during inference stage, this part of code is in the file "src/models/predictor.py".

@jiaxiansen123
Copy link
Author

Thanks for your response sincerely!
When running NCLS+MS model, the total loss is " loss_ncls +loss_ms". However,I can't find some relevant code to represent loss_ncls or loss_ms respectively ,especially the total computation "+" to get the total loss. Also,there is same situation as loss_MCLAS.Maybe my understanding is not enough.
I'd appreciate it if you can give more explaination.

@ybai-nlp
Copy link
Owner

Hi, thanks for asking!

My implementation is pretty personal and perhaps not elegant enough.

For the NCLS+MS scenario, since Presumm used the sharded_compute_loss trick and it's hard to modify it, I directly concat the two outputs of the monolingual decoder and the cross-lingual decoder into one tensor. This part of code is in L291 model_builder.py (produce the monolingual output), and L271 in trainer.py (concatenate the outputs).

For the MCLAS loss, the output is S^A + S^B, the ground truth label is also S^A + S^B, so calculating the normal NLLloss between them is just fine.

Hope this can help you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants