Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification - Inference #12

Closed
zolekode opened this issue Oct 19, 2020 · 1 comment
Closed

Clarification - Inference #12

zolekode opened this issue Oct 19, 2020 · 1 comment

Comments

@zolekode
Copy link

Hi @aladdinpersson

thanks for the work you share. Please could you provide a clear explanation on how inference work?
I have watched your videos and still don't understand a 100% how:
1- The seqeunce is produced at training time
2- How the sequence is produced at test time

I saw your inference script but honestly, the whole thing is super blur to me.

def translate_sentence(model, sentence, german, english, device, max_length=50):

@zolekode zolekode changed the title Clarification Clarification - Inference Oct 19, 2020
@aladdinpersson
Copy link
Owner

Hey @zolekode

I will try my best!

  1. How the sequence is produced at training time
    So at training time we have the entire input and entire target sentences and all we have to do is to: Tokenize --> Numericalize --> Pad (so all are of equal length in the batch). I have separate videos where I go into more details on the data loading part and you could check out the torchtext videos for that. But after that both of these are inputted to the transformer and we utilize masking so that the network doesn't cheat by looking ahead in the target sentence (I've also gone into more depth on this in the transformer from scratch video).

  2. How the sequence is produced at test time
    Obviously at test time we don't have the entire target sentence but we have the input sentence, and what we do is that we try to output a single word at a time (that's we have for i in range(max_length)) loop in translate_sentence function. In the beginning we only have a start token for the target, but for each iteration in the for loop we gain one additional output predicted from the model (we take the highest probability prediction and append it to our outputs). We continue doing this in the for loop until we either a) reach a EOS token, or b) continue until max_length is reached.

Hopefully that clarifies a little bit :)

/Aladdin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants