Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why it outputs ". . . . . . . . . . . . ." when I run decode mode? #7

Open
fishermanff opened this issue May 9, 2017 · 18 comments
Open
Labels

Comments

@fishermanff
Copy link

Thanks for your impressive work.
I have trained the model and run this command:
python run_summarization.py --mode=decode --data_path=/path/to/val.bin --vocab_path=/path/to/vocab --log_root=/path/to/a/log/directory --exp_name=myexperiment

but the screen prints like this:
INFO:tensorflow:ARTICLE: -lrb- cnn -rrb- a grand jury in clark county , nevada , has indicted a 19-year-old man accused of fatally shooting his neighbor in front of her house last month . erich nowsch jr. faces charges of murder with a deadly weapon , attempted murder and firing a gun from within a car . police say nowsch shot tammy meyers , 44 , in front of her home after the car he was riding in followed her home february 12 . nowsch 's attorney , conrad claus , has said his client will argue self-defense . the meyers family told police that tammy meyers was giving her daughter a driving lesson when there was a confrontation with the driver of another car . tammy meyers drove home and sent her inside to get her brother , brandon , who allegedly brought a 9mm handgun . tammy meyers and her son then went back out , police said . they encountered the other car again , and there was gunfire , police said . investigators found casings from six .45 - caliber rounds at that scene . nowsch 's lawyer said after his client 's first court appearance that brandon meyers pointed a gun before anyone started shooting . he said the family 's story about a road-rage incident and what reportedly followed do n't add up . after zipping away from the first shooting , tammy meyers drove home and the other car , a silver audi , went there also . police said nowsch shot at both tammy and brandon meyers . tammy meyers was hit in the head and died two days later at a hospital . brandon meyers , who police said returned fire at the home , was not injured . the driver of the silver audi has yet to be found by authorities . that suspect was n't named in thursday 's indictment . nowsch was arrested five days after the killing in his family 's house , just one block away from the meyers ' home . he is due in court tuesday for a preliminary hearing .
INFO:tensorflow:REFERENCE SUMMARY: erich nowsch will face three charges , including first-degree murder . he is accused of killing tammy meyers in front of her home . the two lived !!withing!! walking distance of each other .
INFO:tensorflow:GENERATED SUMMARY: . arrest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The GENERATED SUMMARY is ". arrest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .", and the same as other test articles.
How can i fix it?
I have fix the bug in beam_search.py following your latest commit, but it doesn't solve this problem.

@abisee
Copy link
Owner

abisee commented May 9, 2017

I tend to see this kind of output in the earlier phases of training (i.e. when the model is still under-trained). Look at the loss curve on tensorboard -- has the loss decreased much? It may be that the model needs further training.

@fishermanff
Copy link
Author

fishermanff commented May 9, 2017

Thanks @abisee , seems you are right. I found the loss is still high, the model needs more training steps.

@makcbe
Copy link

makcbe commented May 19, 2017

@abisee: thank you and this is a great piece of work.
@fishermanff, are you able to tell what a high loss is like? Going with instructions, I am running train and eval concurrently. Is it correct? Also, suggestions on when to stop the training please? Is it ok to stop when the loss stops reducing any further? thank you.

@abisee
Copy link
Owner

abisee commented May 20, 2017

@makcbe Yes, the eval mode is designed to be run concurrently with train mode. The idea is you can see the loss on the validation set plotted alongside the loss on the training set in Tensorboard, helping you to spot overfitting etc.

About when to stop training: there's no easy answer for this. You might keep training until you find that the loss on the validation set is not reducing any more. You might find that after some time your validation set loss starts to rise while your training set loss reduces further (overfitting). In that case you want to stop training. If your loss function has gone flat you can try lowering your learning rate.

In any case you should run decode mode and look at some generated summaries. The visualization tool will make this much more informative.

@fishermanff
Copy link
Author

@makcbe hi, the author has answered the right things. I have run the training mode over 16.77k steps (see from tensorboard), the loss value is about 4.0; then I found the generated summaries have some correct outputs but the total performance is still far away from the ACL results. Hence, I think further steps are needed.

@makcbe
Copy link

makcbe commented May 20, 2017

Both, thank you for the support and that's definitely useful.

@ghost
Copy link

ghost commented Jun 3, 2017

Hi @fishermanff @abisee ,

when I trained for 3k steps, I saw the generated summary began to repeat the first sentence of the whole text. Did that happen to you?

Thanks.

@ghost
Copy link

ghost commented Jun 4, 2017

Hi @fishermanff @abisee,

when I trained to 40k steps. The results turn to be INFO:tensorflow:GENERATED SUMMARY: [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] ....
Just want to make sure further training will make it better.

@fishermanff
Copy link
Author

@lilyzl what's the loss, does it converge? see in tensorboard.

@ghost
Copy link

ghost commented Jun 4, 2017

@fishermanff Thanks for replying.
The [UNK] results are due to the NAN in the loss. I fixed it based on previous issue solutions.
Another question I have is that is the generated summary length variable? I set the minimum length to 30, then all results becomes 30 tokens. How should I deal with that?
Thanks a lot!

@fishermanff
Copy link
Author

fishermanff commented Jun 4, 2017

@lilyzl maybe you can stop decoding in your code when the decoder reach STOP token.

@abisee
Copy link
Owner

abisee commented Jun 6, 2017

Hi @lilyzl

  1. Yes, repetition is very common (it is one of the two big things we are aiming to fix as noted in the ACL paper). That's what the coverage setting is for - to reduce repetition.
  2. Yes, the generated summary length is variable. It's generated using beam search. Essentially it keeps producing tokens until it produces the STOP token. I'm not sure why your decoded summaries are all length 30 if your minimum length is 30. Have a look at the code in beam_search.py.

@fishermanff
Copy link
Author

Hi @abisee
I have trained the model for 80k steps, and then I press Ctrl+C to terminate the training process. I am confused that if the variables saved in logs/train/ file would be restored automatically when I rerun run_summarization.py in 'train' mode? Or I need to add some code like "tf.train.Saver.restore()" by myself to restore the pre-train variables?

@abisee
Copy link
Owner

abisee commented Jun 12, 2017

Hi @fishermanff

Yes, running run_summarization.py in train mode should restore your last training checkpoint. I think it's handled by the supervisor.

@fishermanff
Copy link
Author

Thanks @abisee , copy that

@adowu
Copy link

adowu commented Sep 16, 2018

Hi @fishermanff @abisee,

when I trained to 40k steps. The results turn to be INFO:tensorflow:GENERATED SUMMARY: [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] ....
Just want to make sure further training will make it better.

@fishermanff Thanks for replying.
The [UNK] results are due to the NAN in the loss. I fixed it based on previous issue solutions.
Another question I have is that is the generated summary length variable? I set the minimum length to 30, then all results becomes 30 tokens. How should I deal with that?
Thanks a lot!

hello, i also get [UNK] in mu SUMMARY result. could you tell me how to solve this problem? i found nothing in previous issues. Thanks a lot

@JenuTandel
Copy link

Hi @fishermanff @abisee,
when I trained to 40k steps. The results turn to be INFO:tensorflow:GENERATED SUMMARY: [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] ....
Just want to make sure further training will make it better.

@fishermanff Thanks for replying.
The [UNK] results are due to the NAN in the loss. I fixed it based on previous issue solutions.
Another question I have is that is the generated summary length variable? I set the minimum length to 30, then all results becomes 30 tokens. How should I deal with that?
Thanks a lot!

hello, i also get [UNK] in mu SUMMARY result. could you tell me how to solve this problem? i found nothing in previous issues. Thanks a lot

Hi @fishermanff @abisee,
when I trained to 40k steps. The results turn to be INFO:tensorflow:GENERATED SUMMARY: [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] ....
Just want to make sure further training will make it better.

@fishermanff Thanks for replying.
The [UNK] results are due to the NAN in the loss. I fixed it based on previous issue solutions.
Another question I have is that is the generated summary length variable? I set the minimum length to 30, then all results becomes 30 tokens. How should I deal with that?
Thanks a lot!

hello, i also get [UNK] in mu SUMMARY result. could you tell me how to solve this problem? i found nothing in previous issues. Thanks a lot

I have also same kind of problem. If you have any solution then suggest me.

@GaneshDoosa
Copy link

Hi @fishermanff @abisee,
when I trained to 40k steps. The results turn to be INFO:tensorflow:GENERATED SUMMARY: [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] ....
Just want to make sure further training will make it better.

@fishermanff Thanks for replying.
The [UNK] results are due to the NAN in the loss. I fixed it based on previous issue solutions.
Another question I have is that is the generated summary length variable? I set the minimum length to 30, then all results becomes 30 tokens. How should I deal with that?
Thanks a lot!

hello, i also get [UNK] in mu SUMMARY result. could you tell me how to solve this problem? i found nothing in previous issues. Thanks a lot

Hi @fishermanff @abisee,
when I trained to 40k steps. The results turn to be INFO:tensorflow:GENERATED SUMMARY: [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] ....
Just want to make sure further training will make it better.

@fishermanff Thanks for replying.
The [UNK] results are due to the NAN in the loss. I fixed it based on previous issue solutions.
Another question I have is that is the generated summary length variable? I set the minimum length to 30, then all results becomes 30 tokens. How should I deal with that?
Thanks a lot!

hello, i also get [UNK] in mu SUMMARY result. could you tell me how to solve this problem? i found nothing in previous issues. Thanks a lot

I have also same kind of problem. If you have any solution then suggest me.

Did anyone solve this UNK problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants
@abisee @makcbe @GaneshDoosa @fishermanff @adowu @JenuTandel and others