Empty prediction on CNN/DM with beam > 1 #457

pltrdy · 2017-12-20T12:54:42Z

I trained a summarization model using @srush 's setup (described here), during translate, using a beam size higher than one result in an empty prediction. Using beam = 1 make redundancy.

Preprocessing

The problem may come from the dataset which I processed again. I'm basically running:

A. See's preprocessing
@mataney 's script to get text file from TF bins (gist here)
I'm then replacing </s> tokens from target files to </t>
OpenNMT preprocessing as follows:

  python preprocess.py \
      -train_src $data/train.src.txt \
      -train_tgt $data/train.tgt.repeos.txt \
      -valid_src $data/valid.src.txt \
      -valid_tgt $data/valid.tgt.repeos.txt \
      -save_data $root/data \
      -src_seq_length 10000 \
      -tgt_seq_length 10000 \
      -src_seq_length_trunc 400 \
      -tgt_seq_length_trunc 100 \
      -dynamic_dict \
      -share_vocab \
      -save_data $root/data

(*.repeos.txt files are those with </t> replacing </s>)

Training

  python train.py -data $root/data \
        -save_model $root/model \
        -copy_attn \
        -global_attention mlp \
        -word_vec_size 128 \
        -rnn_size 256 \
        -layers 1 \
        -encoder_type "brnn" \
        -epochs 16 \
        -seed 777 \
        -batch_size 32 \
        -max_grad_norm 2 \
        -share_decoder_embeddings \
        -gpuid 0

## Translation

  python translate.py -model "$best_model" \
                      -src $data/test.src.txt \
                      -gpu "$gpu" \
                      -batch_size 1 \ 
                      -verbose \
                      -beam_size 5

Results

with beam_size = 1: redundant, but not empty, e.g.

PRED 1:  <s> new : the crash of germanwings flight 9525 flight 9525 into the french alps . </t> <s> the crash of germ
anwings flight 9525 flight 9525 into the french alps . </t> <s> the crash of a cell phone video was recovered from a
 phone at the wreckage site . </t> <s> the crash of the flight 9525 flight 9525 's possible motive . </t>

PRED 2: <s> the icc 's founding rome statute is based at the hague , in the netherlands . </t> <s> the icc opened a 
preliminary examination into the situation in january . </t> <s> the icc opened a preliminary examination into the s
ituation in the netherlands . </t> <s> the icc opened a preliminary examination into the situation in january . </t>

with beam_size > 1: empty, each beam produce and eos (token_id = 3)

with n_best > 1: interestingly, I find good sentences in n_best, that are not THE best, e.g.

<s> french prosecutor : `` so far no videos were used in the crash investigation '' </t> <s> robin 's com
ments were `` completely wrong '' and `` unwarranted '' cell phones . </t> <s> `` it is a very disturbing scene , ''
 he says . </t>

but also some that contains redundancy:

 <s> the formal accession was marked with a ceremony at the hague in the netherlands . </t> <s> the formal
 accession was marked with a ceremony at the hague in the netherlands . </t> <s> the formal accession was marked wit
h a ceremony at the hague . </t>

This is not a trivial problem, I really don't know how this happens.
Any clues are welcome!

The text was updated successfully, but these errors were encountered:

srush · 2017-12-20T22:41:53Z

Can you print your logs with -verbose as well?

sebastianGehrmann · 2017-12-21T00:26:20Z

Hm, I have run into a related problem in the past where one in every ~20 predictions was empty, even with beam size 1. Looking at the other top predictions, everything seems normal. I have not been able to replicate this error consistently yet, but a simple fix is to set the probability of EOS to -1e7 or so for the very first step. Let me know if you make progress figuring this bug out!

srush · 2017-12-21T03:52:13Z

Oh okay. Let me try adding a min_length option.

pltrdy · 2017-12-21T09:45:06Z

You can find the trained model here.

pltrdy · 2017-12-21T11:44:13Z

I just checked Abisee's work, she is indeed using an min_length option, and discarding beams that are too short.

Adding an option would make the implementation in line with her work then.

mataney · 2017-12-21T12:49:16Z

@pltrdy Brilliant!
Have you manage to check ROUGE on this!?

pltrdy · 2018-01-02T10:39:20Z

@mataney sry was in vacations.
Since the top prediction is mostly empty I haven't ran the ROUGE scoring, it would be really bad. This has to be fixed before scoring can take place.

pltrdy mentioned this issue Jan 2, 2018

Minimum prediction length at translation time #496

Merged

srush closed this as completed Jan 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty prediction on CNN/DM with beam > 1 #457

Empty prediction on CNN/DM with beam > 1 #457

pltrdy commented Dec 20, 2017

srush commented Dec 20, 2017

sebastianGehrmann commented Dec 21, 2017

srush commented Dec 21, 2017

pltrdy commented Dec 21, 2017 •

edited

Loading

pltrdy commented Dec 21, 2017 •

edited

Loading

mataney commented Dec 21, 2017

pltrdy commented Jan 2, 2018

Empty prediction on CNN/DM with beam > 1 #457

Empty prediction on CNN/DM with beam > 1 #457

Comments

pltrdy commented Dec 20, 2017

Preprocessing

Training

Results

srush commented Dec 20, 2017

sebastianGehrmann commented Dec 21, 2017

srush commented Dec 21, 2017

pltrdy commented Dec 21, 2017 • edited Loading

pltrdy commented Dec 21, 2017 • edited Loading

mataney commented Dec 21, 2017

pltrdy commented Jan 2, 2018

pltrdy commented Dec 21, 2017 •

edited

Loading

pltrdy commented Dec 21, 2017 •

edited

Loading