failed to replicate the result of top1, top5 acc for CVAE model #2

xp1992slz · 2019-04-16T10:01:22Z

Hi,

I am trying to replicate your chatbot model. I did exactly the same thing as you mentioned in the README. I got more or less similar kl/reconstruction loss for training CVAE. But for emoji classification, top1 and top5 acc for CVAE model is only 30.4% and 54.3%, which is much worse than the results you reported in the paper. Can you give some suggestions about this?

Thanks
Peng

claude-zhou · 2019-04-16T17:11:21Z

what is the performance of your Base model?
I suggest choosing the breakpoint of a not yet converged Base model as the pretrained model. And train your CVAE model starting from there.
Also, you may need to stop the training of the emoji classifier before it overfits.

xp1992slz · 2019-04-17T07:20:52Z

Thanks for your reply.
The base model achieves perplexity of 134.244/132.922 at step 18000/27500 on test set. And I choose the breakpoint at step 18000 for the starting point for CVAE training. After CVAE model training, I got recon/kl loss as 42.426/26.412. For the emoji classifier, the best model is 'step': 9000, 'epoch': 2, 'accuracy': 0.3211703300476074, 'loss': 2.8405072689056396, 'top_5_accuracy': 0.5782781839370728 on the test set, which is same as what you reported.

I got the acc result on top1/top5 as 0.304/0.543 for CVAE model from the rl_run.py as you will get the test set performance before training. Any ideas about the problem?

Best
Peng

claude-zhou · 2019-04-18T02:48:49Z

This looks like the result of the Base model. Have you tried printing the acc results of your Base model?

xp1992slz · 2019-04-22T22:56:09Z

Thanks for your comment. I will try the base model and let you know the result.

xp1992slz · 2019-05-05T15:41:15Z

Sorry for the late reply.

I ran the base model and the top1/top5 accuracy is 0.349/0.575, which is similar to what you reported in the paper. However, the CVAE model is even worse than the baseline seq2seq.

Please let me know your opinion.

Thanks!
Peng

Wardwarf-Li · 2019-05-17T08:52:25Z

Hi! I got the same accuracy problem as you discussed here when replicating the model and i am still confused why.If you guys could give me some advice i will be very appreciated.

KingS770234358 · 2020-02-27T12:15:28Z

@claude-zhou
Dear zhou
I attempt to implements your paper with pytorch
when I read your code,I found that the output you used in calculating the test loss(perplexity) was the output of training decoder but not inference decoder.
As you mentioned in the code "use inference decoder's logits to compute recon_loss"
but the logits was the output of training decoder not inference decoders'
shouldn't we use the output of inference decoder to calculate the test loss(perplexity)?

with tf.variable_scope("loss"): max_time = tf.shape(self.rep_output)[0] with tf.variable_scope("reconstruction"): # TODO: use inference decoder's logits to compute recon_loss cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits( # ce = [len, batch_size] labels=self.rep_output, logits=self.logits) # rep: [len, batch_size]; logits: [len, batch_size, vocab_size] target_mask = tf.sequence_mask( self.rep_len + 1, max_time, dtype=self.logits.dtype) # time_major target_mask_t = tf.transpose(target_mask) # max_len batch_size self.recon_losses = tf.reduce_sum(cross_entropy * target_mask_t, axis=0) self.recon_loss = tf.reduce_sum(cross_entropy * target_mask_t) / batch_size
# Dynamic decoding infer_outputs, _, infer_lengths = seq2seq.dynamic_decode( decoder, maximum_iterations=maximum_iterations, output_time_major=True, swap_memory=True, scope=decoder_scope ) if beam_width > 0: self.result = infer_outputs.predicted_ids else: self.result = infer_outputs.sample_id self.result_lengths = infer_lengths

KingS770234358 · 2020-02-27T12:16:12Z

# Dynamic decoding infer_outputs, _, infer_lengths = seq2seq.dynamic_decode( decoder, maximum_iterations=maximum_iterations, output_time_major=True, swap_memory=True, scope=decoder_scope ) if beam_width > 0: self.result = infer_outputs.predicted_ids else: self.result = infer_outputs.sample_id self.result_lengths = infer_lengths

KingS770234358 · 2020-02-27T12:18:24Z

I am thinking that we should use "infer_outputs" to calculate the CrossEntropy and calculate the perplexity further

KingS770234358 · 2020-02-27T12:19:43Z

I will be appreciate it very much if you could give some advice @claude-zhou

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failed to replicate the result of top1, top5 acc for CVAE model #2

failed to replicate the result of top1, top5 acc for CVAE model #2

xp1992slz commented Apr 16, 2019 •

edited

claude-zhou commented Apr 16, 2019

xp1992slz commented Apr 17, 2019

claude-zhou commented Apr 18, 2019

xp1992slz commented Apr 22, 2019

xp1992slz commented May 5, 2019

Wardwarf-Li commented May 17, 2019

KingS770234358 commented Feb 27, 2020

KingS770234358 commented Feb 27, 2020

KingS770234358 commented Feb 27, 2020

KingS770234358 commented Feb 27, 2020

failed to replicate the result of top1, top5 acc for CVAE model #2

failed to replicate the result of top1, top5 acc for CVAE model #2

Comments

xp1992slz commented Apr 16, 2019 • edited

claude-zhou commented Apr 16, 2019

xp1992slz commented Apr 17, 2019

claude-zhou commented Apr 18, 2019

xp1992slz commented Apr 22, 2019

xp1992slz commented May 5, 2019

Wardwarf-Li commented May 17, 2019

KingS770234358 commented Feb 27, 2020

KingS770234358 commented Feb 27, 2020

KingS770234358 commented Feb 27, 2020

KingS770234358 commented Feb 27, 2020

xp1992slz commented Apr 16, 2019 •

edited