Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to replicate the result of top1, top5 acc for CVAE model #2

Open
xp1992slz opened this issue Apr 16, 2019 · 10 comments
Open

failed to replicate the result of top1, top5 acc for CVAE model #2

xp1992slz opened this issue Apr 16, 2019 · 10 comments

Comments

@xp1992slz
Copy link

xp1992slz commented Apr 16, 2019

Hi,

I am trying to replicate your chatbot model. I did exactly the same thing as you mentioned in the README. I got more or less similar kl/reconstruction loss for training CVAE. But for emoji classification, top1 and top5 acc for CVAE model is only 30.4% and 54.3%, which is much worse than the results you reported in the paper. Can you give some suggestions about this?

Thanks
Peng

@claude-zhou
Copy link
Owner

what is the performance of your Base model?
I suggest choosing the breakpoint of a not yet converged Base model as the pretrained model. And train your CVAE model starting from there.
Also, you may need to stop the training of the emoji classifier before it overfits.

@xp1992slz
Copy link
Author

Thanks for your reply.
The base model achieves perplexity of 134.244/132.922 at step 18000/27500 on test set. And I choose the breakpoint at step 18000 for the starting point for CVAE training. After CVAE model training, I got recon/kl loss as 42.426/26.412. For the emoji classifier, the best model is 'step': 9000, 'epoch': 2, 'accuracy': 0.3211703300476074, 'loss': 2.8405072689056396, 'top_5_accuracy': 0.5782781839370728 on the test set, which is same as what you reported.

I got the acc result on top1/top5 as 0.304/0.543 for CVAE model from the rl_run.py as you will get the test set performance before training. Any ideas about the problem?

Best
Peng

@claude-zhou
Copy link
Owner

This looks like the result of the Base model. Have you tried printing the acc results of your Base model?

@xp1992slz
Copy link
Author

Thanks for your comment. I will try the base model and let you know the result.

@xp1992slz
Copy link
Author

Sorry for the late reply.

I ran the base model and the top1/top5 accuracy is 0.349/0.575, which is similar to what you reported in the paper. However, the CVAE model is even worse than the baseline seq2seq.

Please let me know your opinion.

Thanks!
Peng

@Wardwarf-Li
Copy link

Hi! I got the same accuracy problem as you discussed here when replicating the model and i am still confused why.If you guys could give me some advice i will be very appreciated.

@KingS770234358
Copy link

@claude-zhou
Dear zhou
I attempt to implements your paper with pytorch
when I read your code,I found that the output you used in calculating the test loss(perplexity) was the output of training decoder but not inference decoder.
As you mentioned in the code "use inference decoder's logits to compute recon_loss"
but the logits was the output of training decoder not inference decoders'
shouldn't we use the output of inference decoder to calculate the test loss(perplexity)?

with tf.variable_scope("loss"): max_time = tf.shape(self.rep_output)[0] with tf.variable_scope("reconstruction"): # TODO: use inference decoder's logits to compute recon_loss cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits( # ce = [len, batch_size] labels=self.rep_output, logits=self.logits) # rep: [len, batch_size]; logits: [len, batch_size, vocab_size] target_mask = tf.sequence_mask( self.rep_len + 1, max_time, dtype=self.logits.dtype) # time_major target_mask_t = tf.transpose(target_mask) # max_len batch_size self.recon_losses = tf.reduce_sum(cross_entropy * target_mask_t, axis=0) self.recon_loss = tf.reduce_sum(cross_entropy * target_mask_t) / batch_size
# Dynamic decoding infer_outputs, _, infer_lengths = seq2seq.dynamic_decode( decoder, maximum_iterations=maximum_iterations, output_time_major=True, swap_memory=True, scope=decoder_scope ) if beam_width > 0: self.result = infer_outputs.predicted_ids else: self.result = infer_outputs.sample_id self.result_lengths = infer_lengths

@KingS770234358
Copy link

# Dynamic decoding infer_outputs, _, infer_lengths = seq2seq.dynamic_decode( decoder, maximum_iterations=maximum_iterations, output_time_major=True, swap_memory=True, scope=decoder_scope ) if beam_width > 0: self.result = infer_outputs.predicted_ids else: self.result = infer_outputs.sample_id self.result_lengths = infer_lengths

@KingS770234358
Copy link

I am thinking that we should use "infer_outputs" to calculate the CrossEntropy and calculate the perplexity further

@KingS770234358
Copy link

I will be appreciate it very much if you could give some advice @claude-zhou

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants