question about DSC dataset and bert_frozen ncl #28

drndr · 2023-10-23T11:19:10Z

Hi!
I have a question regarding the DSC full dataset. In the CTR paper it is said each domain has 2500-2500 positive and negative reviews for training, but the dataset itself (at least the 10 chosen domains as in the paper) have a max 2000-2000 samples, in some domains even less. Is this possible?

Also the two scenarios (dil_classification, til_classification) are not completely clear for me, if I understood correctly from the code, DIL doesnt use task-ids while TIL does. Which scenario should be used with the DSC dataset?

Finally, we have been able to reproduce some results on the DSC dataset, mostly within a couple of points compared to the table (in this repo or in the CTR paper), but BERT frozen NCL consistently produces 10-15% higher results. Currently we have an average accuracy of 0.8772 over 5 runs with different sequence seeds. Any idea why this naive approach would overperform the reported numbers?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about DSC dataset and bert_frozen ncl #28

question about DSC dataset and bert_frozen ncl #28

drndr commented Oct 23, 2023

question about DSC dataset and bert_frozen ncl #28

question about DSC dataset and bert_frozen ncl #28

Comments

drndr commented Oct 23, 2023