Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about DSC dataset and bert_frozen ncl #28

Open
drndr opened this issue Oct 23, 2023 · 0 comments
Open

question about DSC dataset and bert_frozen ncl #28

drndr opened this issue Oct 23, 2023 · 0 comments

Comments

@drndr
Copy link

drndr commented Oct 23, 2023

Hi!
I have a question regarding the DSC full dataset. In the CTR paper it is said each domain has 2500-2500 positive and negative reviews for training, but the dataset itself (at least the 10 chosen domains as in the paper) have a max 2000-2000 samples, in some domains even less. Is this possible?

Also the two scenarios (dil_classification, til_classification) are not completely clear for me, if I understood correctly from the code, DIL doesnt use task-ids while TIL does. Which scenario should be used with the DSC dataset?

Finally, we have been able to reproduce some results on the DSC dataset, mostly within a couple of points compared to the table (in this repo or in the CTR paper), but BERT frozen NCL consistently produces 10-15% higher results. Currently we have an average accuracy of 0.8772 over 5 runs with different sequence seeds. Any idea why this naive approach would overperform the reported numbers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant