Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why use albert-xxlarge instead of bert-base when training on some datasets? #7

Closed
Kobayashi-Wang opened this issue Nov 10, 2021 · 2 comments

Comments

@Kobayashi-Wang
Copy link

Kobayashi-Wang commented Nov 10, 2021

I run the code using bert-base on the dataset Conll04, and got F1-scores approximately 66. I find the f1 is much lower than using albert-large. I wonder whether the comparison between this model using albert-large and the previous work using bert-base is really reasonable?

@Coopercoppers
Copy link
Owner

Coopercoppers commented Nov 11, 2021

Table-sequence uses Albert-xxlarge, and we want to make the experiment setting the same as the previous sota.
Also, as I mention, model is delicate in this dataset, you need to carefully tune the hyper parameters even if the only change you make is the embedding.
3-4 point difference between Bert and Albert should be reasonable, I suggest that you tune the lr, batch size and clip

@Kobayashi-Wang
Copy link
Author

I get it, thank you very much for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants