Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The websites the dataset was scraped from? #6

Closed
imr555 opened this issue Sep 4, 2022 · 1 comment
Closed

The websites the dataset was scraped from? #6

imr555 opened this issue Sep 4, 2022 · 1 comment

Comments

@imr555
Copy link

imr555 commented Sep 4, 2022

As Alexa Web rankings shut down in May, 2022, (https://www.alexa.com/topsites/countries/BD), it is not possible to retrieve the names of the Bangladeshi websites used.

It would be really useful if the names of the fifty Bangladeshi websites used to scrape the dataset could be released. It would help understand the nature of the dataset used to train the model and help in model interpretability experiments too.

@abhik1505040
Copy link
Collaborator

Pretraining data sources have been enumerated in the appendix of our paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants