The websites the dataset was scraped from? #6

imr555 · 2022-09-04T09:53:13Z

As Alexa Web rankings shut down in May, 2022, (https://www.alexa.com/topsites/countries/BD), it is not possible to retrieve the names of the Bangladeshi websites used.

It would be really useful if the names of the fifty Bangladeshi websites used to scrape the dataset could be released. It would help understand the nature of the dataset used to train the model and help in model interpretability experiments too.

abhik1505040 · 2022-09-10T08:20:32Z

Pretraining data sources have been enumerated in the appendix of our paper.

abhik1505040 closed this as completed Sep 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The websites the dataset was scraped from? #6

The websites the dataset was scraped from? #6

imr555 commented Sep 4, 2022 •

edited

abhik1505040 commented Sep 10, 2022

The websites the dataset was scraped from? #6

The websites the dataset was scraped from? #6

Comments

imr555 commented Sep 4, 2022 • edited

abhik1505040 commented Sep 10, 2022

imr555 commented Sep 4, 2022 •

edited