Skip to content

Latest commit

 

History

History
31 lines (15 loc) · 1.82 KB

File metadata and controls

31 lines (15 loc) · 1.82 KB

Original dataset sources

If you use these datasets in your projects, pelase cite the original sources.

In IWS we use the following datasets:

Amazon
A subset of the Amazon Review Data, aggregating all categories with more than 100k reviews from which we sample 200k reviews and split them into 160k training points and 40k test points.

https://nijianmo.github.io/amazon/index.html

He, R., and McAuley, J. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web (2016), International World Wide Web Conferences Steering Committee, pp. 507–517.

IMDB
The Movie Review Sentiment dataset which has 25k training samples and 25k test samples.

https://ai.stanford.edu/~amaas/data/sentiment/

Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., and Potts, C. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (June 2011), pp. 142–150.

Bias in Bios We use the Bias in Bios dataset from which we create binary classification tasks to distinguish difficult pairs among frequently occurring occupations. Specifically, we create the following subsets with equally sized train and test sets: journalist or photographer (n = 32 258), professor or teacher (n = 24 588), painter or architect (n = 12 236), professor or physician (n = 54 476).

BiasBios: http://aka.ms/biasbios

De-Arteaga, M., Romanov, A., Wallach, H., Chayes, J., Borgs, C., Chouldechova, A., Geyik, S., Kenthapadi, K., and Kalai, A. T. Bias in bios: A case study of semantic representation bias in a high-stakes setting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (2019), pp. 120–128.