Filtered Union Bibliography

This automatically-generated file contains references from the main union bibliography that have been filtered for a single tag. Do not edit this file; instead, please update the main bibliography and tag references appropriately to have them show up here. Thank you!

The papers are listed in the same order as the main bibliography; e.g., by year of publication / release; then by surname / name of the first author.

McMillan-Major, Angelina, Emily M. Bender and Batya Friedman. (2023). Data Statements: From Technical Concept to Community Practice, ACM Journal on Responsible Computing. [paper]
Bender, Emily M., Friedman, B. and McMillan-Major, A. (2021). A Guide for Writing Data Statements for Natural Language Processing [paper]
Birhane, A., Prabhu, V. U., & Kahembwe, E. (2021). Multimodal datasets: misogyny, pornography, and malignant stereotypes. arXiv preprint arXiv:2110.01963. [paper]
Dodge, J., Sap, M., Marasovic, A., Agnew, W., Ilharco, G., Groeneveld, D., ... & Face, H. (2021, September). Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1286–1305, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics. [paper]
Moss, E., Watkins, E. A., Singh, R., Elish, M. C., & Metcalf, J. (2021). Assembling Accountability: Algorithmic Impact Assessment for the Public Interest. Available at SSRN 3877437. [paper]
Bird, S. (2020, December). Decolonising speech and language technology. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 3504-3519). doi:10.18653/v1/2020.coling-main.313 [paper]
Jo, E. S., & Gebru, T. (2020, January). Lessons from archives: Strategies for collecting sociocultural data in machine learning. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 306-316). [paper]
Joshi, P., Santy, S., Budhiraja, A., Bali, K., & Choudhury, M. (2020). The state and fate of linguistic diversity and inclusion in the NLP world. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2020, 6282-6293. doi:10.18653/v1/2020.acl-main.560 [paper]
Paullada, A., Raji, I. D., Bender, E. M., Denton, E., & Hanna, A. (2020). Data and its (dis) contents: A survey of dataset development and use in machine learning research. Patterns, Volume 2, Issue 11, 12 November 2021, Pages 100388. [paper] [paper]
Kann, K., Cho, K., & Bowman, S. R. (2019). Towards realistic practices in low-resource natural language processing: the development set. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3342–3349, Hong Kong, China. Association for Computational Linguistics. doi:10.18653/v1/D19-1329 [paper]
Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., ... & Gebru, T. (2019, January). Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency (pp. 220-229). [paper]
Raji, I. D., & Yang, J. (2019). About ML: Annotation and benchmarking on understanding and transparency of machine learning lifecycles. arXiv preprint arXiv:1912.06166. [paper]
Bender, E. M., & Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587-604 doi:10.1162/tacl_a_00041 [paper]
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2018). Datasheets for datasets. Commun. ACM 64, 12 (December 2021), 86–92. DOI:https://doi.org/10.1145/3458723. [paper]
Holland, S., Hosny, A., Newman, S., Joseph, J., & Chmielinski, K. (2018). The dataset nutrition label: A framework to drive higher data quality standards. arXiv preprint arXiv:1805.03677. [paper]
Mieskes, M. (2017, April). A quantitative study of data in the NLP community. In Proceedings of the first ACL workshop on ethics in natural language processing (pp. 23-29). [paper]
Bretonnel Cohen, K.; Pestian, J. P. & Fort, K. Annotating suicide notes : ethical issues at a glance. In Proc. of ETeRNAL (Ethique et Traitement Automatique des Langues), June 2015, Caen, France. [paper]
Couillault, A., Fort, K., Adda, G., & De Mazancourt, H. (2014, May). Evaluating corpora documentation with regards to the ethics and big data charter. In International Conference on Language Resources and Evaluation (LREC). [paper]
Drugan, J. & Babych, B. Shared Resources, Shared Values? Ethical Implications of Sharing Translation Resources. Proceedings of the Second Joint EM+/CNGL Workshop: Bringing MT to the User: Research on Integrating MT in the Translation Industry, Association for Machine Translation in the Americas, 2010, 3-10. [paper]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t-data.md

t-data.md

Filtered Union Bibliography

Files

t-data.md

Latest commit

History

t-data.md

File metadata and controls

Filtered Union Bibliography