Skip to content

Latest commit

 

History

History
30 lines (26 loc) · 8.56 KB

t-data.md

File metadata and controls

30 lines (26 loc) · 8.56 KB

Filtered Union Bibliography

This automatically-generated file contains references from the main union bibliography that have been filtered for a single tag. Do not edit this file; instead, please update the main bibliography and tag references appropriately to have them show up here. Thank you!

The papers are listed in the same order as the main bibliography; e.g., by year of publication / release; then by surname / name of the first author.

Data

  • McMillan-Major, Angelina, Emily M. Bender and Batya Friedman. (2023). Data Statements: From Technical Concept to Community Practice, ACM Journal on Responsible Computing. [paper] Data published
  • Bender, Emily M., Friedman, B. and McMillan-Major, A. (2021). A Guide for Writing Data Statements for Natural Language Processing [paper] Data report
  • Birhane, A., Prabhu, V. U., & Kahembwe, E. (2021). Multimodal datasets: misogyny, pornography, and malignant stereotypes. arXiv preprint arXiv:2110.01963. [paper] Data preprint
  • Dodge, J., Sap, M., Marasovic, A., Agnew, W., Ilharco, G., Groeneveld, D., ... & Face, H. (2021, September). Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1286–1305, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics. [paper] Data published
  • Moss, E., Watkins, E. A., Singh, R., Elish, M. C., & Metcalf, J. (2021). Assembling Accountability: Algorithmic Impact Assessment for the Public Interest. Available at SSRN 3877437. [paper] Data report
  • Bird, S. (2020, December). Decolonising speech and language technology. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 3504-3519). doi:10.18653/v1/2020.coling-main.313 [paper] Language Diversity Data published
  • Jo, E. S., & Gebru, T. (2020, January). Lessons from archives: Strategies for collecting sociocultural data in machine learning. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 306-316). [paper] Data published
  • Joshi, P., Santy, S., Budhiraja, A., Bali, K., & Choudhury, M. (2020). The state and fate of linguistic diversity and inclusion in the NLP world. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2020, 6282-6293. doi:10.18653/v1/2020.acl-main.560 [paper] Data Language Diversity published
  • Paullada, A., Raji, I. D., Bender, E. M., Denton, E., & Hanna, A. (2020). Data and its (dis) contents: A survey of dataset development and use in machine learning research. Patterns, Volume 2, Issue 11, 12 November 2021, Pages 100388. [paper] Data published [paper] Data published
  • Kann, K., Cho, K., & Bowman, S. R. (2019). Towards realistic practices in low-resource natural language processing: the development set. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3342–3349, Hong Kong, China. Association for Computational Linguistics. doi:10.18653/v1/D19-1329 [paper] Data published
  • Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., ... & Gebru, T. (2019, January). Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency (pp. 220-229). [paper] Data published
  • Raji, I. D., & Yang, J. (2019). About ML: Annotation and benchmarking on understanding and transparency of machine learning lifecycles. arXiv preprint arXiv:1912.06166. [paper] Data preprint
  • Bender, E. M., & Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587-604 doi:10.1162/tacl_a_00041 [paper] Data published
  • Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2018). Datasheets for datasets. Commun. ACM 64, 12 (December 2021), 86–92. DOI:https://doi.org/10.1145/3458723. [paper] Data published
  • Holland, S., Hosny, A., Newman, S., Joseph, J., & Chmielinski, K. (2018). The dataset nutrition label: A framework to drive higher data quality standards. arXiv preprint arXiv:1805.03677. [paper] Data preprint
  • Mieskes, M. (2017, April). A quantitative study of data in the NLP community. In Proceedings of the first ACL workshop on ethics in natural language processing (pp. 23-29). [paper] Data published
  • Bretonnel Cohen, K.; Pestian, J. P. & Fort, K. Annotating suicide notes : ethical issues at a glance. In Proc. of ETeRNAL (Ethique et Traitement Automatique des Langues), June 2015, Caen, France. [paper] Data published
  • Couillault, A., Fort, K., Adda, G., & De Mazancourt, H. (2014, May). Evaluating corpora documentation with regards to the ethics and big data charter. In International Conference on Language Resources and Evaluation (LREC). [paper] Data published
  • Drugan, J. & Babych, B. Shared Resources, Shared Values? Ethical Implications of Sharing Translation Resources. Proceedings of the Second Joint EM+/CNGL Workshop: Bringing MT to the User: Research on Integrating MT in the Translation Industry, Association for Machine Translation in the Americas, 2010, 3-10. [paper] Data published