-
Notifications
You must be signed in to change notification settings - Fork 0
General Bibliography: Machine learning and Historical Research
GENERAL BIBLIOGRAPHY
American Historical Association. “Digital Literacy.” Accessed May 29, 2019. https://www.historians.org/jobs-and-professional-development/career-diversity-for-historians/career-diversity-resources/five-skills/digital-literacy.
Arnoult, Sophie, L. Petram, P. Vossen, Batavia asked for advice. Pretrained language models for Named Entity Recognition in historical texts. 2021, LATECHCLFLm https://aclanthology.org/2021.latechclfl-1.3/
Arnoult, S. (Creator), Petram, L. (Contributor), Vossen, P. (Contributor), Roorda, D. (Contributor) & de Does, J. (Contributor), VOC GM NER corpus VU, 2022 DOI: 10.48338/vu01-hi67kl, https://publication.yoda.vu.nl/full/VU01/HI67KL.html
Bilansky, Alan. “Search, Reading, and the Rise of Database.” Digital Scholarship in the Humanities 32, no. 3 (2017): 511-527. https://doi.org/10.1093/llc/fqw023.
Blaney, Jonathan, and Judith Siefring. “A Culture of Non-Citation: Assessing the Digital Impact of British History Online and the Early English Books Online Text Creation Partnership.” Digital Humanities Quarterly 11, no. 1 (2017): 1-70. http://www.digitalhumanities.org/dhq/vol/11/1/000282/000282.html.
Boot, Peter, Ronald Haentjens Dekker, Marijn Koolen, and Liliana Melgar. 2017. Facilitating Fine-Grained Open Annotations of Scholarly Sources. In DH2017 Digital Humanities 2017 Conference Abstracts. McGill University and Université de Montréal, Montréal, Canada, 167--169. https://dh2017.adho.org/abstracts/198/198.pdf
Borrero, R et al. (2021), Seeking a Common Ground for the Nautical Archaeology Digital Library (NADL). Reflections on Science, Method, Theory and Templates, Virtual Archaeology Review, 12(24): 11-24, 2021, https://doi.org/10.4995/var.2021.14331
Brooks, M., Rowell, C., & Shorish, Y. (2019). A Critical Introduction to Metadata through Dublin Core. In L. Rodrigues & E. Pappas (Eds.), #DLFTeach Toolkit: Lesson Plans for Digital Library Instruction (1st ed.). Digital Library Federation Digital Library Pedagogy Working Group. https://doi.org/10.21428/65a6243c.d57138cc
Chassanoff, Alexandra, Historians and the Use of Primary Source Materials in the Digital Age. The American Archivist 1 September 2013; 76 (2): 458–480. doi: https://doi.org/10.17723/aarc.76.2.lh76217m2m376n28
Chassanoff, Alexandra M. ; Historians' Experiences Using Digitized Archival Photographs as Evidence. Doctoral dissertation, Chapel Hill (2016). https://typeset.io/pdf/historians-experiences-using-digitized-archival-photographs-i3smltqymc.pdf
Cohen, D. (2010). Is Google Good for History? Retrieved from http://www.dancohen.org/2010/01/07/is-google-good-for-history
Cole, Charles, Information Acquisition in History Ph.d. Students: Inferencing and the Formation of Knowledge Structures, The Library Quarterly, vol.68, pp.33-54 (1998). https://api.semanticscholar.org/CorpusID:140498151
Cordell, Ryan. “‘Q i-jtb the Raven:’ Taking Dirty OCR Seriously." Book History 20 (2017): 188-225. https://doi.org/10.1353/bh.2017.0006.
Dalton, M.S., & Charnigo, L. (2004). Historians and their information sources. College & Research Libraries, 65(5), 400–425. https://www.semanticscholar.org/paper/Historians-and-Their-Information-Sources-Dalton-Charnigo/472e461515e289e967a92e4c183a95e913a5dccc
Data Nutrition Project. (2023). The Data Nutrition Project: Empowering data scientists and policymakers with practical tools to improve AI outcomes. Retrieved November 2023 from https://datanutrition.org
D. Dillion, N. Tandon, Y. Gu, K. Gray, Can AI language models replace human participants? Trends Cogn. Sci. 7, 597–600 (2023)
Dobreva, Milena, and Sudatta Chowdhury. “A User-Centric Evaluation of the Europeana Digital Library.” In The Role of Digital Libraries in a Time of Global Change: International Conference on Asian Digital Libraries 2010, edited by Gobinda Chowdhury, Chris Koo, and Jane Hunter, 148-157. Berlin: Springer, 2010.
Duff, W.M., & Johnson, C.A. (2002). Accidentally found on purpose: Information‐seeking behavior of historians in archives. The Library Quarterly, 72(4), 472–496.
Eager, Bron 'AI Literature Reviews: Exploring Google’s NotebookLM for Analysing Academic Literature', blog post, August 10th 2024
Elena, T., Katifori, A., Vassilakis, C., Lepouras, G., & Halatsis, C. (2010). Historical research in archives: User methodology and supporting tools. Journal of Digital Library, 11, 25–36.
Elsevier, Insights 2024: Attitudes towards AI, July 2024
Gadd, Ian. “The Use and Misuse of Early English Books Online.” Literature Compass 6, no. 3 (May 2009): 680-92. https://doi.org/10.1111/j.1741-4113.2009.00632.x
Garcia, G.G., & Weilbach, C. (2023). If the Sources Could Talk: Evaluating Large Language Models for Research Assistance in History. Though focused on history, this paper provides valuable insights into using LLMs for research tasks, including summarization, and addresses the importance of prompt engineering and domain-specific knowledge.
Garrett, Jeffrey, Subject Headings in Full-Text Environments: The ECCO Experiment, College and Research Libraries, vol. 78, January 2007. DOI:10.5860/CRL.68.1.69
Gormly, B., Seale, M., Alpert-Abrams, H., Gustavson, A., Kemp, A., Lindquist, T., & Logsdon, A. (2019). Teaching with Digital Primary Sources: Literacies, Finding and Evaluating, Citing, Ethics, and Existing Models. #DLFteach Publications. https://doi.org/10.21428/65a6243c.6b419f2b
Graham, S. (2002). Historians and electronic resources: Patterns and use. Journal of the Association for History and Computing, 5(2). Retrieved from http://hdl.handle.net/2027/spo.3310410.0005.201
Gunn, S., & Faire, L. (2012). Research methods for history. Edinburgh, UK: Edinburgh University Press.
Hardinges, Jack, Elena Simperl, and Nigel Shadbolt, Around the Data Used to Train Foundation Models, Harvard Data Science Review • Special Issue 5: Grappling With the Generative AI Revolution, (2023) https://hdsr.mitpress.mit.edu/specialissue5
Hoekstra, Rik and Marijn Koolen. 2019. Data scopes for digital history research. Historical Methods: A Journal of Quantitative and Interdisciplinary History 52, 2 (2019), 79--94.March 2018; 81 (1): 135–164. doi: https://doi.org/10.17723/0360-9081-81.1.135
Hosseini, Kasra, Kaspar Beelen, Giovanni Colavizza, and Mariona Coll Ardanuy. 2021a. Neural Language Models for Nineteenth-Century English. Journal of Open Humanities Data, 7:22.
Hosseini, Kasra, Kaspar Beelen, Giovanni Colavizza, and Mariona Coll Ardanuy. 2021b. Neural Language Models for Nineteenth-Century English (dataset; language model zoo)
Huang, S., & Siddarth, D. (2023, February 6). Generative AI and the Digital Commons. Working paper. Collective Intelligence Project. https://cip.org/research/generative-ai-digital-common
Keller, Paul, Betsy Masiello, Derek Slater, and Alek Tarkowski, Towards a Books Data Commons for AI Training (20?XX)
Koolen M., Kumpulainen S., Melgar-Estrada, LO'Brien H, Freund L., Arapakis I., Hoeber O., Lopatovska I (2020) A Workflow Analysis Perspective to Scholarly Research Tasks, Proceedings of the 2020 Conference on Human Information Interaction and Retrieval, 10.1145/3343413.3377969(183-192), Online publication date: 14-Mar-2020, https://dl.acm.org/doi/10.1145/3343413.3377969
Kumpulainen S, Keskustalo H, Zhang B, Stefanidis K (2020), Historical reasoning in authentic research tasks, Journal of the Association for Information Science and Technology10.1002/asi.2421671:2(230-241)Online publication date: 1-Jan-2020,https://dl.acm.org/doi/10.1002/asi.24216
Leigh A, Scholer F, Thomas P., Elsweiler D., Joho H., Kando N., Smith C. (2021), Information Use and the Shaping of Archives, Proceedings of the 2021 Conference on Human Information Interaction and Retrieval 10.1145/3406522.3446010(367-370) Online publication date: 14-Mar-2021, https://dl.acm.org/doi/10.1145/3406522.3446010
Library of Congress. “Citing Primary Sources.” Accessed August 9, 2019. https://www.loc.gov/teachers/usingprimarysources/citing.html.
Lin, Chiao-Min, A Study of the Archival Information Needs and Use Behavior of Historians, Journal of Library and Information Studies, vol. 11, pp.77-116,{2013}. https://jlis.lis.ntu.edu.tw/files/journal/j37-4.pdf
Manjavacas, Enrique & Lauren Fonteyn. 2022. Adapting vs. Pre-training Language Models for Historical Languages. Journal of Data Mining & Digital Humanities jdmdh:9152. https://doi.org/10.46298/jdmdh.9152
Manjavacas, Enrique & Lauren Fonteyn. 2022. Non-Parametric Word Sense Disambiguation for Historical Languages. Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities (NLP4DH), 123-134. Association for Computational Linguistics. https://aclanthology.org/2022.nlp4dh-1.16
Martin, K., & Quan‐Haase, A. (2016). The role of agency in historians' experiences of serendipity in physical and digital information environments. Journal of Documentation, 72(6), 1008–1026.
Matusiak, Krystyna K. “User Navigation in Large-Scale Distributed Digital Libraries: The Case of the Digital Public Library of America.” Journal of Web Librarianship, 11, no.3-4 (2017): 157-171. https://doi.org/10.1080/19322909.2017.1356257.
Maxwell, A. (2010). Digital archives and history research: feedback from an end-user. Library Review, 59(1), 24–39.
Mitchell, M. (2023, April 12). Okay. Inspired by news & @Stealcase , let me clarify something. When AI companies release “open training data” for a model [Image attached] [Post]. Mastodon. https://mastodon.social/@mmitchell_ai/110187818225660060
Mollick, Ethan, Co-Intelligence: Living and Working with AI (April 2024)
Mollick, Ethan, 'One Useful Thing', substack (2023-). https://www.oneusefulthing.org/
Nockels, J., Gooding, P., Ames, S. et al. Understanding the application of handwritten text recognition technology in heritage contexts: a systematic review of Transkribus in published research. Arch Sci 22, 367–392 (2022). https://doi.org/10.1007/s10502-022-09397-0
Ozoani, E., Gerchick, M., & Mitchell, M. (2023). Model cards. Hugging Face. https://huggingface.co/blog/model-cards
Porter, B., Machery, E. AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably. Sci Rep 14, 26133 (2024). https://doi.org/10.1038/s41598-024-76900-1
Post, Colin, and Alexandra Chassanoff. 2021. “Beyond the Workflow: Archivists’ Aspirations for Digital Curation Practices.” Archival Science 21 (4): 413–32. https://doi.org/10.1007/s10502-021-09365-0
Putnam, Lara. “The Transnational and the Text-Searchable: Digitized Sources and the Shadows They Cast.” The American Historical Review 121, no. 2 (April 2016): 377-402. https://doi.org/10.1093/ahr/121.2.377.
Qin, Jian, and John D’ignazio. “The Central Role of Metadata in a Science Data Literacy Course.” Journal of Library Metadata 10, no. 2-3 (2010): 188-204. https://doi.org/10.1080/19386389.2010.506379. Raschka, Sebastian, Build a Large Language Model (From Scratch) (2024)
Riley, C.L. (2013). Beyond Ctrl-c, Ctrl-v: Teaching and learning history in the digital age. In T. Weller (Ed.), History in the digital age (pp. 149–169). New York: Routledge.
Rosenzweig, R. (2006). Can history be open source? Wikipedia and the future of the past. The Journal of American History, 93(1), 117–146.
Royal Historical Society, Generative AI, History and Historians, A Reading Guide, May 1, 2024
Rutner, Jennifer and Roger Schonfeld, Supporting the Changing Research Practices of Historians, Final Report from ITHAKA S+R (2012), https://sr.ithaka.org/wp-content/uploads/2015/08/supporting-the-changing-research-practices-of-historians.pdf
Schaul, K., Chen, S. Y., & Tiku, N. (2023, August 19). Inside the secret list of websites that make AI like ChatGPT sound smart. The Washington Post. https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/
Sinn, Donghee, and Nicholas Soares, “Historians' Use of Digital Archival Collections: The Web, Historical Scholarship, and Archival Research,” Journal of the American Society for Information Science and Technology 65, no. 9 (2014): 1794–1809, doi:10.1002/asi.23091. https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.23091
Society for American Archivists Reference, Access, and Outreach Section. “Teaching with Primary Sources.” Zotero library. https://www.zotero.org/groups/76402/teaching_with_primary_sources/items/collectionKey/2BKBRTH8.
Southwell, K.L. (2002). How researchers learn of manuscript resources at the Western History Collections. Archival Issues, 26(2), 91–109.
Tibbo, H.R. (2003). Primarily history in America: How U.S. historians search for primary materials at the dawn of the digital age. The American Archivist, 66, 9–50.
Touvron, H. et al. Llama 2: Open foundation and fine-tuned chat models. arXiv (2023). https://doi.org/10.48550/arXiv.2307.09288
Underwood, Ted. “Theorizing Research Practices We Forgot to Theorize Twenty Years Ago.” Representations 127, no. 1 (August 2014): 64-72. https://doi.org/10.1525/rep.2014.127.1.64.
Uva, P. (1977). Information gathering habits of academic historians: Report of the pilot study. Syracuse, NY: Upstate Medical Center, State University of New York, Syracuse.
Varnum, Michael, Bernaud, Nicolas, Atari, Mohamma, Gray, Kurt, Large Language Models based on historical text could offer informative tools for behavioral science, PNAS, October 9, 2024, 121 (42) e2407639121, https://doi.org/10.1073/pnas.2407639121
Yakel, Elizabeth, and Deborah A. Torres. "AI: Archival Intelligence and User Expertise." The American Archivist 66, no. 1 (2003): 51-78. http://www.jstor.org/stable/40294217.
Yan, Wanxin, Nakajima, Taira, Sawada, Ryo, Benefits and Challenges of Collaboration between Students and Conversational Generative Artificial Intelligence in Programming Learning: An Empirical Case Study, Educ. Sci. 2024, 14, 433. https://doi.org/10.3390/educsci1404043
Zhang, Jane. and Dayne Mauney, “When Archival Description Meets Digital Object Metadata: A Typological Study of Digital Archival Representation.” The American Archivist 76, no. 1 (2013): 174-195. https://doi.org/10.17723/aarc.76.1.121u85342062w155.
Ziqi, Yin et al, Should we respect LLMs? A Cross-Lingual study on the Influence of Prompt Politeness on LLM Performance. Arxiv. ?2024. https://arxiv.org/pdf/2402.14531
The MarineLives project was founded in 2012. It is a volunteer lead collaboration dedicated to the transcription, enrichment and publication of English High Court of Admiralty depositions.
AI assistants and agents. Nov 19, 2024 talk
Analytical ontological summarization prompt
APIs and batch processing - second collaboratory session
APIs and batch processing ‐ learnings from second collaboratory session
Barbary pirate narrative summarization prompt
Barbary pirate deposition identification and narrative summarization prompt
Batch processing of raw HTR for clean up and summarization
Collaboratory members interests
Early Modern English Language Models
Fine-tuning - third oollaboratory session
History domain training data sets
Introduction to machine learning for historians
MarineLives and machine transcription
New skill set for historians? July 19, 2024 talk
Prompt engineering - first collaboratory session
Prompt engineering - learnings from first collaboratory session