These materials form a two-part, robust-yet-introductory workshop on text analysis of Chronicling America newspapers in R.
The first part introduces strategies for fuzzy string matching, using the OCR-derived text from the Perth Amboy Evening News; you'll need to download data from Chronicling America first as directed. The second part begins with the results of the previous and explores a few possible methods for analyzing phrase use over time, page location, collocate words, and uniqueness.
These workshops were originally offered during April 2018 at Rutgers University-New Brunswick through the New Brunswick Libraries Graduate Specialists program and the Rutgers DH Initiative.
If you've found your way here via some other presentation of this material, such as the 2018 HathiTrust Research Center "Digging Deeper, Reaching Further" Series or the 2018 Seton Hall Digital Humanities Symposium, these files may look a bit different. Nothing to fear: this is the expanded and more hands-on-friendly iteration, from which all others derive.
ChronAm_Workshop_1.pdf and ChronAm_Workshop_2.pdf These .pdf files are best for following along; they contain all code as well as sample outputs and figures.
ChronAm Workshop 1 for Users.Rmd and ChronAm Workshop 2 Users.Rmd These .Rmd files are best for user participation. They are largely the same as the master .Rmd files, but some additional instructions have been added to help prompt users when to edit code and some of the additional demonstrations from the master .rmd files have been removed for simplicity.
ChronAm Workshop 1.Rmd and ChronAm Workshop 2.Rmd Master .Rmd files, used to generate .pdf files.
/Sample Data Several .csv files of search results from Workshop 1 available for use in working through Workshop 2.
sn8503570_date+pages.csv Metadata on the number of pages published by the Perth Amboy Evening News per month. Use this if you want to work through the code in Workshop 2 without downloading all the data from Chronicling America.
/Page Images Individual files for the newspaper images used in the .pdf and master .Rmd files.