Current version: v2.0
OCR errors in Trove's digitised newspapers can be corrected by users. To help understand patterns in newspaper correction, this dataset has been created to record information about the number of articles with corrections.
The data was extracted from the Trove API using this notebook from the Trove newspapers section of the GLAM Workbench.
There are three files in the dataset:
corrections_by_year.csv
– number of articles corrected in each publication yearcorrections_by_category.csv
– number of articles corrected in each Trove categorycorrections_by_title.csv
– number of articles corrected in each newspaper
The files are in CSV format and contain the following fields.
term
– the publication yeartotal_results
– the number of articles with correctionstotal_articles
– the total number of articlesproportion
– the proportion of articles with corrections
term
– the category nametotal_results
– the number of articles with correctionstotal_articles
– the total number of articlesproportion
– the proportion of articles with corrections
id
– the Trove identitifer of the newsspaper titletitle
– the name of the newspaperarticles_with_corrections
– the number of articles with correctionstotal_articles
– the total number of articles from the newspaper in Trovepercentage_with_corrections
– the percentage of articles with corrections
This repository is part of the GLAM Workbench.
If you think this project is worthwhile, you might like to sponsor me on GitHub.