A data curation record supports reuse of data and reproducibility of claims by documenting data source(s), data types and formats, data quality, as well as methods and tools used to subset, transform, augment, and derive insight from data. With this record another researcher should be able to:
(1) understand how data are organized
(2) access the methods and tools used to support analysis
(3) be exposed to data cleaning and transformation processes
(4) identify data source(s)
When, where, and by whom the data were created
What the data topically describes (e.g. Black Death mortality rates)
Describe data features briefly (e.g. no. of records, mixture of text and tabular data)
Licensing (e.g. CC-BY-SA, All Rights Reserved)
Quality of data (OCR vs. hand transcribed text, uncompressed vs. compressed images)
text. .txt, 100 files, 10MB
yyyymmdd_authorlastname_authorfirstname_title.txt
19401021_hemingway_ernest_forwhomthebelltolls_.txt
augmentation - e.g geocoded location data
transformation - e.g. converted date formats from mm/dd/yy to yyyy/mm/dd
cleaning - e.g. Rchrd Jmes to Richard James
method | approach | algorithm | tool
text analysis, topic modeling, latent dirichlet allocation, MALLET
Where source data came from indicated via citation and persistent identifier if available
Michigan State University Libraries. “Feeding America”. East Lansing, MI: Michigan State University. https://www.lib.msu.edu/feedingamerica/
What is the copyright and re-use status of the data — e.g. CC-BY 4.0, All Rights Reserved
Humanities Data Curation Record
Thomas Padilla & Brandon Locke
CC-BY 4.0