Humanities Data Curation Record
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
networkanalysis.md
readme.md

readme.md

Humanities Data Curation Record

A data curation record supports reuse of data and reproducibility of claims by documenting data source(s), data types and formats, data quality, as well as methods and tools used to subset, transform, augment, and derive insight from data. With this record another researcher should be able to:

(1) understand how data are organized
(2) access the methods and tools used to support analysis
(3) be exposed to data cleaning and transformation processes
(4) identify data source(s)

Summary

When, where, and by whom the data were created

What the data topically describes (e.g. Black Death mortality rates)

Describe data features briefly (e.g. no. of records, mixture of text and tabular data)

Licensing (e.g. CC-BY-SA, All Rights Reserved)

Quality

Quality of data (OCR vs. hand transcribed text, uncompressed vs. compressed images)

Type, Format, Extent, Size

text. .txt, 100 files, 10MB

Filenaming Conventions

yyyymmdd_authorlastname_authorfirstname_title.txt

19401021_hemingway_ernest_forwhomthebelltolls_.txt

Modifications

augmentation - e.g geocoded location data

transformation - e.g. converted date formats from mm/dd/yy to yyyy/mm/dd

cleaning - e.g. Rchrd Jmes to Richard James

Methods & Tools

method | approach | algorithm | tool

text analysis, topic modeling, latent dirichlet allocation, MALLET

Source

Where source data came from indicated via citation and persistent identifier if available

Michigan State University Libraries. “Feeding America”. East Lansing, MI: Michigan State University. https://www.lib.msu.edu/feedingamerica/

Reuse License

What is the copyright and re-use status of the data — e.g. CC-BY 4.0, All Rights Reserved


Humanities Data Curation Record
Thomas Padilla & Brandon Locke
CC-BY 4.0