Skip to content

datapraxis/hdcr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 

Repository files navigation

Humanities Data Curation Record

A data curation record supports reuse of data and reproducibility of claims by documenting data source(s), data types and formats, data quality, as well as methods and tools used to subset, transform, augment, and derive insight from data. With this record another researcher should be able to:

(1) understand how data are organized
(2) access the methods and tools used to support analysis
(3) be exposed to data cleaning and transformation processes
(4) identify data source(s)

Summary

When, where, and by whom the data were created

What the data topically describes (e.g. Black Death mortality rates)

Describe data features briefly (e.g. no. of records, mixture of text and tabular data)

Licensing (e.g. CC-BY-SA, All Rights Reserved)

Quality

Quality of data (OCR vs. hand transcribed text, uncompressed vs. compressed images)

Type, Format, Extent, Size

text. .txt, 100 files, 10MB

Filenaming Conventions

yyyymmdd_authorlastname_authorfirstname_title.txt

19401021_hemingway_ernest_forwhomthebelltolls_.txt

Modifications

augmentation - e.g geocoded location data

transformation - e.g. converted date formats from mm/dd/yy to yyyy/mm/dd

cleaning - e.g. Rchrd Jmes to Richard James

Methods & Tools

method | approach | algorithm | tool

text analysis, topic modeling, latent dirichlet allocation, MALLET

Source

Where source data came from indicated via citation and persistent identifier if available

Michigan State University Libraries. “Feeding America”. East Lansing, MI: Michigan State University. https://www.lib.msu.edu/feedingamerica/

Reuse License

What is the copyright and re-use status of the data — e.g. CC-BY 4.0, All Rights Reserved


Humanities Data Curation Record
Thomas Padilla & Brandon Locke
CC-BY 4.0

About

Humanities Data Curation Record

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published