Skip to content

Card Catalog Crowdsourcer

Mike Caprio edited this page Oct 3, 2017 · 14 revisions

A System To Enable Transcription Of Card Catalog Entries

Background

While they may not attract as much public attention as the Museum's dinosaurs, the study of fossil mammals (paleomammalogy) has always been a key component of vertebrate paleontology at the AMNH. The museum has approximately 400,000 fossil mammal specimens, representing 46 extinct and extant orders, 2808 extinct genera, and 7599 species housed on seven floors of the Museum's Childs Frick Building. More than half of all the genera of mammals known to science are present in the collection.

Data about the specimens are dispersed across the collection and associated archives, in card catalogs, shipping records, field notes, and maps. In order to provide high quality, georeferenced data to collection users, these disparate data sources are in the process of being consolidated. The transcription of card catalog entries is particularly time and resource intensive. Past projects took scans of catalog cards and had interns transcribe the data on the cards, following the guidelines linked below.

The fields from the cards that we are interested in are catalog number (in first sample card this is the number on the upper left of the card; in the second sample card, the number appears in the field 053 Cat. No.). The end result would be 4 columns/fields of data: catalog number (if the catalog number has a letter suffix, it should be parsed out), verbatim descriptions, title and description.

Solutions

  • A tool to parse text of verbatim data versus codified rule descriptions
  • Crowdsourcing site to enlist the help of volunteer transcribers

Resources