Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 

Peabody Notecard Pipeline

The peabody archeological museum has thousands of typewritten notecards that index all objects in their possession. In order to help them with their indexing, I created an automatic OCR (Optical Character Recognition) pipeline. It takes all the notecards and converts them to an easily searchable csv. I further then helped in the creation of an internal, django website that allows for correction by workduty students. That code is not currently posted, but I can if interest is expressed — a photo of the website is below.

Example

Example Notecard

Output:

{"CatNo": '1',
"AccNo": '1',
"OrigNo": 'SC/2',
"PhotoNo": '',
"Name": 'Butt of arrowhead',
"Site": 'Squibnocket Cliff.',
"SiteNo": 'M50/1',
"Locality": 'Squibnocket Head, southwest side of Martha's Vineyard, Mass.',
"Situation": 'On sand under shell just south of stake 1.', "Remarks": '', "Figured": ''}

Django Website

About

Automatic OCR and correction of Notecards

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published