Skip to content

Latest commit

 

History

History
17 lines (10 loc) · 2.31 KB

crowdsourcing.md

File metadata and controls

17 lines (10 loc) · 2.31 KB

Crowdsourced history with Zooniverse + LOC's "By the People"

Description:

As we’ll talk a lot about this semester, digitization is not a panacea, and it takes a lot of labor and time to turn a historical document into something that can be easily and quickly used by a researcher online. Scanned items that are printed can be processed with Optical Character Recognition (OCR) software like Abbyy Fine Reader, Adobe Acrobat Pro, or Tesseract OCR in order to create an approximation of what their contents say, which can then be searched, sorted, or edited. Hand-written, torn/damaged, or otherwise non-standardized printed items are much harder to make readable and often require humans to do this work (although HTR software is being developed, it will be a while before it’s very accurate). As you know, many organizations don’t employ people to do this work; instead, they rely on crowdsourced transcription, tagging, editing, and other functions to make the contents of these objects more accessible. Project Zooniverse and the Library of Congress’ By the People are two major initiatives in this vein with simple user interfaces and large, involved communities of volunteers. This can be laudable work when not exploitative of labor, so you’ll be dipping your toes into one of these platforms to get an idea of the amount of work that goes into the process.

Directions:

  1. Browse the open projects at By the People and Project Zooniverse and identify one to which you’d like to contribute. Sign up for an account at the platform of your choice.

  2. Take some time to review the platform’s standards and any directions specific to your project; By the People has a welcome guide. Zooniverse’s projects have their own directions and sometimes tutorials to walk through (an example here).

  3. As you work through this assignment, consider:

    What are the pros and cons of crowdsourced data work like these initiatives?

    Did you learn anything new while working on your project?

    What can you glean about the project’s standards for ensuring accuracy, consistency, and attribution of labor?