Skip to content

CUL Metadata Working Group Metadata Assessment Workshop Materials

Notifications You must be signed in to change notification settings

cmharlow/CULMetadataAssessment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CUL MWG Metadata Assessment Workshop

Uris Lib B05 Classroom 1 to 2:30 PM, Friday March 31st

Logistics

We will be meeting in Uris Lib B05 Classroom from 1 to 2:30 PM today.

This classroom has computers available, and we will be walking people through setting up the hosted options for OpenRefine and Python/Bash at the beginning (so you don’t have to bring your own computer, though please do if you can).

Minimal Setup

If you will be using the hosted options for the tools, please [Takes about 10 minutes]:

If you are bringing your own laptop [Time depends on your setup and comfort]:

  • Install Python 2.7 and Pip (usually included with 2.7) ;
  • Install OpenRefine 2.7rc1 (rc1 recommended, rc2 or 2.6 should both also work).

If you have any trouble installing these, just use the hosted versions mentioned above for now, and we can chat at the end of the workshop about how to get your laptop set up for this work later on. We will take a few minutes at the start of the workshop to review setup for both options.

Agenda

We have a short time to cover a meaty topic, so this should be treated as an introduction to 2 methods for doing this work for jumping off in your own daily practice.

Time Section
1-1:10 Introduction / Setup (10 minutes)
1:10-1:20 Metrics for Metadata Assessment (10 minutes)
1:20-1:50 OpenRefine for Metadata Assessment (30 minutes)
-- Includes: Loading a file, Facets, GREL, Regex, Completeness Rankings & Export
1:50-2:20 Python for Metadata Assessment (30 minutes)
-- Includes: Harvest, General Report, Specific Field review, SORT/UNIQ/GREP & Export
2:20-2:30 Wrap Up / Next Steps (10 minutes)

Sample Data

I’ve gotten requests to work with the following data sources:

  • eCommons (DIMS XML)
  • Fedora 4 (PCDM RDF/XML)
  • FGDC
  • MARC (MARC/XML, Binary MARC if time)
  • Solr (Documents)
  • SharedShelf (SS API Response)

Metrics

OpenRefine Worksheet

Overview of OpenRefine Loading a File Facets GREL or Google Refine Expression Language Using Regular Expressions Completeness Rankings Export Reports

Python Metadata Breakers Worksheet

Overview of Python MetadataBreaker Scripts Harvesting Metadata General Report Looking at a Specific Field Using SORT, UNIQ, GREP, Regular Expressions Export Reports

About

CUL Metadata Working Group Metadata Assessment Workshop Materials

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages