Uris Lib B05 Classroom 1 to 2:30 PM, Friday March 31st
We will be meeting in Uris Lib B05 Classroom from 1 to 2:30 PM today.
This classroom has computers available, and we will be walking people through setting up the hosted options for OpenRefine and Python/Bash at the beginning (so you don’t have to bring your own computer, though please do if you can).
If you will be using the hosted options for the tools, please [Takes about 10 minutes]:
- Sign up for an account at Python Anywhere (https://www.pythonanywhere.com/pricing/, choose the Free beginner account);
- Sign up for RefinePro (which offers each new account a free month trial period): https://app.refinepro.com/register/.
If you are bringing your own laptop [Time depends on your setup and comfort]:
- Install Python 2.7 and Pip (usually included with 2.7) ;
- Install OpenRefine 2.7rc1 (rc1 recommended, rc2 or 2.6 should both also work).
If you have any trouble installing these, just use the hosted versions mentioned above for now, and we can chat at the end of the workshop about how to get your laptop set up for this work later on. We will take a few minutes at the start of the workshop to review setup for both options.
We have a short time to cover a meaty topic, so this should be treated as an introduction to 2 methods for doing this work for jumping off in your own daily practice.
Time | Section |
---|---|
1-1:10 | Introduction / Setup (10 minutes) |
1:10-1:20 | Metrics for Metadata Assessment (10 minutes) |
1:20-1:50 | OpenRefine for Metadata Assessment (30 minutes) |
-- | Includes: Loading a file, Facets, GREL, Regex, Completeness Rankings & Export |
1:50-2:20 | Python for Metadata Assessment (30 minutes) |
-- | Includes: Harvest, General Report, Specific Field review, SORT/UNIQ/GREP & Export |
2:20-2:30 | Wrap Up / Next Steps (10 minutes) |
I’ve gotten requests to work with the following data sources:
- eCommons (DIMS XML)
- Fedora 4 (PCDM RDF/XML)
- FGDC
- MARC (MARC/XML, Binary MARC if time)
- Solr (Documents)
- SharedShelf (SS API Response)
Overview of OpenRefine Loading a File Facets GREL or Google Refine Expression Language Using Regular Expressions Completeness Rankings Export Reports
Overview of Python MetadataBreaker Scripts Harvesting Metadata General Report Looking at a Specific Field Using SORT, UNIQ, GREP, Regular Expressions Export Reports