Skip to content

Latest commit

 

History

History
75 lines (62 loc) · 3.72 KB

90-martin-openrefine.md

File metadata and controls

75 lines (62 loc) · 3.72 KB

Martin Magdinier: Introduction to OpenRefine for Data Cleaning and Reconciliation

Upcoming Events

Join our Meetup group for more events! https://www.meetup.com/data-umbrella

Key Links

Resources

About the Event

OpenRefine stands as a robust, open-source tool specifically tailored for those delving into the complex world of messy data. It is designed to not only cleanse such data but also to transform it, making it easier to convert between varying formats. The talk will unfold in three primary segments. The first portion provides a comprehensive introduction to OpenRefine, exploring its purpose, its user base, and its historical evolution. Following this, attendees will embark on a tour of OpenRefine, familiarizing themselves with its download and installation processes, the intricacies of data import, the nuances of filtering and faceting, clustering, as well as vital data cleaning techniques, and the application of reconciliation services. Finally, the session culminates in an invitation to participants to join the OpenRefine community, shedding light on various avenues through which they can contribute – be it through coding, design, translation, documentation enhancement, or user support.

## Timestamps
00:00 Data Umbrella introduction
03:35 What is OpenRefine?
05:00 History of OpenRefine (Freebase Gridworks, Google Refine to Open Refine)
08:33 OpenRefine user base
10:42 Project statistics
11:34 Features of OpenRefine
14:00 Contributing to OpenRefine (use, promote, help, translate, fix, create, design)
19:40 begin demo: Example dataset of Toronto building permits)
20:23 Running OpenRefine locally, installation
20:44 Download OpenRefine (openrefine.org/download)
21:45 Demo: reading in the data
24:15 Demo: export data from OpenRefine
24:38 Demo: working with the data
25:30 Demo: Text facet shows summary of different values
26:45 facet / filter
27:17 combine multiple facets
28:10 text filter
28:40 Cluster algorithm to clean text data (Ex: Fingerprint function, etc)
32:54 Cluster algorithm: n-Gram fingerprint
33:30 Cluster algorithm: Cologne phonetic
34:15 Cleaning: working with numerical data
35:20 find and replace: remove commas in number
37:49 working with dates
38:40 doing reconciliations in OpenRefine (merge multiple fields into one field)
41:12 Reconciliation Service: an API
41:32 about the dataset: Bathurst Street from Wiki Foundation
44:00 connect my dataset with Wikipedia data
44:45 Reconciliation service test bench (plus: clean street name data)
47:38 Example: Excel type code for editing data
55:26 Resources list
56:20 Q: In the Reconciliation service API, which API versions are supported by OpenRefine?

#92

About the Speaker

Martin Magdinier is OpenRefine Project Manager and core contributor since 2013.

#python #opensource #datascience #dataanalytics

Video

Intro to OpenRefine for Data Cleaning and Reconciliation

Transcript