Skip to content

Latest commit

 

History

History
19 lines (10 loc) · 1.64 KB

06_data.cleaning.md

File metadata and controls

19 lines (10 loc) · 1.64 KB

##Data cleaning: introducing OpenRefine

Goals for this lesson

  • Learn how to identify common problems in data
  • Fix them, and change your life forever

If you've ever prepared manually-entered data for analysis, you know how long it can take you to get your data into a format that's readble by your stats program- before you can even begin to analyse it. If you're using data that was created by others, this can be doubly difficult, because you can't anticipate all the issues in advance, and may have more trouble following the data creator's logic.

Today, we're going to talk about problems with data, and how to resolve them. We'll start out with the Quartz Guide to Bad Data, and then move on to my very favourite data cleaning tool, OpenRefine. Data Carpentry has a fantastic lesson on using this software that we can work through, and then I'd like to apply what we've learned to checking for errors using our class dataset.

Resources:

  • The Quartz guide to bad data is a comprehensive list of the many, many ways data can go wrong and what you can do to fix it.
  • Hadley Wickham's Tidy data sets out the principles of "tidy" datasets and offers instruction for how to clean them in R.

<<Previous Navigate Next>>