Skip to content

afeld/data-cleaning

Repository files navigation

Data Cleaning Techniques workshop

Binder

From the NYC Open Data Week 2022 event page:

You’ve got your data loaded, you start on your analysis, and… WHAM, missing values. WHAM, junk entries. WHAM, capitalization inconsistencies.

Data cleaning often feels like a chore, and we will often do as little as necessary. What if we took a more systemic approach? In this hands-on workshop, we’ll explore some common data issues to look for, tools, and techniques for cleaning it up, giving us better understanding of our data in the process and clearing the path for smoother data analysis and manipulation.

These examples use NYC's 311 dataset.

Running locally

  1. Install repo2docker
  2. cd into this directory and run jupyter-repo2docker -E .
  3. After you open the provided URL, change the path in your browser to /lab.