Skip to content


Repository files navigation

OpenSDP Data Janitor Tutorial (Stata)

Cleaning Raw Data

This tutorial has two objectives. The first objective is to demonstrate the process of cleaning a raw data file from start to finish. The second objective is to demonstrate some features of Stata which are critical for writing efficient code, and the syntax for a number of commands needed for data cleaning. The tutorial concludes with a demonstration of how to reshape data from long to wide format.

You will need to have Stata version 12 or higher installed on your computer to run this tutorial. Download, and unzip and extract the files. Start Stata by opening the file in the programs subdirectory, and arrange the do file editor and main Stata windows side by side so that you can see them both. Work through the tutorial by reading the comments and running one or several lines of code at a time.

This tutorial was originally authored by the Strategic Data Project.

OpenSDP is an online, public repository of analytic code, tools, and training intended to foster collaboration among education analysts and researchers in order to accelerate the improvement of our school systems. The community is hosted by the Strategic Data Project, an initiative of the Center for Education Policy Research at Harvard University. We welcome contributions and feedback.