Audited and wrangled an OpenStreetMap dataset as one of our earlier projects. It was here where Pandas began to shine for me. Our objective was to choose a region and apply data munging techniques to assess the quality of the data for validity, accuracy, completeness, consistency, and uniformity. I selected my hometown, Livermore, to munge. It was indeed an accomplishment.
This project is connected to the Data Wrangling course. You have the choice between two databases for this project: SQL and MongoDB. For an explanation of the differences between these two databases, see this node. There are separate instructions where relevant below for each database choice.
After completing the project, you will be able to:
Assess the quality of the data for validity, accuracy, completeness, consistency and uniformity. Parse and gather data from popular file formats such as .csv, .json, .xml, and .html Process data from multiple files or very large files that can be cleaned programmatically. Learn how to store, query, and aggregate data using MongoDB or SQL.