Wrangle OpenStreetMap Data
Data Wrangling Project — Udacity Data Analyst Nanodegree
This project is part of the Data Analyst Nanodegree. Below you'll find the rest of the projects and I also wrote a short post about the experience.
Important note: The entire project is documented and explained in the
OpenStreetMap.md file, I encourage you to start there. Below you'll find the project file structure.
app.py: calls all the functions and executes the program. To create the .csv files and import the data to the database in the
datafolder, just run
python app.pyand the script will take care of the rest.
app.pycan also run
audit.pyfunctions, but those are commented by default since they don't cause any modification to the data itself.
audit.py: this is the first look at the data. It programmatically checks for data validity, accuracy and other measures and prints its results in the terminal. It does not modify the data itself, only reports the issues it encounters.
The script consists of two similar modules:
- audit_nodes(): checks for
- audit_ways(): checks for
Running both at the same time could lead to parsing errors, therefore it is recommended to leave one of them commented in the
app.py script and run the other separately after the first has finished.
to_csv.py: reads in the data from the
.osmfile and exports all the data to
.csvfiles. During the process, it ensures the export is compliant with the structure dictated by
schema.py. For data validity it focuses more on semantics rather than format, but unlike
to_csv.pytreats and modifies (through
fix.py) any data related problems described in the Part II of the
to_sql.py: after the data has been stored in
to_sql.pycreates a database
osm.dband the necessary tables matching the structure described in
fix.py: contains all the data wrangling functions used by
compress.py: takes an
.osmfile as an input and outputs a k-reduced version of it. k is a parameter that can be changed in the code.
schema.py: schema of how the data will be exported from the
.osmfile to the database.