Python Data Cleaning
Dave Elkington, CEO of InsideSales.com, says that up to 80% of a data analyst's job is cleaning data. The purpose of this project is to showcase my ability to use Python to turn dirty data into clean, organized datasets which can then be used for further analysis, SQL queries, and more.
How to View
This project uses Python 3.7.3, Jupyter Notebooks, and the NumPy, pandas, Matplotlib, and SciPy libraries.
To view the project, open the file
Unitowns Analysis - GitHub.ipynb in the project repository. If this file does not load, please head on over to the projects page on Kyso, a similair website to GitHub.
In this project, I use Python to explore the effect of the 2008 recession on median housing prices across different cities in the United States. First, I clean each dataset and merge them as needed, resulting in workable datasets that are ready for analysis. My analysis at this stage is relatively simple, consisting of a t-test, summary statistics and graphing.
I use three datasets for this project. I encourage you to download each dataset and open them on your own computer so you can see their original state and formats.
Median home sale price data was collected from Zillow Research. This file is
US GDP data is from the Bureau of Economic Analysis at the US Department of Commerce. The file is
University towns are simply cities included in Wikipedia's list of university towns in the US. The file is
university_towns.txt and is simply Wikipedia's list copy-pasted onto a .txt document.
In Kyso, the files are located under the 'file' tab.
This project is based on assignments submitted to University of Michigans's Applied Data Science in Python Specialization, through Coursera. Some questions I answer are inspired by Coursera's questions. The code is 100% my own.