Skip to content
This is my first data analysis project in Python. Please take a look at README.md before diving into the project. It seems as though GitHub struggles to load the .ipynb file at times. As a result, I've uploaded the project and all supporting files to Kyso.io, a platform specifically for data science projects:
Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
City_Zhvi_AllHomes.csv
README.md
Unitowns Analysis GitHub.ipynb
gdplev.xls
university_towns.txt

README.md

Python Data Cleaning

Purpose

Dave Elkington, CEO of InsideSales.com, says that up to 80% of a data analyst's job is cleaning data. The purpose of this project is to showcase my ability to use Python to turn dirty data into clean, organized datasets which can then be used for further analysis, SQL queries, and more.

How to View

This project uses Python 3.7.3, Jupyter Notebooks, and the NumPy, pandas, Matplotlib, and SciPy libraries.

To view the project, open the file Unitowns Analysis - GitHub.ipynb in the project repository. If this file does not load, please head on over to the projects page on Kyso, a similair website to GitHub.

The Project

In this project, I use Python to explore the effect of the 2008 recession on median housing prices across different cities in the United States. First, I clean each dataset and merge them as needed, resulting in workable datasets that are ready for analysis. My analysis at this stage is relatively simple, consisting of a t-test, summary statistics and graphing.

The Data

I use three datasets for this project. I encourage you to download each dataset and open them on your own computer so you can see their original state and formats.

Median home sale price data was collected from Zillow Research. This file is City_Zhvi_AllHomes.csv.

US GDP data is from the Bureau of Economic Analysis at the US Department of Commerce. The file is gdplev.xls.

University towns are simply cities included in Wikipedia's list of university towns in the US. The file is university_towns.txt and is simply Wikipedia's list copy-pasted onto a .txt document.

In Kyso, the files are located under the 'file' tab.

Acknowledgements

This project is based on assignments submitted to University of Michigans's Applied Data Science in Python Specialization, through Coursera. Some questions I answer are inspired by Coursera's questions. The code is 100% my own.

You can’t perform that action at this time.