Pittsburgh Crash Categorical Data Clustering Using K-Modes

The notebooks contained in this repository analyze crash data from automobile accidents reported in the City of Pittsburgh from 2010-2019. The data used for this analysis was provided by the Western Pennsylvania Regional Data Center and can be found at the following link: https://data.wprdc.org/dataset/allegheny-county-crash-data. A data dictionary containing descriptions of each of the variables included in this dataset can be found here: https://data.wprdc.org/dataset/allegheny-county-crash-data/resource/4df9a3c6-34c1-45a5-936e-80758f9f38a5.

pittsburgh-crash-categorical-clustering.ipynb

This notebook contains code that applies k-modes clustering to a filtered version of the raw dataset. The raw dataset includes records of over 121,000 car accidents in Allegheny County between 2004 and 2019. The filtered dataset limits the analysis to over 41,000 crashes occurring within the City of Pittsburgh between 2010-2019. Numerical features and features identified as having low importance were removed from the dataset. K-modes clustering was applied to categorical features in the dataset, and each observed accident was assigned to a cluster.

Using the clusters assigned by k-modes, an exploratory data analysis was performed using data grouped by cluster. Clustering the dataset divides it into separate categories of accidents, and this analysis provides information about how the groups differ from one another. Plots of the geographic location of each crash were also generated for each cluster. These plots can be used to visually identify areas of increased crash density. Future analysis will include further clustering of geographical data using the clusters assigned by k-modes to identify locations of increased risk associated with specific categories of accidents.

A Medium article providing a more detailed description of this analysis can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
images		images
.gitignore		.gitignore
README.md		README.md
pittsburgh-crash-categorical-clustering.ipynb		pittsburgh-crash-categorical-clustering.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pittsburgh Crash Categorical Data Clustering Using K-Modes

pittsburgh-crash-categorical-clustering.ipynb

About

Releases

Packages

Languages

dontmindifiduda/pitt-crash

Folders and files

Latest commit

History

Repository files navigation

Pittsburgh Crash Categorical Data Clustering Using K-Modes

pittsburgh-crash-categorical-clustering.ipynb

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages