PCA Analysis on US Arrests Dataset

Project Description

This project contains a PCA report that I conducted on the US arrests dataset, found on Kaggle at https://www.kaggle.com/datasets/halimedogan/usarrests.

A description of the data is given as: “This data set contains statistics, in arrests per 100,000 residents, for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.”

The aim of this project was to:

Reduce the dimensionality of the dataset through PCA.
Use heirarchical clustering to determine how many distinct groups there were.
Use K-means clustering to identify what data belonged to what cluster.
Attempt to glean an insight into what causes the distinction between clusters.

In the Jupyter Notebook you will see:

How I approached the cleaning of the dataset.
Any other pre-processing methods I took before PCA, such as observing the distribution and correlations.
The PCA itself, with feature importance and a cumulative explained variance graph thrown in there.
My heirarchical cluistering approach, including my geeking out at how nice the complete linkage dendrogram looked.
Using the number of clusters from the heirarchical approach to determine what k is, then perform k-means clustering.
Displaying the clusters and their states in a neatly formatted fashion.
And last but not least, some observations and guesses I made towards the story behind the clusters.
Just when you didn't think it could get any more fun, my commentary and analysis is sprinkled throughout the entire document too.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
US_arrests_PCA_report.ipynb		US_arrests_PCA_report.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PCA Analysis on US Arrests Dataset

Project Description

About

Releases

Packages

Languages

CodeNomad-I/finalCapstone

Folders and files

Latest commit

History

Repository files navigation

PCA Analysis on US Arrests Dataset

Project Description

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages