# Best Neighborhoods to Rob in Pittsburgh
### According to Educational and Employment Data
#### _by Alejandro Ciuba_

In [1]:
#Importing the libraries to use for the data analysis
import pandas as pd
import numpy as np
import matplotlib

# Introduction 

For my part of the project, I will be analyzing which neighborhoods would be the best to rob according to educational/income and job datasets provided by [The WPRDC](http://www.wprdc.org). First, I will analyze the educational/income dataset to determine which neighborhoods in Pittsburgh are the _"oldest"_ and _"richest"_, since that demographic of people would be perfect to mug. They are less likely to do any bodily damage to you should they fight back, and they have a lot of money; these factors would make these neighborhoods prime targets for muggings.

## Education/Income Dataset

The first dataset, [_"Education-Income 2010"_](https://data.wprdc.org/dataset/pgh/resource/f7b19c6c-aa66-419b-b0e1-9998d7ddfcbc), provides us several important pieces of data for each neighborhood: average education obtained, average income, and the estimated percentage of the neighborhood under the poverty line. All of these will provide useful information from which we can draw conclusions about which neighborhoods are the best to rob from a money perspective.

In [2]:
#Import the dataset from the link
edu_inc = pd.read_csv("education-income.csv")

#Read the beginning of the dataset
edu_inc.head(10)

Unnamed: 0,Neighborhood,Sector #,Population (2010),"Total Pop, 25 and older (2010)",Edu. Attainment: Less than High School (2010),Edu. Attainment: High School Graduate (2010),Edu. Attainment: Assoc./Prof. Degree (2010),Edu. Attainment: Bachelor's Degree (2010),Edu. Attainment: Postgraduate Degree (2010),1999 Median Income ('99 Dollars),2009 Median Income ('09 Dollars),1999 Median Income ('11 Dollars),2009 Med. Income ('13 Dollars),Est. Pop. for which Poverty Calc. (2010),Est. Pop. Under Poverty (2010),Est. Percent Under Poverty (2010)
0,Allegheny Center,3,933,609,18.7%,44.5%,17.2%,15.8%,3.8%,"$16,964","$20,911","$22,535","$22,793",954,324,34.0%
1,Allegheny West,3,462,239,9.2%,28.9%,6.7%,44.8%,10.5%,"$26,638","$41,761","$35,386","$45,519",239,12,5.0%
2,Allentown,6,2500,1729,23.0%,63.3%,6.6%,5.6%,1.5%,"$22,539","$29,274","$29,941","$31,909",2212,630,28.5%
3,Arlington,7,1869,1232,14.9%,65.3%,10.1%,7.1%,2.5%,"$27,167","$25,119","$36,089","$27,380",1779,361,20.3%
4,Arlington Heights,7,244,166,18.1%,74.1%,0.0%,7.8%,0.0%,"$18,646","$9,417","$24,769","$10,265",293,169,57.7%
5,Banksville,5,4144,3935,9.8%,51.5%,9.8%,22.3%,6.6%,"$38,555","$50,625","$51,217","$55,181",4170,243,5.8%
6,Bedford Dwellings,15,1202,733,8.3%,49.9%,12.8%,14.2%,14.7%,"$8,955","$9,992","$11,896","$10,891",1203,589,49.0%
7,Beechview,5,7974,5211,11.0%,58.1%,11.1%,13.6%,6.2%,"$34,079","$36,602","$45,270","$39,896",7450,1366,18.3%
8,Beltzhoover,6,1925,1369,19.3%,54.3%,13.8%,8.3%,4.2%,"$26,750","$33,869","$35,535","$36,917",2066,485,23.5%
9,Bloomfield,12,8442,6671,11.2%,42.6%,8.6%,24.1%,13.5%,"$23,831","$30,830","$31,658","$33,604",9192,1781,19.4%


We will first group the data by education obtained for each type of education.

In [87]:
edu_lths = edu_inc.loc[:,["Neighborhood", "Edu. Attainment: Less than High School (2010)"]]

#Drop NaN values, and reindex
edu_lths = edu_lths.dropna()
edu_lths.reset_index(drop=True, inplace=True)

#Convert % strings to numerical data
for x in range(len(edu_lths)):

    #change that value to a float
    edu_lths.loc[x, "Edu. Attainment: Less than High School (2010)"] = float(edu_lths.loc[x, "Edu. Attainment: Less than High School (2010)"].split("%")[0])
    
#Sort the data
edu_lths = edu_lths.sort_values(by="Edu. Attainment: Less than High School (2010)", ascending=False)
edu_lths.head(10)

Unnamed: 0,Neighborhood,Edu. Attainment: Less than High School (2010)
36,Hays,56.7
34,Glen Hazel,41.0
29,Esplen,34.5
22,Crawford-Roberts,27.6
41,Homewood West,26.8
81,Terrace Village,26.1
56,Northview Heights,24.9
48,Marshall-Shadeland,24.7
72,Spring Garden,23.7
19,Central Oakland,23.6
