Introduction
=====

In this world, there are givers and takers... and some holders. When a person seeks a new place to live, they may look at a variety of topics. Maybe the median prices of houses, the average household income, or even the parking prices. However, not every person seeking a place is seeking it to live. Those takers I mentioned earlier often comes in the form of criminals and they might look at the exact same metrics, but with a different intention. 

![criminality](https://media.giphy.com/media/NFzNOIBiFYlZS/giphy.gif)

**Quick Notes**
1) Due to the nature of money and privacy, money data was harder to come by than expected. But, nonetheless, we found some interesting data sets to look at and apply to our argument. 
2) Why the money focus? Money runs the world, baby. 
3) Initally, we were planning to do something along the lines of just making money and living in a wealthy place, but soon, like many movie characters, turned to a life of crime. It's interesting and fun to look at life through the lens of someone you might not traditionally align with, so we kept at it. 

And shoutout to the first group from last week for the simliar idea. 

![shoutout](https://media.giphy.com/media/bKFbckT4NUjh7qBjNj/giphy.gif)

In [43]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

#get rid of the nll values, they are useless
park = pd.read_csv("parking.tsv", sep="\t")
mask = ~park['rate'].isnull()
masked_park = park[mask]
masked_park.iloc[0:5,0:]

#print(masked_park[['zone', 'rate']].sample(15))

#general cleaning
df= masked_park[masked_park['rate']!="Multi-Rate"]
df.loc[0:,'rate'] = df.loc[0:,'rate'].replace({'\$': '', ',': '','/hr':'', '2 after 2pm':'','\(':'','\)':'','2.50 after 2pm':'',}, regex=True).astype(float)
df.loc[df['zone'].str.contains('Oakland'),'zone'] = 'Oakland'
df.loc[df['zone'].str.contains('Squirrel Hill'),'zone'] = 'Squirrel Hill'
df.loc[df['zone'].str.contains('SS & SSW'),'zone'] = 'SS & SSW'
df.loc[df['zone'].str.contains('Downtown'),'zone'] = 'Downtown'
df.loc[df['zone'].str.contains('Allentown'),'zone'] = 'Allentown'
df.loc[df['zone'].str.contains('Bloomfield'),'zone'] = 'Bloomfield'
df.loc[df['zone'].str.contains('Shadyside'),'zone'] = 'Shadyside'
df.loc[df['zone'].str.contains('NorthSide'),'zone'] = 'North Shore'
df.loc[df['zone'].str.contains('Lawrenceville'),'zone'] = 'Lawrenceville'
df.loc[df['zone'].str.contains('Uptown'),'zone'] = 'Uptown'
df.loc[df['zone'].str.contains('Strip Disctrict'),'zone'] = 'Strip District'
df.loc[df['zone'].str.contains('Mellon Park'),'zone'] = 'Mellon Park'
df.loc[df['zone'].str.contains('Mt. Washington'),'zone'] = 'Mt. Washington'
df.loc[df['zone'].str.contains('Northshore'),'zone'] = 'North Shore'
df.loc[df['zone'].str.contains('Carrick'),'zone'] = 'Carrick'
df.loc[df['zone'].str.contains('Brookline'),'zone'] = 'Brookline'
df.loc[df['zone'].str.contains('Sheridan Kirkwood Lot'),'zone'] = 'East Liberty'
df.loc[df['zone'].str.contains('Hill District'),'zone'] = 'Hill District'
df.loc[df['zone'].str.contains('Beacon Bartlett Lot'),'zone'] = 'Squirrel Hill'
df.loc[df['zone'].str.contains('Sidney Lot'),'zone'] = 'SS & SSW'
df.loc[df['zone'].str.contains('Carson'),'zone'] = 'SS & SSW'
df.loc[df['zone'].str.contains('Beechview'),'zone'] = 'Beechview'
df.loc[df['zone'].str.contains('East Liberty'),'zone'] = 'East Liberty'
df.loc[df['zone'].str.contains('West End'),'zone'] = 'West End'
df.loc[df['zone'].str.contains('Knoxville'),'zone'] = 'Knoxville'
df.loc[df['zone'].str.contains('Bakery Sq'),'zone'] = 'Bakery Square'

#probably shouldn't do this but it really wont change the data all that much
df = df[df["zone"].str.contains("Lot") == False]

#clean up the duplicates. 
df.drop_duplicates(subset=['zone', 'rate'], keep='last')
#sort for ascending
df = df.sort_values('rate', ascending = False)

#copy over the array
df2 = df[['zone','rate']]
df2 = df2.drop_duplicates(subset=['zone','rate'])
#print(df[['zone', 'rate']])

#purge the index to reset it
df2 = df2.reset_index(drop=True)

#df.plot(x ='zone', y='rate', kind = 'scatter', figsize=(10,10))

#group up duplicate zones and set the value to the mean
dfa = df2.groupby('zone').mean()

#sort again
dfa =dfa.sort_values('rate',ascending = False)

#drop the index again
dfa = dfa.reset_index(drop=False)
dfa=dfa.rename_axis('Parking Score').reset_index()


hIncome = pd.read_csv("HouseholdIncome.csv")

hIncome = hIncome.drop(hIncome.index[71]) #drop South Shore b/c population was 8
hIncome = hIncome.drop(hIncome.index[21]) #drop Chateau b/c population was 3 

#sort through data and calculate average income 
i=0
avgArray = []
maxIndex = len(hIncome.index)
while i < maxIndex:
    totalIncome = 0
    i2 = 4
    neighborhoodPop = 0
    
    while i2 < 33:
    
        popTotal = hIncome.iloc[i, i2] 
        neighborhoodPop += popTotal
        
        if i2 == 4:
            totalIncome += (popTotal * 5000)
        elif i2 == 6:
            totalIncome += (popTotal * 12500)
        elif i2 == 8:
            totalIncome += (popTotal * 17500)
        elif i2 == 10:
            totalIncome += (popTotal * 22500)
        elif i2 == 12:
            totalIncome += (popTotal * 27500)
        elif i2 == 14:
            totalIncome += (popTotal * 32500)
        elif i2 == 16:
            totalIncome += (popTotal * 37500)
        elif i2 == 18:
            totalIncome += (popTotal * 42500)
        elif i2 == 20:
            totalIncome += (popTotal * 47500)
        elif i2 == 22:
            totalIncome += (popTotal * 55000)
        elif i2 == 22:
            totalIncome += (popTotal * 67500)
        elif i2 == 24:
            totalIncome += (popTotal * 87500)
        elif i2 == 26:
            totalIncome += (popTotal * 112500)
        elif i2 == 28:
            totalIncome += (popTotal * 137500)
        elif i2 == 30:
            totalIncome += (popTotal * 175000) 
        elif i2 == 32:
            totalIncome += (popTotal * 250000)
        else: 
            totalIncome = totalIncome
        i2 += 2
         
    average = (totalIncome / neighborhoodPop)
    avgArray.append(average)
    i += 1
        
hIncome["Average Income"] = avgArray
hIncomeS = hIncome.sort_values("Average Income") #arrange from least to most wealthy 
hIncomeS["Average Income"].describe()
#find and print out highest income neighborhoods
hIncomeS = hIncomeS.reset_index(drop=True)
topIncome = hIncomeS[['Neighborhood', 'Average Income']]
topIncome = topIncome.sort_values("Average Income", ascending = False)


hv = pd.read_csv("Housing.csv")
hv = hv.sort_values('Median Home Value', ascending = False)
hve =hv.iloc[:,[0,21]]
hve = hve.dropna()
hve = hve = hve.reset_index(drop=True)
#hve=hve.rename_axis('House Value Score').reset_index()
#clean up the data sets to all match for merging
for index, row in dfa.iterrows():
    hve.loc[hve['Neighborhood'].str.contains(row[1]),'Neighborhood'] = row[1]
    topIncome.loc[topIncome['Neighborhood'].str.contains(row[1]),'Neighborhood'] = row[1]
    hve.loc[hve['Neighborhood'].str.contains('Central Business District'),'Neighborhood'] = row[1]
    topIncome.loc[topIncome['Neighborhood'].str.contains('Central Business District'),'Neighborhood'] = row[1]
    hve.loc[hve['Neighborhood'].str.contains('North Shore'),'Neighborhood'] = row[1]
    topIncome.loc[topIncome['Neighborhood'].str.contains('North Shore'),'Neighborhood'] = row[1]

#this is basically just sorting and cleaning the index
topIncome = topIncome.reset_index(drop=True)
topIncome = topIncome.groupby('Neighborhood').mean()
topIncome = topIncome.sort_values('Average Income', ascending =False)
topIncome = topIncome.reset_index()
topIncome=topIncome.rename_axis('Income Score').reset_index()
#ready to be merged based on the zone
merged = pd.merge(dfa, topIncome, left_on = 'zone', right_on = 'Neighborhood', how = 'inner').drop('Neighborhood', axis = 1)

#basic cleaning like before
hve = hve.reset_index(drop=True)
hve = hve.groupby('Neighborhood').mean()
hve =hve.sort_values('Median Home Value', ascending =False)
hve = hve.reset_index()
hve=hve.rename_axis('House Value Score').reset_index()
#basic merge like before
merged = pd.merge(merged, hve, left_on = 'zone', right_on = 'Neighborhood', how = 'inner').drop('Neighborhood', axis = 1)

#add by columns that are the scores
merged["Overall Score"] = merged[['House Value Score','Income Score', 'Parking Score']].sum(axis=1)
#sort to rank the neighborhoods
merged =merged.sort_values('Overall Score', ascending =True)
merged = merged.reset_index()
merged=merged.rename_axis('Rank').reset_index()
#print the merged set
merged

Unnamed: 0,Rank,index,Parking Score,zone,rate,Income Score,Average Income,House Value Score,Median Home Value,Overall Score
0,0,2,5,Strip District,1.5,0,108655.606407,6,161800.0,11
1,1,4,7,Squirrel Hill,1.25,6,86603.691278,0,273150.0,13
2,2,0,0,Downtown,4.0,7,83781.526298,10,125800.0,17
3,3,3,6,Shadyside,1.5,22,68304.693141,1,264860.0,29
4,4,9,16,Brookline,1.0,19,68990.552834,29,82150.0,64
5,5,10,17,Bloomfield,1.0,35,58864.552991,24,92840.0,76
6,6,1,3,Oakland,1.958333,66,36983.885236,8,138912.5,77
7,7,11,18,Beechview,1.0,32,60493.495475,43,72400.0,93
8,8,6,10,Lawrenceville,1.0,37,58515.144292,58,57483.333333,105
9,9,12,19,Carrick,0.5,41,54918.618267,45,66900.0,105


# Conclusion

## * The best neighborhood to break into a car is : ***Downtown***

## * The best neighborhood to burgle a house is : ***Squirrel Hill***

## * The best neighborhood to mug someone is : ***Strip District***



# What is the best neighborhood to steal in general? How do we determine that?
We can sort in descending order and, then use the index as a metric. Then, We add up the indexs of the common neighborhoods, and the one with the lowest sum wins. 

# Best Neighborhood to rob is: The home of *Wholey's* 
![fish](https://media.giphy.com/media/Qs6sTHhObxQSzP4esz/giphy.gif)
# Strip District
## Runner ups:
* Squirrel Hill
* Downtown
* Shady Side
