# Introduction

#### For this project, we were asked to find the best neighborhood in Pittsburgh using datasets that we could find or create about the neighborhoods themselves. 

#### Our motivation behind the datasets we chose was to find the safest neighborhood in Pittsburgh, which was what turned into our metric that would guide us in creating the three datasets.

#### The three main datasets that we ended up choosing were: COVID-19 cases, crime rates in the neighborhood, and car crashes. We had some difficulties in finding data for car crashes, and the solution is mentioned when we get to that section.


#### Below, we imported pandas, numpy, and matplotlib into our file, so that we could use them later on.

In [1]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

# COVID-19

#### Our first dataset that we will be analysing is COVID-19 cases across the city of Pittsburgh. We first read the COVID-19 data into a coviddata variable, and dropped the unneeded lines of data in the lines below.

#### Next, we removed neighborhoods that had tested less that 100 people, as they could skew our results in a different direction. After completing this, we displayed the results as the highest and lowest percentages of positive cases per covid test. 

In [2]:
coviddata = pd.read_csv("covid_19_cases_by_place.csv", index_col="neighborhood_municipality", parse_dates=True)
coviddata.drop('Undefined', inplace = True)

##remove neighborhoods that tested less than 100 people
index = coviddata[coviddata['indv_tested'] < 100 ].index 
coviddata.drop(index, inplace=True)

# create percentage column for number of cases per test
coviddata['percentage'] = (coviddata['cases']/coviddata['indv_tested'])*100

#sort data by lowest covid percentage
coviddata_sorted = coviddata.sort_values(by='percentage')

#make cleaner dataframe with rankings
covidPercent = list(coviddata_sorted.percentage)
covidNeighborhoods = list(coviddata_sorted.index)
covidRanks = pd.DataFrame(columns=['Neighborhood', 'Case Percentage'])
covidRanks['Neighborhood'] = covidNeighborhoods
covidRanks['Case Percentage'] = covidPercent

print('\033[1m' + '\n-----Top 10 neighborhoods for minimum covid cases-----'+ '\033[0m')
covidRanks.head(10)

[1m
-----Top 10 neighborhoods for minimum covid cases-----[0m


Unnamed: 0,Neighborhood,Case Percentage
0,Squirrel Hill North (Pittsburgh),6.741168
1,Edgeworth,8.395802
2,Friendship (Pittsburgh),8.809524
3,Point Breeze (Pittsburgh),8.840263
4,Shadyside (Pittsburgh),9.550725
5,Edgewood,9.705648
6,Squirrel Hill South (Pittsburgh),10.123826
7,Swisshelm Park (Pittsburgh),10.123967
8,Regent Square (Pittsburgh),10.194175
9,North Shore (Pittsburgh),10.441767


### Based on this dataset,

#### We can conclude that the best neighborhood in Pittsburgh is Squirrel Hill North. However, this is only taking into account the number of COVID-19 cases that the area has, so it far from a final decision.

# Car Crashes

#### For this dataset, we struggled to find information or a solid set of data. What eventually ended up happening was I spent about three hours counting the dots on the car crash map provided on the WPRDC website. The numbers may not be perfect, but I believe that I got a good count of the average number of crashes in a year.

#### Since I made the .csv file myself with no extra columns, it was relatively simple to load it into a variable, sort the list, and print it out in order from the least crashes to the greatest.

In [3]:
crashdata = pd.read_csv("Crash_Data.csv", parse_dates=True)
crashdata_sorted = crashdata.sort_values(by='CRASHES', ignore_index=True,)
crashdata_sorted.head(10)

Unnamed: 0,NEIGHBORHOOD,CRASHES
0,Brunot Island,0
1,Herrs Island,2
2,Arlington Heights,12
3,Esplen,13
4,Sheraden,14
5,Friendship,15
6,Fairywood,17
7,Bedford Dwellings,17
8,Mt Oliver,21
9,Middle Hill,24


### Based on this dataset, 
#### We can conclude that the the best neighborhood as far as car crashes go is "Brunot Island", but since it and Herrs Island are not technically Neighborhoods, we will omit them, and go with the third best, which is "Arlington Heights"

## Crime Data

#### For this dataset, we were looking at all of the crime data that did NOT involve car crashes, since this was looked at previously. 

#### First, we read in the .csv file to the variable crimedata. 

#### Next, we set crimedata to ignore all occurances of a crime that did not have a listed neighborhood, added a column for the number of occurances in each neighborhood, and removed all of the duplicates that appeared.

#### Finally, after adding a "reports" column and sorting, we printed out the dataset up to the 10th best Neighborhood in this category.

In [4]:
crimedata = pd.read_csv("non_traffic_citations.csv", parse_dates=True)

#ignore occurences with no listed neighborhood
crimedata = crimedata[crimedata["NEIGHBORHOOD"].str.contains('Unable To Retrieve Address')==False]

# Add column of occurences of each neighborhood
crimedata['counts'] = crimedata['NEIGHBORHOOD'].map(crimedata['NEIGHBORHOOD'].value_counts())

# remove all the duplictes so each neighborhod is listed once
crimedata = crimedata[~(crimedata.duplicated(['NEIGHBORHOOD']))].reset_index(drop=True)

# sort data by number of reported crimes
crimedata_sorted = crimedata.sort_values(by='counts', ignore_index=True,)

#cleaner dataframe
crimeNeighborhood = list(crimedata_sorted.NEIGHBORHOOD)
crimeReports = list(crimedata_sorted.counts)
crimeRanks = pd.DataFrame(columns=[ 'neighborhood', 'reports'])
crimeRanks['neighborhood'] = crimeNeighborhood
crimeRanks['reports'] = crimeReports

print('\033[1m' + '-----Top 10 Neighborhoods with Lowest Reported Non-Traffic Crimes' + '\033[0m')
crimeRanks.head(10)

[1m-----Top 10 Neighborhoods with Lowest Reported Non-Traffic Crimes[0m


Unnamed: 0,neighborhood,reports
0,Chartiers City,1
1,Mt. Oliver Boro,1
2,Ridgemont,1
3,Outside State,2
4,St. Clair,3
5,Swisshelm Park,4
6,Mt. Oliver Neighborhood,4
7,Oakwood,5
8,New Homestead,5
9,Summer Hill,5


### Based on this dataset,

#### We can conclude that "Chartiers City", "Mt. Oliver", or "Ridgemont" are the best neighborhoods to live in if you want to avoid non-vehicular crime.

# Averages

#### In this section, we take the average of the best options for each of the three datasets in order to see which is the best across all three. The neighborhoods are loaded into a bestoptions variable, and the averages are then calculated based on where each neighborhood appears in the lists. Friendship recieved a -3 bonus on its score because it appeared in the top 20 of all three categories. 

#### We then sort the list and print out the neighborhoods that rank top 20 in at least two categories.

In [5]:
# list of all neighborhoods that appear in the top 20 of at least two categories
bestoptions = ['Friendship', 'Swisshelm Park', 'Regent Square', 'Arlington Heights', 'Mt. Oliver', 'East Carnegie']
#Average rank calculated by average of each neighborhoods top 20 rankings
friendship_avg = ((19+6+3)/3)-3 #Friendship recieves a minus three bonus for appearing in the top 20 of all three categories
swisshelm_avg = (8+6)/2
regent_avg = (9+14)/2
arlington_avg = (3+12)/2
mtoliver_avg = (2+9)/2
ecarnegie_avg = (18+11)/2
avglist = [friendship_avg, swisshelm_avg, regent_avg, arlington_avg, mtoliver_avg, ecarnegie_avg]

avgRanks =  pd.DataFrame(columns=['Neighborhood', 'Average Rank'])
avgRanks['Neighborhood'] = bestoptions
avgRanks['Average Rank'] = avglist

avgRanks_sorted = avgRanks.sort_values(by='Average Rank', ignore_index=True,)

print('\033[1m' + 'Neighborhoods which rank in the top 20 in at least two categories:' + '\033[0m')
avgRanks_sorted.head(10)

[1mNeighborhoods which rank in the top 20 in at least two categories:[0m


Unnamed: 0,Neighborhood,Average Rank
0,Mt. Oliver,5.5
1,Friendship,6.333333
2,Swisshelm Park,7.0
3,Arlington Heights,7.5
4,Regent Square,11.5
5,East Carnegie,14.5


# Conclusion

#### As a review, here are the three best neighborhoods from our datasets,

#### For staying Covid free: Squirrel Hill North
#### For avoiding car crashes: Arlington Heights
#### For avoiding crime: Chartiers City, Mt. Oliver, or Ridgemont




#### Based on the data above, in our opinon the best neighborhood in Pittsburgh is "Mt. Oliver". Two of the datasets that we had strongly reflected this idea, and while it isn't in the top 20 for COVID-19 cases, we feel that this can be looked over in favor of how high it has placed in the other two categ

# Sources

https://data.wprdc.org/dataset/allegheny-county-covid-19-tests-cases-and-deaths

https://data.wprdc.org/dataset/allegheny-county-crash-data

https://data.wprdc.org/dataset/non-traffic-citations

https://data.wprdc.org/dataset/fire-incidents-in-city-of-pittsburgh