# Introduction

#### For this project, we were told to find the best neighborhood in Pittsburgh using datasets that we could find or create about the neighborhoods themselves. 

#### Our motivation behind the datasets we chose was to find the safest neighborhood in Pittsburgh, which was what turned into our metric that would guide us in creating the three datasets.

#### The three main datasets that we ended up choosing were: COVID-19 cases, Crime Rates in the neighborhood, and Fires in the nieghborhoods. A smaller, extra dataset was seperated from the Crime Rates in order to include Car Crashes, which was originally going to be one of our main datasets until we could not easily find data on the WPRDC website.

#### Below, we imported pandas, numpy, and matplotlib into our file, so that we could use them later on.

In [6]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

# COVID-19

#### Our first dataset that we will be analysing is COVID-19 cases across the city of Pittsburgh. We first read the COVID-19 data into a coviddata variable, and dropped the unneeded lines of data in the lines below.

#### Next, we removed neighborhoods that had only tested less that 100 people, as they could skew our results in a different direction. After completing this, we displayed the results as the highest and lowest percentages of positive cases per covid test. 

In [8]:
coviddata = pd.read_csv("covid_19_cases_by_place.csv", index_col="neighborhood_municipality", parse_dates=True)

coviddata.drop('Undefined', inplace = True)
coviddata.drop(['update_date'], axis=1, inplace=True)

##remove neighborhoods that tested less than 100 people
index = coviddata[coviddata['indv_tested'] < 100 ].index 
coviddata.drop(index, inplace=True)

coviddata['percentage'] = (coviddata['cases']/coviddata['indv_tested'])*100
min = coviddata['percentage'].min()
max = coviddata['percentage'].max()

print('\033[1m' + '-----Lowest Percentage of Positive Cases per Covid Test-----'+ '\033[0m') #prints in bold, pretty cool
print(coviddata.iloc[coviddata['percentage'].argmin()])
print('min is', int(min),'%')

print('\033[1m' + '\n-----Highest Percentage of Positive Cases per Covid Test-----'+ '\033[0m')
print(coviddata.iloc[coviddata['percentage'].argmax()])
print('max is', int(max),'%')

print('\033[1m' + '\n-----Top 15 neighborhoods for minimum covid cases-----'+ '\033[0m')
#sort data by lowest covid percentage
coviddata_sorted = coviddata.sort_values(by='percentage')

#make clean dataframe with rankings
covidNeighborhoods = list(coviddata_sorted.index)
covidRanks = pd.DataFrame(columns=['rank', 'neighborhood'])
covidRanks['neighborhood'] = covidNeighborhoods
covidrankings = list(covidRanks.index)
covidRanks['rank'] = covidrankings
covidRanks.head(15)

[1m-----Lowest Percentage of Positive Cases per Covid Test-----[0m
indv_tested    5548.000000
cases           374.000000
deaths            6.000000
percentage        6.741168
Name: Squirrel Hill North (Pittsburgh), dtype: float64
min is 6 %
[1m
-----Highest Percentage of Positive Cases per Covid Test-----[0m
indv_tested    108.000000
cases           34.000000
deaths           0.000000
percentage      31.481481
Name: West Elizabeth, dtype: float64
max is 31 %
[1m
-----Top 15 neighborhoods for minimum covid cases-----[0m


Unnamed: 0,rank,neighborhood
0,0,Squirrel Hill North (Pittsburgh)
1,1,Edgeworth
2,2,Friendship (Pittsburgh)
3,3,Point Breeze (Pittsburgh)
4,4,Shadyside (Pittsburgh)
5,5,Edgewood
6,6,Squirrel Hill South (Pittsburgh)
7,7,Swisshelm Park (Pittsburgh)
8,8,Regent Square (Pittsburgh)
9,9,North Shore (Pittsburgh)


### Based on this dataset,

#### We can conclude that the best neighborhood in Pittsburgh is Squirrel Hill North. However, this is only taking into account the number of COVID-19 cases that the area has, so it far from a final decision.

# Car Crashes

#### For this dataset, we struggled to find information or a solid set of data. What eventually ended up happening was I spent about three hours counting the dots on the car crash map provided on the WPRDC website. The numbers may not be perfect, but I believe that I got a good count of the average number of crashes in a year.

#### Since I made the .csv file in Excel myself with no extra columns, it was relatively simple to load it into a variable, sort the list, and print it out in order from the least crashes to the greatest.

In [23]:
crashdata = pd.read_csv("Crash_Data.csv", parse_dates=True)
crashdata_sorted = crashdata.sort_values(by='CRASHES', ignore_index=True,)
crashdata_sorted.head(10)

Unnamed: 0,NEIGHBORHOOD,CRASHES
0,Brunot Island,0
1,Herrs Island,2
2,Arlington Heights,12
3,Esplen,13
4,Sheraden,14
5,Friendship,15
6,Fairywood,17
7,Bedford Dwellings,17
8,Mt Oliver,21
9,Middle Hill,24


### Based on this dataset, 
#### We can conclude that the the best neighborhood as far as car crashes go is "Brunot Island", but since it and Herrs Island are not technically Neighborhoods, we will omit them, and go with the third best, which is "Arlington Heights"

## Crime Data

#### For this dataset, we were looking at all of the crime data that did NOT involve car crashes, since this was looked at previously. 

#### First, we read in the .csv file to the variable crimedata. After that, we dropped all of the unneccessary catergories, and for this one, there were quite a few. 

#### Next, we set crimedata to ignore all occurances of a crime that did not have a listed neighborhood, added a column for the number of occurances in each neighborhood, and removed all of the duplicates that appeared.

#### Finally, after adding a "Ranks" column, we printed out the dataset up to the 15th best Neighborhood in this category.

In [10]:
crimedata = pd.read_csv("non_traffic_citations.csv", parse_dates=True)
crimedata.drop(['PK'], axis=1, inplace=True)
crimedata.drop(['CCR'], axis=1, inplace=True)
crimedata.drop(['GENDER'], axis=1, inplace=True)
crimedata.drop(['RACE'], axis=1, inplace=True)
crimedata.drop(['AGE'], axis=1, inplace=True)
crimedata.drop(['CITEDTIME'], axis=1, inplace=True)
crimedata.drop(['INCIDENTLOCATION'], axis=1, inplace=True)
crimedata.drop(['INCIDENTTRACT'], axis=1, inplace=True)
crimedata.drop(['COUNCIL_DISTRICT'], axis=1, inplace=True)
crimedata.drop(['PUBLIC_WORKS_DIVISION'], axis=1, inplace=True)
crimedata.drop(['X'], axis=1, inplace=True)
crimedata.drop(['Y'], axis=1, inplace=True)
crimedata.drop(['OFFENSES'], axis=1, inplace=True)
crimedata.drop(['ZONE'], axis=1, inplace=True)

#ignore occurences with no listed neighborhood
crimedata = crimedata[crimedata["NEIGHBORHOOD"].str.contains('Unable To Retrieve Address')==False]

# Add column of occurences of each neighborhood
crimedata['counts'] = crimedata['NEIGHBORHOOD'].map(crimedata['NEIGHBORHOOD'].value_counts())

# remove all the duplictes so each neighborhod is listed once
crimedata = crimedata[~(crimedata.duplicated(['NEIGHBORHOOD']))].reset_index(drop=True)

# sort data by number of reported crimes
crimedata_sorted = crimedata.sort_values(by='counts', ignore_index=True,)

#add ranks column
crimeNeighborhood = (crimedata_sorted.NEIGHBORHOOD)
crimeRanks = pd.DataFrame(columns=['rank', 'neighborhood'])
crimeRanks['neighborhood'] = crimeNeighborhood
crimerankings = list(crimeRanks.index)
crimeRanks['rank'] = crimerankings


print('\033[1m' + '-----Top 15 Neighborhoods with Lowest Reported Non-Traffic Crimes' + '\033[0m')
crimeRanks.head(15)



[1m-----Top 15 Neighborhoods with Lowest Reported Non-Traffic Crimes[0m


Unnamed: 0,rank,neighborhood
0,0,Chartiers City
1,1,Mt. Oliver Boro
2,2,Ridgemont
3,3,Outside State
4,4,St. Clair
5,5,Swisshelm Park
6,6,Mt. Oliver Neighborhood
7,7,Oakwood
8,8,New Homestead
9,9,Summer Hill


### Based on this dataset,

#### We can conclude that "Chartiers City" is the best neighborhood to live in if you want to avoid non-vehicular crime.

# Fires

#### For this dataset, we first loaded Fire_data.csv into a firedata variable, then ignored datapoints that had no neighborhood listed.

#### Next, we added a column of occurances, removed all the duplicates, and sorted the data. Finally, we printed out the data so that the lowest reported fire number was on the top of the list.

In [11]:
firedata = pd.read_csv("Fire_Data.csv", parse_dates=True)

#ignore data points with no neighborhood listed
firedata = firedata[firedata["neighborhood"].str.contains('NaN')==False]

# Add column of occurences of each neighborhood
firedata['counts'] = firedata['neighborhood'].map(firedata['neighborhood'].value_counts())

# remove all the duplictes so each neighborhod is listed once
firedata = firedata[~(firedata.duplicated(['neighborhood']))].reset_index(drop=True)

# sort data by number of fires
firedata_sorted = firedata.sort_values(by='counts', ignore_index=True,)
firedata_sorted.head(50)

# make cleaner dataframe with ranks
fireNeighborhood = (firedata_sorted.neighborhood)
fireRanks = pd.DataFrame(columns=['rank', 'neighborhood'])
fireRanks['neighborhood'] = fireNeighborhood
firerankings = list(fireRanks.index)
fireRanks['rank'] = firerankings

print('\033[1m' + '-----Top 15 Neighborhoods with Lowest Reported Fires' + '\033[0m')
fireRanks.head(15)

[1m-----Top 15 Neighborhoods with Lowest Reported Fires[0m


Unnamed: 0,rank,neighborhood
0,0,Mount Oliver Borough
1,1,Regent Square
2,2,East Carnegie
3,3,Mt. Oliver
4,4,St. Clair
5,5,Arlington Heights
6,6,Ridgemont
7,7,Chartiers City
8,8,Oakwood
9,9,Hays


### Based on this dataset,

#### We can conclude that "Mount Oliver Borough" is the best neighborhood if you wish to avoid fires.

# Conclusion

#### As a review, here are the four best neighborhoods from our datasets:

#### Squirrel Hill North
#### Arlington Heights
#### Chartiers City
#### Mt Oliver Borough



#### Based on the data above, in our opinon the best neighborhood in Pittsburgh is "". All of the datasets that we had reflected this idea, and we feel confident in the final decision.

# Sources

https://data.wprdc.org/dataset/allegheny-county-covid-19-tests-cases-and-deaths

https://data.wprdc.org/dataset/allegheny-county-crash-data

https://data.wprdc.org/dataset/non-traffic-citations

https://data.wprdc.org/dataset/fire-incidents-in-city-of-pittsburgh