First, we do some initial setup by importing pandas and numpy.

In [1]:
import pandas as pd
import numpy as np

In [2]:
%matplotlib inline

We will be analyzing the Covid-19 data for each neighborhood as a metric of the best neighborhood in Pittsburgh. As Covid-19 will not be disappearing anytime within the next few months, any neighborhood that purports to be the best in the city must have a low number of Covid-19 cases and deaths, and extensive Covid-19 testing.

In [3]:
covid_data = pd.read_csv("https://data.wprdc.org/datastore/dump/0f214885-ff3e-44e1-9963-e9e9062a04d1",
                         index_col="neighborhood_municipality",
                         parse_dates=True)

The data describes the number of individuals tested, the number of cases, and the number of deaths. This is exactly what we wanted to analyze as a metric of each neighborhood's Covid-19 status. Here is an example of some of the data:

In [4]:
covid_data.head(5)

Unnamed: 0_level_0,indv_tested,cases,deaths,update_date
neighborhood_municipality,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Aleppo,362,6,0,2020-10-26
Allegheny Center (Pittsburgh),195,14,0,2020-10-26
Allegheny West (Pittsburgh),94,3,0,2020-10-26
Allentown (Pittsburgh),327,30,0,2020-10-26
Arlington (Pittsburgh),255,25,1,2020-10-26


The first metric we wil be analyzing is the number of cases for each neighborhood. This is possibly the most important factor, perhaps even above deaths, for each neighborhood: deaths scale with cases, so the more cases, the more deaths. While most neighborhoods at least have one case, there are a handful that have none.

In [12]:
covid_data[['cases']].sort_values('cases').head(10)

Unnamed: 0_level_0,cases
neighborhood_municipality,Unnamed: 1_level_1
Trafford,0
St. Clair (Pittsburgh),0
Chateau (Pittsburgh),0
Mt. Oliver (Pittsburgh),0
Arlington Heights (Pittsburgh),1
Mcdonald,2
Haysville,2
Wall,2
Ridgemont (Pittsburgh),2
Chartiers City (Pittsburgh),2


The second metric we will be analyzing is the number of deaths. A neighborhood with a high number of Covid-19 deaths will struggle to call itself the best in the city, thus, the best neighborhood should have few or no deaths. As many neighborhoods have very few cases, there are many neighborhoods that have no deaths.

In [13]:
covid_data[['deaths']].sort_values('deaths').head(10)

Unnamed: 0_level_0,deaths
neighborhood_municipality,Unnamed: 1_level_1
Aleppo,0
Lower Lawrenceville (Pittsburgh),0
Marshall,0
Marshall-Shadeland (Pittsburgh),0
Mcdonald,0
Mckees rocks,0
Middle Hill (Pittsburgh),0
Mt. Oliver (Pittsburgh),0
New Homestead (Pittsburgh),0
North Braddock,0


The third and last metric we will be analyzing is the number of individuals tested. This metric is, in my opinion, the least important. A neighborhood that has more actual cases will be more likely to have more tests as contact-tracing goes underway and networks of people who have interacted with an infected individual receive tests. In addition, someone with symptoms will be likely to the doctor and have a test taken. 

However, there should still be testing occurring even if there are few cases in a neighborhood. Thus, while this is the least important metric in my opinion, it should still be considered.

In [14]:
covid_data[['indv_tested']].sort_values('indv_tested').tail(10).iloc[::-1]

Unnamed: 0_level_0,indv_tested
neighborhood_municipality,Unnamed: 1_level_1
Mount Lebanon,9462
Penn Hills,6497
Monroeville,5235
Ross,5206
McCandless,4601
Bethel Park,4120
Baldwin Borough,3948
Bluff (Pittsburgh),3789
Plum,3781
Shaler,3767


In accordance with the comparative importance of these three metrics for Covid-19 described above, we will assign a score from 0 to 1 to each neighborhood. The number of cases will constitute 50% of this score; the number of deaths, 40%; and the number of tests, 10%.

The score for each neighborhood's cases is scaled so that the neighborhood with the most cases has a score of 0, and a neighborhood with no cases has the maximum score. The score for each neighborhood's deaths is calculated in the same way. The score for each neighborhood's testing is calculated so that the neighborhood with the largest number of tests receives the highest score.

In [8]:
cases_max = covid_data['cases'].max()
deaths_max = covid_data['deaths'].max()
testing_max = covid_data['indv_tested'].max()

In [9]:
scores = {}
for index, row in covid_data.iterrows():
    cases_score = 0.5 * (1 - (row['cases'] / cases_max)) # 50%
    deaths_score = 0.4 * (1 - (row['deaths'] / deaths_max)) # 40%
    testing_score = 0.1 * (row['indv_tested'] / testing_max) # 10%
    scores[index] = cases_score + deaths_score + testing_score
scores

{'Aleppo': 0.8986976245061217,
 'Allegheny Center (Pittsburgh)': 0.8900950631134524,
 'Allegheny West (Pittsburgh)': 0.8984293449100044,
 'Allentown (Pittsburgh)': 0.8778149033380486,
 'Arlington (Pittsburgh)': 0.8715713715597718,
 'Arlington Heights (Pittsburgh)': 0.8899387682721274,
 'Aspinwall': 0.8902880076310641,
 'Avalon': 0.8345132149677674,
 'Baldwin Borough': 0.3246516231807847,
 'Baldwin Township': 0.8708117880164217,
 'Banksville (Pittsburgh)': 0.7427264900423259,
 'Bedford Dwellings (Pittsburgh)': 0.864705647910096,
 'Beechview (Pittsburgh)': 0.8677186478708356,
 'Bell Acres': 0.8940937840430548,
 'Bellevue': 0.8394330357868721,
 'Beltzhoover (Pittsburgh)': 0.8713494311666203,
 'Ben Avon': 0.8900647122904574,
 'Ben Avon Heights': 0.8972681549409515,
 'Bethel Park': 0.7014537859148651,
 'Blawnox': 0.8838211805121806,
 'Bloomfield (Pittsburgh)': 0.8354961665562219,
 'Bluff (Pittsburgh)': 0.8408990889333312,
 'Bon Air (Pittsburgh)': 0.8965614143483516,
 'Brackenridge': 0.88029

In [10]:
scores_df = pd.DataFrame.from_dict(scores, orient='index', columns=['score'])

According to these scores, the following neighborhoods are the best neighborhoods in Pittsburgh according to this Covid-19 metric.

In [15]:
scores_df.sort_values('score').tail(10).iloc[::-1]

Unnamed: 0,score
Sewickley Heights,0.915579
Ridgemont (Pittsburgh),0.902286
Mt. Oliver (Pittsburgh),0.90055
St. Clair (Pittsburgh),0.900243
Chateau (Pittsburgh),0.90019
Trafford,0.900053
Polish Hill (Pittsburgh),0.899846
Chartiers City (Pittsburgh),0.898978
Wall,0.898946
Aleppo,0.898698


By a fair margin, Sewickley Heights has handled Covid-19 the best out of any neighborhood in Pittsburgh.