# Covid cases throughout Allegheny County

One of the metrics we decided to use was the amount of covid cases (updated 5/15/2023) throughout the county

In [2]:
import pandas as pd

import numpy as np

covid_cases = pd.read_csv("covid_cases_by_neighborhood_municipality.csv", index_col="_id")

In [25]:
covid_cases.head(221)

Unnamed: 0_level_0,neighborhood_municipality,infections,reinfections,deaths,hospitalizations,pcr_tests,positive_pcr_tests,update_date
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
92170,Aleppo,2188,169,19,28,2893,328,2023-05-15
92171,Allegheny Center (Pittsburgh),1703,195,5,19,3137,414,2023-05-15
92172,Allegheny West (Pittsburgh),524,76,0,5,1136,119,2023-05-15
92173,Allentown (Pittsburgh),3529,583,2,40,5171,679,2023-05-15
92174,Arlington (Pittsburgh),2607,397,9,36,4062,502,2023-05-15
...,...,...,...,...,...,...,...,...
92386,Whitehall,24844,2907,75,254,37120,4166,2023-05-15
92387,Wilkins,8411,936,16,107,13439,1665,2023-05-15
92388,Wilkinsburg,23248,3299,54,347,40053,4477,2023-05-15
92389,Wilmerding,2686,449,5,39,3814,493,2023-05-15


As we can see, there is a lot of data to analyze and not much room to list it all. So, let's just look at the averages throughout Allgheny County.

In [16]:
covid_cases.describe()

Unnamed: 0,infections,reinfections,deaths,hospitalizations,pcr_tests,positive_pcr_tests
count,221.0,221.0,221.0,221.0,221.0,221.0
mean,8509.108597,1049.20362,15.886878,80.40724,13877.520362,1517.208145
std,10967.748354,1427.712407,25.166191,107.012347,18182.635591,1813.107754
min,49.0,0.0,0.0,0.0,66.0,6.0
25%,1901.0,221.0,3.0,18.0,2970.0,387.0
50%,4206.0,526.0,7.0,48.0,6991.0,843.0
75%,10200.0,1259.0,18.0,92.0,16848.0,1975.0
max,61217.0,8826.0,158.0,802.0,101828.0,10896.0


Now let's list the top 5 most and least infections throughout the county.

In [36]:
covid_cases.groupby('infections').max()

Unnamed: 0_level_0,neighborhood_municipality,reinfections,deaths,hospitalizations,pcr_tests,positive_pcr_tests,update_date
infections,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
49,Trafford,5,0,1,66,6,2023-05-15
59,Undefined (Pittsburgh),0,0,0,91,12,2023-05-15
133,Haysville,34,0,1,251,14,2023-05-15
159,Glenfield,16,0,3,355,46,2023-05-15
216,Arlington Heights (Pittsburgh),16,3,4,579,44,2023-05-15
...,...,...,...,...,...,...,...
47413,Bethel Park,5623,99,354,70683,8335,2023-05-15
48856,Mount Lebanon,5372,93,254,86681,8201,2023-05-15
51684,Monroeville,7024,136,612,71180,8137,2023-05-15
53323,Ross,6752,158,492,80411,8054,2023-05-15


As we can see, with only 49 infections, Trafford had the least amount of infections out of all the locations in Allegheny County, while Penn Hills had the most.

However, we also need to analyze both reinfections and amount of deaths. So, let's find out.

In [49]:
covid_cases.groupby('neighborhood_municipality')['reinfections'].sum().sort_values()

neighborhood_municipality
Undefined (Pittsburgh)               0
Trafford                             5
Rosslyn Farms                        9
Arlington Heights (Pittsburgh)      16
Glenfield                           16
                                  ... 
Scott                             6440
Ross                              6752
Monroeville                       7024
McKeesport                        7388
Penn Hills                        8826
Name: reinfections, Length: 221, dtype: int64

In [48]:
covid_cases.groupby('neighborhood_municipality')['deaths'].sum().sort_values()

neighborhood_municipality
Glen Osborne                0
Undefined (Pittsburgh)      0
Haysville                   0
Thornburg                   0
Bradford Woods              0
                         ... 
McKeesport                114
Penn Hills                124
Monroeville               136
McCandless                140
Ross                      158
Name: deaths, Length: 221, dtype: int64

As we can see, for reinfections, and undefined neighborhood in Pittsburgh takes first with zero. We also see that Trafford shows up again and takes the number two spot.

For deaths, multiple locations have zero deaths, and the undefined neighborhood in Pittsburgh shows up again. 

From what we can deduce here, we have multiple locations that seem to be pretty safe. Many recurring places, like the undefined neighborhood in Pittsburgh, Trafford, Glensfield, and Arlington Heights are present in the data that we see.