### Exploring KPI's

I will be analysing the datasets from the previous two tasks and look at the following:

- Total cases to deaths ratio per county.
- Fist to second dose completion ratio. 
- Amount of 3rd doses taken per age group.

In [1]:
import pandas as pd
import plotly_express as px

### Total cases to deaths ratio per county.

In [14]:
totals_per_region = pd.read_excel("./Data/Folkhalsomyndigheten_Covid19.xlsx", sheet_name="Totalt antal per region")

In [15]:
totals_per_region.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 5 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Region                        21 non-null     object 
 1   Totalt_antal_fall             21 non-null     int64  
 2   Fall_per_100000_inv           21 non-null     float64
 3   Totalt_antal_intensivvårdade  21 non-null     int64  
 4   Totalt_antal_avlidna          21 non-null     int64  
dtypes: float64(1), int64(3), object(1)
memory usage: 972.0+ bytes


In [16]:
totals_per_region.isna().sum()

Region                          0
Totalt_antal_fall               0
Fall_per_100000_inv             0
Totalt_antal_intensivvårdade    0
Totalt_antal_avlidna            0
dtype: int64

We have no NaN values so cleaning isn't needed. We can also see that we won't need "Fall_per_100000_inv" and "Totalt_antal_intensivvårdade" for now.


In [17]:
totals_per_region.drop(["Fall_per_100000_inv","Totalt_antal_intensivvårdade"],axis=1, inplace=True)

In [18]:
totals_per_region

Unnamed: 0,Region,Totalt_antal_fall,Totalt_antal_avlidna
0,Blekinge,30829,184
1,Dalarna,75091,544
2,Gotland,11874,82
3,Gävleborg,74803,754
4,Halland,108822,518
5,Jämtland Härjedalen,34347,197
6,Jönköping,89662,756
7,Kalmar,62810,385
8,Kronoberg,51460,410
9,Norrbotten,50755,459


Now we're adding a "death_percentage" column to the DataFrame.

In [23]:
totals_per_region["death_percentage"] = (totals_per_region["Totalt_antal_avlidna"] / 
                                         totals_per_region["Totalt_antal_fall"]) * 100

In [24]:
totals_per_region

Unnamed: 0,Region,Totalt_antal_fall,Totalt_antal_avlidna,death_percentage
0,Blekinge,30829,184,0.596841
1,Dalarna,75091,544,0.724454
2,Gotland,11874,82,0.690584
3,Gävleborg,74803,754,1.007981
4,Halland,108822,518,0.476007
5,Jämtland Härjedalen,34347,197,0.573558
6,Jönköping,89662,756,0.843167
7,Kalmar,62810,385,0.61296
8,Kronoberg,51460,410,0.796735
9,Norrbotten,50755,459,0.904344


It's interesting that the amount of deaths does not seem to follow the amount of total cases. At the bottom we have Halland with over 100k cases but a death percentage under 0.5%, meanwhile Västernorrland with a bit over half the cases of Halland has a death percentage over 1.2%.

Out of the big three, Stockholm seems to stand out with a death percentage at over 0.9%, while Västra Götaland and Skåne are pretty close together, both at around 0.75%.

In [33]:
death_percentage_region_bar = px.bar(totals_per_region,
                                         x="Region",
                                         y="death_percentage",
                                         title="Percentage of deaths from covid-cases per region",
                                         labels={"death_percentage": "Death Percentage", "Region": "Region"},
                                         category_orders={"Region": totals_per_region.sort_values("death_percentage", ascending=False)["Region"]}
                                         )
death_percentage_region_bar.show()
death_percentage_region_bar.write_html("./Visualisations/Task_3_plotly_death_percentage_per_region.html")