### Exploring KPI's

I will be analysing the datasets from the previous two tasks and look at the following:

- Total cases to deaths ratio per county.
- Fist to second dose completion ratio per age group. 
- Amount of 3rd doses taken per age group.

In [39]:
import pandas as pd
import plotly_express as px

### Total cases to deaths ratio per county.

In [40]:
totals_per_region = pd.read_excel("./Data/Folkhalsomyndigheten_Covid19.xlsx", sheet_name="Totalt antal per region")

In [41]:
totals_per_region.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 5 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Region                        21 non-null     object 
 1   Totalt_antal_fall             21 non-null     int64  
 2   Fall_per_100000_inv           21 non-null     float64
 3   Totalt_antal_intensivvårdade  21 non-null     int64  
 4   Totalt_antal_avlidna          21 non-null     int64  
dtypes: float64(1), int64(3), object(1)
memory usage: 972.0+ bytes


In [42]:
totals_per_region.isna().sum()

Region                          0
Totalt_antal_fall               0
Fall_per_100000_inv             0
Totalt_antal_intensivvårdade    0
Totalt_antal_avlidna            0
dtype: int64

We have no NaN values so cleaning isn't needed. We can also see that we won't need "Fall_per_100000_inv" and "Totalt_antal_intensivvårdade" for now.


In [43]:
totals_per_region.drop(["Fall_per_100000_inv","Totalt_antal_intensivvårdade"],axis=1, inplace=True)

In [44]:
totals_per_region

Unnamed: 0,Region,Totalt_antal_fall,Totalt_antal_avlidna
0,Blekinge,30829,184
1,Dalarna,75091,544
2,Gotland,11874,82
3,Gävleborg,74803,754
4,Halland,108822,518
5,Jämtland Härjedalen,34347,197
6,Jönköping,89662,756
7,Kalmar,62810,385
8,Kronoberg,51460,410
9,Norrbotten,50755,459


Now we're adding a "death_percentage" column to the DataFrame.

In [45]:
totals_per_region["death_percentage"] = (totals_per_region["Totalt_antal_avlidna"] / 
                                         totals_per_region["Totalt_antal_fall"]) * 100

In [46]:
totals_per_region

Unnamed: 0,Region,Totalt_antal_fall,Totalt_antal_avlidna,death_percentage
0,Blekinge,30829,184,0.596841
1,Dalarna,75091,544,0.724454
2,Gotland,11874,82,0.690584
3,Gävleborg,74803,754,1.007981
4,Halland,108822,518,0.476007
5,Jämtland Härjedalen,34347,197,0.573558
6,Jönköping,89662,756,0.843167
7,Kalmar,62810,385,0.61296
8,Kronoberg,51460,410,0.796735
9,Norrbotten,50755,459,0.904344


It's interesting that the amount of deaths does not seem to follow the amount of total cases. At the bottom we have Halland with over 100k cases but a death percentage under 0.5%, meanwhile Västernorrland with a bit over half the cases of Halland has a death percentage over 1.2%.

Out of the big three, Stockholm seems to stand out with a death percentage at over 0.9%, while Västra Götaland and Skåne are pretty close together, both at around 0.75%.

In [47]:
death_percentage_region_bar = px.bar(totals_per_region,
                                         x="Region",
                                         y="death_percentage",
                                         title="Percentage of deaths from covid-cases per region",
                                         labels={"death_percentage": "Death Percentage", "Region": "Region"},
                                         category_orders={"Region": totals_per_region.sort_values("death_percentage", ascending=False)["Region"]}
                                         )
death_percentage_region_bar.show()
death_percentage_region_bar.write_html("./Visualisations/Task_3_plotly_death_percentage_per_region.html")

### First to second dose completion ratio.

In [48]:
vaccine_data = pd.read_excel("./Data/folkhalsomyndigheten_Covid19_Vaccine.xlsx", sheet_name= "Vaccinerade kommun och ålder")

In [49]:
vaccine_data["completion_rate"] = (vaccine_data["Antal minst 2 doser"] / vaccine_data["Antal minst 1 dos"])*100

In [50]:
vaccine_data

Unnamed: 0,Län,Län_namn,Kommun,Kommun_namn,Ålder,Befolkning,Antal minst 1 dos,Antal minst 2 doser,Antal 3 doser,Antal 4 doser,Andel minst 1 dos,Andel minst 2 doser,Andel 3 doser,Andel 4 doser,completion_rate
0,1,Stockholms län,114,Upplands Väsby,12-15,2422,1206,1046,,,0.497936,0.431874,,,86.733002
1,1,Stockholms län,114,Upplands Väsby,16-17,1203,839,755,,,0.697423,0.627598,,,89.988081
2,1,Stockholms län,114,Upplands Väsby,18-29,6692,4887,4469,1959.0,,0.730275,0.667812,0.292738,,91.446695
3,1,Stockholms län,114,Upplands Väsby,30-39,7332,5542,5240,2878.0,,0.755865,0.714675,0.392526,,94.550704
4,1,Stockholms län,114,Upplands Väsby,40-49,6946,5592,5429,3719.0,,0.805068,0.781601,0.535416,,97.085122
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2895,25,Norrbottens län,2584,Kiruna,50-59,3079,2878,2860,2482.0,,0.934719,0.928873,0.806106,,99.374566
2896,25,Norrbottens län,2584,Kiruna,60-69,2781,2648,2633,2434.0,,0.952175,0.946782,0.875225,,99.433535
2897,25,Norrbottens län,2584,Kiruna,70-79,2194,2115,2108,2034.0,1784.0,0.963993,0.960802,0.927074,0.813127,99.669031
2898,25,Norrbottens län,2584,Kiruna,80-89,1280,1256,1253,1220.0,1091.0,0.981250,0.978906,0.953125,0.852344,99.761146


We want to get the mean completion rate per age group.

In [51]:
mean_completion_rates = vaccine_data.groupby("Ålder")["completion_rate"].mean().reset_index()

In [52]:
mean_completion_rates

Unnamed: 0,Ålder,completion_rate
0,12-15,92.821832
1,16-17,94.509599
2,18-29,95.699239
3,30-39,97.038878
4,40-49,98.372727
5,50-59,99.066984
6,60-69,99.373774
7,70-79,99.620643
8,80-89,99.723614
9,90 eller äldre,99.708721


As could have been expected, we can see that the completion rate for getting both doses follows the age groups pretty well. Ages 90 or older all the way down to 50 have a completion rate in the 99% range, dropping off as the age goes down to 12-15 year olds with a completion rate of under 93%. This would be in line with the distribution of the covid vaccine (both first and second dose) prioritising older and more vulnerable people before younger and healthier ones.

The only age group that doesn't perfectly follow the trend is the "90 or older" group that has a completion rate that is ever so slightly lower than the "80-89" group.

In [58]:
mean_completion_rates_bar = px.bar(mean_completion_rates,
                                         x="Ålder",
                                         y="completion_rate", range_y=[90,100],
                                         title="Completion rate of covid vaccine (dose 1 + 2) per age-group",
                                         labels={"completion_rate": "Completion Rate", "Ålder": "Age"},
                                         category_orders={"Ålder": mean_completion_rates.sort_values("completion_rate", ascending=False)["Ålder"]}
                                         )
mean_completion_rates_bar.show()
mean_completion_rates_bar.write_html("./Visualisations/Task_3_plotly_mean_completion_rates_per_age_group.html")