**Key Performance Indicators (KPIs): Data Analysis based on COVID-19 Data**


In [2]:
import pandas as pd
import seaborn as sns
import plotly.io as pio
import plotly_express as px
import matplotlib.pyplot as plt 


#The file path along with the variable.
file_path = "Data/Folkhalsomyndigheten_Covid19.xlsx"
work_sheet = "Totalt antal per åldersgrupp"

#Reads the file into a dataframe
df = pd.read_excel(file_path, sheet_name= work_sheet)
df.head()

Unnamed: 0,Åldersgrupp,Totalt_antal_fall,Totalt_antal_intensivvårdade,Totalt_antal_avlidna
0,Ålder_0_9,138071,109,17
1,Ålder_10_19,355823,101,9
2,Ålder_20_29,418506,285,41
3,Ålder_30_39,493443,492,71
4,Ålder_40_49,474702,997,172


**- A) KPI: Shows the average duration of COVID-19 illness across different age groups.**

I started by refining age group labels in the dataset, cleaning prefixes and replacing specific entries. The initial 11 rows of the modified dataset were displayed for a quick overview. Additionally, I computed the percentage of intensive care cases relative to total cases, presenting the first 11 rows of the updated dataset.

Moving on to visualization, I used Plotly Express to create a bar chart depicting the "Genomsnittlig_sjukdomsperiod" (Average illness duration) for different age groups. Custom labels and marker colors were applied for clarity, with the x-axis representing "Åldersgrupp" (Age group) and the y-axis indicating "Genomsnittlig Sjukdomsperiod i procent (%)" (Average illness duration in percentage).

To conclude, I saved the chart as an HTML file named "3A.Genomsnittlig Sjukdomsperiod i procent (%).html" in the "Visualiseringar" directory. The main objective of the code was to transform, analyze, and visually present data regarding the average illness duration as a percentage across various age groups through a bar chart.

Data analysis: Loading data file and starting to process the data  

In [3]:
# Format the "Age group" column by removing prefixes and replacing "Uppgift saknas" with "Okänd åldersgrupp"
df["Åldersgrupp"] = df["Åldersgrupp"].apply(lambda x: x.replace("Ålder_", "").replace("_plus", "+").replace("_", "-"))
df["Åldersgrupp"] = df["Åldersgrupp"].replace("Uppgift saknas", "Okänd åldersgrupp")

# Displaying rows of the DataFrame
df.head(11)

Unnamed: 0,Åldersgrupp,Totalt_antal_fall,Totalt_antal_intensivvårdade,Totalt_antal_avlidna
0,0-9,138071,109,17
1,10-19,355823,101,9
2,20-29,418506,285,41
3,30-39,493443,492,71
4,40-49,474702,997,172
5,50-59,378468,1932,523
6,60-69,180079,2595,1422
7,70-79,87096,2394,4654
8,80-89,58170,612,8326
9,90+,26677,21,5420


In [4]:
len(df)


11

In [5]:
df["Åldersgrupp"].unique()


array(['0-9', '10-19', '20-29', '30-39', '40-49', '50-59', '60-69',
       '70-79', '80-89', '90+', 'Okänd åldersgrupp'], dtype=object)

In [6]:
# Create a new DataFrame "ny_df" by selecting the "Åldersgrupp" and "Totalt_antal_fall" columns
ny_df = df[["Åldersgrupp", "Totalt_antal_fall"]]

# Display rows
ny_df.head(11)

Unnamed: 0,Åldersgrupp,Totalt_antal_fall
0,0-9,138071
1,10-19,355823
2,20-29,418506
3,30-39,493443
4,40-49,474702
5,50-59,378468
6,60-69,180079
7,70-79,87096
8,80-89,58170
9,90+,26677


In [7]:
"""I calculate the percentage of intensive care cases relative to the 
total number of cases in a dataset, representing the average illness duration for 
intensive care patients. The total average illness duration is then printed."""

# Calculate the average illness duration 
df["Genomsnittlig_sjukdomsperiod"] = df["Totalt_antal_intensivvårdade"] / df["Totalt_antal_fall"] * 100 # Ratio of intensive care cases to total cases

# Print the total value of Genomsnittlig_sjukdomsperiod
total_genomsnittlig_sjukdomsperiod = df["Genomsnittlig_sjukdomsperiod"].sum()
print(f"The total average illness duration for intensive care patients in relation to the total\n number of cases is {total_genomsnittlig_sjukdomsperiod:.2f} %.")

df.head()

The total average illness duration for intensive care patients in relation to the total
 number of cases is 7.42 %.


Unnamed: 0,Åldersgrupp,Totalt_antal_fall,Totalt_antal_intensivvårdade,Totalt_antal_avlidna,Genomsnittlig_sjukdomsperiod
0,0-9,138071,109,17,0.078945
1,10-19,355823,101,9,0.028385
2,20-29,418506,285,41,0.068099
3,30-39,493443,492,71,0.099708
4,40-49,474702,997,172,0.210027


In [8]:
# Create a bar chart with Plotly Express to visualize average illness duration by age group
fig = px.bar(df, x="Åldersgrupp", y="Genomsnittlig_sjukdomsperiod", title="Genomsnittlig Sjukdomsperiod per åldersgrupp",
             labels={"Genomsnittlig_sjukdomsperiod": "Genomsnittlig Sjukdomsperiod i procent (%)", "Åldersgrupp": "Åldersgrupp"})

# Update trace colors to purple
fig.update_traces(marker=dict(color="purple"))

# Saving the chart as an HTML file
fig.write_html("Visualiseringar/3A.Genomsnittlig Sjukdomsperiod per åldersgrupp.html")

# Displaying the plot
fig.show()


**Conclusion** 

The average time it takes for intensive care patients to get better, compared to the total number of cases, is 7.42%. This shows how long it takes for intensive care patients to recover in relation to all confirmed cases. Differences in how long people are sick and how many die between age groups can be because of different things, like how long the illness lasts, other health problems etc.

**- B) KPI: Evaluates the risk of deaths in different age groups**

I start by displaying the initial rows of the DataFrame "df" using df.head(). Following this, I calculate and present the mortality rate for each age group, computed by dividing the count of deceased individuals by the total number of cases, and then multiplying by 100.

The overall mortality rate across all age groups is determined and showcased as a percentage. A new DataFrame is created, featuring "Age group" and "Mortality rate" columns to display the mortality rate for each specific age group.

Using Plotly Express, I generate a visually informative bar plot illustrating the mortality rates categorized by age groups, enriched with appropriate labels and color schemes.


In [9]:
df.head()


Unnamed: 0,Åldersgrupp,Totalt_antal_fall,Totalt_antal_intensivvårdade,Totalt_antal_avlidna,Genomsnittlig_sjukdomsperiod
0,0-9,138071,109,17,0.078945
1,10-19,355823,101,9,0.028385
2,20-29,418506,285,41,0.068099
3,30-39,493443,492,71,0.099708
4,40-49,474702,997,172,0.210027


In [10]:
# Calculate the mortality rate by dividing the number of deaths by the total number of cases and multiplying by 100
df["Dödlighetsfrekvens"] = (df["Totalt_antal_avlidna"] / df["Totalt_antal_fall"]) * 100

# Calculate the total mortality rate across all age groups and print it
total_dödlighetsfrekvensen = df["Dödlighetsfrekvens"].sum()
print(f"The total mortality rate among different age groups is {total_dödlighetsfrekvensen:.2f}%.")

The total mortality rate among different age groups is 41.71%.


In [11]:
# Print the DataFrame with columns "Åldersgrupp" and "Dödlighetsfrekvens"
print(df[["Åldersgrupp", "Dödlighetsfrekvens"]])

          Åldersgrupp  Dödlighetsfrekvens
0                 0-9            0.012313
1               10-19            0.002529
2               20-29            0.009797
3               30-39            0.014389
4               40-49            0.036233
5               50-59            0.138189
6               60-69            0.789653
7               70-79            5.343529
8               80-89           14.313220
9                 90+           20.317127
10  Okänd åldersgrupp            0.733945


In [12]:
# Create a bar chart with Plotly Express to visualize mortality rate by age group
fig = px.bar(df, x="Åldersgrupp", y="Dödlighetsfrekvens", title="Dödlighetsfrekvens per åldersgrupp",
             labels={"Dödlighetsfrekvens": "Dödlighetsfrekvens (%)", "Åldersgrupp": "Åldersgrupp"})

# Update the trace with a purple marker color
fig.update_traces(marker=dict(color="purple"))

# Saving the chart as an HTML file
fig.write_html("Visualiseringar/3B.Dödlighetsfrekvens per åldersgrupp.html")

# Display the plot
fig.show()


**Conclusion** 

The data analysis shows clear age-related differences in the COVID-19 outbreak. Older age groups, especially those between 70-89 years, have more cases, intensive care needs, and deaths, indicating increased vulnerability. The average illness duration for intensive care patients increases with age. Mortality rate is highest in the 90+ age group, reflecting an elevated risk of death among the elderly. The 70-79 age group stands out with unusually high mortality and intensive care frequency, suggesting specific vulnerability or other factors. The "Unknown age group" has a low impact on total cases and deaths. 

**- C) KPI: How does overall mortality relate to the average disease duration for intensive care patients, and are there any noticeable patterns or differences among different age groups?**

In [13]:
## Create a scatter plot for "Dödlighet vs. Sjukdomsperiod för intensivvårdspatienter"
fig = px.scatter(df, x="Åldersgrupp", y="Totalt_antal_avlidna", color="Genomsnittlig_sjukdomsperiod",
                 size="Totalt_antal_avlidna", hover_data=["Åldersgrupp", "Totalt_antal_avlidna", "Genomsnittlig_sjukdomsperiod"],
                 title="Dödlighet vs. Sjukdomsperiod för intensivvårdspatienter",
                 labels={"Totalt_antal_avlidna": "Totalt Antal Avlidna", "Genomsnittlig_sjukdomsperiod": "Genomsnittlig sjukdomsperiod"},
                 template="plotly_dark")

fig.write_html("Visualiseringar/3C.Dödlighet vs. Sjukdomsperiod för intensivvårdspatienter.html")
fig.show()

**Conclusion**

The data analysis shows clear age-related differences in the COVID-19 outbreak. Older age groups, especially those between 70-89 years, have more cases, intensive care needs, and deaths, indicating increased vulnerability. The average illness duration for intensive care patients increases with age. Mortality rate is highest in the 90+ age group, reflecting an elevated risk of death among the elderly. The 70-79 age group stands out with unusually high mortality and intensive care frequency, suggesting specific vulnerability or other factors. The "Unknown age group" has a low impact on total cases and deaths.