In [36]:
import pandas as pd
import plotly_express as px
import numpy as np

In [2]:
vaccindata = pd.read_excel("./Data/Covid19_Vaccine.xlsx", sheet_name="Vaccinerade kommun och ålder")
vaccindata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2900 entries, 0 to 2899
Data columns (total 14 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Län                  2900 non-null   int64  
 1   Län_namn             2900 non-null   object 
 2   Kommun               2900 non-null   int64  
 3   Kommun_namn          2900 non-null   object 
 4   Ålder                2900 non-null   object 
 5   Befolkning           2900 non-null   int64  
 6   Antal minst 1 dos    2900 non-null   int64  
 7   Antal minst 2 doser  2900 non-null   int64  
 8   Antal 3 doser        2320 non-null   float64
 9   Antal 4 doser        870 non-null    float64
 10  Andel minst 1 dos    2900 non-null   float64
 11  Andel minst 2 doser  2900 non-null   float64
 12  Andel 3 doser        2320 non-null   float64
 13  Andel 4 doser        870 non-null    float64
dtypes: float64(6), int64(5), object(3)
memory usage: 317.3+ KB


In [3]:
vaccindata.sample(3)

Unnamed: 0,Län,Län_namn,Kommun,Kommun_namn,Ålder,Befolkning,Antal minst 1 dos,Antal minst 2 doser,Antal 3 doser,Antal 4 doser,Andel minst 1 dos,Andel minst 2 doser,Andel 3 doser,Andel 4 doser
178,1,Stockholms län,181,Södertälje,80-89,3932,3720,3705,3511.0,2879.0,0.946083,0.942269,0.89293,0.732197
292,3,Uppsala län,331,Heby,18-29,1739,1402,1331,634.0,,0.80621,0.765382,0.364577,
2662,24,Västerbottens län,2418,Malå,18-29,386,313,293,114.0,,0.810881,0.759067,0.295337,


Finding out the number of counties in the dataset:

In [4]:
vaccindata["Län"].nunique()

21

Double-checking the information

In [5]:
vaccindata["Län_namn"].nunique()

21

Or we can create a function which double-checks and prints the answer, if information seems to be correct.

In [6]:
from Functions import *

In [7]:
count_and_check("Number of counties in the dataset: ", vaccindata, "Län", "Län_namn")

Number of counties in the dataset:  21


As we can see, there is 21 county in the dataset, which corresponds to the total number of counties in Sweden.

In [8]:
count_and_check("Number of municipalities in the dataset: ", vaccindata, "Kommun", "Kommun_namn")

Number of municipalities in the dataset:  290


As we can see, there are 290 municipalities in the dataset, which means, all the Swedish minicipalities.

In [9]:
dataset_population = vaccindata["Befolkning"].sum()
dataset_population

9092790

The population represented in the dataset is 9 092 790 people.

In [10]:
age_groups = vaccindata["Ålder"].unique()
age_groups

array(['12-15', '16-17', '18-29', '30-39', '40-49', '50-59', '60-69',
       '70-79', '80-89', '90 eller äldre'], dtype=object)

As we can see, there is no data about how many children of age 0-12 there are in Sweden.
So we can not calculate the number of children under 18 from the dataset directly.
We are going to calculate this, using the following steps:
1. Find the data about total population in Sweden for year 2022.
2. Find out the number of adults 18+ in the dataset.
3. The difference between these two numbers is the number of children under 18 in Sweden, based on the dataset.

According to [this source](https://www.macrotrends.net/countries/SWE/sweden/population), the population of Sweden in 2022 was 10,549,347 people.

In [11]:
total_population = 10549347

In [12]:
dataset_adults = vaccindata[~vaccindata["Ålder"].isin(['12-15', '16-17'])]["Befolkning"].sum()
dataset_adults

8347420

In [13]:
total_children = total_population - dataset_adults
print("The number of children under the age of 18 in Sweden according to the dataset is ", total_children)

The number of children under the age of 18 in Sweden according to the dataset is  2201927


As we can see, the number of children under the age of 18 in Sweden according to the dataset is  2 201 927.

In [14]:
children_0_11 = total_population - dataset_population
children_0_11

1456557

Of them children 0-11 (not represented in the dataset): 1 456 557

In [47]:
ages_data = vaccindata.groupby("Ålder")["Befolkning"].sum()
ages_data = pd.concat([ages_data, pd.Series({'0-11': children_0_11})])
ages_data

12-15              503831
16-17              241539
18-29             1475950
30-39             1467590
40-49             1298156
50-59             1339798
60-69             1121922
70-79             1033113
80-89              496750
90 eller äldre     114141
0-11              1456557
dtype: int64

In [59]:
ages = vaccindata['Ålder'].unique()
ages = np.append(ages, '0-11')
ages

array(['12-15', '16-17', '18-29', '30-39', '40-49', '50-59', '60-69',
       '70-79', '80-89', '90 eller äldre', '0-11'], dtype=object)

We can draw a diagram about Swedish population right now, but I want to arrange the data into data frame in case I want some further data manipulation.

In [62]:
age_dic = {"age_group": ages,
           "population": ages_data}

age_frame = pd.DataFrame(age_dic)
age_frame

Unnamed: 0,age_group,population
12-15,12-15,503831
16-17,16-17,241539
18-29,18-29,1475950
30-39,30-39,1467590
40-49,40-49,1298156
50-59,50-59,1339798
60-69,60-69,1121922
70-79,70-79,1033113
80-89,80-89,496750
90 eller äldre,90 eller äldre,114141


In [63]:
px.pie(age_frame, values = 'population', names = 'age_group', title = "Swedish population by ages")