# Covid-19 Statistics Using Simulated Data

***

<br>

## Objectives of this Project

***

The objective of this project is to synthesise and simulate some data about Covid-19 cases using the NumPy Random package and basing this off real data from sources such as the Central Statistics Office (CSO) and the HSE. The sample size will be 100 and the variables that I will analyse will be:

- Age
- Gender
- Underlying Conditions
- Vaccination Status
- Hospitalisation
- Admission to ICU
- Death/Recovery

To breakdown this project and to start it, I am going to focus on the Age variable first. I will get some statistics from the web and get the age data simulated and put this into a Pandas DataFrame using NumPy.<br>
From the CDO website, COVID-19 Deaths and Cases, Series 36: [04]

![Cases By Age](img/2021_12_01_18_27_25_covid_19_deaths_and_cases.png)

In [1]:
# Imports

# Numerical arrays
import numpy as np

# Pandas DataFrame
import pandas as pd

## Age Groups

From the above data on the CSO website, I have created some variables and a pandas dataframe. I used NumPy Random Choice to with the various probabilities to simulate the data.

In [2]:
# Age groups as a 1d NumPy array

age_groups = (['0-14', '15-24', '25-44', '45-64', '65-79', '80+'])

In [3]:
age_cases = ([80308, 96151, 160951, 110574, 33681, 16071])
total_age_cases = 497736

In [4]:
# Function to calculate percentages [06]

def get_percent(numer, integer = False):
   percent = numer / total_age_cases * 100
   
   if integer:
       return int(percent)
   return percent

In [5]:
# Calculate the percentages of cases for each age group [01]

for i, j in zip(age_cases, age_groups):
    percentage = get_percent(i)
    print(f"The percentage of age group {j} is {percentage}%")

The percentage of age group 0-14 is 16.134657730202356%
The percentage of age group 15-24 is 19.317670411623833%
The percentage of age group 25-44 is 32.33662021633958%
The percentage of age group 45-64 is 22.21539129176913%
The percentage of age group 65-79 is 6.766840252664062%
The percentage of age group 80+ is 3.228820097401032%


In [6]:
# Set the Random Number Generator with a seed value [07]

rng = np.random.default_rng(121) 

In [7]:
# Generate a random sample from the age_cases array [08]

patient_ages = rng.choice(age_groups, size = 100, p = [0.17, 0.19, 0.32, 0.22, 0.07, 0.03])
patient_ages

array(['25-44', '15-24', '45-64', '25-44', '25-44', '45-64', '25-44',
       '25-44', '15-24', '0-14', '45-64', '25-44', '65-79', '25-44',
       '45-64', '25-44', '45-64', '15-24', '25-44', '0-14', '45-64',
       '15-24', '45-64', '25-44', '25-44', '25-44', '25-44', '25-44',
       '0-14', '15-24', '25-44', '15-24', '25-44', '0-14', '45-64',
       '15-24', '65-79', '15-24', '15-24', '25-44', '45-64', '25-44',
       '45-64', '80+', '45-64', '0-14', '45-64', '0-14', '80+', '25-44',
       '15-24', '15-24', '15-24', '45-64', '0-14', '15-24', '15-24',
       '25-44', '45-64', '0-14', '0-14', '45-64', '15-24', '25-44',
       '0-14', '15-24', '25-44', '25-44', '45-64', '0-14', '15-24',
       '45-64', '25-44', '15-24', '15-24', '80+', '25-44', '45-64',
       '0-14', '0-14', '15-24', '80+', '0-14', '25-44', '15-24', '0-14',
       '0-14', '15-24', '15-24', '15-24', '45-64', '45-64', '45-64',
       '0-14', '15-24', '25-44', '65-79', '65-79', '65-79', '65-79'],
      dtype='<U5')

In [8]:
# Put the patient_ages array into a Pandas DataFrame [09]

dfcovid = pd.DataFrame(data = patient_ages, columns = ['Age Group'])
dfcovid

Unnamed: 0,Age Group
0,25-44
1,15-24
2,45-64
3,25-44
4,25-44
...,...
95,25-44
96,65-79
97,65-79
98,65-79


In [9]:
# Check the amount of 25-44 year olds in the dataset

len(dfcovid.loc[dfcovid.loc[:,'Age Group'] == '25-44'])

27

In [10]:
# Convert the age groups to actual ages [10]

def actual_ages (x):
   if x['Age Group'] == '0-14' :
      return rng.integers (0, 15, 1)
   elif x['Age Group'] == '15-24' :
      return rng.integers (15, 25, 1)
   elif x['Age Group'] == '25-44' :
      return rng.integers (25, 45, 1)
   elif x['Age Group'] == '45-64' :
      return rng.integers (45, 65, 1)
   elif x['Age Group'] == '65-79' :
      return rng.integers (65, 80, 1)
   else:
    return rng.integers (80, 100, 1)

In [11]:
# Use pd.DataFrame.apply to add the the actual ages to the dfcovid dataframe
# [10], [11], [12], [13]

dfcovid['Age'] = dfcovid.apply (lambda x: actual_ages(x), axis = 1)
dfcovid['Age'] = dfcovid['Age'].astype(int)
dfcovid

Unnamed: 0,Age Group,Age
0,25-44,41
1,15-24,19
2,45-64,50
3,25-44,33
4,25-44,39
...,...,...
95,25-44,44
96,65-79,65
97,65-79,71
98,65-79,74


## Genders

According to the CSO site, around 49% of patients with Covid-19 are female [04]. Below I have used the NumPy Random package to simulate this date and added a column called gender to the dataframe.

In [12]:
# Ratio of Male to Female

males = 254330
females = 243277
total = males = females + males

ratio_females = females / total
ratio_females

0.48889384594670093

In [13]:
# Use rng.binomial to choose male or female [14]

patient_genders_binom = rng.binomial(1, ratio_females, 100)
patient_genders_binom

array([0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1,
       1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0], dtype=int64)

In [14]:
# Declare a the new array patient_genders_binom as a column [15]

dfcovid['Gender'] = patient_genders_binom
dfcovid

Unnamed: 0,Age Group,Age,Gender
0,25-44,41,0
1,15-24,19,0
2,45-64,50,1
3,25-44,33,1
4,25-44,39,0
...,...,...,...
95,25-44,44,1
96,65-79,65,1
97,65-79,71,0
98,65-79,74,0


In [15]:
# Function to turn the 1's and 0's into Genders [10]

def get_genders(x):
    if x['Gender'] == 0:
        return 'Male'
    else:
        return 'Female'

In [16]:
# Use pd.DataFrame.apply to add the the genders to the dfcovid dataframe
# [10], [11], [12], [13]

dfcovid['Gender'] = dfcovid.apply (lambda x: get_genders(x), axis = 1)
dfcovid.sort_values(by = ['Age'], inplace = True)
dfcovid

Unnamed: 0,Age Group,Age,Gender
60,0-14,0,Female
93,0-14,0,Male
78,0-14,1,Male
54,0-14,1,Female
33,0-14,2,Male
...,...,...,...
36,65-79,76,Female
81,80+,80,Male
48,80+,82,Male
43,80+,85,Female


In [17]:
# Check the amount of Males in the dataset

len(dfcovid.loc[dfcovid.loc[:,'Gender'] == 'Male'])

53

Here I have created a pandas dataframe that has an age group for each patient, their actual age and their gender according the the CSO statistics. The next column to create will be whether or not the patients have any underlying conditions.

## Underlying Health Conditions

## References

[01][DelftStack - Loop Through Multiple Lists in Python](https://www.delftstack.com/howto/python/how-to-loop-through-multiple-lists-in-python/)<br>
[02][COVID-19 Vaccination Uptake in IrelandWeekly Report ](chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/viewer.html?pdfurl=https%3A%2F%2Fwww.hpsc.ie%2Fa-z%2Frespiratory%2Fcoronavirus%2Fnovelcoronavirus%2Fvaccination%2Fcovid-19vaccinationuptakereports%2FCOVID-19%2520Vaccination%2520Uptake%2520in%2520Ireland%2520Weekly%2520Report%2520Week%252046%25202021.pdf&clen=484980&chunk=true)<br>
[03][geohive - ICU, Acute Hospital & Testing Data](https://covid19ireland-geohive.hub.arcgis.com/pages/hospitals-icu--testing)<br>
[04][CSO - COVID-19 Deaths and Cases, Series 36](https://www.cso.ie/en/releasesandpublications/br/b-cdc/covid-19deathsandcasesseries36/)<br>
[05][CSO - COVID-19 Vaccination Statistics Series 1](https://www.cso.ie/en/releasesandpublications/br/b-cvac/covid-19vaccinationstatisticsseries1/)<br>
[06][skillsugar.com - How to Calculate a Percentage in Python](https://www.skillsugar.com/how-to-calculate-a-percentage-in-python)<br>
[07][numpy.org - Random Generator](https://numpy.org/doc/stable/reference/random/generator.html)<br>
[08][numpy.org - Choice](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.choice.html#numpy.random.Generator.choice)<br>
[09][pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)<br>
[10][pandas.DataFrame.apply](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html)<br>
[11][numpy.org - Integers](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.integers.html#numpy.random.Generator.integers)<br>
[12][stackoverflow - Adding a column in pandas df using a function](https://stackoverflow.com/questions/40045632/adding-a-column-in-pandas-df-using-a-function)<br>
[13][statology.org - How to Convert Pandas DataFrame Columns to int](https://www.statology.org/pandas-convert-column-to-int/)<br>
[14][numpy.org - Binomial](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.binomial.html#numpy.random.Generator.binomial)<br>
[15][re-thought.com - How to add new columns to Pandas dataframe?](https://re-thought.com/how-to-add-new-columns-in-a-dataframe-in-pandas/)<br>
[][]()<br>
[][]()<br>
[][]()<br>
[][]()<br>
[][]()<br>
[][]()<br>
[][]()<br>
[][]()<br>
[][]()<br>


## END