**Using EpidemicKabu to estimate the size of the epidemic waves**
In this notebook we create a database with date of report and an indicator of incident cases by date. Then, we use the library to estimate waves using the indicator and the dates as the main input. Finally, we create a database to estimate the waves size using the output database of the library.

***1. Building the database with the indicator:*** The indicator is estimated dividing each daily case between the total population by year by country and multiplying by 100.

In [1]:
import pandas as pd
import numpy as np

    The database with the daily cases by country

In [2]:
#The database with the daily cases by country
database = pd.read_csv("/Users/linaruiz/Documents/EpidemicKabu_project/EpidemicKabu/exampleUseLibrary/data/uncoverCountries.csv")
database = database[["Date_reported","Country_code","Country","New_cases"]] 
database.head()

Unnamed: 0,Date_reported,Country_code,Country,New_cases
0,2020-01-03,BE,Belgium,0
1,2020-01-04,BE,Belgium,0
2,2020-01-05,BE,Belgium,0
3,2020-01-06,BE,Belgium,0
4,2020-01-07,BE,Belgium,0


In [4]:
#Building the year coulmn for each Date_reported
database["Year"] = database.apply(lambda x : x["Date_reported"][0:4], axis = 1)
database.head()

Unnamed: 0,Date_reported,Country_code,Country,New_cases,Year
0,2020-01-03,BE,Belgium,0,2020
1,2020-01-04,BE,Belgium,0,2020
2,2020-01-05,BE,Belgium,0,2020
3,2020-01-06,BE,Belgium,0,2020
4,2020-01-07,BE,Belgium,0,2020


    Uploading the database with the total population by year by country from:
    https://ourworldindata.org/population-sources


In [5]:
databaseCPowid= pd.read_csv("/Users/linaruiz/Documents/EpidemicKabu_project/EpidemicKabu/exampleUseLibrary/data/populationOWID.csv").reset_index(drop=True)
databaseCPowid.head()

Unnamed: 0,Entity,Code,Year,Population (future projections),Population (historical estimates)
0,Afghanistan,AFG,2022,41128772.0,
1,Afghanistan,AFG,2023,42239856.0,
2,Afghanistan,AFG,2024,43372952.0,
3,Afghanistan,AFG,2025,44515788.0,
4,Afghanistan,AFG,2026,45667552.0,


In [6]:
databaseCPowid["Population"]=databaseCPowid['Population (future projections)'].combine_first(databaseCPowid['Population (historical estimates)'])
databaseCPowid = databaseCPowid.rename(columns = {"Entity":"Country"})
databaseCPowid = databaseCPowid[["Country","Year","Population"]]
databaseCPowid.head()

Unnamed: 0,Country,Year,Population
0,Afghanistan,2022,41128772.0
1,Afghanistan,2023,42239856.0
2,Afghanistan,2024,43372952.0
3,Afghanistan,2025,44515788.0
4,Afghanistan,2026,45667552.0


In [7]:
databaseCPowid=databaseCPowid[databaseCPowid["Year"].between(2020,2022,inclusive=True)]

  databaseCPowid=databaseCPowid[databaseCPowid["Year"].between(2020,2022,inclusive=True)]


In [8]:
databaseCPowid["Year"]=databaseCPowid["Year"].astype(str)

In [9]:
databaseCPowid.shape

(762, 3)

In [10]:
# Homogenization of the countries names in such way the the database and databaseCPowid could be joined
np.setdiff1d(np.array(database["Country"].unique()),np.array(databaseCPowid["Country"].unique()))

array(['Republic of Korea', 'The United Kingdom', 'Türkiye',
       'United States of America'], dtype=object)

In [11]:
def looking (pattern):
    return list(filter(lambda x : pattern in x, databaseCPowid["Country"].unique()))

In [12]:
looking("orea")

['North Korea', 'South Korea']

In [13]:
looking("ingdom")

['United Kingdom']

In [14]:
looking("rkey")

['Turkey']

In [15]:
looking("merica")

['American Samoa',
 'Latin America and the Caribbean (UN)',
 'North America',
 'Northern America (UN)',
 'South America']

In [16]:
looking("tates")

['United States', 'United States Virgin Islands']

In [17]:
old_strings = ['South Korea', 'United Kingdom', 'Turkey','United States']
new_strings = ['Republic of Korea', 'The United Kingdom', 'Türkiye',
       'United States of America']

In [18]:
databaseCPowid["Country"] = databaseCPowid["Country"].replace(old_strings,new_strings)
databaseCPowid.shape

(762, 3)

In [19]:
np.setdiff1d(np.array(database["Country"].unique()),np.array(databaseCPowid["Country"].unique()))

array([], dtype=object)

    The complete database

In [20]:
#The complete database with the population by year by country by each date_reported and the indicator
complete = pd.merge(database,databaseCPowid, on = ["Country","Year"])
complete.head(4)

Unnamed: 0,Date_reported,Country_code,Country,New_cases,Year,Population
0,2020-01-03,BE,Belgium,0,2020,11561716.0
1,2020-01-04,BE,Belgium,0,2020,11561716.0
2,2020-01-05,BE,Belgium,0,2020,11561716.0
3,2020-01-06,BE,Belgium,0,2020,11561716.0


In [21]:
complete.shape

(14985, 6)

In [22]:
complete.Date_reported = pd.to_datetime(complete.Date_reported,errors = "coerce")
complete["Indicator"] = (complete["New_cases"]/complete["Population"])*100
complete.head(4)


Unnamed: 0,Date_reported,Country_code,Country,New_cases,Year,Population,Indicator
0,2020-01-03,BE,Belgium,0,2020,11561716.0,0.0
1,2020-01-04,BE,Belgium,0,2020,11561716.0,0.0
2,2020-01-05,BE,Belgium,0,2020,11561716.0,0.0
3,2020-01-06,BE,Belgium,0,2020,11561716.0,0.0


In [23]:
complete.to_csv("/Users/linaruiz/Documents/EpidemicKabu_project/EpidemicKabu/exampleUseLibrary/data/uncoverCountriesIndicator.csv")
complete.drop_duplicates(["Year","Population"])
complete.head(4)

Unnamed: 0,Date_reported,Country_code,Country,New_cases,Year,Population,Indicator
0,2020-01-03,BE,Belgium,0,2020,11561716.0,0.0
1,2020-01-04,BE,Belgium,0,2020,11561716.0,0.0
2,2020-01-05,BE,Belgium,0,2020,11561716.0,0.0
3,2020-01-06,BE,Belgium,0,2020,11561716.0,0.0


***2. Using EpidemicKabu library:*** Using the date and the indicator of incident cases to estiamte the waves

In [3]:
import epidemickabu as ek

In [4]:
dataframe= pd.read_csv("/Users/linaruiz/Documents/EpidemicKabu_project/EpidemicKabuLibrary/examples/data/uncoverCountriesIndicator.csv")
datesName = "Date_reported"
casesName = "Indicator"
configFile= pd.read_csv("/Users/linaruiz/Documents/EpidemicKabu_project/EpidemicKabuLibrary/examples/data/configurationFile.csv")

In [5]:
dataframe

Unnamed: 0.1,Unnamed: 0,Date_reported,Country_code,Country,New_cases,Year,Population,Indicator
0,0,2020-01-03,BE,Belgium,0,2020,11561716.0,0.000000
1,1,2020-01-04,BE,Belgium,0,2020,11561716.0,0.000000
2,2,2020-01-05,BE,Belgium,0,2020,11561716.0,0.000000
3,3,2020-01-06,BE,Belgium,0,2020,11561716.0,0.000000
4,4,2020-01-07,BE,Belgium,0,2020,11561716.0,0.000000
...,...,...,...,...,...,...,...,...
14980,14980,2022-09-23,US,United States of America,82395,2022,338289856.0,0.024356
14981,14981,2022-09-24,US,United States of America,94613,2022,338289856.0,0.027968
14982,14982,2022-09-25,US,United States of America,53399,2022,338289856.0,0.015785
14983,14983,2022-09-26,US,United States of America,8849,2022,338289856.0,0.002616


In [6]:
len(dataframe["Date_reported"])

14985

In [28]:
# The dataframe with the columns dates and cases by date
database = pd.read_csv("/Users/linaruiz/Documents/EpidemicKabu_project/EpidemicKabuLibrary/examples/data/uncoverCountries.csv")
database = database[["Date_reported","Country_code","Country","New_cases"]]
databaseCOLOMBIA=database[database["Country_code"]=="GB"]
datesName = "Date_reported"
casesName = "Indicator"
databaseCOLOMBIA.head(3)

# The next dataframe has the kernel values for the countries
configFile= pd.read_csv("/Users/linaruiz/Documents/EpidemicKabu_project/EpidemicKabuLibrary/examples/data/configurationFile.csv")
configFile.head(3)

# the names of the output files
plotNameW = "Epidemic_curve_UK_W_exploringL"
dfNameW = "Epidemic_curve_UK_W_exploringL"
plotNamePV = "Epidemic_curve_UK_PV_exploringL"
dfNamePV = "Epidemic_curve_UK_PV_exploringL"

#Be sure to create the "./plots/" and "./dataframes" folder in the same folder in which you
#are running the code, or define the variables to set an specific directory
outFolderPlot= "/Users/linaruiz/Documents/EpidemicKabu_project/EpidemicKabuLibrary/examples/plots/"
outFolderDF= "/Users/linaruiz/Documents/EpidemicKabu_project/EpidemicKabuLibrary/examples/dataframes/"

# The threshols is optional by default is zero. 
thresholdW = 0
thresholdPV = 0

In [29]:
def kabuWavesF (database,datesName,casesName,value,plotName,dfName):
    test = ek.waves(database,datesName,casesName,[configFile,"Code",value,"kernel1"],[configFile,"Code",value,"Kernel2"],plotName,dfName)
    test.run()

In [34]:
dataframe.groupby("Country").apply(lambda x : kabuWavesF(
    x[["Date_reported","Indicator"]],
    datesName,
    casesName,
    x["Country_code"].iloc[0],
    "Waves_"+ x["Country"].iloc[0]+" indicator",
    "Waves_"+ x["Country"].iloc[0]+" indicator"))

***3. Creating a database to estimate the size of the waves:***
1. sum: It counts the Indicator values since the start to the end of each wave
2. max: The maximum value inside the wave
3. long: The number of dates that a wave spans
NOTE: All these are estimated using the Smothed curve of the indicator not the raw data 

In [7]:
dataframe

Unnamed: 0.1,Unnamed: 0,Date_reported,Country_code,Country,New_cases,Year,Population,Indicator
0,0,2020-01-03,BE,Belgium,0,2020,11561716.0,0.000000
1,1,2020-01-04,BE,Belgium,0,2020,11561716.0,0.000000
2,2,2020-01-05,BE,Belgium,0,2020,11561716.0,0.000000
3,3,2020-01-06,BE,Belgium,0,2020,11561716.0,0.000000
4,4,2020-01-07,BE,Belgium,0,2020,11561716.0,0.000000
...,...,...,...,...,...,...,...,...
14980,14980,2022-09-23,US,United States of America,82395,2022,338289856.0,0.024356
14981,14981,2022-09-24,US,United States of America,94613,2022,338289856.0,0.027968
14982,14982,2022-09-25,US,United States of America,53399,2022,338289856.0,0.015785
14983,14983,2022-09-26,US,United States of America,8849,2022,338289856.0,0.002616


In [40]:
sizeWavesDF = dataframe.groupby("Country").apply(lambda x : pd.read_csv("/Users/linaruiz/Documents/EpidemicKabu_project/EpidemicKabuLibrary/examples/dataframes/" + "Waves_"+ x["Country"].iloc[0]+" indicator" + ".csv"))
sizeWavesDF

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 0,Date_reported,Indicator,SmoothedCases,cutDatesW
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Belgium,0,0,2020-01-03,0.000000,3.967039e-09,0
Belgium,1,1,2020-01-04,0.000000,5.050803e-09,0
Belgium,2,2,2020-01-05,0.000000,6.405380e-09,0
Belgium,3,3,2020-01-06,0.000000,8.089473e-09,0
Belgium,4,4,2020-01-07,0.000000,1.017147e-08,0
...,...,...,...,...,...,...
United States of America,994,14980,2022-09-23,0.024356,1.628586e-02,0
United States of America,995,14981,2022-09-24,0.027968,1.608026e-02,0
United States of America,996,14982,2022-09-25,0.015785,1.588202e-02,0
United States of America,997,14983,2022-09-26,0.002616,1.569049e-02,0


In [37]:
len(dataframe)

14985

In [8]:
def group (name):
    d = pd.read_csv("/Users/linaruiz/Documents/EpidemicKabu_project/EpidemicKabuLibrary/examples/dataframes/" + name + ".csv")
    d["Indicator"] = d["SmoothedCases"]
    d["waveNum"] = (d['cutDatesW'] == 1).cumsum()
    n = d.groupby("waveNum")['Indicator'].agg(["max","sum"]) 
    n["spanDays"] = d.groupby("waveNum")['Indicator'].agg("count")
    n["ratioSumSpan"] = n["sum"]/n["spanDays"]
    return n

In [9]:
sizeWavesDF = dataframe.groupby("Country").apply(lambda x : group("Waves_"+ x["Country"].iloc[0]+" indicator") )
sizeWavesDF.to_csv("/Users/linaruiz/Documents/EpidemicKabu_project/EpidemicKabuLibrary/examples/data/wavesSizes.csv")
sizeWavesDF

Unnamed: 0_level_0,Unnamed: 1_level_0,max,sum,spanDays,ratioSumSpan
Country,waveNum,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Belgium,0,0.010796,0.527542,170,0.003103
Belgium,1,0.099744,5.151652,199,0.025888
Belgium,2,0.035368,3.623331,169,0.021440
Belgium,3,0.132126,7.411447,173,0.042841
Belgium,4,0.316606,14.644044,87,0.168322
...,...,...,...,...,...
United States of America,2,0.063211,6.827758,189,0.036126
United States of America,3,0.018725,1.091631,86,0.012693
United States of America,4,0.045002,3.691912,139,0.026561
United States of America,5,0.192028,10.004389,150,0.066696
