**Using EpidemicKabu to estimate the size of the epidemic waves**
In this notebook we create a database with date of report and an indicator of incident cases by date. Then, we use the library to estimate waves using the indicator and the dates as the main input. Finally, we create a database to estimate the waves size using the output database of the library.

***1. Building the database with the indicator:*** The indicator is estimated dividing each daily case between the total population by year by country and multiplying by 100.

In [4]:
import pandas as pd
import numpy as np

In [5]:
#The database with the daily cases by country
database = pd.read_csv("/Users/linaruiz/Documents/EpidemicKabu_project/EpidemicKabu/exampleUseLibrary/data/uncoverCountries.csv")
database = database[["Date_reported","Country_code","Country","New_cases"]] 
database.head()

Unnamed: 0,Date_reported,Country_code,Country,New_cases
0,2020-01-03,BE,Belgium,0
1,2020-01-04,BE,Belgium,0
2,2020-01-05,BE,Belgium,0
3,2020-01-06,BE,Belgium,0
4,2020-01-07,BE,Belgium,0


In [6]:
#Building the year coulmn for each Date_reported
database["Year"] = database.apply(lambda x : x["Date_reported"][0:4], axis = 1)
database.head()

Unnamed: 0,Date_reported,Country_code,Country,New_cases,Year
0,2020-01-03,BE,Belgium,0,2020
1,2020-01-04,BE,Belgium,0,2020
2,2020-01-05,BE,Belgium,0,2020
3,2020-01-06,BE,Belgium,0,2020
4,2020-01-07,BE,Belgium,0,2020


In [7]:
# Uploading the database with the total population by year by country from: https://data.worldbank.org/indicator/SP.POP.TOTL
# MISSING THE 2022 POPULATION BY COUNTRY
databaseCP= pd.read_csv("/Users/linaruiz/Documents/EpidemicKabu_project/EpidemicKabu/exampleUseLibrary/data/countriesPopulation.csv").reset_index(drop=True)
databaseCP.head()

Unnamed: 0,Country Name,2020,2021,2022
0,Aruba,106585.0,106537.0,106537.0
1,Africa Eastern and Southern,685112705.0,702976832.0,702976832.0
2,Afghanistan,38972230.0,40099462.0,40099462.0
3,Africa Western and Central,466189102.0,478185907.0,478185907.0
4,Angola,33428486.0,34503774.0,34503774.0


In [16]:
#Reordering the total population by year by country 
databaseCP2 = databaseCP.melt(id_vars="Country Name", var_name="Year", value_name="Population")
databaseCP2.sort_values("Country Name", inplace=True)
databaseCP2.reset_index(drop=True, inplace=True)
databaseCP2.head(3)

Unnamed: 0,Country Name,Year,Population
0,Afghanistan,2020,38972230.0
1,Afghanistan,2021,40099462.0
2,Afghanistan,2022,40099462.0


In [9]:
# Homogenization of the countries names in such way the the database and databaseCP2 could be joined
databaseCP2 = databaseCP2.rename(columns = {"Country Name":"Country"})
np.setdiff1d(np.array(database["Country"].unique()),np.array(databaseCP2["Country"].unique()))

array(['Republic of Korea', 'The United Kingdom', 'Türkiye',
       'United States of America'], dtype=object)

In [None]:
def looking (pattern):
    return list(filter(lambda x : pattern in x, databaseCP2["Country"].unique()))

In [None]:
looking("orea")

["Korea, Dem. People's Rep.", 'Korea, Rep.']

In [None]:
looking("ingdom")

['United Kingdom']

In [None]:
looking("rkiye")

['Turkiye']

In [None]:
looking("merica")

['American Samoa',
 'Latin America & Caribbean',
 'Latin America & Caribbean (excluding high income)',
 'Latin America & the Caribbean (IDA & IBRD countries)',
 'North America']

In [None]:
looking("tates")

['Caribbean small states',
 'Other small states',
 'Pacific island small states',
 'Small states',
 'United States']

In [10]:
old_strings = ['Korea, Rep.', 'United Kingdom', 'Turkiye','United States']
new_strings = ['Republic of Korea', 'The United Kingdom', 'Türkiye',
       'United States of America']

In [11]:
databaseCP2["Country"] = databaseCP2["Country"].replace(old_strings,new_strings)

In [12]:
np.setdiff1d(np.array(database["Country"].unique()),np.array(databaseCP2["Country"].unique()))

array([], dtype=object)

In [15]:
#The complete database with the population by year by country by each date_reported and the indicator
complete = pd.merge(database,databaseCP2, on = ["Country","Year"])
complete.Date_reported = pd.to_datetime(complete.Date_reported,errors = "coerce")
complete["Indicator"] = (complete["New_cases"]/complete["Population"])*100
complete.to_csv("/Users/linaruiz/Documents/EpidemicKabu_project/EpidemicKabu/exampleUseLibrary/data/uncoverCountriesIndicator.csv")
complete.drop_duplicates(["Year","Population"])
complete.head(4)

Unnamed: 0,Date_reported,Country_code,Country,New_cases,Year,Population,Indicator
0,2020-01-03,BE,Belgium,0,2020,11538604.0,0.0
1,2020-01-04,BE,Belgium,0,2020,11538604.0,0.0
2,2020-01-05,BE,Belgium,0,2020,11538604.0,0.0
3,2020-01-06,BE,Belgium,0,2020,11538604.0,0.0


***2. Using EpidemicKabu library:*** Using the date and the indicator of incident cases to estiamte the waves

In [1]:
from kabu import curves
from kabuWaves import waves
from kabuPeaksValleys import peaksValleys

ModuleNotFoundError: No module named 'kabu'

In [85]:
dataframe= pd.read_csv("/Users/linaruiz/Documents/EpidemicKabu_project/EpidemicKabu/exampleUseLibrary/data/uncoverCountriesIndicator.csv")
datesName = "Date_reported"
casesName = "Indicator"
configFile= pd.read_csv("/Users/linaruiz/Documents/projectEpidemicCurve/kabu/Kabu/ConfigFile.csv")

In [87]:
def kabuWavesF (database,datesName,casesName,value,plotName,dfName):
    test = waves(database,datesName,casesName,[configFile,"Code",value,"kernel1"],plotName,dfName)
    test.run()

In [None]:
dataframe.groupby("Country").apply(lambda x : kabuWavesF(
    x[["Date_reported","Indicator"]],
    datesName,
    casesName,
    x["Country_code"].iloc[0],
    "Waves_"+ x["Country"].iloc[0]+" confi + indicator",
    "Waves_"+ x["Country"].iloc[0]+" confi + indicator"))

***3. Creating a database to estimate the size of the waves:***
1. sum: It counts the Indicator values since the start to the end of each wave
2. max: The maximum value inside the wave
3. max: The maximum value normalized by its maximum value

In [17]:
sizeWavesDF = dataframe.groupby("Country").apply(lambda x : pd.read_csv("/Users/linaruiz/Documents/projectEpidemicCurve/kabu/Kabu/dataframes/" + "Waves_"+ x["Country"].iloc[0]+" confi + indicator" + ".csv"))
sizeWavesDF

NameError: name 'dataframe' is not defined

In [68]:
dfs.index

MultiIndex([(                 'Belgium',   0),
            (                 'Belgium',   1),
            (                 'Belgium',   2),
            (                 'Belgium',   3),
            (                 'Belgium',   4),
            (                 'Belgium',   5),
            (                 'Belgium',   6),
            (                 'Belgium',   7),
            (                 'Belgium',   8),
            (                 'Belgium',   9),
            ...
            ('United States of America', 989),
            ('United States of America', 990),
            ('United States of America', 991),
            ('United States of America', 992),
            ('United States of America', 993),
            ('United States of America', 994),
            ('United States of America', 995),
            ('United States of America', 996),
            ('United States of America', 997),
            ('United States of America', 998)],
           names=['Country', None], length=

In [90]:
def group (name):
    d = pd.read_csv("/Users/linaruiz/Documents/projectEpidemicCurve/kabu/Kabu/dataframes/" + name + ".csv")
    d["Indicator"] = d["Indicator"]
    d["cunsum"] = (d['CutDays'] == 1).cumsum()
    n = d.groupby("cunsum")['Indicator'].agg(["max","sum"]) 
    n["max/Max"] = n["max"]/n["max"].max()
    return n

In [92]:
sizeWavesDF = dataframe.groupby("Country").apply(lambda x : group("Waves_"+ x["Country"].iloc[0]+" confi + indicator") )
sizeWavesDF.to_csv("/Users/linaruiz/Documents/projectEpidemicCurve/data/wavesSizes.csv")
sizeWavesDF

Unnamed: 0_level_0,Unnamed: 1_level_0,max,sum,max/Max
Country,cunsum,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Belgium,0,0.020245,0.529787,0.030868
Belgium,1,0.192320,5.148259,0.293231
Belgium,2,0.054214,3.639953,0.082660
Belgium,3,0.221039,7.773102,0.337020
Belgium,4,0.655864,14.027575,1.000000
...,...,...,...,...
Spain,3,0.023091,0.833464,0.062179
Spain,4,0.067292,2.772925,0.181198
Spain,5,0.371372,14.250602,1.000000
Spain,6,0.052748,1.528994,0.142036
