# **Linear Combination: The case of Brazil**

>Now that we built an entire database composed of each Brazilian State's simulation, we can consider the whole Brazil as a Linear Combination of each 26 simulation. \\
The idea is to save each simulation data in a table or matrix containing all of them, then concatenate to get Brazil's estimations. 

## Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.integrate import odeint
from scipy import optimize
import plotly.graph_objects as go
import yaml
from datetime import datetime
from sklearn import metrics as mt

## SIRD model

In [2]:
# The SIRD model differential equations.
def SIRD(y, t, N, Beta, Gamma, Mu):
    S, I, R, D = y
    dSdt = -(Beta * I * S)/N
    dIdt = (Beta * I * S)/N  - Gamma * I - Mu * I
    dRdt = Gamma * I
    dDdt = Mu * I
    return dSdt, dIdt, dRdt, dDdt

In [3]:
#Saving the simulation results

def SIRDsim(y0, t, N, theta):
  
  #Transmission rate
  Beta = theta[0]
  #Recovery rate per day
  Gamma = theta[1]
  #Mortality rate
  Mu = theta[2]

  # Integrate the SIRD equations over the time grid t.
  result = odeint(SIRD, y0, t, args=(N, Beta, Gamma, Mu))
  S, I, R, D = result.T
  return S, I, R, D

In [4]:
#Least Squares Method

def QuadraticError(theta0, Sd, Id, Rd, Dd, y0, t, N):
    """ function to pass to optimize.leastsq
        The routine will square and sum the values returned by 
        this function""" 
    [S,I,R,D] = SIRDsim(y0, t, N, theta0)
    errorS = S - Sd
    errorI = I - Id
    errorR = R - Rd
    errorD = D - Dd
    EQ = np.concatenate([errorI,errorR,errorD])
    return EQ

## Fitting Automation

> Because Brazil is composed of many States, and simulations have to be done several times to actualize the database, creating an automatized universal method is a chance to win time and energy.

In [5]:
def SIRD_state_fitting_algorithm(State_Name, theta0, Tresearch, Tsim):
  """This function is fitting the SIRD model to a Dataset
     and plots the resulting curves.
     Args:
           - State_Name: Abbreviated name of a State  | string
           - theta0: Initial parameters               | tuple of float Beta, Gamma, Mu
           - Tresearch: Parameters' research duration | int (days)
           - Tsim: Simulation duration                | int (days) 
  """

  # Reading data - wcota
  data_path = 'https://raw.githubusercontent.com/wcota/covid19br/master/cases-brazil-states.csv'
  df = pd.read_csv(data_path, delimiter=",") 

  # Dataframe for the specific State
  df_state = df[df.state == State_Name].reset_index()

  # Creating new recovered column
  df_state["newRecovered"] = df_state["recovered"].diff()
  df_state.newRecovered.fillna(0, inplace=True)
  df_state.recovered.fillna(0, inplace=True)

  # Creating active cases column (Infected)
  active_infected = [df_state["totalCases"].iloc[0]]
  for nc, nr in zip(df_state["newCases"].iloc[1:], 
                    df_state["newRecovered"].iloc[1:]):
      active_infected.append(active_infected[-1] + nc - nr)
  df_state["activeCases"] = active_infected

  # State population
  with open('/content/drive/MyDrive/ISMIN/Semestre 8/Stage International/Instituto Maua de Tecnologia/Maua Internship/05_Notebooks/DataSets/state_pop.yaml') as f:
    state_pop = yaml.load(f)

  if (State_Name != 'TOTAL'):
    N = state_pop[State_Name]['population']

    #Susceptible population
    alpha = (df_state["recovered"][Tresearch] + df_state["activeCases"][Tresearch] + df_state["deaths"][Tresearch])/N
    N = alpha*N
  else:
    N = (df_state["recovered"][Tresearch] + df_state["activeCases"][Tresearch] + df_state["deaths"][Tresearch])
  
  # Considering the research duration for the parameters
  df_state_init = df_state.iloc[:Tresearch]
  
  # Data
  Id = df_state_init["activeCases"]
  Rd = df_state_init["recovered"]
  Dd = df_state_init["deaths"]
  Sd = N - Rd - Id - Dd

  # Initial Conditions
  S0, I0, R0, D0 = N-1, 1, 0, 0
  y0 = S0, I0, R0, D0

  # Time vector
  t = np.linspace(0, len(df_state_init.index.values), len(df_state_init.index.values))

  # Model use to find optimal parameters
  (best_theta, kvg) = optimize.leastsq(QuadraticError, theta0, args=(Sd,Id,Rd,Dd,y0,t,N))

  # Fitted simulation
  df_state_sim = df_state.iloc[:Tsim] # Days of simulation
  timeline = df_state_sim.date
  tsim = np.linspace(0, len(df_state_sim.index.values), len(df_state_sim.index.values))
  [Ss,Is,Rs,Ds] = SIRDsim(y0, tsim, N, best_theta)

  data = {'Date': timeline, 'Sd': Sd, 'Id': Id, 'Rd': Rd, 'Dd': Dd, 'Ss': Ss, 'Is': Is, 'Rs': Rs, 'Ds': Ds}
  Final_df = pd.DataFrame(data)

  return Final_df

In [6]:
def SIRD_plotting(SIRD_df):
  # Plotting the results

  Sd,Id,Rd,Dd = SIRD_df['Sd'], SIRD_df['Id'], SIRD_df['Rd'], SIRD_df['Dd']
  Ss,Is,Rs,Ds = SIRD_df['Ss'], SIRD_df['Is'], SIRD_df['Rs'], SIRD_df['Ds']
  timeline = SIRD_df['Date']

  fig = go.Figure()

  fig.add_trace(go.Scatter(
      name="Active Cases - Model",
      x=timeline,
      y=Is,
      mode='lines',
      line=dict(width=3, dash="dash", color="#6a1b9a")
      ))

  fig.add_trace(go.Scatter(
      name="Recovered - Model",
      x=timeline,
      y=Rs,
      mode='lines',
      line=dict(width=3, dash="dash", color="#2e7d32")
      ))

  fig.add_trace(go.Scatter(
      name="Susceptible - Model",
      x=timeline,
      y=Ss,
      mode='lines',
      line=dict(width=3, dash="dash", color="#58ACFA")
      ))

  fig.add_trace(go.Scatter(
      name="Desceased - Model",
      x=timeline,
      y=Ds,
      mode='lines',
      line=dict(width=3, dash="dash", color="burlywood")
      ))

  fig.add_trace(go.Scatter(
      name="Active Cases - Data",
      x=timeline,
      y=Id,
      mode='markers',
      marker=dict(size=6, color="#38006b")
      ))
  
  fig.add_trace(go.Scatter(
      name="Recovered - Data",
      x=timeline,
      y=Rd,
      mode='markers',
      marker=dict(size=6, color="#005005")
      ))

  fig.add_trace(go.Scatter(
      name="Susceptible - Data",
      x=timeline,
      y=Sd,
      mode='markers',
      marker=dict(size=6, color="#0080FF")
      ))

  fig.add_trace(go.Scatter(
      name="Desceased - Data",
      x=timeline,
      y=Dd,
      mode='markers',
      marker=dict(size=6, color="burlywood")
      ))
  
  fig.update_layout(
      template='xgridoff',
      xaxis=dict(showgrid=False),
      xaxis_title='Days',
      yaxis_title='Individuals',
      legend=dict(
        orientation='h',
        yanchor='bottom',
        y=1.01,
        xanchor='right',
        x=0.95
        ),
      title_text="SIRD Model")

  fig.show()

## States Vizualisation

In [12]:
theta0 = [0.4, 0.1, 0.00292]
SIRD_state_fitting_algorithm('TO', theta0, 300, 300)

Unnamed: 0,Date,Sd,Id,Rd,Dd,Ss,Is,Rs,Ds
0,2020-03-18,95678.0,1.0,0.0,0,95678.000000,1.000000,0.000000,0.000000
1,2020-03-19,95678.0,1.0,0.0,0,95677.879055,1.070704,0.048101,0.002139
2,2020-03-20,95678.0,1.0,0.0,0,95677.749560,1.146407,0.099604,0.004430
3,2020-03-21,95677.0,2.0,0.0,0,95677.610908,1.227462,0.154748,0.006882
4,2020-03-22,95674.0,5.0,0.0,0,95677.462454,1.314248,0.213791,0.009508
...,...,...,...,...,...,...,...,...,...
295,2021-01-07,1958.0,10115.0,82343.0,1263,11825.882558,758.689605,79556.366038,3538.061799
296,2021-01-08,1484.0,10202.0,82726.0,1267,11815.113669,733.249555,79591.033243,3539.603533
297,2021-01-09,1298.0,9888.0,83223.0,1270,11804.715263,708.653391,79624.537786,3541.093560
298,2021-01-10,1132.0,9657.0,83616.0,1274,11794.674421,684.873734,79656.918248,3542.533596


In [13]:
theta0 = [0.4, 0.1, 0.00292]
SIRD_plotting(SIRD_state_fitting_algorithm('TO', theta0, 300, 300))

## Linear Combination

>Having built a strong model to fit the SIRD model to each State, we can now modelize the whole Brazil Covid-19 behaviour, operating a linear combination of them.

### SIRD Matrix

>>The idea is to create a matrix from the *SIRD_state_fitting_algorithm*, regrouping every State's simulation data.

In [7]:
  #State Population Importation
  with open('/content/drive/MyDrive/ISMIN/Semestre 8/Stage International/Instituto Maua de Tecnologia/Maua Internship/05_Notebooks/DataSets/state_pop.yaml') as f:
    state_pop = yaml.load(f)

  #Matrix Initialisation
  columns = {'State': [],'Covid_Data': [],'Sim_Data': [], 'Date': []}
  SIRD_matrix = pd.DataFrame(columns=['State', 'Covid_Data', 'Sim_Data', 'Date'])

In [8]:
def SIRD_state_matrix(theta0, Tresearch, Tsim):
    
  #States' Population Importation
  with open('/content/drive/MyDrive/ISMIN/Semestre 8/Stage International/Instituto Maua de Tecnologia/Maua Internship/05_Notebooks/DataSets/state_pop.yaml') as f:
    state_pop = yaml.load(f)

  #States loop
  States_table = []
  DataFrames_table = []
  
  for state in state_pop:
    States_table.append(state)
    DataFrames_table.append(SIRD_state_fitting_algorithm(state, theta0, Tresearch, Tsim))

  #Matrix Creation
  BigData = {'State': States_table, 'SIRD_DataFrame': DataFrames_table}
  SIRD_matrix = pd.DataFrame(BigData)

  return SIRD_matrix

In [9]:
theta0 = [0.44, 0.15, 0.00292]

SIRD_matrix = SIRD_state_matrix(theta0, 300, 300)
SIRD_matrix.head()

Unnamed: 0,State,SIRD_DataFrame
0,AC,Date Sd Id ... ...
1,AL,Date Sd Id ... ...
2,AP,Date Sd Id ... ...
3,AM,Date Sd Id ... ...
4,BA,Date Sd Id ... ...


In [10]:
#Visualisation of a State Data from the matrix
state_df = SIRD_matrix.loc[SIRD_matrix['State'] == 'SP'].reset_index()
SIRD_plotting(state_df['SIRD_DataFrame'][0])

### Combination

>The purpose is now to sum the simulations issued from each State, then saved in the data matrix, to get the simulation for Brazil.

In [15]:
#Initialization
Init = np.zeros(300)
Combination_Result = pd.DataFrame({'Date': Init, 'Sd': Init, 'Id': Init, 'Rd': Init, 'Dd': Init, 'Ss': Init, 'Is': Init, 'Rs': Init, 'Ds': Init})

#States' Population Importation
with open('/content/drive/MyDrive/ISMIN/Semestre 8/Stage International/Instituto Maua de Tecnologia/Maua Internship/05_Notebooks/DataSets/state_pop.yaml') as f:
  state_pop = yaml.load(f)

#States loop
for Selected_state in state_pop:
  State_df = SIRD_matrix.loc[SIRD_matrix['State'] == Selected_state].reset_index()
  State_df = State_df['SIRD_DataFrame'][0]

  Combination_Result['Date'] = State_df['Date']

  Combination_Result['Sd'] = Combination_Result['Sd'] + State_df['Sd']
  Combination_Result['Id'] = Combination_Result['Id'] + State_df['Id']
  Combination_Result['Rd'] = Combination_Result['Rd'] + State_df['Rd']
  Combination_Result['Dd'] = Combination_Result['Dd'] + State_df['Dd']

  Combination_Result['Ss'] = Combination_Result['Ss'] + State_df['Ss']
  Combination_Result['Is'] = Combination_Result['Is'] + State_df['Is']
  Combination_Result['Rs'] = Combination_Result['Rs'] + State_df['Rs']
  Combination_Result['Ds'] = Combination_Result['Ds'] + State_df['Ds']

### Visualization

>The final DataFrame being created and calculated from the matrix, we can now visualize it to analyze the accuracy of the linear combination compared to real Brazil data.

In [16]:
SIRD_plotting(Combination_Result)

>>As a conclusion concerning our Linear Combination Algorithm, we can observe that the result is more or less accurate. \\
The goal is now to define if this method is better than the direct SIRD simulation from the real Dataset, and quantify the accuracy.

### Comparison Linear Combination/SIRD Simulation

In [23]:
theta0 = [0.4, 0.1, 0.00292]
Simulation = SIRD_state_fitting_algorithm('TOTAL', theta0, 300, 300)
SIRD_plotting(Simulation)
SIRD_plotting(Combination_Result)

In [26]:
from sklearn import metrics as mt

mt.r2_score(Simulation[['Sd','Id','Rd','Dd']], Simulation[['Ss','Is','Rs','Ds']])

0.40275270625241677

In [27]:
mt.r2_score(Combination_Result[['Sd','Id','Rd','Dd']], Combination_Result[['Ss','Is','Rs','Ds']])

0.5454004963177259

## Conclusion

>Finally, we have succeeded in building a strong model for whole Brazil. \\
Our studies can now result in the analysis of each Brazilian State individually, but moreover in the analysis of whole Brazil with a Linear Combination, resulting in more relevant and smooth results compared to the direct SIRD simulation.