# Introduction - Herd Immunity Estimate for Spain 

## Objectives
The goal of this project is to answer the following question: 
 - Will Spain reach herd immunity? 
 - When will that happen?  


## Steps
In order to answer the questions defined above, we need to accomplish the following steps: 
1. Calculate the increment in the expected number of deaths per day 
2. Calculate number of real infected people from 2 weeks ago backwards
3. Calculate a model that fit the evolution of infected people. The model will give us:
     - Predictions for future real infected and real deaths 
     - Ro
4. Having Ro and the Infections we have all the values in the equation to determine whether we´ll reach herd immunity  

## Disclaimer
By no means, you should take the conclusions from this project as statements. I am not a doctor, neither an expert in epidemics. 





In [38]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline 

# Decrement in deaths due to confinement

We are going to calculate deaths provoke by causes that should not occur during the confinement. Then we are going to subtract this number for the expected daily death. 

This is a very basic approach, but since we have no data about the death distribution accross the year we are going to assume these deaths happen following a uniform distribution. 

I got the information about causes of death per year in Spain from the following page: https://www.ine.es/jaxiT3/Tabla.htm?t=7947

In [35]:
deaths_causes = pd.read_csv("7947.csv", sep=";", thousands=".")

# These are the causes I have identified that should be reduced. 
non_existing_causes = ['090  Accidentes de tráfico',
                       '091  Otros accidentes de transporte',
                       '092  Caídas accidentales',
                       '093  Ahogamiento, sumersión y sofocación accidentales',
                       '099  Agresiones (homicidio)']

# We are going to filter  only the information that we need
deaths_causes = deaths_causes[deaths_causes["Sexo"] == "Total"]
deaths_causes = deaths_causes[deaths_causes["Edad"] == "Todas las edades"]

# I am going to consider information about the last 5 years available 
deaths_causes = deaths_causes[deaths_causes["Periodo"] > 2013] 
deaths_causes = deaths_causes.loc[deaths_causes["Causa de muerte"].isin(non_existing_causes)]
deaths_causes = deaths_causes.groupby(by=["Causa de muerte"]).max() 
deaths_causes.drop(columns="Periodo")

Unnamed: 0_level_0,Sexo,Edad,Total
Causa de muerte,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
090 Accidentes de tráfico,Total,Todas las edades,1943
091 Otros accidentes de transporte,Total,Todas las edades,217
092 Caídas accidentales,Total,Todas las edades,3143
"093 Ahogamiento, sumersión y sofocación accidentales",Total,Todas las edades,3116
099 Agresiones (homicidio),Total,Todas las edades,325


In [36]:
# We are going to calculate the number of reduced deaths per day
deaths_causes["Total"].sum() / 365

23.956164383561642

# All-cause mortality


I got the information from this site: https://momo.isciii.es/public/momo/dashboard/momo_dashboard.html#nacional

In [33]:
mortality = pd.read_csv("data.csv")
mortality = mortality[mortality["nombre_sexo"] == "todos"]
mortality = mortality[mortality["nombre_gedad"] == "todos"]
mortality = mortality[mortality["ambito"] == "nacional"]


mortality.drop(columns=["ambito", "cod_ambito", "cod_ine_ambito", "nombre_ambito", "cod_sexo",
               "nombre_sexo", "cod_gedad", "nombre_gedad"], inplace=True)
fecha_defuncion.

Unnamed: 0,fecha_defuncion,defunciones_observadas,defunciones_observadas_lim_inf,defunciones_observadas_lim_sup,defunciones_esperadas,defunciones_esperadas_q01,defunciones_esperadas_q99
0,2018-03-24,1155,1155.000000,1155.000000,1118.5,1007.45,1248.325
1,2018-03-25,1065,1065.000000,1065.000000,1121.0,1007.45,1229.170
2,2018-03-26,1143,1143.000000,1143.000000,1120.5,989.43,1229.170
3,2018-03-27,1164,1164.000000,1164.000000,1119.5,989.43,1213.890
4,2018-03-28,1179,1179.000000,1179.000000,1118.5,989.43,1230.720
...,...,...,...,...,...,...,...
744,2020-04-06,1576,1563.208148,1589.200704,1093.5,1016.80,1188.030
745,2020-04-07,1486,1469.219408,1503.240904,1092.5,1014.04,1176.990
746,2020-04-08,1022,996.534246,1048.020855,1093.5,998.97,1177.230
747,2020-04-09,781,732.317178,830.371116,1092.5,998.97,1176.540


# Next Steps:
    - To do the same but to a regional level. data from Movo and Inem is avaibla for province
    
# Limitations:
    - Debido a la sobrecarga del sistema sanitario, el numero de muertes durante la prmera quincena del mes de marzo ha podido sue superior al esperado 
    - Confinement is going to have an important impact on the trasmission of the virus. We should consider the transmission rate is going to change before, after and even depending the severity of the confinement
    