<h1>COVID-19 - Regional visualization <span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Data-pre-processing" data-toc-modified-id="Data-pre-processing-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Data pre-processing</a></span></li><li><span><a href="#Visualizations" data-toc-modified-id="Visualizations-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Visualizations</a></span></li></ul></div>

# Introduction

--------
This notebook... 


The *Data pre-processing* stage was almost all done by [@manu-jimenez](https://github.com/manu-jimenez) in its [notebook](https://github.com/manu-jimenez/covid19-manu/blob/master/covid19-manu.ipynb).



--------

Data about COVID-19 in Spain from: https://covid19.isciii.es

Population data from INE: https://www.ine.es/jaxiT3/Datos.htm?t=2915

--------

In [18]:
# Libraries and figure settings

import io
import numpy as np
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import requests
import time
import matplotlib
plt.style.use('fivethirtyeight')

# Data pre-processing

In [19]:
# Get COVID-19 data 
req = requests.get("https://covid19.isciii.es/resources/serie_historica_acumulados.csv")
csv = req.content.decode('ISO-8859-1')

filename = 'serie_historica_acumulados.csv'
with open(filename, 'w') as f:
    f.write(csv)
    
# Load population file (Source: INE)
population = pd.read_csv('censo_espana_2019.csv',sep=';')

In [20]:
# Constants, variables and dictionaries

init_quarantine = np.datetime64('2020-03-15')
init_strict_quarantine = np.datetime64('2020-03-30')

In [21]:
# Create dataframe. # If there is no data use: df0 = pd.read_csv('./download_data/serie_historica_acumulados.csv')
df = pd.read_csv(io.StringIO(csv), sep=',', skipfooter=1, parse_dates=[1], dayfirst=True, engine='python')

# Rename columns and replace CCAA codes
df.rename(columns={'CCAA Codigo ISO': 'Region', 'Casos ': 'Casos'}, inplace = True)

# Fill nan's
df[['Casos', 'Hospitalizados', 'UCI', 'Fallecidos', 'Recuperados']] = df[['Casos', 'Hospitalizados', 'UCI', 'Fallecidos', 'Recuperados']].fillna(0)

# Compute total
df = df.sort_values(by=['Region','Fecha']).reset_index(drop=True)
total_spain = df.groupby('Fecha').sum().reset_index()
total_spain['Region'] = ['Spain'] * len(total_spain)

# Concatenate both df and total_spain
df = pd.concat([total_spain, df])

# Add day
df['Dia'] = df.groupby('Region')['Fecha'].transform(lambda x: x - x.iloc[0]).dt.days 


# Only take data if deaths > 100
df = df[df['Fallecidos'] > 100]


df.head()

Unnamed: 0,Fecha,Casos,Hospitalizados,UCI,Fallecidos,Recuperados,Region,Dia
21,2020-03-12,4130.0,1858.0,270.0,120.0,181.0,Spain,21.0
22,2020-03-13,5844.0,2120.0,296.0,134.0,512.0,Spain,22.0
23,2020-03-14,7698.0,3096.0,383.0,285.0,517.0,Spain,23.0
24,2020-03-15,9149.0,3435.0,416.0,306.0,530.0,Spain,24.0
25,2020-03-16,11178.0,5136.0,563.0,491.0,1028.0,Spain,25.0


In [23]:
# Compute new metrics using population by region
df = df.merge(population, 'left', on='Region')

df['Casos_per_capita'] = df['Casos'] / df['Population'] * 100
df['Hospitalizados_por_casos'] = df['Hospitalizados'] / df['Casos'] * 100
df['UCI_por_hospitalizados'] = df['UCI'] / df['Hospitalizados'] * 100
df['Letalidad'] = df['Fallecidos'] / df['Casos'] * 100
df['Mortalidad'] = df['Fallecidos'] / df['Population'] * 100
df['Recuperados_por_casos'] = df['Recuperados'] / df['Casos'] * 100

In [25]:
df.columns

Index(['Fecha', 'Casos', 'Hospitalizados', 'UCI', 'Fallecidos', 'Recuperados',
       'Region', 'Dia', 'Population', 'Casos_per_capita',
       'Hospitalizados_por_casos', 'UCI_por_hospitalizados', 'Letalidad',
       'Mortalidad', 'Recuperados_por_casos'],
      dtype='object')

# Visualizations