<h1>COVID-19 - Regional visualization <span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Data-pre-processing" data-toc-modified-id="Data-pre-processing-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Data pre-processing</a></span></li><li><span><a href="#Visualizations" data-toc-modified-id="Visualizations-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Visualizations</a></span></li></ul></div>

# Introduction

--------
This notebook... 


The *Data pre-processing* stage was almost all done by [@manu-jimenez](https://github.com/manu-jimenez) in its [notebook](https://github.com/manu-jimenez/covid19-manu/blob/master/covid19-manu.ipynb).

--------

Data from official source: https://covid19.isciii.es


--------

In [118]:
# Libraries and figure settings

import io
import numpy as np
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import requests
import time
import matplotlib
plt.style.use('fivethirtyeight')

# Data pre-processing

In [80]:
# Get data 

req = requests.get("https://covid19.isciii.es/resources/serie_historica_acumulados.csv")
csv = req.content.decode('ISO-8859-1')

filename = 'serie_historica_acumulados.csv'
with open(filename, 'w') as f:
    f.write(r'./download_data/'+csv)

In [96]:
# Constants, variables and dictionaries

init_quarantine = np.datetime64('2020-03-15')
init_strict_quarantine = np.datetime64('2020-03-30')

nombres_ccaa = {'AN':'Andalucía', 'AR':'Aragón', 'AS':'Asturias', 'CB':'Cantabria', 'CE':'Ceuta',
                'CL':'Castilla Y León', 'CM':'Castilla-La Mancha', 'CN':'Canarias', 'CT':'Cataluña', 'EX':'Extremadura',
                'GA':'Galicia','IB':'Balears', 'MC':'Murcia', 'MD':'Madrid, Comunidad', 'ME':'Melilla',
                'NC':'Navarra','PV':'País Vasco', 'RI':'Rioja, La', 'VC':'Valencia, Comunidad'
               }

In [116]:
# Create dataframe. # If there is no data use: df0 = pd.read_csv('./download_data/serie_historica_acumulados.csv')
df = pd.read_csv(io.StringIO(csv), sep=',', skipfooter=1, parse_dates=[1], na_values=0, infer_datetime_format=True, engine='python')

# Rename regions
df.rename(columns={'CCAA Codigo ISO': 'Region', 'Casos ': 'Casos'}, inplace = True)
df.replace({'Region': nombres_ccaa}, inplace= True)

# Fill nan's and delete
df = df.fillna(0)

# Compute total values (Spain)
total_españa = df.groupby('Fecha').sum().reset_index()
total_españa['Region'] = ['España'] * len(total_españa)

# Dtaframe with both the values by region (df0) and the spanish values
df = pd.concat([total_españa, df], ignore_index=True)

# Add day column (Black magic by @manu-jimenez)
df['Dia'] = df.groupby('Region')['Fecha'].transform(lambda x: x - x.iloc[0]).dt.days 
df.sort_values(by=['Region','Fecha'], inplace = True)

df.head()

Unnamed: 0,Fecha,Casos,Hospitalizados,UCI,Fallecidos,Recuperados,Region,Dia
42,2020-02-20,0.0,0.0,0.0,0.0,0.0,Andalucía,0
61,2020-02-21,0.0,0.0,0.0,0.0,0.0,Andalucía,1
80,2020-02-22,0.0,0.0,0.0,0.0,0.0,Andalucía,2
99,2020-02-23,0.0,0.0,0.0,0.0,0.0,Andalucía,3
118,2020-02-24,0.0,0.0,0.0,0.0,0.0,Andalucía,4


# Visualizations