# COVID 19 Mexican Analysis
## Cesar Robles
### What is COVID-19?

**COVID-19** is the infectious disease caused by the coronavirus, *SARS-CoV-2*, which is a respiratory pathogen. WHO (World Health Organization) first learned of this new virus from cases in Wuhan, People’s Republic of China on 31 December 2019.

The most common symptoms of COVID-19 are:
* Fever
* Dry cough
* Fatigue

Other symptoms that are less common and may affect some patients include:
* Loss of taste or smell,
* Nasal congestion,
* Conjunctivitis (also known as red eyes)
* Sore throat,
* Headache,
* Muscle or joint pain,
* Different types of skin rash,
* Nausea or vomiting,
* Diarrhea,
* Chills or dizziness.

Symptoms are usually mild. Some people become infected but only have very mild symptoms or none at all.

Symptoms of severe COVID‐19 disease include:
* Shortness of breath,
* Loss of appetite,
* Confusion,
* Persistent pain or pressure in the chest,
* High temperature (above 38 °C).

Other less common symptoms are:
* Irritability,
* Confusion,
* Reduced consciousness (sometimes associated with seizures),
* Anxiety,
* Depression,
* Sleep disorders,
* More severe and rare neurological complications such as strokes, brain inflammation, delirium and nerve damage.

People of all ages who experience fever and/or cough associated with difficulty breathing or shortness of breath, chest pain or pressure, or loss of speech or movement should seek medical care immediately. If possible, call your health care provider, hotline or health facility first, so you can be directed to the right clinic.

[https://www.who.int/emergencies/diseases/novel-coronavirus-2019/question-and-answers-hub/q-a-detail/q-a-coronaviruses#:~:text=symptoms]

### COVID-19 in Mexico

The virus was confirmed to have reached Mexico in February 2020. However, the National Council of Science and Technology (CONACYT) reported two cases of COVID-19 in mid-January 2020 in the states of Nayarit and Tabasco, one case per state. As of October, there had been near 800,000 confirmed cases of COVID-19 in Mexico and circa 88,000 reported deaths, although the Secretariat of Health, through the "Programa Centinela" (Spanish for "Sentinel Program") estimated in mid July 2020 that there were more than 2,875,734 cases in Mexico, because they were considering the total number of cases confirmed as a statistical sample.

[https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Mexico]

In [1]:
## Needed libraries

import pandas as pd
import numpy as np
import os
import glob

from requests import request
import urllib.request
import json
from pandas.io.json import json_normalize

import numpy as np
import pandas as pd
import pandas_profiling
from pandas.plotting import register_matplotlib_converters
import seaborn as sns
import matplotlib.pyplot as plt
import folium 
import plotly
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

# colour pallette
cnf, dth, rec, act = '#f7d619', '#d43d56', '#48cf46', '#6b9be8' 

# Use to hide warnings
import warnings
warnings.filterwarnings('ignore')

from IPython.display import Markdown
%matplotlib inline

All the necesary files are stored in Files directory. The following code extract the information from a zip file.

In [2]:
os.system('unzip ../Files/db.zip')

0

In [None]:
data = pd.read_csv('../Files/mex_covid_daily.csv',engine='python')
print('Confirmed cases: {0:,.0f}'.format(len(data)))

In [None]:
data.head()

**Converting columns to datetime**

Converting all the columns that have dates. There are 3 date columns (FECHA_INGRESO, FECHA_SINTOMAS and FECHA_DEF) the only column with problems is FECHA_DEF due to the strange date ('9999-99-99').
* FECHA_INGRESO is the date when the person goes to the hospital.
* FECHA_SINTOMAS is the date when the person has the COVID Symptoms.
* FECHA_DEF is the date when the infected person dies.

In [None]:
data['FECHA_ACTUALIZACION'] = pd.to_datetime(data['FECHA_ACTUALIZACION'])
data['FECHA_INGRESO'] = pd.to_datetime(data['FECHA_INGRESO'])
data['FECHA_SINTOMAS'] = pd.to_datetime(data['FECHA_SINTOMAS'])
data.loc[(data['FECHA_DEF'].isin(['9999-99-99'])),'FECHA_DEF'] = ''
data.loc[(data['FECHA_DEF'].isnull()),'FECHA_DEF']='0'
data['FECHA_DEF']=pd.to_datetime(data['FECHA_DEF'])

Get the number of deceased people in Mexico.

In [None]:
print('Dead people: {0:,.0f}'.format(data.loc[(data['Status'].isin(['Dead'])),'ID_REGISTRO'].count()))

Select the more useful columns from the database.

In [None]:
data = data[['ID_REGISTRO','FECHA_INGRESO','FECHA_DEF',
 'SECTOR_DESC','SEXO_DESC','TIPO_PACIENTE_DESC','ENTIDAD_FEDERATIVA_RES',
 'ABREVIATURA_RES','INTUBADO_DESC', 'NEUMONIA_DESC','DIABETES_DESC','EPOC_DESC',
 'ASMA_DESC','INMUSUPR_DESC','OTRA_COM_DESC','CARDIOVASCULAR_DESC','OBESIDAD_DESC',
 'RENAL_CRONICA_DESC','TABAQUISMO_DESC','OTRO_CASO_DESC',
 'HIPERTENSION_DESC','MUNICIPIO_RES_DESC',
 'EDAD_CLASS','Status']]
print('Total confirmed  people: {0:,.0f}'.format(len(data)))

Checking the final numbers of infected, active and alive people.

In [None]:
data.groupby(['Status'],as_index=False)['ID_REGISTRO'].count()

Extracting the confirmed cases by states. In Mexico, we have 32 registered states.

In [None]:
confirm = data.groupby(['ENTIDAD_FEDERATIVA_RES'],as_index=False)['ID_REGISTRO'].count()
confirm = confirm.rename(columns={'ID_REGISTRO':'Confirmed'})
confirm['Confirmed'].sum()

Get the total deceased people by States.

In [None]:
muertos = data.loc[(data['Status'].isin(['Dead']))]
deceased = muertos.groupby(['ENTIDAD_FEDERATIVA_RES'],as_index=False)['ID_REGISTRO'].count()
deceased = deceased.rename(columns={'ID_REGISTRO':'Deceased'})
deceased.head()

Get the number of recovered people by States.

In [None]:
recuperados = data.loc[(data['Status'].isin(['Alive']))]
recovered = recuperados.groupby(['ENTIDAD_FEDERATIVA_RES'],as_index=False)['ID_REGISTRO'].count()
recovered = recovered.rename(columns={'ID_REGISTRO':'Recovered'})
recovered.head()

Get the total active people by states.

In [None]:
activos = data.loc[(data['Status'].isin(['Active']))]
active = activos.groupby(['ENTIDAD_FEDERATIVA_RES'],as_index=False)['ID_REGISTRO'].count()
active = active.rename(columns={'ID_REGISTRO':'Active'})
active.head()

Merge all the data into a simple table to be used further.

In [None]:
newData=pd.merge(confirm,deceased)
newData=pd.merge(newData,recovered)
newData=pd.merge(newData,active)
newData.head(32)

Set the latitude and length for all the states in Mexico. This information will be used into map.

In [None]:
states=[['AGUASCALIENTES',21.87945992,-102.2904135],
['BAJA CALIFORNIA',30.76405113,-116.0092603],
['BAJA CALIFORNIA SUR',26.01333335,-111.3516635],
['CAMPECHE',18.65365928,-91.82448019],
['CHIAPAS',16.74999697,-92.63337447],
['CHIHUAHUA',26.93335472,-105.6666358],
['COAHUILA DE ZARAGOZA',28.32998781,-100.8499789],
['COLIMA',18.92038129,-103.8799748],
['CIUDAD DE MÉXICO',19.44244244,-99.1309882],
['DURANGO',25.57005292,-103.5000238],
['GUANAJUATO',20.67001609,-101.4999909],
['GUERRERO',17.54997398,-99.5000096],
['HIDALGO',20.17043418,-98.73003076],
['JALISCO',19.77001935,-104.3699966],
['MICHOACÁN DE OCAMPO',19.41001548,-99.02998661],
['MORELOS',19.67997316,-100.569996],
['MÉXICO',18.92110476,-99.23999964],
['NAYARIT',21.81999758,-105.2200481],
['NUEVO LEÓN',25.1899986,-99.83998885],
['OAXACA',16.42999066,-95.01999882],
['PUEBLA',18.90002077,-98.44999618],
['QUERÉTARO',20.37998212,-100.0000308],
['QUINTANA ROO',21.20839057,-86.7114549],
['SAN LUIS POTOSÍ',22.00001243,-99.66999923],
['SINALOA',23.19999086,-106.2300381],
['SONORA',27.58000775,-109.9299931],
['TABASCO',18.40002545,-93.22997888],
['TAMAULIPAS',22.73335268,-98.95001734],
['TLAXCALA',19.31999514,-98.2300096],
['VERACRUZ DE IGNACIO DE LA LLAVE',17.93997601,-94.73999007],
['YUCATÁN',21.09998985,-89.27998743],
['ZACATECAS',22.35001691,-102.88001]]
states=pd.DataFrame(states,columns=['ENTIDAD_FEDERATIVA_RES','Latitude','Longitude'])
newData = newData.join(states.set_index('ENTIDAD_FEDERATIVA_RES'),on='ENTIDAD_FEDERATIVA_RES')
newData.head(32)

Calculating the Mortality and Recovered rate.

In [None]:
newData['Mortality_Rate']=newData['Deceased']/newData['Confirmed'] * 100
newData['Recovery_Rate']=newData['Recovered']/newData['Confirmed'] * 100
newData.head(32)

In [None]:
newData.sum()

Sorting the table by its confirmed cases by state.

In [None]:
newData=newData.sort_values(by=['Confirmed'],ascending=False)
newData

Graphical analysis for the Recovered, Deceased and Active people.

In [None]:
import plotly.express as px
#  subset columns
temp = newData[['Active', 'Deceased', 'Recovered']]
temp = temp.iloc[:,:]

# rename columns
temp.columns = ['Active', 'Deceased', 'Recovered']

# melt into longer format
tm = temp.melt(value_vars=['Active', 'Deceased', 'Recovered'])

# plot
fig_1 = px.treemap(tm, path=["variable"], values="value", height=250, color_discrete_sequence=[rec, dth, act], title='Latest stats')
fig_1.data[0].textinfo = 'label+text+value'
fig_1.show()

### Grouping phase

Extracting the information based on its datetime.

**Confirmed cases by date**

In [None]:
confirmed=data.groupby(['FECHA_INGRESO'],as_index=False)['ID_REGISTRO'].count()
confirmed=confirmed.rename(columns={'ID_REGISTRO':'Confirmed'})
confirmed.head()

**Deceased cases by date**

In [None]:
dead = data.loc[(data['Status'].isin(['Dead']))]
dead = dead.groupby(['FECHA_INGRESO'],as_index=False)['ID_REGISTRO'].count()
dead = dead.rename(columns={'ID_REGISTRO':'Dead'})
dead.head()

**Recover people by date**

In [None]:
recover = data.loc[(data['Status'].isin(['Alive']))]
recover = recover.groupby(['FECHA_INGRESO'],as_index=False)['ID_REGISTRO'].count()
recover = recover.rename(columns={'ID_REGISTRO':'Recover'})
recover.head()

**Active people by date**

In [None]:
active = data.loc[(data['Status'].isin(['Active']))]
active = active.groupby(['FECHA_INGRESO'],as_index=False)['ID_REGISTRO'].count()
active = active.rename(columns={'ID_REGISTRO':'Active'})
active.head()

**Creating the daily based to be analysed further**

In this part, the confirmed, recover, active and dead people are grouped. This dataframe will be used to forecast the number of confirm people and also the number of deads.

In [None]:
daily = confirmed.join(active.set_index('FECHA_INGRESO'),on='FECHA_INGRESO')
daily = daily.join(recover.set_index('FECHA_INGRESO'),on='FECHA_INGRESO')
daily = daily.join(dead.set_index('FECHA_INGRESO'),on='FECHA_INGRESO')
daily = daily.fillna(0.0)
daily.head()

Convert the columns to integer.

In [None]:
daily['Dead']=daily['Dead'].astype(int)
daily['Confirmed']=daily['Confirmed'].astype(int)
daily['Recover']=daily['Recover'].astype(int)
daily['Active']=daily['Active'].astype(int)
daily=daily.rename(columns={'FECHA_INGRESO':'Date'})

In [None]:
daily.head()

## Plotting the daily base

Ploting the information to check the deceased behaviour. The following method use the daily base and group the information by date.

In [None]:
def plot_daily(col, hue):
    fig = px.bar(daily, x="Date", y=col, title=col, 
                 color_discrete_sequence=[hue])
    fig.update_layout(title=col, xaxis_title="", yaxis_title="")
    fig.show()

Confirmed cases plot

In [None]:
plot_daily('Confirmed','#F9E010')

Deceased people graphic

In [None]:
plot_daily('Dead','#E2380A')

Reccover people cases in a bar chart

In [None]:
plot_daily('Recover','#41E20A')

## Forecasting analysis

Facebook phrophet library is used to perform the forecast analysis. This library is selected because it follows the *sklearn* model API. In this sense, an instance of the Prophet class can be created and then call its fit and predict methods.

The input to Prophet is always a dataframe with two columns: **ds** and **y**. The ds (datestamp) column should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. The y column must be numeric, and represents the measurement we wish to forecast.

In [None]:
df = daily[['Date','Confirmed']]
df.columns = ['ds' , 'y']

In [None]:
# importing fbprophet
from fbprophet import Prophet

# model
m = Prophet(daily_seasonality=False,yearly_seasonality=False)

# fitting the model
m.fit(df)

The number of periods was chosen to predict the number of confirm cases for the rest of the year. This analysis was perform to analyse the behaviour of the confirm cases and determine how many infected people will be there by state and take the corresponding actions to mitigate this illnes.

In [None]:
future = m.make_future_dataframe(periods = 70) 
future.tail()

In [None]:
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(21)

In [None]:
from fbprophet.plot import plot_plotly
fig = plot_plotly(m, forecast)  # This returns a plotly Figure
fig.update_layout(
                  autosize=False,
                  width= 750,
                  height= 800,
    title_text='<b>Covid-19 Total cases Forecast<b>',
    title_x=0.5,)
fig.show()

The above graphic illustrates the behaviour of the data. In this sense, the library can model the behaviour of the data. One of the most difficult problems with this graphic is the data. Mexico collects the data every day but unfortunately, the weekends are less accurate that the other days.

The Mexican sentinel program are very effective to simulate and calculate the number of infected people. However, dur to the fear to the deceased, many people do not go outside on weekends. Being this the first hipothesis to understand the graphic above.

In [None]:
forecast.tail()

## Fitting the curve

Trying to fit these class of curve is very difficul, because there are not particular method to do that. The easiest way to do that is taking into consideration the predicted output from the phrophet library.

In [None]:
import datetime
import scipy
from plotly.offline import iplot

def plot_exponential_fit_data(d_df, title, delta, p0):
    d_df = d_df.sort_values(by=['Date'], ascending=True)
    d_df['x'] = np.arange(len(d_df)) + 1
    d_df['y'] = d_df['Confirmed']

    x = d_df['x'][:-delta]
    y = d_df['y'][:-delta]
    y_fit = forecast['yhat']
    
    traceC = go.Scatter(
        x=d_df['x'][:-delta], y=d_df['y'][:-delta],
        name="Confirmed (included for fit)",
        marker=dict(color="Red"),
        mode = "markers+lines",
        text=d_df['Confirmed'],
    )

    traceV = go.Scatter(
        x=d_df['x'][-delta-1:], y=d_df['y'][-delta-1:],
        name="Confirmed (validation)",
        marker=dict(color="blue"),
        mode = "markers+lines",
        text=d_df['Confirmed'],
    )
    
    traceP = go.Scatter(
        x=np.array(x), y=y_fit,
        name="Projected values (fit curve)",
        marker=dict(color="green"),
        mode = "lines",
        text=y_fit,
    )

    data = [traceC, traceV, traceP]

    layout = dict(title = 'Confirmed cases and curve projection',
          xaxis = dict(title = 'Day since first case', showticklabels=True), 
          yaxis = dict(title = 'Number of cases'),plot_bgcolor='rgb(275, 270, 273)',
          hovermode = 'closest'
         )
    fig = dict(data=data, layout=layout)
    iplot(fig, filename='covid-exponential-forecast')

p0 = (40, 0.7)
plot_exponential_fit_data(daily, 'I', 15, p0)

The graphic above illustrates the behaviour of the confirmed cases. The green line represents the projected values that are emulating the behaviour due to the data. The red colour is plotting the real confirmed cases andd finally the blue line is plotting the forecasted data. In this case, the behaviour of the curve confirms the Mexican reality on the outbreak. We can confirm everyday the information by watching the news.

In [None]:
daily.head()

## Time Series Forecasting

A *time series* is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Time Series analysis can be useful to see how a given asset, security or economic variable changes over time. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

Time series are very frequently plotted via run charts (a temporal line chart). Time series are used in statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, communications engineering, and largely in any domain of applied science and engineering which involves temporal measurements.

Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values. While regression analysis is often employed in such a way as to test theories that the current values of one or more independent time series affect the current value of another time series, this type of analysis of time series is not called "time series analysis", which focuses on comparing values of a single time series or multiple dependent time series at different points in time. Interrupted time series analysis is the analysis of interventions on a single time series.

Time series data have a natural temporal ordering. This makes time series analysis distinct from cross-sectional studies, in which there is no natural ordering of the observations (e.g. explaining people's wages by reference to their respective education levels, where the individuals' data could be entered in any order). Time series analysis is also distinct from spatial data analysis where the observations typically relate to geographical locations (e.g. accounting for house prices by the location as well as the intrinsic characteristics of the houses). A stochastic model for a time series will generally reflect the fact that observations close together in time will be more closely related than observations further apart. In addition, time series models will often make use of the natural one-way ordering of time so that values for a given period will be expressed as deriving in some way from past values, rather than from future values (see time reversibility.)

Time series analysis can be applied to real-valued, continuous data, discrete numeric data, or discrete symbolic data.
[https://en.wikipedia.org/wiki/Time_series]

In [None]:
# Import statsmodel
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.tsa.stattools import adfuller, acf, pacf
from statsmodels.tsa.arima_model import ARIMA
import math

from matplotlib.pylab import rcParams
rcParams['figure.figsize']  =  10, 6
pd.options.display.float_format = '{:.2f}'.format

In [None]:
df_time_series = daily[['Date', 'Confirmed']]
df_time_series = df_time_series.set_index('Date')
df_time_series.tail()

In [None]:
df_time_series.shape

### Time Series Analysis on total deceased cases

Performing seasonal decomposition on "total deceased" column with model as "additive", store it in dataframe named decomposed_dataset.

In [None]:
# Additive Decomposition
from statsmodels.tsa.seasonal import seasonal_decompose
pd.plotting.register_matplotlib_converters()

def generate():
    decomposed_dataset = seasonal_decompose(df_time_series, model='additive', freq=1)
    figure  =  decomposed_dataset.plot()
    plt.show()
    return decomposed_dataset

decomposed_dataset = generate()

###   Analysing the stationarity of Timeseries

In [None]:
import itertools

# Define the p, d and q parameters to take any value between 0 and 2
p = d = q = range(0, 3)

# Generate all different combinations of p, q and q triplets
pdq = list(itertools.product(p, d, q))

# Generate all different combinations of seasonal p, q and q triplets
seasonal_pdq = [(x[0], x[1], x[2], 4) for x in list(itertools.product(p, d, q))]

print('Examples of parameter combinations for Seasonal ARIMAX...')
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[1]))
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[2]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[3]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[4]))

In [None]:
for param in pdq:
    for param_seasonal in seasonal_pdq:
        try:
            mod = sm.tsa.statespace.SARIMAX(df_time_series.totalconfirmed,
                                            order=param,
                                            seasonal_order=param_seasonal,
                                            enforce_stationarity=False,
                                            enforce_invertibility=False)

            results = mod.fit()
            print('SARIMAX{}x{}4 - AIC:{}'.format(param, param_seasonal, results.aic))
        except:
            continue


from datetime import datetime
start_time = datetime.now()
# do your work here
end_time = datetime.now()
print('Duration: {}'.format(end_time - start_time))

### Fitting an SARIMAX Time Series Model

In [None]:
#with-out any Transpormation
mod = sm.tsa.statespace.SARIMAX(df_time_series,
                                order=(2, 1, 2),
                                seasonal_order=(0, 2, 2, 4),
                                enforce_stationarity=False,
                                enforce_invertibility=False)

results = mod.fit(maxiter=100)

In [None]:
print(results.summary())

In [None]:
results.plot_diagnostics(figsize=(18, 8))
plt.show()

In [None]:
import pickle 
from statsmodels.tsa.statespace.sarimax import SARIMAXResults

# save model
results.save('model.pkl')
# load model
loaded = SARIMAXResults.load('model.pkl')

In [None]:
pred = loaded.get_prediction(start=pd.to_datetime('2020-01-13'), dynamic=False)
pred_ci = pred.conf_int()
ax = df_time_series['2020':].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7, figsize=(14, 4))
ax.fill_between(pred_ci.index,
                pred_ci.iloc[:, 0],
                pred_ci.iloc[:, 1], color='k', alpha=.2)
ax.set_xlabel('Date')
ax.set_ylabel('Total +Ve Confirmed Case')
plt.legend()
plt.show()

## Some useful information

### Mexican Map with the Confirm, Recover and Deceased people

In [None]:
mexMap = folium.Map(location=[19,-102], tiles="cartodbpositron", zoom_start=4)

for lat, lon, value1,value2,value3, name in zip(newData['Latitude'], newData['Longitude'], newData['Confirmed'],newData['Recovered'],newData['Deceased'], newData['ENTIDAD_FEDERATIVA_RES']):
    folium.CircleMarker([lat, lon],
                        radius= (int((np.log(value1+1.00001))))*4,
                        popup = ('<strong>States</strong>: ' + str(name).capitalize() + '<br>'
                                '<strong>Confirmed</strong>: ' + str(value1) + '<br>'),
                        color='#ff6600',
                        
                        fill_color='#ff8533',
                        fill_opacity=0.4 ).add_to(mexMap)
    
    folium.CircleMarker([lat, lon],
                        radius= (int((np.log(value2+1.00001))))*3,
                        popup = ('<strong>States</strong>: ' + str(name).capitalize() + '<br>'
                                '<strong>Recovered</strong>: ' + str(value2) + '<br>'),
                        color='#008000',
                        
                        fill_color='#008000',
                        fill_opacity=0.4 ).add_to(mexMap)
    
    folium.CircleMarker([lat, lon],
                        radius= (int((np.log(value3+1.00001))))*2,
                        popup = ('<strong>States</strong>: ' + str(name).capitalize() + '<br>'
                                 '<strong>Deaths</strong>: ' + str(value3) + '<br>'),
                        color='#0000A0',
                        
                        fill_color='#0000A0',
                        fill_opacity=0.4 ).add_to(mexMap)
mexMap

In [None]:
newData

### Top ten confirmed and affected states

In [None]:
def plot_topn_states(col, color=None, n=10):
    df = newData.sort_values(col, ascending=False).head(n)
    fig = px.bar(df, x=col, y='ENTIDAD_FEDERATIVA_RES', text=col, color='ENTIDAD_FEDERATIVA_RES', 
                 orientation='h', color_discrete_sequence=[color]
                 )
    fig.update_layout(title=col, xaxis_title="", yaxis_title="", 
                      yaxis_categoryorder = 'total ascending',
                      showlegend=False)
    fig.show()

In [None]:
plot_topn_states('Confirmed', 10)

### Top ten states with more deceased people

In [None]:
plot_topn_states('Deceased',10)

### Another way to display the confirmed cases along the Mexican territory.

In [None]:
temp = newData.groupby(['ENTIDAD_FEDERATIVA_RES'])['Confirmed'].sum().reset_index()
temp.head()
fig = px.treemap(temp, path=["ENTIDAD_FEDERATIVA_RES"], values="Confirmed", 
                 height=700, title='Number of Confirmed Cases', 
                 color_discrete_sequence = px.colors.qualitative.Vivid)
fig.data[0].textinfo = 'label+text+value'
fig.show()

### Graphical comparison between confirmed and deceased people.

In [None]:
temp = daily[['Date','Confirmed','Dead']]
temp = temp.melt(id_vars='Date', value_vars=['Confirmed', 'Dead'])
temp.head()
fig_c = px.line(temp, x="Date", y="value", color='variable', line_dash='variable', 
                color_discrete_sequence=[rec, dth])
fig_c.update_layout(title='Confirmed vs Deceased cases', 
                  xaxis_title='', yaxis_title='')
fig_c.show()

### Graphic distribution by age and gender

In [None]:
def print_missing_vals():
    print('Total no. of values : ', '{0:,.0f}'.format(data.shape[0]), 
          '\nNo. of missing values :', '{0:,.0f}'.format(data.shape[0]-temp.shape[0]), 
          '\nNo. of available values :', '{0:,.0f}'.format(data.shape[0]-(data.shape[0]-temp.shape[0])))

In [None]:
data['EDAD_CLASS'] = pd.Categorical(data['EDAD_CLASS'],categories=['Baby','Child','Teenage','Young Adult','Adult','Senior Adult'])
data = data.sort_values(by=['EDAD_CLASS'])

fig = make_subplots(
    rows=1, cols=2, column_widths=[0.8, 0.2],
    subplot_titles = ['Gender vs Age', ''],
    specs=[[{"type": "histogram"}, {"type": "pie"}]]
)
temp = data[['EDAD_CLASS', 'SEXO_DESC']].dropna()
print_missing_vals()
gen_grp = temp.groupby('SEXO_DESC').count()
fig.add_trace(go.Histogram(x=temp[temp['SEXO_DESC']=='MUJER']['EDAD_CLASS'], nbinsx=50, name='Female', marker_color='#FF69B4'), 1, 1)
fig.add_trace(go.Histogram(x=temp[temp['SEXO_DESC']=='HOMBRE']['EDAD_CLASS'], nbinsx=50, name='Male', marker_color='#000080'), 1, 1)

fig.add_trace(go.Pie(values=gen_grp.values.reshape(-1).tolist(), labels=['Female', 'Male'], marker_colors = ['#FF69B4', '#000080']),1, 2)

fig.update_layout(showlegend=False)
fig.update_layout(barmode='stack')
fig.data[2].textinfo = 'label+text+value+percent'

fig.show()

In [None]:
os.system('rm -r ../Files/mex_covid_daily.csv')