# Deadly Visualizations!!!

![Image](../images/viz_types_portada.png)

## Setup

First we need to create a basic setup which includes:

- Importing the libraries.

- Reading the dataset file (source [Instituto Nacional de Estadística](https://www.ine.es/ss/Satellite?L=es_ES&c=Page&cid=1259942408928&p=1259942408928&pagename=ProductosYServicios%2FPYSLayout)).

- Create a couple of columns and tables for the analysis.

__NOTE:__ some functions were already created in order to help you go through the challenge. However, feel free to perform any code you might need.

In [1]:
# imports

import sys
import re
sys.path.insert(0, "../modules")

import numpy as np
import pandas as pd

import plotly.express as px
import cufflinks as cf
cf.go_offline()

import module as mod     # functions are include in module.py

In [2]:
# read dataset

deaths = pd.read_csv('../data/7947.csv', sep=';', thousands='.')

deaths.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301158 entries, 0 to 301157
Data columns (total 5 columns):
 #   Column           Non-Null Count   Dtype 
---  ------           --------------   ----- 
 0   Causa de muerte  301158 non-null  object
 1   Sexo             301158 non-null  object
 2   Edad             301158 non-null  object
 3   Periodo          301158 non-null  int64 
 4   Total            301158 non-null  int64 
dtypes: int64(2), object(3)
memory usage: 11.5+ MB


In [3]:
# add some columns...you'll need them later

deaths['cause_code'] = deaths['Causa de muerte'].apply(mod.cause_code)
deaths['cause_group'] = deaths['Causa de muerte'].apply(mod.cause_types)
deaths['cause_name'] = deaths['Causa de muerte'].apply(mod.cause_name)

deaths.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301158 entries, 0 to 301157
Data columns (total 8 columns):
 #   Column           Non-Null Count   Dtype 
---  ------           --------------   ----- 
 0   Causa de muerte  301158 non-null  object
 1   Sexo             301158 non-null  object
 2   Edad             301158 non-null  object
 3   Periodo          301158 non-null  int64 
 4   Total            301158 non-null  int64 
 5   cause_code       301158 non-null  object
 6   cause_group      301158 non-null  object
 7   cause_name       301158 non-null  object
dtypes: int64(2), object(6)
memory usage: 18.4+ MB


In [4]:
# lets check the categorical variables

var_list = ['Sexo', 'Edad', 'Periodo', 'cause_code', 'cause_name', 'cause_group']

categories = mod.cat_var(deaths, var_list)
categories

Unnamed: 0,categorical_variable,number_of_possible_values,values
0,cause_code,117,"[001-102, 001-008, 001, 002, 003, 004, 005, 00..."
1,cause_name,117,"[I-XXII.Todas las causas, I.Enfermedades infec..."
2,Periodo,39,"[2018, 2017, 2016, 2015, 2014, 2013, 2012, 201..."
3,Edad,22,"[Todas las edades, Menos de 1 año, De 1 a 4 añ..."
4,Sexo,3,"[Total, Hombres, Mujeres]"
5,cause_group,2,"[Multiple causes, Single cause]"


In [5]:
# we need also to create a causes table for the analysis

causes_table = deaths[['cause_code', 'cause_name']].drop_duplicates().sort_values(by='cause_code').reset_index(drop=True)

causes_table

Unnamed: 0,cause_code,cause_name
0,001,Enfermedades infecciosas intestinales
1,001-008,I.Enfermedades infecciosas y parasitarias
2,001-102,I-XXII.Todas las causas
3,002,Tuberculosis y sus efectos tardíos
4,003,Enfermedad meningocócica
...,...,...
112,098,Suicidio y lesiones autoinfligidas
113,099,Agresiones (homicidio)
114,100,Eventos de intención no determinada
115,101,Complicaciones de la atención médica y quirúrgica


In [6]:
# And some space for free-style Pandas!!! (e.g.: df['column_name'].unique())
deaths.head()

Unnamed: 0,Causa de muerte,Sexo,Edad,Periodo,Total,cause_code,cause_group,cause_name
0,001-102 I-XXII.Todas las causas,Total,Todas las edades,2018,427721,001-102,Multiple causes,I-XXII.Todas las causas
1,001-102 I-XXII.Todas las causas,Total,Todas las edades,2017,424523,001-102,Multiple causes,I-XXII.Todas las causas
2,001-102 I-XXII.Todas las causas,Total,Todas las edades,2016,410611,001-102,Multiple causes,I-XXII.Todas las causas
3,001-102 I-XXII.Todas las causas,Total,Todas las edades,2015,422568,001-102,Multiple causes,I-XXII.Todas las causas
4,001-102 I-XXII.Todas las causas,Total,Todas las edades,2014,395830,001-102,Multiple causes,I-XXII.Todas las causas


In [7]:
#Years of the dataset
deaths['Periodo'].unique()

array([2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008,
       2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997,
       1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986,
       1985, 1984, 1983, 1982, 1981, 1980], dtype=int64)

In [8]:
#Sexes of the dataset
deaths['Sexo'].unique()

array(['Total', 'Hombres', 'Mujeres'], dtype=object)

In [9]:
#Causes of the dataset
deaths['cause_group'].unique()

array(['Multiple causes', 'Single cause'], dtype=object)

In [10]:
#Edades of the dataset
deaths['Edad'].unique()

array(['Todas las edades', 'Menos de 1 año', 'De 1 a 4 años',
       'De 5 a 9 años', 'De 10 a 14 años  ', 'De 15 a 19 años  ',
       'De 20 a 24 años', 'De 25 a 29 años', 'De 30 a 34 años',
       'De 35 a 39 años', 'De 40 a 44 años', 'De 45 a 49 años',
       'De 50 a 54 años', 'De 55 a 59 años', 'De 60 a 64 años',
       'De 65 a 69 años', 'De 70 a 74 años  ', 'De 75 a 79 años  ',
       'De 80 a 84 años  ', 'De 85 a 89 años  ', 'De 90 a 94 años  ',
       '95 y más años'], dtype=object)

In [11]:
#Death causes of the dataset
deaths['Causa de muerte'].unique()

array(['001-102  I-XXII.Todas las causas',
       '001-008  I.Enfermedades infecciosas y parasitarias',
       '001  Enfermedades infecciosas intestinales',
       '002  Tuberculosis y sus efectos tardíos',
       '003  Enfermedad meningocócica', '004  Septicemia',
       '005  Hepatitis vírica', '006  SIDA',
       '007  VIH+ (portador, evidencias de laboratorio del VIH, ...)',
       '008  Resto de enfermedades infecciosas y parasitarias y sus efectos tardíos',
       '009-041  II.Tumores',
       '009  Tumor maligno del labio, de la cavidad bucal y de la faringe',
       '010  Tumor maligno del esófago',
       '011  Tumor maligno del estómago', '012  Tumor maligno del colon',
       '013  Tumor maligno del recto, de la porción rectosigmoide y del ano',
       '014  Tumor maligno del hígado y vías biliares intrahepáticas',
       '015  Tumor maligno del páncreas',
       '016  Otros tumores malignos digestivos',
       '017  Tumor maligno de la laringe',
       '018  Tumor maligno d

In [12]:
#Codes of the death causes
deaths['cause_code'].unique()

array(['001-102', '001-008', '001', '002', '003', '004', '005', '006',
       '007', '008', '009-041', '009', '010', '011', '012', '013', '014',
       '015', '016', '017', '018', '019', '020', '021', '022', '023',
       '024', '025', '026', '027', '028', '029', '030', '031', '032',
       '033', '034', '035', '036', '037', '038', '039', '040', '041',
       '042-043', '042', '043', '044-045', '044', '045', '046-049', '046',
       '047', '048', '049', '050-052', '050', '051', '052', '053-061',
       '053', '054', '055', '056', '057', '058', '059', '060', '061',
       '062-067', '062', '063', '064', '065', '066', '067', '068-072',
       '068', '069', '070', '071', '072', '073', '074-076', '074', '075',
       '076', '077-080', '077', '078', '079', '080', '081', '082',
       '083-085', '083', '084', '085', '086-089', '086', '087', '088',
       '089', '090-102', '090', '091', '092', '093', '094', '095', '096',
       '097', '098', '099', '100', '101', '102'], dtype=object)

## Lets make some transformations

Eventhough the dataset is pretty clean, the information is completely denormalized as you could see. For that matter a collection of methods (functions) are available in order to generate the tables you might need:

- `row_filter(df, cat_var, cat_values)` => Filter rows by any value or group of values in a categorical variable.

- `nrow_filter(df, cat_var, cat_values)` => The same but backwards. 

- `groupby_sum(df, group_vars, agg_var='Total', sort_var='Total')` => Add deaths by a certain variable.

- `pivot_table(df, col, x_axis, value='Total')`=> Make some pivot tables, you might need them...

__NOTE:__ be aware that the filtering methods can perform a filter at a time. Feel free to perform the filter you need in any way you want or feel confortable with.

In [13]:
#Isolate data of men an women
deaths_sexo = mod.row_filter(deaths, 'Sexo', ['Hombres', 'Mujeres'])
deaths_sexo.head()

Unnamed: 0,Causa de muerte,Sexo,Edad,Periodo,Total,cause_code,cause_group,cause_name
0,001-102 I-XXII.Todas las causas,Hombres,Todas las edades,2018,216442,001-102,Multiple causes,I-XXII.Todas las causas
1,001-102 I-XXII.Todas las causas,Hombres,Todas las edades,2017,214236,001-102,Multiple causes,I-XXII.Todas las causas
2,001-102 I-XXII.Todas las causas,Hombres,Todas las edades,2015,213309,001-102,Multiple causes,I-XXII.Todas las causas
3,001-102 I-XXII.Todas las causas,Mujeres,Todas las edades,2018,211279,001-102,Multiple causes,I-XXII.Todas las causas
4,001-102 I-XXII.Todas las causas,Mujeres,Todas las edades,2017,210287,001-102,Multiple causes,I-XXII.Todas las causas


In [14]:
#mod.nrow_filter(deaths, 'Sexo', ['Hombres', 'Mujeres'])

In [15]:
#Men and women deaths of aids

dataset = mod.nrow_filter(deaths, 'Sexo', ['Total'])
dataset_1 = mod.row_filter(dataset, 'Edad', ['Todas las edades'])
dataset_2 = mod.row_filter(dataset_1, 'cause_code', ['006'])
dataset_3 = dataset_2[['Sexo', 'Edad', 'Periodo', 'Total']]
muertes_sida_HM = dataset_3.sort_values(by=['Periodo'])
muertes_sida_HM.head()

Unnamed: 0,Sexo,Edad,Periodo,Total
69,Mujeres,Todas las edades,1980,0
70,Hombres,Todas las edades,1980,0
77,Mujeres,Todas las edades,1981,0
71,Hombres,Todas las edades,1981,0
75,Mujeres,Todas las edades,1982,0


In [16]:
#Deaths of aids without differentiation
dataset = mod.row_filter(deaths, 'Sexo', ['Total'])
dataset_1 = mod.row_filter(dataset, 'Edad', ['Todas las edades'])
dataset_2 = mod.row_filter(dataset_1, 'cause_code', ['006'])
dataset_3 = dataset_2[['Sexo', 'Edad', 'Periodo', 'Total']]
muertes_sida_Todos = dataset_3.sort_values(by=['Periodo'])
muertes_sida_Todos.head()

Unnamed: 0,Sexo,Edad,Periodo,Total
38,Total,Todas las edades,1980,0
37,Total,Todas las edades,1981,0
36,Total,Todas las edades,1982,0
35,Total,Todas las edades,1983,0
34,Total,Todas las edades,1984,0


In [17]:
#Causes of death isolated by groups
muerte_historica = mod.row_filter(deaths, 'cause_code', 
                                   ['001-008', '009-041', '042-043', '044-045', 
                                    '046-049', '050-052', '053-061', '062-067', '068-072', '073', 
                                    '074-076', '077-080', '081', '082', '083-085', '086-089', '090-102'])
muerte_historica_2 = muerte_historica[['cause_code', 'Total']]
muerte_historica_3 = muerte_historica_2.groupby(['cause_code'], 
                    as_index=False).sum().sort_values(by = 'Total', ascending= True)
muerte_historica_3

Unnamed: 0,cause_code,Total
12,081,3016
9,073,131656
13,082,187080
14,083-085,192240
2,042-043,197136
10,074-076,464248
0,001-008,984972
11,077-080,1339144
4,046-049,1611472
15,086-089,1727316


In [18]:
# Example 2
"""
group = ['cause_code','Periodo']
dataset_1 = mod.groupby_sum(deaths, group)
dataset_1
"""

"\ngroup = ['cause_code','Periodo']\ndataset_1 = mod.groupby_sum(deaths, group)\ndataset_1\n"

In [19]:
# Example 3
"""
dataset = mod.pivot_table(dataset, 'cause_code', 'Periodo')
dataset.head()
"""

"\ndataset = mod.pivot_table(dataset, 'cause_code', 'Periodo')\ndataset.head()\n"

## ...and finally, show me some insights with Plotly!!!

# EVOLUCIÓN DE MUERTES POR SIDA

In [20]:
muertes_sida_Todos.iplot(kind='bar',
                  x='Periodo',
                y='Total',
                  xTitle='Periodo',
                  yTitle='Número de muertes',
                  title='EVOLUCIÓN DE MUERTES POR SIDA EN ESPAÑA')

# EVOLUCIÓN DE MUERTES POR SIDA EN HOMBRES Y MUJERES

In [21]:
px.bar(muertes_sida_HM,  x='Periodo',
                y='Total', 
                 color="Sexo", barmode="group", 
                   title = "EVOLUCIÓN DE MUERTES POR SIDA EN HOMBRES Y MUJERES EN ESPAÑA")

In [22]:
px.bar(muerte_historica_3, x="Total", y="cause_code", orientation='h', 
                   title = "MAYOR CAUSA DE MUERTE HISTÓRICA EN ESPAÑA")

In [37]:
code_meaning = mod.row_filter(deaths, 'cause_code', 
                                   ['001-008', '009-041', '042-043', '044-045', 
                                    '046-049', '050-052', '053-061', '062-067', '068-072', '073', 
                                    '074-076', '077-080', '081', '082', '083-085', '086-089', '090-102'])
code_meaning_2 = code_meaning[['cause_code', 'Causa de muerte']].groupby(['cause_code'], 
                    as_index=False).sum().sort_values(by = 'cause_code', ascending= True)
code_meaning_2

Unnamed: 0,cause_code,Causa de muerte
0,001-008,001-008 I.Enfermedades infecciosas y parasita...
1,009-041,009-041 II.Tumores009-041 II.Tumores009-041 ...
2,042-043,042-043 III.Enfermedades de la sangre y de lo...
3,044-045,"044-045 IV.Enfermedades endocrinas, nutricion..."
4,046-049,046-049 V.Trastornos mentales y del comportam...
5,050-052,050-052 VI-VIII.Enfermedades del sistema nerv...
6,053-061,053-061 IX.Enfermedades del sistema circulator...
7,062-067,062-067 X.Enfermedades del sistema respirator...
8,068-072,068-072 XI.Enfermedades del sistema digestivo...
9,073,073 XII.Enfermedades de la piel y del tejido ...


In [24]:
muertes_sida_Todos.iplot(kind='line',
                    x='Periodo',
                y='Total',
                  xTitle='Periodo',
                  yTitle='Número de muertes',
                  title='MUERTES POR SIDA')

In [25]:
dataset_bar.iplot(kind='bar',
                  x='VARIABLE',
                  xTitle='AXIS TITLE',
                  yTitle='AXIS TITLE',
                  title='VIZ TITLE')

NameError: name 'dataset_bar' is not defined

In [None]:
"""
dataset_4['Total'].iplot(kind='hist',
                     title='MUERTES POR SIDA',
                     yTitle='Total',
                     xTitle='Muertes')
"""

In [None]:
# Cufflinks histogram
"""
dataset_column.iplot(kind='hist',
                     title='VIZ TITLE',
                     yTitle='AXIS TITLE',
                     xTitle='AXIS TITLE')
"""

In [None]:
# Cufflinks bar plot
'''
dataset_bar.iplot(kind='bar',
                  x='VARIABLE',
                  xTitle='AXIS TITLE',
                  yTitle='AXIS TITLE',
                  title='VIZ TITLE')
'''

In [None]:
# Cufflinks line plot
'''
dataset_line.iplot(kind='line',
                   x='VARIABLE',
                   xTitle='AXIS TITLE',
                   yTitle='AXIS TITLE',
                   title='VIZ TITLE')
'''

In [None]:
# Cufflinks scatter plot
'''
dataset_scatter.iplot(x='VARIABLE', 
                      y='VARIABLE', 
                      categories='VARIABLE',
                      xTitle='AXIS TITLE', 
                      yTitle='AXIS TITLE',
                      title='VIZ TITLE')
'''