# Deadly Visualizations!!!

![Image](../images/viz_types_portada.png)

## Setup

First we need to create a basic setup which includes:

- Importing the libraries.

- Reading the dataset file (source [Instituto Nacional de Estadística](https://www.ine.es/ss/Satellite?L=es_ES&c=Page&cid=1259942408928&p=1259942408928&pagename=ProductosYServicios%2FPYSLayout)).

- Create a couple of columns and tables for the analysis.

__NOTE:__ some functions were already created in order to help you go through the challenge. However, feel free to perform any code you might need.

In [62]:
# imports

import sys
import re
sys.path.insert(0, "../modules")

import numpy as np
import pandas as pd

import plotly.express as px
import cufflinks as cf
cf.go_offline()

import module as mod     # functions are include in module.py

In [64]:
# read dataset

deaths = pd.read_csv('../data/7947.csv', sep=';', thousands='.')

deaths.info()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301158 entries, 0 to 301157
Data columns (total 5 columns):
 #   Column           Non-Null Count   Dtype 
---  ------           --------------   ----- 
 0   Causa de muerte  301158 non-null  object
 1   Sexo             301158 non-null  object
 2   Edad             301158 non-null  object
 3   Periodo          301158 non-null  int64 
 4   Total            301158 non-null  int64 
dtypes: int64(2), object(3)
memory usage: 11.5+ MB


In [65]:
# add some columns...you'll need them later

deaths['cause_code'] = deaths['Causa de muerte'].apply(mod.cause_code)
deaths['cause_group'] = deaths['Causa de muerte'].apply(mod.cause_types)
deaths['cause_name'] = deaths['Causa de muerte'].apply(mod.cause_name)

deaths.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301158 entries, 0 to 301157
Data columns (total 8 columns):
 #   Column           Non-Null Count   Dtype 
---  ------           --------------   ----- 
 0   Causa de muerte  301158 non-null  object
 1   Sexo             301158 non-null  object
 2   Edad             301158 non-null  object
 3   Periodo          301158 non-null  int64 
 4   Total            301158 non-null  int64 
 5   cause_code       301158 non-null  object
 6   cause_group      301158 non-null  object
 7   cause_name       301158 non-null  object
dtypes: int64(2), object(6)
memory usage: 18.4+ MB


In [48]:
# lets check the categorical variables

var_list = ['Sexo', 'Edad', 'Periodo', 'cause_code', 'cause_name', 'cause_group']

categories = mod.cat_var(deaths, var_list)
categories

Unnamed: 0,categorical_variable,number_of_possible_values,values
0,cause_code,117,"[001-102, 001-008, 001, 002, 003, 004, 005, 00..."
1,cause_name,117,"[I-XXII.Todas las causas, I.Enfermedades infec..."
2,Periodo,39,"[2018, 2017, 2016, 2015, 2014, 2013, 2012, 201..."
3,Edad,22,"[Todas las edades, Menos de 1 año, De 1 a 4 añ..."
4,Sexo,3,"[Total, Hombres, Mujeres]"
5,cause_group,2,"[Multiple causes, Single cause]"


In [66]:
# we need also to create a causes table for the analysis

causes_table = deaths[['cause_code', 'cause_name']].drop_duplicates().sort_values(by='cause_code').reset_index(drop=True)

causes_table


Unnamed: 0,cause_code,cause_name
0,001,Enfermedades infecciosas intestinales
1,001-008,I.Enfermedades infecciosas y parasitarias
2,001-102,I-XXII.Todas las causas
3,002,Tuberculosis y sus efectos tardíos
4,003,Enfermedad meningocócica
...,...,...
112,098,Suicidio y lesiones autoinfligidas
113,099,Agresiones (homicidio)
114,100,Eventos de intención no determinada
115,101,Complicaciones de la atención médica y quirúrgica


In [51]:
# And some space for free-style Pandas!!! (e.g.: df['column_name'].unique())

# Display unique values in a specific column
unique_values_sex = deaths['Sexo'].unique()
print("Unique values in 'Sexo':", unique_values_sex)

unique_values_age = deaths['Edad'].unique()
print("\nUnique values in 'Edad':", unique_values_age)

unique_values_period = deaths['Periodo'].unique()
print("\nUnique values in 'Periodo':", unique_values_period)

# You can continue this pattern for other columns as needed.


Unique values in 'Sexo': ['Total' 'Hombres' 'Mujeres']

Unique values in 'Edad': ['Todas las edades' 'Menos de 1 año' 'De 1 a 4 años' 'De 5 a 9 años'
 'De 10 a 14 años  ' 'De 15 a 19 años  ' 'De 20 a 24 años'
 'De 25 a 29 años' 'De 30 a 34 años' 'De 35 a 39 años' 'De 40 a 44 años'
 'De 45 a 49 años' 'De 50 a 54 años' 'De 55 a 59 años' 'De 60 a 64 años'
 'De 65 a 69 años' 'De 70 a 74 años  ' 'De 75 a 79 años  '
 'De 80 a 84 años  ' 'De 85 a 89 años  ' 'De 90 a 94 años  '
 '95 y más años']

Unique values in 'Periodo': [2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005
 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991
 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1980]


## Lets make some transformations

Eventhough the dataset is pretty clean, the information is completely denormalized as you could see. For that matter a collection of methods (functions) are available in order to generate the tables you might need:

- `row_filter(df, cat_var, cat_values)` => Filter rows by any value or group of values in a categorical variable.

- `nrow_filter(df, cat_var, cat_values)` => The same but backwards. 

- `groupby_sum(df, group_vars, agg_var='Total', sort_var='Total')` => Add deaths by a certain variable.

- `pivot_table(df, col, x_axis, value='Total')`=> Make some pivot tables, you might need them...

__NOTE:__ be aware that the filtering methods can perform a filter at a time. Feel free to perform the filter you need in any way you want or feel confortable with.

In [68]:
# Example 1
'''
dataset = mod.row_filter(deaths, 'Sexo', ['Total'])
dataset = mod.row_filter(dataset, 'Edad', ['Todas las edades'])
dataset.head()
'''

grouped_data = mod.groupby_sum(deaths, group_vars=['cause_name'])
grouped_data.head()


Unnamed: 0,cause_name,Total
0,I-XXII.Todas las causas,55654432
1,IX.Enfermedades del sistema circulatorio,19605948
2,II.Tumores,14237728
3,X.Enfermedades del sistema respiratorio,5943228
4,Enfermedades cerebrovasculares,5871364


In [69]:
# Example 2

group = ['cause_code','Periodo']
dataset = mod.groupby_sum(deaths, group)
dataset.head()




Unnamed: 0,cause_code,Periodo,Total
0,001-102,2018,1710884
1,001-102,2017,1698092
2,001-102,2015,1690272
3,001-102,2016,1642444
4,001-102,2012,1611800


In [70]:
# Example 3

dataset = mod.pivot_table(dataset, 'cause_code', 'Periodo')
dataset.head()


cause_code,Periodo,001,001-008,001-102,002,003,004,005,006,007,...,093,094,095,096,097,098,099,100,101,102
0,1980,1620,15768,1157376,5904,2008,3448,436,0,0,...,4956,1432,184,692,16748,6608,1496,28,968,96
1,1981,1404,15124,1173544,6332,1656,3344,348,0,0,...,4700,1200,156,1396,17472,6872,1284,336,908,208
2,1982,1308,13488,1146620,5352,1240,3104,316,0,0,...,4864,956,200,1000,18616,7404,1228,440,1132,52
3,1983,1212,13100,1210276,5152,1072,3152,336,0,0,...,4788,1464,148,884,18392,8724,1560,1276,1500,56
4,1984,1228,12928,1197636,4564,964,3704,424,0,0,...,4716,1244,164,1020,14696,9972,1812,1144,1636,76


## ...and finally, show me some insights with Plotly!!!

In [80]:
# Cufflinks histogram
'''
dataset_column.iplot(kind='hist',
                     title='VIZ TITLE',
                     yTitle='AXIS TITLE',
                     xTitle='AXIS TITLE')
'''
# Assuming 'Periodo' is the column you want to visualize
column_to_visualize = 'Periodo'

# Check if the column exists in the dataset
if column_to_visualize in dataset.columns:
    # Create a histogram using Cufflinks and Plotly
    dataset[column_to_visualize].iplot(
        kind='hist',
        title=f'Histogram of {column_to_visualize}',
        yTitle='Frequency',
        xTitle=column_to_visualize
    )
else:
    print(f"The column '{column_to_visualize}' does not exist in the dataset.")

In [85]:
# Cufflinks bar plot
'''
dataset_bar.iplot(kind='bar',
                  x='VARIABLE',
                  xTitle='AXIS TITLE',
                  yTitle='AXIS TITLE',
                  title='VIZ TITLE')
'''

# Assuming 'Periodo' is the column you want to visualize in the bar plot
column_to_visualize_bar = 'Periodo'

# Check if the column exists in the dataset
if column_to_visualize_bar in dataset.columns:
    # Create a bar plot using Cufflinks and Plotly
    dataset.iplot(
        kind='bar',
        x=column_to_visualize_bar,
        xTitle='Axis Title',
        yTitle='Axis Title',
        title=f'Bar Plot of {column_to_visualize_bar}'
    )
else:
    print(f"The column '{column_to_visualize_bar}' does not exist in the dataset.")


In [86]:
# Cufflinks line plot
'''
dataset_line.iplot(kind='line',
                   x='VARIABLE',
                   xTitle='AXIS TITLE',
                   yTitle='AXIS TITLE',
                   title='VIZ TITLE')
'''

# Assuming 'Periodo' is the column you want to visualize in the line plot
column_to_visualize_line = 'Periodo'

# Check if the column exists in the dataset
if column_to_visualize_line in dataset.columns:
    # Create a line plot using Cufflinks and Plotly
    dataset.iplot(
        kind='line',
        x=column_to_visualize_line,
        xTitle='Axis Title',
        yTitle='Axis Title',
        title=f'Line Plot of {column_to_visualize_line}'
    )
else:
    print(f"The column '{column_to_visualize_line}' does not exist in the dataset.")


In [90]:
# Cufflinks scatter plot
'''
dataset_scatter.iplot(x='VARIABLE', 
                      y='VARIABLE', 
                      categories='VARIABLE',
                      xTitle='AXIS TITLE', 
                      yTitle='AXIS TITLE',
                      title='VIZ TITLE')
'''
# Assuming 'Periodo' is the column you want to visualize on the x-axis
# and '001' is the numeric column you want to visualize on the y-axis
x_column_scatter = 'Periodo'
y_column_scatter = '001'

# Check if the columns exist in the dataset
if x_column_scatter in dataset.columns and y_column_scatter in dataset.columns:
    # Create a scatter plot using Cufflinks and Plotly
    dataset.iplot(
        kind='scatter',
        x=x_column_scatter,
        y=y_column_scatter,
        mode='markers',
        xTitle='X-Axis Title',
        yTitle='Y-Axis Title',
        title=f'Scatter Plot of {y_column_scatter} over {x_column_scatter}'
    )