# Módulo 18: Visualización de Datos para Data Analytics
- Uso de Plotly

## Principios de visualización de datos
- Qué preguntas queremos responder
- Proveer contexto ya que no todas las personas tienen la noción completa del problema
- Definir los mejores gráficos a utilizar
- Definir y utilizar estándares de colores
    - Tener en cuenta tant balances como contrastes
- Balancear el diseño (textura, colores, formas)
- Enfocar en áreas clave
- Alinear ejes, documentar gráficos (poner nombres de ejes, títulos, etc que sea necesario)
- Mantener el gráfico simple
- Habilidar comparaciones
- Anticiparse a la siguiente pregunta del lector

## Principios de diseño
- https://coolinfographics.com/blog/2015/2/3/the-6-principles-of-design.html

## Lineamientos de visualización
https://medium.com/google-design/redefining-data-visualization-at-google-9bdcf2e447c6
- Se honesto
- Lend a helping hand
- Deligh users
- Give clarity of focus
- Embrace scale
- Provide structure

## Lineamientos de visualización de Google
https://m2.material.io/design/communication/data-visualization.html#principles
- Tipos de gráficos, comparación de categorías, de ranking, part to whole (porcentaje de un total), correlación, distribución, flujos, relación, etc.

## Códigos de color
https://en.wikipedia.org/wiki/Color_coding_in_data_visualization 
- Usar color para crear asociaciones: profit, loss, etc. 
- Usar distintas saturaciones para data continua
- Colores contrastantes para comparaciones
- Usar colores para enfatizar información importante
- Usar colores fáciles de distinguir
- Usar máximo 7 colores
- Etc.

## Framework de visualización para storytelling
https://www.sigmacomputing.com/blog/a-7-step-data-visualization-framework-for-stronger-storytelling

### Gráficos básicos de Plotly
- Gráfico de barras
- Gráfico de líneas
- Gráfico de Pay
- Gráfico XY (Scatterplot)

In [1]:
import pandas as pd
import numpy as np
import os

df = pd.read_csv('D:/Documentos/DataAnalysis/EBAC/Python/Modulo16/netflix_titles_2.csv')

In [2]:
df.sample(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,duration_num,duration_unit
2574,s2575,TV Show,The Universe: Ancient Mysteries Solved,,Erik Thompson,,"May 2, 2020",2015,TV-PG,1 Season,"Docuseries, Science & Nature TV",From astronomical events to shapes and pattern...,1.0,season
8708,s8709,Movie,We're No Animals,Alejandro Agresti,"John Cusack, Paul Hipp, Kevin Morris, Alejandr...","United States, Argentina","August 15, 2017",2015,TV-MA,94 min,"Comedies, Dramas, Independent Movies","Unhappy with his commercial film work, a jaded...",94.0,min
1943,s1944,Movie,Jab We Met,Imtiaz Ali,"Shahid Kapoor, Kareena Kapoor, Tarun Arora, Da...",India,"September 28, 2020",2007,TV-14,143 min,"Comedies, International Movies, Music & Musicals",Changing fortunes await a wealthy but dejected...,143.0,min
3010,s3011,TV Show,Ares,,"Jade Olieberg, Tobias Kersloot, Lisa Smit, Fri...",Netherlands,"January 17, 2020",2020,TV-MA,1 Season,"International TV Shows, TV Dramas, TV Horror","Aiming to become part of Amsterdam's elite, an...",1.0,season
8525,s8526,Movie,The Taking of Pelham 123,Tony Scott,"Denzel Washington, John Travolta, Luis Guzmán,...","United States, United Kingdom","September 1, 2019",2009,R,106 min,Action & Adventure,When a group of hijackers takes passengers abo...,106.0,min


In [3]:
import plotly.express as px

In [4]:
df_movies_year = df.groupby('release_year').size().rename('movies').reset_index()
df_movies_year

Unnamed: 0,release_year,movies
0,1925,1
1,1942,2
2,1943,3
3,1944,3
4,1945,4
...,...,...
69,2017,1032
70,2018,1147
71,2019,1030
72,2020,953


In [5]:
# Bar graph
fig = px.bar(df_movies_year, x='release_year', y='movies', title='Series y películas por año')
fig.show()

In [6]:
df_releasetype_year = df.groupby(['release_year', 'type']).size().rename('movies').reset_index()
fig = px.bar(data_frame=df_releasetype_year, x='release_year', y='movies', title='Series y Películas por año', color='type', barmode='group')
fig.show()

In [7]:
# Hacer zoom en la sección del gráfico donde hay mas datos
index_1990 = df_releasetype_year[(df_releasetype_year['release_year'] < 2000)].index
df_releasetype_year.drop(index_1990,inplace=True)

In [8]:
fig = px.bar(data_frame=df_releasetype_year, x='release_year', y='movies', color='type', barmode='group', text='movies')
fig.show()

In [9]:
# Line Graph
fig = px.line(df_movies_year, x='release_year', y='movies', title= "Series y películas por año")
fig.show()

In [10]:
# Gráfico de pay
df_rating = df.groupby('rating').size().rename('movies').reset_index()

df_rating

Unnamed: 0,rating,movies
0,66 min,1
1,74 min,1
2,84 min,1
3,G,41
4,NC-17,3
5,NR,80
6,PG,287
7,PG-13,490
8,R,799
9,TV-14,2160


In [11]:
fig = px.pie(df_rating.sort_values('movies'), names='rating', values='movies')
fig.show()

## Matriz de Correlación y gráficas estadísticas

### Gráficos de matríz de correlación
- Se utilizan heatmaps

In [12]:
df['num_words_title'] = df['title'].str.split().str.len()
df

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,duration_num,duration_unit,num_words_title
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",90.0,min,4
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2.0,season,3
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,1.0,season,1
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",1.0,season,3
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,2.0,season,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s8803,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,"November 20, 2019",2007,R,158 min,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a...",158.0,min,1
8803,s8804,TV Show,Zombie Dumb,,,,"July 1, 2019",2018,TV-Y7,2 Seasons,"Kids' TV, Korean TV Shows, TV Comedies","While living alone in a spooky town, a young g...",2.0,season,2
8804,s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,"November 1, 2019",2009,R,88 min,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...,88.0,min,1
8805,s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero...",88.0,min,1


In [13]:
fig = px.imshow(df.corr(numeric_only=True), text_auto=True)
fig.show()

### Gráficos estadísticos

In [14]:
df_movies = df[df['type'] == 'Movie']

# Histográma
fig = px.histogram(df_movies, x='duration_num', title='Distribución del largo de las películas', nbins=30)
fig.show()

In [15]:
# Distribución acumulada
fig = px.ecdf(df, x='duration_num')
fig.show()

In [16]:
# Box plot
fig = px.box(df_movies, y='num_words_title', title='Box Plot del Largo del Título')
fig.show()

In [17]:
# Violin Plot
fig = px.violin(df, y='num_words_title', x='rating')
fig.show()

In [18]:
# Strip plot
# Distribuciones por otra variable, también se puede hacer con el violin plot y el box plot

fig = px.strip(df_movies, x='duration_num', y='rating')
fig.show()

## Gráficos multipanel y mapas de calor

In [19]:
# Múltiples gráficos
fig = px.strip(df_movies[df_movies['release_year'] > 2015], x='duration_num', y='num_words_title', facet_col='release_year')
fig.show()

In [20]:
# Un gráfico con 4 dimensiones: x, y, color y facets
fig = px.strip(df[df['release_year'] > 2015], x='duration_num', y='num_words_title', color='type', facet_col='release_year')
fig.show()

### Scatterplot

In [21]:
df_releaserating = df.groupby(['release_year', 'rating']).size().rename('movies').reset_index()
fig = px.scatter(df_releaserating, x='release_year', y='movies')
fig.show()

### Heatmaps

In [27]:
fig = px.density_heatmap(df[df['release_year'] > 2000], x='release_year', y='rating', facet_row='type')
fig.show()