# Data Visualization
- Conceptos basicos
- Uso de plotly
- Graficos basicos
- Graficos de Correlacion
- Mapas de calor

# Principios Basicos (Teoria)
- Que pregunta quieres responder con los graficos
- Proveer un contexto
- Tener una Jerarquia de visualizacion
- Codigo de colores
- Enfocarse en areas claves
- Graficos simples hacen mas que graficos complejos
- Habilitar comparaciones para mejor visualizacion

## Lineamientos de visualizacion
- Se honesto
- Echar una mano
- Deleitar a los usuarios
- Dar claridad de enfoque
- Abrazar la escala
- Proporcionar estructura

## Codigo de color
- Usar color para crear asociaciones: profit, loss, medio ambiente, pais, etc.
- Usar distintas saturaciones para data continua
- Colores de contraste para comparaciones
- Colores para enfatizar informacion importante
- Colores que sean faciles de distinguir
- Uso de pocos colores para evitar saturaciones (max 7)
- Accesibilidad
- Recordar que la inclusion de colores acelera y mejora el contenido de cualquier visualizacion

# Graficas Basicas
- Barras
- Lineas
- Torta \ Pastel
- Scatterplot (x , y)

## Importacion de librerias y apertura de csv

In [2]:
import pandas as pd
import numpy as np

In [3]:
df = pd.read_csv("../../../Archivos-Analisis/netflix_titles2.csv")

In [4]:
df.sample(10)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,duration_num,duration_unit
2578,s2579,Movie,Get In,Olivier Abbou,"Adama Niane, Stéphane Caillard, Paul Hamy, Edd...","France, Belgium","May 1, 2020",2019,TV-MA,98 min,"International Movies, Thrillers",When he returns from vacation and finds his ho...,98.0,min
5662,s5663,TV Show,Crazyhead,,"Cara Theobold, Susan Wokoma, Riann Steele, Ari...",United Kingdom,"December 16, 2016",2016,TV-MA,1 Season,"British TV Shows, International TV Shows, TV C...",Bowling alley worker Amy and nonconformist Raq...,1.0,season
6094,s6095,TV Show,Africa,,David Attenborough,United Kingdom,"April 28, 2016",2013,TV-PG,1 Season,"British TV Shows, Docuseries, International TV...",This five-part nature series chronicles fascin...,1.0,season
2796,s2797,Movie,Ultras,Francesco Lettieri,"Aniello Arena, Antonia Truppo, Ciro Nacca, Sim...",Italy,"March 20, 2020",2020,TV-MA,109 min,"Dramas, International Movies, Sports Movies",An aging soccer fanatic faces down the reality...,109.0,min
1415,s1416,Movie,An Imperfect Murder,James Toback,"Sienna Miller, Alec Baldwin, Charles Grodin, C...",United States,"January 13, 2021",2017,R,71 min,"Dramas, Thrillers",Haunted by a nightmare involving her abusive e...,71.0,min
84,s85,Movie,Omo Ghetto: the Saga,"JJC Skillz, Funke Akindele","Funke Akindele, Ayo Makun, Chioma Chukwuka Akp...",Nigeria,"September 10, 2021",2020,TV-MA,147 min,"Action & Adventure, Comedies, Dramas",Twins are reunited as a good-hearted female ga...,147.0,min
5764,s5765,Movie,Bombshell,Riccardo Pilizzeri,"Ande Cunningham, Mark Mitchinson, Mia Pistoriu...",New Zealand,"October 1, 2016",2016,TV-MA,86 min,Dramas,"In 1985, when a vessel protesting nuclear test...",86.0,min
1257,s1258,TV Show,TIGER & BUNNY,,"Hiroaki Hirata, Masakazu Morita, Koji Yusa, Mi...",Japan,"February 28, 2021",2011,TV-14,1 Season,"Anime Series, International TV Shows",In an alternate New York City protected by a b...,1.0,season
8031,s8032,TV Show,Skin Wars: Fresh Paint,,RuPaul,United States,"December 15, 2018",2016,TV-14,1 Season,Reality TV,"In each episode of this ""Skin Wars"" spinoff ho...",1.0,season
143,s144,Movie,Green Lantern,Martin Campbell,"Ryan Reynolds, Blake Lively, Peter Sarsgaard, ...",United States,"September 1, 2021",2011,PG-13,114 min,"Action & Adventure, Sci-Fi & Fantasy",Test pilot Hal Jordan harnesses glowing new po...,114.0,min


## Importacion de la libreria Plotly

In [5]:
import plotly.express as px

In [6]:
df_movies_year = df.groupby('release_year').size().rename('movies').reset_index()
df_movies_year

Unnamed: 0,release_year,movies
0,1925,1
1,1942,2
2,1943,3
3,1944,3
4,1945,4
...,...,...
69,2017,1032
70,2018,1147
71,2019,1030
72,2020,953


## Grafico de barras

In [7]:
# Grafico de peliculas y series por año
fig = px.bar(df_movies_year, x='release_year', y='movies', title='Series y Peliculas por año')
fig.show()

In [8]:
# Grafico de peliculas y series por año - comparacion
# Primero generamos un DF basado en el año (release_year) y el tipo (type)
df_release_type_year = df.groupby(['release_year', 'type']).size().rename('movies').reset_index()
df_release_type_year

Unnamed: 0,release_year,type,movies
0,1925,TV Show,1
1,1942,Movie,2
2,1943,Movie,3
3,1944,Movie,3
4,1945,Movie,3
...,...,...,...
114,2019,TV Show,397
115,2020,Movie,517
116,2020,TV Show,436
117,2021,Movie,277


In [9]:
# Graficamos
fig = px.bar(df_release_type_year, x='release_year', y ='movies', title='Series y Peliculas por año', color='type')
fig.show()

In [10]:
# Forma para que no se sobrepongan las barras
fig = px.bar(df_release_type_year, x='release_year', y ='movies', title='Series y Peliculas por año', color='type', barmode='group')
fig.show()


In [11]:
# Hagamos un zoom desde 2000 hasta la fecha
index_1990 = df_release_type_year[(df_release_type_year['release_year'] < 2000)].index
df_release_type_year.drop(index_1990, inplace=True)

In [12]:
df_release_type_year.sample(3)

Unnamed: 0,release_year,type,movies
80,2002,TV Show,7
78,2001,TV Show,5
89,2007,Movie,74


In [13]:
fig = px.bar(df_release_type_year, x='release_year', y ='movies', title='Series y Peliculas por año (2000 - 2021)', color='type', barmode='group', text='movies')
fig.show()

## Grafico de lineas

In [14]:
fig = px.line(df_movies_year, x='release_year', y='movies', title='Series y Peliculas por año')
fig.show()

In [15]:
fig = px.line(df_release_type_year, x='release_year', y='movies', title='Series y Peliculas por año (2000 - 2021)', color='type')
fig.show()

In [16]:
# Grafico de peliculas y series por año basado en rating
df_release_rating = df.groupby(['release_year', 'rating']).size().rename('movies').reset_index()
df_release_rating

Unnamed: 0,release_year,rating,movies
0,1925,TV-14,1
1,1942,TV-14,2
2,1943,TV-PG,3
3,1944,TV-14,2
4,1944,TV-PG,1
...,...,...,...
435,2021,TV-G,21
436,2021,TV-MA,270
437,2021,TV-PG,45
438,2021,TV-Y,26


In [17]:
fig = px.line(df_release_rating, x='release_year', y='movies', color='rating', title='Serires y Pelisculas por año y por rating')
fig.show()

In [18]:
index_1990 = df_release_rating[(df_release_rating['release_year'] < 2000)].index
df_release_rating.drop(index_1990, inplace=True)

In [19]:
# Limpieza de rating
ix66 = df_release_rating[df_release_rating['rating'] == '66 min'].index
ix74 = df_release_rating[df_release_rating['rating'] == '84 min'].index
ix84 = df_release_rating[df_release_rating['rating'] == '74 min'].index

In [None]:
# Eliminamos los datos
df_release_rating.drop(ix66, inplace=True)
df_release_rating.drop(ix74, inplace=True)
df_release_rating.drop(ix84, inplace=True)

In [23]:
df_release_rating.head(6)

Unnamed: 0,release_year,rating,movies
217,2000,G,2
218,2000,PG,4
219,2000,PG-13,10
220,2000,R,8
221,2000,TV-14,5
222,2000,TV-MA,1


In [24]:
df_release_rating.groupby('rating').count()

Unnamed: 0_level_0,release_year,movies
rating,Unnamed: 1_level_1,Unnamed: 2_level_1
G,13,13
NC-17,3,3
NR,12,12
PG,22,22
PG-13,22,22
R,22,22
TV-14,22,22
TV-G,15,15
TV-MA,22,22
TV-PG,22,22


In [27]:
fig = px.bar(df_release_rating, x='release_year', y='movies', color='rating', title='Series y Peliculas por año y por Rating')
fig.show()

Retomando un poco lo teorio, la pregunta que se esta buscando responder con estos graficos o con este grafico es, ¿Cuantas series y peliculas tengo por rating por año?

## Grafico de Pie
- Que tipo de pelicula
- Que tipo de pelicula (por rating) se incluyeron, en total

In [28]:
# Limpieza de ratings
df.at[5541, 'duration_unit'] = 'min'
df.at[5794, 'duration_unit'] = 'min'
df.at[5813, 'duration_unit'] = 'min'

df.at[5541, 'duration_num'] = 74
df.at[5794, 'duration_num'] = 84
df.at[5813, 'duration_num'] = 66

df.at[5541, 'duration'] = '74 min'
df.at[5794, 'duration'] = '84 min'
df.at[5813, 'duration'] = '66 min'

# Se asume que el rating era 'G'
df.at[5541, 'rating'] = 'G'
df.at[5794, 'rating'] = 'G'
df.at[5813, 'rating'] = 'G'


In [31]:
# Numero de peliculas clasificadas por rating
df_release_rating_total = df.groupby(['rating']).size().rename('movies').reset_index()
df_release_rating_total

Unnamed: 0,rating,movies
0,G,44
1,NC-17,3
2,NR,80
3,PG,287
4,PG-13,490
5,R,799
6,TV-14,2160
7,TV-G,220
8,TV-MA,3207
9,TV-PG,863


In [33]:
fig = px.pie(df_release_rating_total, names='rating', values='movies', title='Series y Peliculas por rating')
fig.show()