`ANÁLISIS EXPLORATORIO DE DATOS (EDA) Y CARGA DE TRANSFORMACIÓN DE EXTRACCIÓN (ETL)`
____________________________________________________________________________________

`Este archivo de Jupyter Notebooks muestra todos los pasos y procesos utilizados para la realización de la etapa EDA y ETL, incluyendo las funciones que van a ser consumidas por API.`
__________________________________________________________________________________________________________________

Importar bibliotecas necesarias

In [402]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import math
from sklearn.metrics import classification_report


Creaciones de DataFrame basadas en conjuntos de datos proporcionados en archivos en formatos de texto separados por coma.

In [403]:
amazon_titles = pd.read_csv(r"C:/Users/rocio/OneDrive/Escritorio/FastAPI/Streaming_Platforms/Datasets/amazon_prime_titles.csv")
disney_titles = pd.read_csv(r"C:/Users/rocio/OneDrive/Escritorio/FastAPI/Streaming_Platforms/Datasets/disney_plus_titles.csv")
hulu_titles = pd.read_csv(r'C:/Users/rocio/OneDrive/Escritorio/FastAPI/Streaming_Platforms/Datasets/hulu_titles.csv')
netflix_titles= pd.read_csv(r"C:/Users/rocio/OneDrive/Escritorio/FastAPI/Streaming_Platforms/Datasets/netflix_titles.csv")

`Análisis exploratorio de datos`

In [404]:
# AMAZON
amazon_titles.head(4)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,The Grand Seduction,Don McKellar,"Brendan Gleeson, Taylor Kitsch, Gordon Pinsent",Canada,"March 30, 2021",2014,,113 min,"Comedy, Drama",A small fishing village must procure a local d...
1,s2,Movie,Take Care Good Night,Girish Joshi,"Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar",India,"March 30, 2021",2018,13+,110 min,"Drama, International",A Metro Family decides to fight a Cyber Crimin...
2,s3,Movie,Secrets of Deception,Josh Webber,"Tom Sizemore, Lorenzo Lamas, Robert LaSardo, R...",United States,"March 30, 2021",2017,,74 min,"Action, Drama, Suspense",After a man discovers his wife is cheating on ...
3,s4,Movie,Pink: Staying True,Sonia Anderson,"Interviews with: Pink, Adele, Beyoncé, Britney...",United States,"March 30, 2021",2014,,69 min,Documentary,"Pink breaks the mold once again, bringing her ..."


In [405]:
#info
amazon_titles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9668 entries, 0 to 9667
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       9668 non-null   object
 1   type          9668 non-null   object
 2   title         9668 non-null   object
 3   director      7586 non-null   object
 4   cast          8435 non-null   object
 5   country       672 non-null    object
 6   date_added    155 non-null    object
 7   release_year  9668 non-null   int64 
 8   rating        9331 non-null   object
 9   duration      9668 non-null   object
 10  listed_in     9668 non-null   object
 11  description   9668 non-null   object
dtypes: int64(1), object(11)
memory usage: 906.5+ KB


In [406]:
# Comprobar si el archivo tiene valores nulos
amazon_titles.isnull().sum()

show_id            0
type               0
title              0
director        2082
cast            1233
country         8996
date_added      9513
release_year       0
rating           337
duration           0
listed_in          0
description        0
dtype: int64

In [407]:
#DISNEY
disney_titles.head(4)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Duck the Halls: A Mickey Mouse Christmas Special,"Alonso Ramirez Ramos, Dave Wasson","Chris Diamantopoulos, Tony Anselmo, Tress MacN...",,"November 26, 2021",2016,TV-G,23 min,"Animation, Family",Join Mickey and the gang as they duck the halls!
1,s2,Movie,Ernest Saves Christmas,John Cherry,"Jim Varney, Noelle Parker, Douglas Seale",,"November 26, 2021",1988,PG,91 min,Comedy,Santa Claus passes his magic bag to a new St. ...
2,s3,Movie,Ice Age: A Mammoth Christmas,Karen Disher,"Raymond Albert Romano, John Leguizamo, Denis L...",United States,"November 26, 2021",2011,TV-G,23 min,"Animation, Comedy, Family",Sid the Sloth is on Santa's naughty list.
3,s4,Movie,The Queen Family Singalong,Hamish Hamilton,"Darren Criss, Adam Lambert, Derek Hough, Alexa...",,"November 26, 2021",2021,TV-PG,41 min,Musical,"This is real life, not just fantasy!"


In [408]:
#Info
disney_titles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1450 entries, 0 to 1449
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       1450 non-null   object
 1   type          1450 non-null   object
 2   title         1450 non-null   object
 3   director      977 non-null    object
 4   cast          1260 non-null   object
 5   country       1231 non-null   object
 6   date_added    1447 non-null   object
 7   release_year  1450 non-null   int64 
 8   rating        1447 non-null   object
 9   duration      1450 non-null   object
 10  listed_in     1450 non-null   object
 11  description   1450 non-null   object
dtypes: int64(1), object(11)
memory usage: 136.1+ KB


In [409]:
# Se comprueba si el archivo tiene valores nulos
disney_titles.isnull().sum()

show_id           0
type              0
title             0
director        473
cast            190
country         219
date_added        3
release_year      0
rating            3
duration          0
listed_in         0
description       0
dtype: int64

In [410]:
#HULU
hulu_titles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3073 entries, 0 to 3072
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   show_id       3073 non-null   object 
 1   type          3073 non-null   object 
 2   title         3073 non-null   object 
 3   director      3 non-null      object 
 4   cast          0 non-null      float64
 5   country       1620 non-null   object 
 6   date_added    3045 non-null   object 
 7   release_year  3073 non-null   int64  
 8   rating        2553 non-null   object 
 9   duration      2594 non-null   object 
 10  listed_in     3073 non-null   object 
 11  description   3069 non-null   object 
dtypes: float64(1), int64(1), object(10)
memory usage: 288.2+ KB


In [411]:
#Info
hulu_titles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3073 entries, 0 to 3072
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   show_id       3073 non-null   object 
 1   type          3073 non-null   object 
 2   title         3073 non-null   object 
 3   director      3 non-null      object 
 4   cast          0 non-null      float64
 5   country       1620 non-null   object 
 6   date_added    3045 non-null   object 
 7   release_year  3073 non-null   int64  
 8   rating        2553 non-null   object 
 9   duration      2594 non-null   object 
 10  listed_in     3073 non-null   object 
 11  description   3069 non-null   object 
dtypes: float64(1), int64(1), object(10)
memory usage: 288.2+ KB


In [412]:
# Se comprueba si el archivo tiene valores nulos
hulu_titles.isnull().sum()

show_id            0
type               0
title              0
director        3070
cast            3073
country         1453
date_added        28
release_year       0
rating           520
duration         479
listed_in          0
description        4
dtype: int64

In [413]:
#NETLIX
netflix_titles.head(4)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."


In [414]:
#Info
netflix_titles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB


In [415]:
# Se comprueba si el archivo tiene valores nulos
netflix_titles.isnull().sum()

show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
dtype: int64

<!-- En conclusión, hay valores nulos considerables que se pueden corregir, pero en general se puede decir que hay buenas condiciones para continuar con el proceso de ETL. -->

`Extraer, Transformar y Cargar`

In [416]:
#Agrego un campo letra_plataforma con la inicial de cada plataforma
amazon_titles = amazon_titles.assign(letra_plataforma = pd.Series(['a'] * len(amazon_titles)))

In [417]:
hulu_titles = hulu_titles.assign(letra_plataforma=pd.Series(['h'] * len(hulu_titles)))


In [418]:
netflix_titles = netflix_titles.assign(letra_plataforma=pd.Series(['n'] * len(netflix_titles)))


In [419]:
disney_titles = disney_titles.assign(letra_plataforma=pd.Series(['d'] * len(disney_titles)))

In [420]:
disney_titles.head(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,letra_plataforma
0,s1,Movie,Duck the Halls: A Mickey Mouse Christmas Special,"Alonso Ramirez Ramos, Dave Wasson","Chris Diamantopoulos, Tony Anselmo, Tress MacN...",,"November 26, 2021",2016,TV-G,23 min,"Animation, Family",Join Mickey and the gang as they duck the halls!,d
1,s2,Movie,Ernest Saves Christmas,John Cherry,"Jim Varney, Noelle Parker, Douglas Seale",,"November 26, 2021",1988,PG,91 min,Comedy,Santa Claus passes his magic bag to a new St. ...,d
2,s3,Movie,Ice Age: A Mammoth Christmas,Karen Disher,"Raymond Albert Romano, John Leguizamo, Denis L...",United States,"November 26, 2021",2011,TV-G,23 min,"Animation, Comedy, Family",Sid the Sloth is on Santa's naughty list.,d
3,s4,Movie,The Queen Family Singalong,Hamish Hamilton,"Darren Criss, Adam Lambert, Derek Hough, Alexa...",,"November 26, 2021",2021,TV-PG,41 min,Musical,"This is real life, not just fantasy!",d
4,s5,TV Show,The Beatles: Get Back,,"John Lennon, Paul McCartney, George Harrison, ...",,"November 25, 2021",2021,,1 Season,"Docuseries, Historical, Music",A three-part documentary from Peter Jackson ca...,d


In [421]:
hulu_titles.head(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,letra_plataforma
0,s1,Movie,Ricky Velez: Here's Everything,,,,"October 24, 2021",2021,TV-MA,,"Comedy, Stand Up",​Comedian Ricky Velez bares it all with his ho...,h
1,s2,Movie,Silent Night,,,,"October 23, 2021",2020,,94 min,"Crime, Drama, Thriller","Mark, a low end South London hitman recently r...",h
2,s3,Movie,The Marksman,,,,"October 23, 2021",2021,PG-13,108 min,"Action, Thriller",A hardened Arizona rancher tries to protect an...,h
3,s4,Movie,Gaia,,,,"October 22, 2021",2021,R,97 min,Horror,A forest ranger and two survivalists with a cu...,h
4,s5,Movie,Settlers,,,,"October 22, 2021",2021,,104 min,"Science Fiction, Thriller",Mankind's earliest settlers on the Martian fro...,h


In [422]:
amazon_titles.head(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,letra_plataforma
0,s1,Movie,The Grand Seduction,Don McKellar,"Brendan Gleeson, Taylor Kitsch, Gordon Pinsent",Canada,"March 30, 2021",2014,,113 min,"Comedy, Drama",A small fishing village must procure a local d...,a
1,s2,Movie,Take Care Good Night,Girish Joshi,"Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar",India,"March 30, 2021",2018,13+,110 min,"Drama, International",A Metro Family decides to fight a Cyber Crimin...,a
2,s3,Movie,Secrets of Deception,Josh Webber,"Tom Sizemore, Lorenzo Lamas, Robert LaSardo, R...",United States,"March 30, 2021",2017,,74 min,"Action, Drama, Suspense",After a man discovers his wife is cheating on ...,a
3,s4,Movie,Pink: Staying True,Sonia Anderson,"Interviews with: Pink, Adele, Beyoncé, Britney...",United States,"March 30, 2021",2014,,69 min,Documentary,"Pink breaks the mold once again, bringing her ...",a
4,s5,Movie,Monster Maker,Giles Foster,"Harry Dean Stanton, Kieran O'Brien, George Cos...",United Kingdom,"March 30, 2021",1989,,45 min,"Drama, Fantasy",Teenage Matt Banting wants to work with a famo...,a


In [423]:
netflix_titles.head(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,letra_plataforma
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",n
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",n
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,n
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",n
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,n


`Concateno los 4 datasets de las plataformas`

In [424]:

plataformas = pd.concat([disney_titles, netflix_titles, hulu_titles, amazon_titles], ignore_index=True)

In [425]:
plataformas.shape

(22998, 13)

In [426]:
#Importo dataset de Raiting
df1 = pd.read_csv(r'C:/Users/rocio/OneDrive/Escritorio/FastAPI/Streaming_Platforms/Ratings/1.csv')
df2 = pd.read_csv(r'C:/Users/rocio/OneDrive/Escritorio/FastAPI/Streaming_Platforms/Ratings/2.csv')
df3 = pd.read_csv(r'C:/Users/rocio/OneDrive/Escritorio/FastAPI/Streaming_Platforms/Ratings/3.csv')
df4 = pd.read_csv(r'C:/Users/rocio/OneDrive/Escritorio/FastAPI/Streaming_Platforms/Ratings/4.csv')
df5 = pd.read_csv(r'C:/Users/rocio/OneDrive/Escritorio/FastAPI/Streaming_Platforms/Ratings/5.csv')
df6 = pd.read_csv(r'C:/Users/rocio/OneDrive/Escritorio/FastAPI/Streaming_Platforms/Ratings/6.csv')
df7 = pd.read_csv(r'C:/Users/rocio/OneDrive/Escritorio/FastAPI/Streaming_Platforms/Ratings/7.csv')
df8 = pd.read_csv(r'C:/Users/rocio/OneDrive/Escritorio/FastAPI/Streaming_Platforms/Ratings/8.csv')

In [427]:
# Inspecciono
list(df1.columns) #hacer x8

['userId', 'rating', 'timestamp', 'movieId']

`Concateno los 8 datasets de ratings`

In [428]:
# Concateno los 8 dataset
ratings = pd.concat([df1, df2, df3, df4, df5, df6, df7, df8], ignore_index=True)

In [429]:
# Inspecciono tamaño
ratings.shape

(11024289, 4)

`Transformaciones 2` 
`Se empieza a trabajar con el nuevo DF`

In [430]:
#1) Generar campo Id con letra del nombre de la plataforma + show_id
plataformas['id'] = plataformas['letra_plataforma'] + plataformas['show_id']

In [431]:
# Corroboro
plataformas.head(20)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,letra_plataforma,id
0,s1,Movie,Duck the Halls: A Mickey Mouse Christmas Special,"Alonso Ramirez Ramos, Dave Wasson","Chris Diamantopoulos, Tony Anselmo, Tress MacN...",,"November 26, 2021",2016,TV-G,23 min,"Animation, Family",Join Mickey and the gang as they duck the halls!,d,ds1
1,s2,Movie,Ernest Saves Christmas,John Cherry,"Jim Varney, Noelle Parker, Douglas Seale",,"November 26, 2021",1988,PG,91 min,Comedy,Santa Claus passes his magic bag to a new St. ...,d,ds2
2,s3,Movie,Ice Age: A Mammoth Christmas,Karen Disher,"Raymond Albert Romano, John Leguizamo, Denis L...",United States,"November 26, 2021",2011,TV-G,23 min,"Animation, Comedy, Family",Sid the Sloth is on Santa's naughty list.,d,ds3
3,s4,Movie,The Queen Family Singalong,Hamish Hamilton,"Darren Criss, Adam Lambert, Derek Hough, Alexa...",,"November 26, 2021",2021,TV-PG,41 min,Musical,"This is real life, not just fantasy!",d,ds4
4,s5,TV Show,The Beatles: Get Back,,"John Lennon, Paul McCartney, George Harrison, ...",,"November 25, 2021",2021,,1 Season,"Docuseries, Historical, Music",A three-part documentary from Peter Jackson ca...,d,ds5
5,s6,Movie,Becoming Cousteau,Liz Garbus,"Jacques Yves Cousteau, Vincent Cassel",United States,"November 24, 2021",2021,PG-13,94 min,"Biographical, Documentary",An inside look at the legendary life of advent...,d,ds6
6,s7,TV Show,Hawkeye,,"Jeremy Renner, Hailee Steinfeld, Vera Farmiga,...",,"November 24, 2021",2021,TV-14,1 Season,"Action-Adventure, Superhero",Clint Barton/Hawkeye must team up with skilled...,d,ds7
7,s8,TV Show,Port Protection Alaska,,"Gary Muehlberger, Mary Miller, Curly Leach, Sa...",United States,"November 24, 2021",2015,TV-14,2 Seasons,"Docuseries, Reality, Survival",Residents of Port Protection must combat volat...,d,ds8
8,s9,TV Show,Secrets of the Zoo: Tampa,,"Dr. Ray Ball, Dr. Lauren Smith, Chris Massaro,...",United States,"November 24, 2021",2019,TV-PG,2 Seasons,"Animals & Nature, Docuseries, Family",A day in the life at ZooTampa is anything but ...,d,ds9
9,s10,Movie,A Muppets Christmas: Letters To Santa,Kirk R. Thatcher,"Steve Whitmire, Dave Goelz, Bill Barretta, Eri...",United States,"November 19, 2021",2008,G,45 min,"Comedy, Family, Musical",Celebrate the holiday season with all your fav...,d,ds10


In [432]:
#2)Reemplazo valores nulos del campo rating por string "G"..

In [433]:
plataformas['rating'].value_counts(dropna=False)

TV-MA      3675
TV-14      3138
R          2154
13+        2117
TV-PG      1654
           ... 
115 min       1
61 min        1
28 min        1
64 min        1
157 min       1
Name: rating, Length: 106, dtype: int64

In [434]:
#primero visualizo cuantos valores nulos hay 

In [435]:
plataformas['rating'].isnull().sum()

864

In [436]:
#visualizo cuantos valos con la letra G ya hay

In [437]:
plataformas['rating'].value_counts()['G']

405

In [438]:
#convierto los valores a G

In [439]:
plataformas['rating'] = plataformas['rating'].fillna('G')

In [440]:
plataformas['rating'].value_counts(dropna=False)

TV-MA      3675
TV-14      3138
R          2154
13+        2117
TV-PG      1654
           ... 
115 min       1
61 min        1
28 min        1
64 min        1
157 min       1
Name: rating, Length: 105, dtype: int64

In [441]:
# Verifico cantidad de nulos TOTAL
plataformas['rating'].value_counts()['G']

1269

In [442]:
#3)De haber fechas, deberán tener formato AAA-mm-dd
plataformas['release_year'].value_counts(dropna=False)

2019    2470
2020    2406
2021    2385
2018    2105
2017    1892
        ... 
1922       2
1926       2
1928       1
1924       1
1927       1
Name: release_year, Length: 101, dtype: int64

In [443]:
plataformas['date_added'] = plataformas['date_added'].str.lstrip()

In [444]:
plataformas['date_added'] = pd.to_datetime(plataformas['date_added'], format='%B %d, %Y').dt.strftime('%Y-%m-%d')

In [445]:
plataformas.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,letra_plataforma,id
0,s1,Movie,Duck the Halls: A Mickey Mouse Christmas Special,"Alonso Ramirez Ramos, Dave Wasson","Chris Diamantopoulos, Tony Anselmo, Tress MacN...",,2021-11-26,2016,TV-G,23 min,"Animation, Family",Join Mickey and the gang as they duck the halls!,d,ds1
1,s2,Movie,Ernest Saves Christmas,John Cherry,"Jim Varney, Noelle Parker, Douglas Seale",,2021-11-26,1988,PG,91 min,Comedy,Santa Claus passes his magic bag to a new St. ...,d,ds2
2,s3,Movie,Ice Age: A Mammoth Christmas,Karen Disher,"Raymond Albert Romano, John Leguizamo, Denis L...",United States,2021-11-26,2011,TV-G,23 min,"Animation, Comedy, Family",Sid the Sloth is on Santa's naughty list.,d,ds3
3,s4,Movie,The Queen Family Singalong,Hamish Hamilton,"Darren Criss, Adam Lambert, Derek Hough, Alexa...",,2021-11-26,2021,TV-PG,41 min,Musical,"This is real life, not just fantasy!",d,ds4
4,s5,TV Show,The Beatles: Get Back,,"John Lennon, Paul McCartney, George Harrison, ...",,2021-11-25,2021,G,1 Season,"Docuseries, Historical, Music",A three-part documentary from Peter Jackson ca...,d,ds5


In [446]:
#Los campos de texto deberán estar en minuscula
plataformas = plataformas.applymap(lambda s: s.lower() if type(s) == str else s)

In [447]:
plataformas.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,letra_plataforma,id
0,s1,movie,duck the halls: a mickey mouse christmas special,"alonso ramirez ramos, dave wasson","chris diamantopoulos, tony anselmo, tress macn...",,2021-11-26,2016,tv-g,23 min,"animation, family",join mickey and the gang as they duck the halls!,d,ds1
1,s2,movie,ernest saves christmas,john cherry,"jim varney, noelle parker, douglas seale",,2021-11-26,1988,pg,91 min,comedy,santa claus passes his magic bag to a new st. ...,d,ds2
2,s3,movie,ice age: a mammoth christmas,karen disher,"raymond albert romano, john leguizamo, denis l...",united states,2021-11-26,2011,tv-g,23 min,"animation, comedy, family",sid the sloth is on santa's naughty list.,d,ds3
3,s4,movie,the queen family singalong,hamish hamilton,"darren criss, adam lambert, derek hough, alexa...",,2021-11-26,2021,tv-pg,41 min,musical,"this is real life, not just fantasy!",d,ds4
4,s5,tv show,the beatles: get back,,"john lennon, paul mccartney, george harrison, ...",,2021-11-25,2021,g,1 season,"docuseries, historical, music",a three-part documentary from peter jackson ca...,d,ds5


#4) separar duration y cambiar format


In [448]:
plataformas[['duration_int', 'duration_type']] = plataformas['duration'].str.split(' ', 1, expand=True)

  plataformas[['duration_int', 'duration_type']] = plataformas['duration'].str.split(' ', 1, expand=True)


In [449]:
plataformas.head(25)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,letra_plataforma,id,duration_int,duration_type
0,s1,movie,duck the halls: a mickey mouse christmas special,"alonso ramirez ramos, dave wasson","chris diamantopoulos, tony anselmo, tress macn...",,2021-11-26,2016,tv-g,23 min,"animation, family",join mickey and the gang as they duck the halls!,d,ds1,23,min
1,s2,movie,ernest saves christmas,john cherry,"jim varney, noelle parker, douglas seale",,2021-11-26,1988,pg,91 min,comedy,santa claus passes his magic bag to a new st. ...,d,ds2,91,min
2,s3,movie,ice age: a mammoth christmas,karen disher,"raymond albert romano, john leguizamo, denis l...",united states,2021-11-26,2011,tv-g,23 min,"animation, comedy, family",sid the sloth is on santa's naughty list.,d,ds3,23,min
3,s4,movie,the queen family singalong,hamish hamilton,"darren criss, adam lambert, derek hough, alexa...",,2021-11-26,2021,tv-pg,41 min,musical,"this is real life, not just fantasy!",d,ds4,41,min
4,s5,tv show,the beatles: get back,,"john lennon, paul mccartney, george harrison, ...",,2021-11-25,2021,g,1 season,"docuseries, historical, music",a three-part documentary from peter jackson ca...,d,ds5,1,season
5,s6,movie,becoming cousteau,liz garbus,"jacques yves cousteau, vincent cassel",united states,2021-11-24,2021,pg-13,94 min,"biographical, documentary",an inside look at the legendary life of advent...,d,ds6,94,min
6,s7,tv show,hawkeye,,"jeremy renner, hailee steinfeld, vera farmiga,...",,2021-11-24,2021,tv-14,1 season,"action-adventure, superhero",clint barton/hawkeye must team up with skilled...,d,ds7,1,season
7,s8,tv show,port protection alaska,,"gary muehlberger, mary miller, curly leach, sa...",united states,2021-11-24,2015,tv-14,2 seasons,"docuseries, reality, survival",residents of port protection must combat volat...,d,ds8,2,seasons
8,s9,tv show,secrets of the zoo: tampa,,"dr. ray ball, dr. lauren smith, chris massaro,...",united states,2021-11-24,2019,tv-pg,2 seasons,"animals & nature, docuseries, family",a day in the life at zootampa is anything but ...,d,ds9,2,seasons
9,s10,movie,a muppets christmas: letters to santa,kirk r. thatcher,"steve whitmire, dave goelz, bill barretta, eri...",united states,2021-11-19,2008,g,45 min,"comedy, family, musical",celebrate the holiday season with all your fav...,d,ds10,45,min


In [450]:
plataformas.dtypes

show_id             object
type                object
title               object
director            object
cast                object
country             object
date_added          object
release_year         int64
rating              object
duration            object
listed_in           object
description         object
letra_plataforma    object
id                  object
duration_int        object
duration_type       object
dtype: object

In [451]:
plataformas['duration_int'].unique

<bound method Series.unique of 0         23
1         91
2         23
3         41
4          1
        ... 
22993     60
22994      4
22995     90
22996      1
22997    103
Name: duration_int, Length: 22998, dtype: object>

In [452]:
plataformas['duration_int'] = plataformas['duration_int'].fillna(0)

In [453]:
plataformas['duration_int'] = plataformas['duration_int'].astype(int)

In [454]:
plataformas.to_csv('plataformas.csv')

In [455]:
ratings.head()

Unnamed: 0,userId,rating,timestamp,movieId
0,1,1.0,1425941529,as680
1,1,4.5,1425942435,ns2186
2,1,5.0,1425941523,hs2381
3,1,5.0,1425941546,ns3663
4,1,5.0,1425941556,as9500


In [456]:
ratings.columns

Index(['userId', 'rating', 'timestamp', 'movieId'], dtype='object')

In [457]:
ratings = ratings.drop('unnamed: 0', axis=1)

KeyError: "['unnamed: 0'] not found in axis"

In [None]:
ratings.head()

In [None]:
ratings = ratings.rename(columns={'rating': 'score'})

In [None]:
ratings.tail(20)

In [None]:
ratings['timestamp'] = pd.to_datetime(ratings['timestamp']).dt.strftime('%Y-%m-%d')

In [None]:
ratings.head()

In [None]:
peliculas=pd.merge(ratings, plataformas, on='id')

`A continuación, se procede a realizar las funciones para hacer las consultas a través de la API posteriormete.`

-----------------------------------------------------------------------------
|                           CONSULTAS A REALIZAR                            |
-----------------------------------------------------------------------------

- 1) Película con mayor duración con filtros opcionales de AÑO, PLATAFORMA Y TIPO DE DURACIÓN. (la función debe llamarse get_max_duration(year, platform, duration_type))

- 2) Cantidad de películas por plataforma con filtro de PLATAFORMA. (La función debe llamarse get_count_platform(platform))
                                           
- 3) Cantidad de películas por plataforma con un puntaje mayor a XX en determinado año (la función debe llamarse get_score_count(platform, scored, year))

(cambiar órden 2x3)

- 4) Actor que más se repite según plataforma y año. (La función debe llamarse get_actor(platform, year))



In [None]:
#1)Esta función consigue la mayor duración según el tipo de película (película/serie), por plataforma y por año. En este ejemplo usaremos 2018 y Hulu"

def get_max_duration(year:int,platform:str,min_or_seasson:str):
    df=movies_and_series[(movies_and_series['RELEASE_YEAR']==year) & (movies_and_series['PLATFORM']==platform)] #segun el año, plataforma y tipo de duracion
    if min_or_seasson == 'min':
        a=df.MOVIE_DURATION.max()
        title=df[df.MOVIE_DURATION==a] ['TITLE'] 
        title=title.to_list()
        title=title[0]
    else:
        a=df.SEASONS.max()
        title=df[df.SEASONS==a] ['TITLE']
        title=title.to_list()
        title=title[0]
    return title


In [None]:
get_max_duration(2018,'Hulu','min')

In [None]:
#opcion b terminar

In [None]:
#2) Esta función obtiene el conteo de películas y series (separadas) por plataforma. 
def get_count_plataform(platform:str):
    platform = platform.replace("'","")
    platform = platform.capitalize()
    Count_platform = movies_and_series[(movies_and_series.PLATFORM == platform)]  # Aplicamos una máscara de acuerdo al parámetro
    movies = int(Count_platform[Count_platform.TYPE == 'Movie'].TYPE.value_counts()[0]) # Contamos la cantidad de ocurrencias
    series = int(Count_platform[Count_platform.TYPE == 'TV Show'].TYPE.value_counts()[0])
    # Retornamos el valor en formato str para poder aclarar a qué corresponde cada cantidad
    return platform, f'Movie: {movies}', f'TV Show: {series}'

In [None]:
get_count_plataform("Netflix")

In [None]:
#3)Cantidad de películas por plataforma con un puntaje mayor a XX en determinado año (la función debe llamarse get_score_count(platform, scored, year))

terminar

In [None]:
def get_score_count(df, platform, score, year=None):
    if year is not None:
        df = df[df['RELEASE_YEAR'] == year]
    df = df[df['PLATFORM'] == platform]
    df = df[df['SCORE'] > score]
    return len(df)


In [None]:
#4)Esta función obtiene al actor que más se repite según plataforma y año.
def get_actor(platform:str,year:int):
    platform = platform.replace("'","")
    platform = platform.capitalize()
    actores, repeticiones = list(), list()  # Creamos dos listas vacías para colocar cada actor y la cantidad de veces
    # Aplicamos máscara para obtener una lista de listas de actores, que no tengan nulos
    Cast_list = list(movies_and_series[(movies_and_series.PLATFORM == platform) & (movies_and_series.RELEASE_YEAR == year)].CAST.fillna(''))

    for each in Cast_list:  # Iteramos cada elemento, que es a su vez una lista de actores
        if not(each == '' or each is None):    # Validamos que tenga datos
            list1 = each.split(",") # Separamos por comas, para obtener una lista nueva cuyos elementos sean los actores
            for elem in list1:  # Iteramos sobre esta nueva lista de actores
                elem = elem.strip() # Limpiamos los espacios vacíos
                # Si el actor ya se encuentra en 'actores', entonces sumará 1 en 'apariciones' con el mismo índice
                if elem in actores: 
                    repeticiones[actores.index(elem)] += 1
                # De lo contrario, agregará el actor en 'actores' y 1 en 'apariciones'
                else:    
                    actores.append(elem)
                    repeticiones.append(1)
    if actores == []: return 'No hay datos' # Para el caso de que ambas listas queden vacías, que no retorne error
    # Retornamos la plataforma, el actor que más se repite en esa plataforma y ese año, y cuántas veces lo hace
    return (platform, max(repeticiones), actores[repeticiones.index(max(repeticiones))])

In [None]:
get_actor('Netflix',2018)