<div style="text-align: center;">
  <img src="https://github.com/Hack-io-Data/Imagenes/blob/main/01-LogosHackio/logo_naranja@4x.png?raw=true" alt="esquema" />
</div>

# Laboratorio Limpieza de Datos

En este laboratorio usaremos el DataFrame de Netflix completo creado en los primeros laboratorios de Pandas. 

**Instrucciones:**

1. Lee cuidadosamente el enunciado de cada ejercicio.

2. Implementa la solución en la celda de código proporcionada.

3. Documenta todas las funciones creadas durante el ejercicio. 

4. Debes incluir después de cada gráfica la interpretación de las mismas en una celda de markdown. 

In [85]:
import pandas as pd
import numpy as np

from pandas import ExcelWriter
import itertools

pd.set_option('display.max_columns', None)

df_union_final = pd.read_csv("datos/df_union_final.csv", index_col = 0)

df_union_final

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Genre,Premiere,Runtime,IMDB Score,Language
0,s2037,Movie,#Alive,Cho Il,"Yoo Ah-in, Park Shin-hye",South Korea,"September 8, 2020",2020,TV-MA,,"Horror Movies, International Movies, Thrillers","As a grisly virus rampages a city, a lone man ...",,,,,
1,s2305,Movie,#AnneFrank - Parallel Stories,"Sabina Fedeli, Anna Migotto","Helen Mirren, Gengher Gatti",Italy,"July 1, 2020",2019,TV-14,,"Documentaries, International Movies","Through her diary, Anne Frank's story is retol...",,,,,
2,s2482,Movie,#FriendButMarried,Rako Prijanto,"Adipati Dolken, Vanesha Prescilla, Rendi Jhon,...",Indonesia,"May 21, 2020",2018,TV-G,,"Dramas, International Movies, Romantic Movies","Pining for his high school crush for years, a ...",,,,,
3,s2325,Movie,#FriendButMarried 2,Rako Prijanto,"Adipati Dolken, Mawar de Jongh, Sari Nila, Von...",Indonesia,"June 28, 2020",2020,TV-G,104 min,"Dramas, International Movies, Romantic Movies",As Ayu and Ditto finally transition from best ...,,,,,
4,s5974,Movie,#Roxy,Michael Kennedy,"Jake Short, Sarah Fisher, Booboo Stewart, Dann...",Canada,"April 10, 2019",2018,TV-14,,"Comedies, Romantic Movies",A teenage hacker with a huge nose helps a cool...,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s6178,TV Show,忍者ハットリくん,,,Japan,"December 23, 2018",2012,TV-Y7,2 Seasons,"Anime Series, Kids' TV","Hailing from the mountains of Iga, Kanzo Hatto...",,,,,
8803,s4915,TV Show,海的儿子,,"Li Nanxing, Christopher Lee, Jesseca Liu, Appl...",,"April 27, 2018",2016,TV-14,,"International TV Shows, TV Dramas","Two brothers start a new life in Singapore, wh...",,,,,
8804,s7102,TV Show,마녀사냥,,"Si-kyung Sung, Se-yoon Yoo, Dong-yup Shin, Ji-...",South Korea,"February 19, 2018",2015,TV-MA,,"International TV Shows, Korean TV Shows, Stand...",Four Korean celebrity men and guest stars of b...,,,,,
8805,s5023,Movie,반드시 잡는다,Hong-seon Kim,Baek Yoon-sik,South Korea,"February 28, 2018",2017,TV-MA,110 min,"Dramas, International Movies, Thrillers",After people in his town start turning up dead...,,,,,


## Parte 1: Limpieza y Preparación de Datos

#### Ejercicio 1: Estandarización y limpieza de columnas

En este ejercicio, debes limpiar y estandarizar algunas columnas clave para hacerlas más manejables y consistentes en tus análisis. Específicamente, trabajarás con las columnas `date_added` y `duration` para convertirlas a un formato uniforme y estructurado.

Instrucciones:

1. **Convertir la columna `date_added`**: La columna `date_added` contiene fechas en formato de texto. Debes convertirla a un formato `datetime` que pandas pueda entender y manejar fácilmente.

2. **Limpiar la columna `duration`**: La columna `duration` tiene valores en diferentes formatos como "1 Season", "2 Seasons", "90 min", etc. Tu tarea es extraer el número (ya sea el número de temporadas o la cantidad de minutos) y crear una nueva columna llamada `duration_cleaned` con esos valores estandarizados.


**Resultado Esperado:**
Deberás obtener algo como esto:

| duration   | duration_cleaned |
|------------|-----------------|
| 1 Season   | 1               |
| 90 min     | 90              |
| 2 Seasons  | 2               |
| 45 min     | 45              |
| 3 Seasons  | 3               |

In [86]:
df_union_final["fecha"] = pd.to_datetime(df_union_final["date_added"], errors='coerce', format= '%B %d, %Y')
df_union_final

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Genre,Premiere,Runtime,IMDB Score,Language,fecha
0,s2037,Movie,#Alive,Cho Il,"Yoo Ah-in, Park Shin-hye",South Korea,"September 8, 2020",2020,TV-MA,,"Horror Movies, International Movies, Thrillers","As a grisly virus rampages a city, a lone man ...",,,,,,2020-09-08
1,s2305,Movie,#AnneFrank - Parallel Stories,"Sabina Fedeli, Anna Migotto","Helen Mirren, Gengher Gatti",Italy,"July 1, 2020",2019,TV-14,,"Documentaries, International Movies","Through her diary, Anne Frank's story is retol...",,,,,,2020-07-01
2,s2482,Movie,#FriendButMarried,Rako Prijanto,"Adipati Dolken, Vanesha Prescilla, Rendi Jhon,...",Indonesia,"May 21, 2020",2018,TV-G,,"Dramas, International Movies, Romantic Movies","Pining for his high school crush for years, a ...",,,,,,2020-05-21
3,s2325,Movie,#FriendButMarried 2,Rako Prijanto,"Adipati Dolken, Mawar de Jongh, Sari Nila, Von...",Indonesia,"June 28, 2020",2020,TV-G,104 min,"Dramas, International Movies, Romantic Movies",As Ayu and Ditto finally transition from best ...,,,,,,2020-06-28
4,s5974,Movie,#Roxy,Michael Kennedy,"Jake Short, Sarah Fisher, Booboo Stewart, Dann...",Canada,"April 10, 2019",2018,TV-14,,"Comedies, Romantic Movies",A teenage hacker with a huge nose helps a cool...,,,,,,2019-04-10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s6178,TV Show,忍者ハットリくん,,,Japan,"December 23, 2018",2012,TV-Y7,2 Seasons,"Anime Series, Kids' TV","Hailing from the mountains of Iga, Kanzo Hatto...",,,,,,NaT
8803,s4915,TV Show,海的儿子,,"Li Nanxing, Christopher Lee, Jesseca Liu, Appl...",,"April 27, 2018",2016,TV-14,,"International TV Shows, TV Dramas","Two brothers start a new life in Singapore, wh...",,,,,,2018-04-27
8804,s7102,TV Show,마녀사냥,,"Si-kyung Sung, Se-yoon Yoo, Dong-yup Shin, Ji-...",South Korea,"February 19, 2018",2015,TV-MA,,"International TV Shows, Korean TV Shows, Stand...",Four Korean celebrity men and guest stars of b...,,,,,,2018-02-19
8805,s5023,Movie,반드시 잡는다,Hong-seon Kim,Baek Yoon-sik,South Korea,"February 28, 2018",2017,TV-MA,110 min,"Dramas, International Movies, Thrillers",After people in his town start turning up dead...,,,,,,2018-02-28


In [87]:
df_union_final["duration_clean"] = df_union_final["duration"].str.extract('(\d+)').astype("Int64")
df_union_final[["duration" , "duration_clean"]]

Unnamed: 0,duration,duration_clean
0,,
1,,
2,,
3,104 min,104
4,,
...,...,...
8802,2 Seasons,2
8803,,
8804,,
8805,110 min,110


#### Ejercicio 2: Normalización de la columna `rating`

La columna `rating` tiene diferentes calificaciones como `PG`, `PG-13`, `R`, entre otras. Debes categorizar estas calificaciones en tres grupos:

- **'General Audience'** para calificaciones como `G`, `PG`.

- **'Teens'** para calificaciones como `PG-13`, `TV-14`.

- **'Adults'** para calificaciones como `R`, `TV-MA`.


In [88]:
mapa_rating = {"G":"General Audience", "PG":"General Audience" , "PG-13": "Teens", "TV-14": "Teens", "R": "Adults", "TV-MA" : "Adults"}
df_union_final["clasificacion"] = df_union_final["rating"].map(mapa_rating)
df_union_final[["rating", "clasificacion"]]


Unnamed: 0,rating,clasificacion
0,TV-MA,Adults
1,TV-14,Teens
2,TV-G,
3,TV-G,
4,TV-14,Teens
...,...,...
8802,TV-Y7,
8803,TV-14,Teens
8804,TV-MA,Adults
8805,TV-MA,Adults


#### Ejercicio 3: Creación de una columna personalizada basada en el elenco

Vamos a identificar si un actor clave como `Leonardo DiCaprio`, `Tom Hanks`, o `Morgan Freeman` aparece en el elenco.

Usa `apply` y una función lambda para crear una nueva columna llamada `has_famous_actor` que contenga `True` si alguno de estos actores está en la lista de `cast` y `False` en caso contrario.

In [89]:
def actor_bueno(valor_1):
    if valor_1 in ["Leonardo DiCaprio", "Tom Hanks", "Morgan Freeman"] :
        return True
    else:
        return False
    
df_union_final["has_famous_actor"] = df_union_final.apply(lambda x: actor_bueno(x["cast"]),axis=1)
df_union_final.sample(10)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Genre,Premiere,Runtime,IMDB Score,Language,fecha,duration_clean,clasificacion,has_famous_actor
7905,s2402,TV Show,The Woods,,"Grzegorz Damięcki, Agnieszka Grochowska, Huber...","Poland, United States","June 12, 2020",2020,TV-MA,,"Crime TV Shows, International TV Shows, TV Dramas",Evidence found on the body of a homicide victi...,,,,,,2020-06-12,,Adults,False
5155,s5460,TV Show,Off Camera,,Sam Jones,United States,"June 1, 2017",2015,TV-MA,,Stand-Up Comedy & Talk Shows,"Photographer, writer and director Sam Jones se...",,,,,,2017-06-01,,Adults,False
5888,s1970,Movie,Road To High & Low,Shigeaki Kubo,"Takanori Iwata, Nobuyuki Suzuki, Keita Machida...",Japan,"September 20, 2020",2016,TV-MA,,"Action & Adventure, International Movies",Three inseparable friends are torn when one of...,,,,,,2020-09-20,,Adults,False
2647,s6825,TV Show,Game Winning Hit,,"Lego Lee, Alice Ko, Afalean Lu, Tsai Chen-nan,...",Taiwan,"September 1, 2016",2009,TV-MA,,"International TV Shows, TV Comedies, TV Dramas",An army deserter hiding out in a small coastal...,,,,,,2016-09-01,,Adults,False
7119,s8287,Movie,The Eichmann Show,Paul Andrew Williams,"Martin Freeman, Anthony LaPaglia, Rebecca Fron...","United Kingdom, Lithuania","August 31, 2017",2015,TV-MA,,"Dramas, International Movies",This is the astonishing true story behind a mo...,,,,,,2017-08-31,,Adults,False
6705,s4813,Movie,TAU,Federico D'Alessandro,"Maika Monroe, Ed Skrein, Gary Oldman",United States,"June 29, 2018",2018,R,98 min,"Sci-Fi & Fantasy, Thrillers",Kidnapped by an inventor who uses her as a tes...,Science fiction/Thriller,"June 29, 2018",97.0,5.8,English,2018-06-29,98.0,Adults,False
5100,s7596,Movie,Norm of the North: Keys to the Kingdom,Tim Maltby,"Andrew Toth, Cole Howard, Maya Kay, Jennifer C...","India, United States","August 21, 2020",2019,TV-PG,,Children & Family Movies,When Norm the polar bear is framed for a crime...,,,,,,2020-08-21,,,False
3881,s5801,TV Show,Kulipari: An Army of Frogs,,"Mark Hamill, Keith David, Wendie Malick, Josh ...",United States,"September 2, 2016",2016,TV-Y7,,Kids' TV,"In a tale of bravery and heroism, fearless fro...",,,,,,2016-09-02,,,False
19,s5982,Movie,"10,000 B.C.",Roland Emmerich,"Steven Strait, Camilla Belle, Cliff Curtis, Jo...","United States, South Africa","June 1, 2019",2008,PG-13,,Action & Adventure,Fierce mammoth hunter D'Leh sets out on an imp...,,,,,,2019-06-01,,Teens,False
296,s6075,Movie,Aashayein,Nagesh Kukunoor,"John Abraham, Sonal Sehgal, Prateeksha Lonkar,...",India,"October 22, 2017",2010,TV-14,,"Dramas, International Movies","When he learns he has terminal cancer, a cynic...",,,,,,2017-10-22,,Teens,False


#### Ejercicio 4: Creación de una columna personalizada usando lógica condicional

Vamos a crear una columna llamada `is_recent` que identifique si un título fue lanzado en los últimos 5 años.

Crea una función para marcar con `True` si el título es reciente (lanzado en los últimos 5 años) y `False` si no lo es.

In [90]:
df_union_final["release_year_new"] = pd.to_datetime(df_union_final["release_year"], format='%Y')
df_union_final["is_recent"] = df_union_final["release_year_new"] >= pd.to_datetime('today') - pd.DateOffset(years=5)
df_union_final.sample(10)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Genre,Premiere,Runtime,IMDB Score,Language,fecha,duration_clean,clasificacion,has_famous_actor,release_year_new,is_recent
7424,s8394,Movie,The Little Hours,Jeff Baena,"Alison Brie, Dave Franco, Kate Micucci, Aubrey...","Canada, United States","December 23, 2018",2017,R,,"Comedies, Independent Movies",Life at a convent takes an unruly turn when th...,,,,,,2018-12-23,,Adults,False,2017-01-01,False
6016,s7920,Movie,Saeed Mirza: The Leftist Sufi,"Kireet Khurana, Padmakumar Narasimhamurthy",Saeed Akhtar Mirza,India,"May 1, 2017",2016,TV-MA,61 min,"Documentaries, International Movies",This documentary profiles the celebrated Hindi...,,,,,,2017-05-01,61.0,Adults,False,2016-01-01,False
8095,s5018,Movie,Trailer Park Boys Live In F**kin' Dublin,"Mike Smith, John Paul Tremblay, Robb Wells","Mike Smith, John Paul Tremblay, Robb Wells, Pa...",Canada,"March 1, 2018",2014,TV-MA,81 min,Movies,The boys head to Ireland after winning a conte...,,,,,,2018-03-01,81.0,Adults,False,2014-01-01,False
2949,s4570,Movie,Harishchandrachi Factory,Paresh Mokashi,"Nandu Madhav, Vibhawari Deshpande, Ambarish De...",India,"October 1, 2018",2009,TV-PG,,"Dramas, International Movies",Against a backdrop of burgeoning social unrest...,,,,,,2018-10-01,,,False,2009-01-01,False
5078,s2966,TV Show,No Game No Life,,"Yoshitsugu Matsuoka, Ai Kayano, Yoko Hikasa, Y...",Japan,"February 1, 2020",2014,TV-MA,,"Anime Series, International TV Shows",Legendary gamer siblings Sora and Shiro are tr...,,,,,,2020-02-01,,Adults,False,2014-01-01,False
4409,s5515,TV Show,Mar de Plástico,,"Rodolfo Sancho, Belén López, Pedro Casablanc, ...",Spain,"April 27, 2017",2017,TV-MA,2 Seasons,"Crime TV Shows, International TV Shows, Spanis...",In a town in southern Spain where racial tensi...,,,,,,2017-04-27,2.0,Adults,False,2017-01-01,False
4290,s3629,Movie,Léa & I,Camille Shooshani,"Léa Moret, Camille Shooshani",,"August 2, 2019",2019,TV-MA,84 min,Documentaries,Best friends Léa and Camille explore the notio...,,,,,,2019-08-02,84.0,Adults,False,2019-01-01,False
3005,s6949,Movie,Heal,Kelly Noonan,"Kelly Noonan, Deepak Chopra, Michael Beckwith,...",United States,"February 1, 2019",2017,TV-14,107 min,Documentaries,"Stories from spiritual leaders, physicians and...",,,,,,2019-02-01,107.0,Teens,False,2017-01-01,False
6628,s4830,Movie,Sunday's Illness,Ramón Salazar,"Bárbara Lennie, Susi Sánchez, Miguel Ángel Sol...",Spain,"June 15, 2018",2017,PG-13,113 min,"Dramas, Independent Movies, International Movies",Decades after being abandoned as a young child...,,,,,,2018-06-15,113.0,Teens,False,2017-01-01,False
6839,s8185,Movie,The Adventures of Tintin,Steven Spielberg,"Jamie Bell, Andy Serkis, Daniel Craig, Nick Fr...","United States, New Zealand, United Kingdom","November 20, 2019",2011,PG,107 min,Children & Family Movies,This 3-D motion capture adapts Georges Remi's ...,,,,,,2019-11-20,107.0,General Audience,False,2011-01-01,False


#### Ejercicio 5: Clasificación de películas por década

En este ejercicio, tu objetivo es categorizar los años de lanzamiento de las películas o series en décadas. La columna `release_year` contiene el año de lanzamiento y debes crear una nueva columna llamada `decade` que indique la década correspondiente, como "1990s", "2000s", etc.


In [91]:
df_union_final["decade"] = (df_union_final["release_year"]// 10*10).astype(str) + "s"
df_union_final.sample(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Genre,Premiere,Runtime,IMDB Score,Language,fecha,duration_clean,clasificacion,has_famous_actor,release_year_new,is_recent,decade
6801,s4477,TV Show,Terrorism Close Calls,,,"United States, Czech Republic","October 26, 2018",2018,TV-MA,,"Crime TV Shows, Docuseries, International TV S...",Law enforcement officials look back on attempt...,,,,,,2018-10-26,,Adults,False,2018-01-01,False,2010s
7324,s3218,Movie,The Island,Toka McBaror,"Sambasa Nzeribe, Segun Arinze, Tokunbo Idowu, ...",Nigeria,"November 29, 2019",2018,TV-14,,"Dramas, International Movies, Thrillers",When a colonel uncovers controversial intel ab...,,,,,,2019-11-29,,Teens,False,2018-01-01,False,2010s
7854,s5288,TV Show,The Vampire Diaries,,"Nina Dobrev, Paul Wesley, Ian Somerhalder, Ste...",United States,"September 4, 2017",2016,TV-14,8 Seasons,"TV Dramas, TV Mysteries, TV Sci-Fi & Fantasy","Trapped in adolescent bodies, feuding vampire ...",,,,,,2017-09-04,8.0,Teens,False,2016-01-01,False,2010s
1356,s5712,Movie,Carlos Ballarta: El amor es de putos,"Jan Suter, Raúl Campos Delgado",Carlos Ballarta,Mexico,"November 21, 2016",2016,TV-MA,,Stand-Up Comedy,"Carlos Ballarta mocks daily life in Mexico, in...",,,,,,2016-11-21,,Adults,False,2016-01-01,False,2010s
4122,s2115,Movie,Little Singham: Kaal Ka Badla,Prakash Satam,"Anamaya Verma, Arushi Talwar, Ganesh Divekar, ...",,"August 19, 2020",2020,TV-Y7,70 min,"Children & Family Movies, Comedies","Little Singham’s biggest enemy, the demon Kaal...",,,,,,2020-08-19,70.0,,False,2020-01-01,True,2020s


#### Ejercicio 6: Extracción de información

Para practicar la extracción de información:

1. **Extrae el primer actor** de la lista en la columna `cast` y crea una nueva columna llamada `first_actor`.

2. **Extrae el primer nombre del director** y guárdalo en una columna llamada `first_name_director`.


In [92]:
df_union_final["first_actor"] = df_union_final["cast"].str.split(",").str[0]
df_union_final[["cast", "first_actor"]]


Unnamed: 0,cast,first_actor
0,"Yoo Ah-in, Park Shin-hye",Yoo Ah-in
1,"Helen Mirren, Gengher Gatti",Helen Mirren
2,"Adipati Dolken, Vanesha Prescilla, Rendi Jhon,...",Adipati Dolken
3,"Adipati Dolken, Mawar de Jongh, Sari Nila, Von...",Adipati Dolken
4,"Jake Short, Sarah Fisher, Booboo Stewart, Dann...",Jake Short
...,...,...
8802,,
8803,"Li Nanxing, Christopher Lee, Jesseca Liu, Appl...",Li Nanxing
8804,"Si-kyung Sung, Se-yoon Yoo, Dong-yup Shin, Ji-...",Si-kyung Sung
8805,Baek Yoon-sik,Baek Yoon-sik


In [93]:
df_union_final["first_director"] = df_union_final["director"].str.split(",").str[0]
df_union_final[["director", "first_director"]]

Unnamed: 0,director,first_director
0,Cho Il,Cho Il
1,"Sabina Fedeli, Anna Migotto",Sabina Fedeli
2,Rako Prijanto,Rako Prijanto
3,Rako Prijanto,Rako Prijanto
4,Michael Kennedy,Michael Kennedy
...,...,...
8802,,
8803,,
8804,,
8805,Hong-seon Kim,Hong-seon Kim


#### Ejercicio 7: Limpieza de la columna `cast`

La columna `cast` contiene una lista de actores separados por comas. Tu objetivo es realizar las siguientes tareas:

1. **Reemplaza los valores nulos** en la columna `cast` por "sin información".

2. **Contar el número de actores** en cada entrada y crear una nueva columna llamada `num_cast`.

3. **Normalizar los nombres**: Asegúrate de que los nombres de los actores estén en un formato consistente (por ejemplo, quitar espacios adicionales).


In [94]:
df_union_final["cast"] = df_union_final["cast"].fillna("sin_informacion")
df_union_final

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Genre,Premiere,Runtime,IMDB Score,Language,fecha,duration_clean,clasificacion,has_famous_actor,release_year_new,is_recent,decade,first_actor,first_director
0,s2037,Movie,#Alive,Cho Il,"Yoo Ah-in, Park Shin-hye",South Korea,"September 8, 2020",2020,TV-MA,,"Horror Movies, International Movies, Thrillers","As a grisly virus rampages a city, a lone man ...",,,,,,2020-09-08,,Adults,False,2020-01-01,True,2020s,Yoo Ah-in,Cho Il
1,s2305,Movie,#AnneFrank - Parallel Stories,"Sabina Fedeli, Anna Migotto","Helen Mirren, Gengher Gatti",Italy,"July 1, 2020",2019,TV-14,,"Documentaries, International Movies","Through her diary, Anne Frank's story is retol...",,,,,,2020-07-01,,Teens,False,2019-01-01,False,2010s,Helen Mirren,Sabina Fedeli
2,s2482,Movie,#FriendButMarried,Rako Prijanto,"Adipati Dolken, Vanesha Prescilla, Rendi Jhon,...",Indonesia,"May 21, 2020",2018,TV-G,,"Dramas, International Movies, Romantic Movies","Pining for his high school crush for years, a ...",,,,,,2020-05-21,,,False,2018-01-01,False,2010s,Adipati Dolken,Rako Prijanto
3,s2325,Movie,#FriendButMarried 2,Rako Prijanto,"Adipati Dolken, Mawar de Jongh, Sari Nila, Von...",Indonesia,"June 28, 2020",2020,TV-G,104 min,"Dramas, International Movies, Romantic Movies",As Ayu and Ditto finally transition from best ...,,,,,,2020-06-28,104,,False,2020-01-01,True,2020s,Adipati Dolken,Rako Prijanto
4,s5974,Movie,#Roxy,Michael Kennedy,"Jake Short, Sarah Fisher, Booboo Stewart, Dann...",Canada,"April 10, 2019",2018,TV-14,,"Comedies, Romantic Movies",A teenage hacker with a huge nose helps a cool...,,,,,,2019-04-10,,Teens,False,2018-01-01,False,2010s,Jake Short,Michael Kennedy
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s6178,TV Show,忍者ハットリくん,,sin_informacion,Japan,"December 23, 2018",2012,TV-Y7,2 Seasons,"Anime Series, Kids' TV","Hailing from the mountains of Iga, Kanzo Hatto...",,,,,,NaT,2,,False,2012-01-01,False,2010s,,
8803,s4915,TV Show,海的儿子,,"Li Nanxing, Christopher Lee, Jesseca Liu, Appl...",,"April 27, 2018",2016,TV-14,,"International TV Shows, TV Dramas","Two brothers start a new life in Singapore, wh...",,,,,,2018-04-27,,Teens,False,2016-01-01,False,2010s,Li Nanxing,
8804,s7102,TV Show,마녀사냥,,"Si-kyung Sung, Se-yoon Yoo, Dong-yup Shin, Ji-...",South Korea,"February 19, 2018",2015,TV-MA,,"International TV Shows, Korean TV Shows, Stand...",Four Korean celebrity men and guest stars of b...,,,,,,2018-02-19,,Adults,False,2015-01-01,False,2010s,Si-kyung Sung,
8805,s5023,Movie,반드시 잡는다,Hong-seon Kim,Baek Yoon-sik,South Korea,"February 28, 2018",2017,TV-MA,110 min,"Dramas, International Movies, Thrillers",After people in his town start turning up dead...,,,,,,2018-02-28,110,Adults,False,2017-01-01,False,2010s,Baek Yoon-sik,Hong-seon Kim


In [103]:
df_union_final["num_cast"] = df_union_final["cast"].apply(lambda x: len(x.split(",")))
df_union_final[["cast", "num_cast"]]

Unnamed: 0,cast,num_cast
0,"Yoo Ah-in, Park Shin-hye",2
1,"Helen Mirren, Gengher Gatti",2
2,"Adipati Dolken, Vanesha Prescilla, Rendi Jhon,...",8
3,"Adipati Dolken, Mawar de Jongh, Sari Nila, Von...",8
4,"Jake Short, Sarah Fisher, Booboo Stewart, Dann...",8
...,...,...
8802,sin_informacion,1
8803,"Li Nanxing, Christopher Lee, Jesseca Liu, Appl...",5
8804,"Si-kyung Sung, Se-yoon Yoo, Dong-yup Shin, Ji-...",5
8805,Baek Yoon-sik,1



#### Ejercicio 9: Identificación de Directores Recurrentes

En este ejercicio, debes identificar los directores que aparecen más de una vez en el conjunto de datos. Realiza los siguientes pasos:

1. **Reemplaza los valores nulos** en la columna `director` por "sin información".

3. **Cuenta cuántas veces aparece cada director** en la columna creada en el ejercicio 6.

4. **Filtra aquellos directores que aparecen más de una vez** y crea una nueva columna llamada `recurrent_director` donde se indique "Yes" si el director aparece varias veces o "No" en caso contrario.

In [96]:
df_union_final["director"] = df_union_final["director"].fillna("sin_informacion")
df_union_final

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Genre,Premiere,Runtime,IMDB Score,Language,fecha,duration_clean,clasificacion,has_famous_actor,release_year_new,is_recent,decade,first_actor,first_director,num_cast
0,s2037,Movie,#Alive,Cho Il,"Yoo Ah-in, Park Shin-hye",South Korea,"September 8, 2020",2020,TV-MA,,"Horror Movies, International Movies, Thrillers","As a grisly virus rampages a city, a lone man ...",,,,,,2020-09-08,,Adults,False,2020-01-01,True,2020s,Yoo Ah-in,Cho Il,2
1,s2305,Movie,#AnneFrank - Parallel Stories,"Sabina Fedeli, Anna Migotto","Helen Mirren, Gengher Gatti",Italy,"July 1, 2020",2019,TV-14,,"Documentaries, International Movies","Through her diary, Anne Frank's story is retol...",,,,,,2020-07-01,,Teens,False,2019-01-01,False,2010s,Helen Mirren,Sabina Fedeli,2
2,s2482,Movie,#FriendButMarried,Rako Prijanto,"Adipati Dolken, Vanesha Prescilla, Rendi Jhon,...",Indonesia,"May 21, 2020",2018,TV-G,,"Dramas, International Movies, Romantic Movies","Pining for his high school crush for years, a ...",,,,,,2020-05-21,,,False,2018-01-01,False,2010s,Adipati Dolken,Rako Prijanto,8
3,s2325,Movie,#FriendButMarried 2,Rako Prijanto,"Adipati Dolken, Mawar de Jongh, Sari Nila, Von...",Indonesia,"June 28, 2020",2020,TV-G,104 min,"Dramas, International Movies, Romantic Movies",As Ayu and Ditto finally transition from best ...,,,,,,2020-06-28,104,,False,2020-01-01,True,2020s,Adipati Dolken,Rako Prijanto,8
4,s5974,Movie,#Roxy,Michael Kennedy,"Jake Short, Sarah Fisher, Booboo Stewart, Dann...",Canada,"April 10, 2019",2018,TV-14,,"Comedies, Romantic Movies",A teenage hacker with a huge nose helps a cool...,,,,,,2019-04-10,,Teens,False,2018-01-01,False,2010s,Jake Short,Michael Kennedy,8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s6178,TV Show,忍者ハットリくん,sin_informacion,sin_informacion,Japan,"December 23, 2018",2012,TV-Y7,2 Seasons,"Anime Series, Kids' TV","Hailing from the mountains of Iga, Kanzo Hatto...",,,,,,NaT,2,,False,2012-01-01,False,2010s,,,1
8803,s4915,TV Show,海的儿子,sin_informacion,"Li Nanxing, Christopher Lee, Jesseca Liu, Appl...",,"April 27, 2018",2016,TV-14,,"International TV Shows, TV Dramas","Two brothers start a new life in Singapore, wh...",,,,,,2018-04-27,,Teens,False,2016-01-01,False,2010s,Li Nanxing,,5
8804,s7102,TV Show,마녀사냥,sin_informacion,"Si-kyung Sung, Se-yoon Yoo, Dong-yup Shin, Ji-...",South Korea,"February 19, 2018",2015,TV-MA,,"International TV Shows, Korean TV Shows, Stand...",Four Korean celebrity men and guest stars of b...,,,,,,2018-02-19,,Adults,False,2015-01-01,False,2010s,Si-kyung Sung,,5
8805,s5023,Movie,반드시 잡는다,Hong-seon Kim,Baek Yoon-sik,South Korea,"February 28, 2018",2017,TV-MA,110 min,"Dramas, International Movies, Thrillers",After people in his town start turning up dead...,,,,,,2018-02-28,110,Adults,False,2017-01-01,False,2010s,Baek Yoon-sik,Hong-seon Kim,1
