<div style="text-align: center;">
  <img src="https://github.com/Hack-io-Data/Imagenes/blob/main/01-LogosHackio/logo_naranja@4x.png?raw=true" alt="esquema" />
</div>

# Laboratorio Limpieza de Datos

En este laboratorio usaremos el DataFrame de Netflix completo creado en los primeros laboratorios de Pandas. 

**Instrucciones:**

1. Lee cuidadosamente el enunciado de cada ejercicio.

2. Implementa la solución en la celda de código proporcionada.

3. Documenta todas las funciones creadas durante el ejercicio. 

4. Debes incluir después de cada gráfica la interpretación de las mismas en una celda de markdown. 

In [135]:
import pandas as pd
import numpy as np

from pandas import ExcelWriter
import itertools

pd.set_option('display.max_columns', None)

df_union_final = pd.read_csv("datos/df_union_final.csv", index_col = 0)

df_union_final

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Genre,Premiere,Runtime,IMDB Score,Language
0,s2037,Movie,#Alive,Cho Il,"Yoo Ah-in, Park Shin-hye",South Korea,"September 8, 2020",2020,TV-MA,,"Horror Movies, International Movies, Thrillers","As a grisly virus rampages a city, a lone man ...",,,,,
1,s2305,Movie,#AnneFrank - Parallel Stories,"Sabina Fedeli, Anna Migotto","Helen Mirren, Gengher Gatti",Italy,"July 1, 2020",2019,TV-14,,"Documentaries, International Movies","Through her diary, Anne Frank's story is retol...",,,,,
2,s2482,Movie,#FriendButMarried,Rako Prijanto,"Adipati Dolken, Vanesha Prescilla, Rendi Jhon,...",Indonesia,"May 21, 2020",2018,TV-G,,"Dramas, International Movies, Romantic Movies","Pining for his high school crush for years, a ...",,,,,
3,s2325,Movie,#FriendButMarried 2,Rako Prijanto,"Adipati Dolken, Mawar de Jongh, Sari Nila, Von...",Indonesia,"June 28, 2020",2020,TV-G,104 min,"Dramas, International Movies, Romantic Movies",As Ayu and Ditto finally transition from best ...,,,,,
4,s5974,Movie,#Roxy,Michael Kennedy,"Jake Short, Sarah Fisher, Booboo Stewart, Dann...",Canada,"April 10, 2019",2018,TV-14,,"Comedies, Romantic Movies",A teenage hacker with a huge nose helps a cool...,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s6178,TV Show,忍者ハットリくん,,,Japan,"December 23, 2018",2012,TV-Y7,2 Seasons,"Anime Series, Kids' TV","Hailing from the mountains of Iga, Kanzo Hatto...",,,,,
8803,s4915,TV Show,海的儿子,,"Li Nanxing, Christopher Lee, Jesseca Liu, Appl...",,"April 27, 2018",2016,TV-14,,"International TV Shows, TV Dramas","Two brothers start a new life in Singapore, wh...",,,,,
8804,s7102,TV Show,마녀사냥,,"Si-kyung Sung, Se-yoon Yoo, Dong-yup Shin, Ji-...",South Korea,"February 19, 2018",2015,TV-MA,,"International TV Shows, Korean TV Shows, Stand...",Four Korean celebrity men and guest stars of b...,,,,,
8805,s5023,Movie,반드시 잡는다,Hong-seon Kim,Baek Yoon-sik,South Korea,"February 28, 2018",2017,TV-MA,110 min,"Dramas, International Movies, Thrillers",After people in his town start turning up dead...,,,,,


## Parte 1: Limpieza y Preparación de Datos

#### Ejercicio 1: Estandarización y limpieza de columnas

En este ejercicio, debes limpiar y estandarizar algunas columnas clave para hacerlas más manejables y consistentes en tus análisis. Específicamente, trabajarás con las columnas `date_added` y `duration` para convertirlas a un formato uniforme y estructurado.

Instrucciones:

1. **Convertir la columna `date_added`**: La columna `date_added` contiene fechas en formato de texto. Debes convertirla a un formato `datetime` que pandas pueda entender y manejar fácilmente.

2. **Limpiar la columna `duration`**: La columna `duration` tiene valores en diferentes formatos como "1 Season", "2 Seasons", "90 min", etc. Tu tarea es extraer el número (ya sea el número de temporadas o la cantidad de minutos) y crear una nueva columna llamada `duration_cleaned` con esos valores estandarizados.


**Resultado Esperado:**
Deberás obtener algo como esto:

| duration   | duration_cleaned |
|------------|-----------------|
| 1 Season   | 1               |
| 90 min     | 90              |
| 2 Seasons  | 2               |
| 45 min     | 45              |
| 3 Seasons  | 3               |

In [136]:
df_union_final["fecha"] = pd.to_datetime(df_union_final["date_added"], errors='coerce', format= '%B %d, %Y')
df_union_final

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Genre,Premiere,Runtime,IMDB Score,Language,fecha
0,s2037,Movie,#Alive,Cho Il,"Yoo Ah-in, Park Shin-hye",South Korea,"September 8, 2020",2020,TV-MA,,"Horror Movies, International Movies, Thrillers","As a grisly virus rampages a city, a lone man ...",,,,,,2020-09-08
1,s2305,Movie,#AnneFrank - Parallel Stories,"Sabina Fedeli, Anna Migotto","Helen Mirren, Gengher Gatti",Italy,"July 1, 2020",2019,TV-14,,"Documentaries, International Movies","Through her diary, Anne Frank's story is retol...",,,,,,2020-07-01
2,s2482,Movie,#FriendButMarried,Rako Prijanto,"Adipati Dolken, Vanesha Prescilla, Rendi Jhon,...",Indonesia,"May 21, 2020",2018,TV-G,,"Dramas, International Movies, Romantic Movies","Pining for his high school crush for years, a ...",,,,,,2020-05-21
3,s2325,Movie,#FriendButMarried 2,Rako Prijanto,"Adipati Dolken, Mawar de Jongh, Sari Nila, Von...",Indonesia,"June 28, 2020",2020,TV-G,104 min,"Dramas, International Movies, Romantic Movies",As Ayu and Ditto finally transition from best ...,,,,,,2020-06-28
4,s5974,Movie,#Roxy,Michael Kennedy,"Jake Short, Sarah Fisher, Booboo Stewart, Dann...",Canada,"April 10, 2019",2018,TV-14,,"Comedies, Romantic Movies",A teenage hacker with a huge nose helps a cool...,,,,,,2019-04-10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s6178,TV Show,忍者ハットリくん,,,Japan,"December 23, 2018",2012,TV-Y7,2 Seasons,"Anime Series, Kids' TV","Hailing from the mountains of Iga, Kanzo Hatto...",,,,,,NaT
8803,s4915,TV Show,海的儿子,,"Li Nanxing, Christopher Lee, Jesseca Liu, Appl...",,"April 27, 2018",2016,TV-14,,"International TV Shows, TV Dramas","Two brothers start a new life in Singapore, wh...",,,,,,2018-04-27
8804,s7102,TV Show,마녀사냥,,"Si-kyung Sung, Se-yoon Yoo, Dong-yup Shin, Ji-...",South Korea,"February 19, 2018",2015,TV-MA,,"International TV Shows, Korean TV Shows, Stand...",Four Korean celebrity men and guest stars of b...,,,,,,2018-02-19
8805,s5023,Movie,반드시 잡는다,Hong-seon Kim,Baek Yoon-sik,South Korea,"February 28, 2018",2017,TV-MA,110 min,"Dramas, International Movies, Thrillers",After people in his town start turning up dead...,,,,,,2018-02-28


In [137]:
df_union_final["duration_clean"] = df_union_final["duration"].str.extract('(\d+)').astype("Int64")
df_union_final[["duration" , "duration_clean"]]

Unnamed: 0,duration,duration_clean
0,,
1,,
2,,
3,104 min,104
4,,
...,...,...
8802,2 Seasons,2
8803,,
8804,,
8805,110 min,110


#### Ejercicio 2: Normalización de la columna `rating`

La columna `rating` tiene diferentes calificaciones como `PG`, `PG-13`, `R`, entre otras. Debes categorizar estas calificaciones en tres grupos:

- **'General Audience'** para calificaciones como `G`, `PG`.

- **'Teens'** para calificaciones como `PG-13`, `TV-14`.

- **'Adults'** para calificaciones como `R`, `TV-MA`.


In [138]:
mapa_rating = {"G":"General Audience", "PG":"General Audience" , "PG-13": "Teens", "TV-14": "Teens", "R": "Adults", "TV-MA" : "Adults"}
df_union_final["clasificacion"] = df_union_final["rating"].map(mapa_rating)
df_union_final[["rating", "clasificacion"]]


Unnamed: 0,rating,clasificacion
0,TV-MA,Adults
1,TV-14,Teens
2,TV-G,
3,TV-G,
4,TV-14,Teens
...,...,...
8802,TV-Y7,
8803,TV-14,Teens
8804,TV-MA,Adults
8805,TV-MA,Adults


#### Ejercicio 3: Creación de una columna personalizada basada en el elenco

Vamos a identificar si un actor clave como `Leonardo DiCaprio`, `Tom Hanks`, o `Morgan Freeman` aparece en el elenco.

Usa `apply` y una función lambda para crear una nueva columna llamada `has_famous_actor` que contenga `True` si alguno de estos actores está en la lista de `cast` y `False` en caso contrario.

In [139]:
def actor_bueno(valor_1):
    if valor_1 in ["Leonardo DiCaprio", "Tom Hanks", "Morgan Freeman"] :
        return True
    else:
        return False
    
df_union_final["has_famous_actor"] = df_union_final.apply(lambda x: actor_bueno(x["cast"]),axis=1)
df_union_final.sample(10)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Genre,Premiere,Runtime,IMDB Score,Language,fecha,duration_clean,clasificacion,has_famous_actor
8416,s5106,TV Show,Wadi,,"Syafie Naswip, Ardell Aryana, Naim Daniel, Sha...",,"December 29, 2017",2015,TV-14,,"International TV Shows, TV Dramas","After the death of his father, a passive and g...",,,,,,2017-12-29,,Teens,False
3179,s7009,Movie,Hotel Transylvania 3: Summer Vacation,Genndy Tartakovsky,"Adam Sandler, Selena Gomez, Kevin James, Kathr...",United States,"January 24, 2019",2018,PG,,"Children & Family Movies, Comedies",It's love at first sight for Dracula when he m...,,,,,,2019-01-24,,General Audience,False
4280,s3547,TV Show,Luo Bao Bei,,"Hana Burnett, Natalia-Jade Jonathan, Leo Tang,...","China, United Kingdom","August 31, 2019",2018,TV-Y,,"British TV Shows, Kids' TV",A bright and spirited seven-year-old girl uses...,,,,,,2019-08-31,,,False
8726,s323,TV Show,You're My Destiny,,"Joe Chen, Ethan Juan, Baron Chen, Bianca Bai, ...",Taiwan,"August 3, 2021",2008,TV-MA,,"International TV Shows, Romantic TV Shows, TV ...",A young woman's romantic cruise ends in a twis...,,,,,,2021-08-03,,Adults,False
2306,s4868,TV Show,Evil Genius,"Trey Borzillieri, Barbara Schroeder",,United States,"May 11, 2018",2018,TV-MA,,"Crime TV Shows, Docuseries, TV Mysteries",This baffling true crime story starts with the...,,,,,,2018-05-11,,Adults,False
6790,s4736,Movie,Tere Naal Love Ho Gaya,Mandeep Kumar,"Riteish Deshmukh, Genelia D'Souza, Tinnu Anand...",India,"August 2, 2018",2012,TV-14,127 min,"Comedies, International Movies, Romantic Movies",Mini isn't eager to wed the rich suitor who's ...,,,,,,2018-08-02,127.0,Teens,False
8338,s277,TV Show,Valeria,Inma Torrente,"Diana Gómez, Silma López, Paula Malia, Teresa ...",Spain,"August 13, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, Spa...",A writer in creative and marital crises finds ...,,,,,,2021-08-13,2.0,Adults,False
2780,s6871,Movie,Golmaal: Fun Unlimited,Rohit Shetty,"Ajay Devgn, Tusshar Kapoor, Arshad Warsi, Shar...",India,"December 31, 2019",2006,TV-14,148 min,"Comedies, International Movies",Between thwarting crooks and wooing the belle ...,,,,,,2019-12-31,148.0,Teens,False
6407,s809,Movie,Sniper: Legacy,Don Michael Paul,"Tom Berenger, Chad Michael Collins, Doug Allen...","United States, Bulgaria","June 2, 2021",2014,R,98 min,Action & Adventure,When a troubled sniper begins killing officers...,,,,,,2021-06-02,98.0,Adults,False
6186,s1237,Movie,Sentinelle,Julien Leclercq,"Olga Kurylenko, Marilyn Lima, Michel Nabokoff,...",France,"March 5, 2021",2021,TV-MA,81 min,"Action & Adventure, Dramas, International Movies",Transferred home after a traumatizing combat m...,Action,"March 5, 2021",80.0,4.7,French,2021-03-05,81.0,Adults,False


#### Ejercicio 4: Creación de una columna personalizada usando lógica condicional

Vamos a crear una columna llamada `is_recent` que identifique si un título fue lanzado en los últimos 5 años.

Crea una función para marcar con `True` si el título es reciente (lanzado en los últimos 5 años) y `False` si no lo es.

In [140]:
df_union_final["release_year_new"] = pd.to_datetime(df_union_final["release_year"], format='%Y')
df_union_final["is_recent"] = df_union_final["release_year_new"] >= pd.to_datetime('today') - pd.DateOffset(years=5)
df_union_final.sample(10)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Genre,Premiere,Runtime,IMDB Score,Language,fecha,duration_clean,clasificacion,has_famous_actor,release_year_new,is_recent
911,s6281,Movie,Below Her Mouth,April Mullen,"Erika Linder, Natalie Krill, Sebastian Pigott,...",Canada,"August 1, 2017",2016,TV-MA,,"Dramas, Independent Movies, International Movies",An engaged fashion editor begins a torrid affa...,,,,,,2017-08-01,,Adults,False,2016-01-01,False
1454,s302,Movie,Chennai Express,Rohit Shetty,"Shah Rukh Khan, Deepika Padukone, Nikitin Dhee...",India,"August 5, 2021",2013,TV-14,135 min,"Action & Adventure, Comedies, International Mo...",What could have been a sad journey turns joyfu...,,,,,,2021-08-05,135.0,Teens,False,2013-01-01,False
6880,s8202,Movie,The Bad Education Movie,Elliot Hegarty,"Jack Whitehall, Joanna Scanlan, Iain Glen, Eth...",United Kingdom,"December 15, 2018",2015,TV-MA,,Comedies,Britain's most ineffective but caring teacher ...,,,,,,2018-12-15,,Adults,False,2015-01-01,False
3946,s7259,TV Show,La Viuda Negra,Alejandro Lozano,"Ana Serradilla, Julián Román, Ramiro Meneses, ...","Colombia, Mexico, United States","January 15, 2019",2016,TV-14,,"Crime TV Shows, International TV Shows, Spanis...","Beautiful and ruthless Griselda Blanco, known ...",,,,,,2019-01-15,,Teens,False,2016-01-01,False
7899,s3731,Movie,The Wolf's Call,Antonin Baudry,"François Civil, Omar Sy, Mathieu Kassovitz, Re...",France,"June 20, 2019",2019,TV-14,116 min,"Dramas, International Movies, Thrillers","With nuclear war looming, a military expert in...",,,,,,2019-06-20,116.0,Teens,False,2019-01-01,False
2581,s2403,Movie,From A to B,Ali F. Mostafa,"Fahad Albutairi, Shadi Alfons, Fadi Rifaai, Sa...","United Arab Emirates, Jordan, Lebanon, Saudi A...","June 11, 2020",2014,TV-MA,104 min,"Comedies, Dramas, International Movies",To celebrate the memory of their pal who passe...,,,,,,2020-06-11,104.0,Adults,False,2014-01-01,False
679,s562,Movie,Austin Powers in Goldmember,Jay Roach,"Mike Myers, Beyoncé Knowles-Carter, Seth Green...",United States,"July 1, 2021",2002,PG-13,,"Action & Adventure, Comedies",The world's most shagadelic spy continues his ...,,,,,,2021-07-01,,Teens,False,2002-01-01,False
3130,s492,Movie,Home Again,Hallie Meyers-Shyer,"Reese Witherspoon, Michael Sheen, Candice Berg...",United States,"July 8, 2021",2017,PG-13,,"Comedies, Dramas, Romantic Movies",A newly single mom takes in three young male f...,,,,,,2021-07-08,,Teens,False,2017-01-01,False
590,s3665,TV Show,Anohana: The Flower We Saw That Day,,"Miyu Irino, Ai Kayano, Haruka Tomatsu, Takahir...",Japan,"July 15, 2019",2011,TV-14,,"Anime Series, International TV Shows, Teen TV ...",A teen haunted by the spirit of an old friend ...,,,,,,2019-07-15,,Teens,False,2011-01-01,False
7061,s5211,TV Show,The Day I Met El Chapo,,Kate del Castillo,United States,"October 20, 2017",2017,TV-MA,,"Crime TV Shows, Docuseries, International TV S...",Mexican superstar actress Kate del Castillo re...,,,,,,2017-10-20,,Adults,False,2017-01-01,False


#### Ejercicio 5: Clasificación de películas por década

En este ejercicio, tu objetivo es categorizar los años de lanzamiento de las películas o series en décadas. La columna `release_year` contiene el año de lanzamiento y debes crear una nueva columna llamada `decade` que indique la década correspondiente, como "1990s", "2000s", etc.


In [141]:
df_union_final["decade"] = (df_union_final["release_year"]// 10*10).astype(str) + "s"
df_union_final.sample(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Genre,Premiere,Runtime,IMDB Score,Language,fecha,duration_clean,clasificacion,has_famous_actor,release_year_new,is_recent,decade
6594,s8107,Movie,Striptease,Andrew Bergman,"Demi Moore, Burt Reynolds, Armand Assante, Vin...",United States,"January 1, 2021",1996,R,117 min,"Comedies, Dramas",A former FBI employee works as a stripper to f...,,,,,,2021-01-01,117.0,Adults,False,1996-01-01,False,1990s
3335,s3938,TV Show,Imposters,,"Inbar Lavi, Rob Heaps, Parker Young, Marianne ...",United States,"April 5, 2019",2017,TV-MA,2 Seasons,"Crime TV Shows, International TV Shows, TV Com...","Supported by a team of fellow thieves, a con a...",,,,,,2019-04-05,2.0,Adults,False,2017-01-01,False,2010s
7891,s5442,Movie,The Wishing Tree,Manika Sharma,"Shabana Azmi, Makrand Deshpande, Harshpreet Ka...",India,"June 5, 2017",2017,TV-14,117 min,"Children & Family Movies, Dramas, Music & Musi...",Five children living on the edge of a forest b...,,,,,,2017-06-05,117.0,Teens,False,2017-01-01,False,2010s
84,s6010,Movie,3 Heroines,Iman Brotoseno,"Reza Rahadian, Bunga Citra Lestari, Tara Basro...",Indonesia,"January 5, 2019",2016,TV-PG,124 min,"Dramas, International Movies, Sports Movies",Three Indonesian women break records by becomi...,,,,,,2019-01-05,124.0,,False,2016-01-01,False,2010s
3193,s1536,TV Show,How To Ruin Christmas,,,South Africa,"December 16, 2020",2020,TV-MA,,"International TV Shows, TV Comedies, TV Dramas",Prodigal daughter Tumi tries to make things ri...,,,,,,2020-12-16,,Adults,False,2020-01-01,True,2020s


#### Ejercicio 6: Extracción de información

Para practicar la extracción de información:

1. **Extrae el primer actor** de la lista en la columna `cast` y crea una nueva columna llamada `first_actor`.

2. **Extrae el primer nombre del director** y guárdalo en una columna llamada `first_name_director`.


In [142]:
df_union_final["first_actor"] = df_union_final["cast"].str.split(",").str[0]
df_union_final[["cast", "first_actor"]]


Unnamed: 0,cast,first_actor
0,"Yoo Ah-in, Park Shin-hye",Yoo Ah-in
1,"Helen Mirren, Gengher Gatti",Helen Mirren
2,"Adipati Dolken, Vanesha Prescilla, Rendi Jhon,...",Adipati Dolken
3,"Adipati Dolken, Mawar de Jongh, Sari Nila, Von...",Adipati Dolken
4,"Jake Short, Sarah Fisher, Booboo Stewart, Dann...",Jake Short
...,...,...
8802,,
8803,"Li Nanxing, Christopher Lee, Jesseca Liu, Appl...",Li Nanxing
8804,"Si-kyung Sung, Se-yoon Yoo, Dong-yup Shin, Ji-...",Si-kyung Sung
8805,Baek Yoon-sik,Baek Yoon-sik


In [143]:
df_union_final["first_director"] = df_union_final["director"].str.split(",").str[0]
df_union_final[["director", "first_director"]]

Unnamed: 0,director,first_director
0,Cho Il,Cho Il
1,"Sabina Fedeli, Anna Migotto",Sabina Fedeli
2,Rako Prijanto,Rako Prijanto
3,Rako Prijanto,Rako Prijanto
4,Michael Kennedy,Michael Kennedy
...,...,...
8802,,
8803,,
8804,,
8805,Hong-seon Kim,Hong-seon Kim


#### Ejercicio 7: Limpieza de la columna `cast`

La columna `cast` contiene una lista de actores separados por comas. Tu objetivo es realizar las siguientes tareas:

1. **Reemplaza los valores nulos** en la columna `cast` por "sin información".

2. **Contar el número de actores** en cada entrada y crear una nueva columna llamada `num_cast`.

3. **Normalizar los nombres**: Asegúrate de que los nombres de los actores estén en un formato consistente (por ejemplo, quitar espacios adicionales).


In [144]:
df_union_final["cast"] = df_union_final["cast"].fillna("sin_informacion")
df_union_final

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Genre,Premiere,Runtime,IMDB Score,Language,fecha,duration_clean,clasificacion,has_famous_actor,release_year_new,is_recent,decade,first_actor,first_director
0,s2037,Movie,#Alive,Cho Il,"Yoo Ah-in, Park Shin-hye",South Korea,"September 8, 2020",2020,TV-MA,,"Horror Movies, International Movies, Thrillers","As a grisly virus rampages a city, a lone man ...",,,,,,2020-09-08,,Adults,False,2020-01-01,True,2020s,Yoo Ah-in,Cho Il
1,s2305,Movie,#AnneFrank - Parallel Stories,"Sabina Fedeli, Anna Migotto","Helen Mirren, Gengher Gatti",Italy,"July 1, 2020",2019,TV-14,,"Documentaries, International Movies","Through her diary, Anne Frank's story is retol...",,,,,,2020-07-01,,Teens,False,2019-01-01,False,2010s,Helen Mirren,Sabina Fedeli
2,s2482,Movie,#FriendButMarried,Rako Prijanto,"Adipati Dolken, Vanesha Prescilla, Rendi Jhon,...",Indonesia,"May 21, 2020",2018,TV-G,,"Dramas, International Movies, Romantic Movies","Pining for his high school crush for years, a ...",,,,,,2020-05-21,,,False,2018-01-01,False,2010s,Adipati Dolken,Rako Prijanto
3,s2325,Movie,#FriendButMarried 2,Rako Prijanto,"Adipati Dolken, Mawar de Jongh, Sari Nila, Von...",Indonesia,"June 28, 2020",2020,TV-G,104 min,"Dramas, International Movies, Romantic Movies",As Ayu and Ditto finally transition from best ...,,,,,,2020-06-28,104,,False,2020-01-01,True,2020s,Adipati Dolken,Rako Prijanto
4,s5974,Movie,#Roxy,Michael Kennedy,"Jake Short, Sarah Fisher, Booboo Stewart, Dann...",Canada,"April 10, 2019",2018,TV-14,,"Comedies, Romantic Movies",A teenage hacker with a huge nose helps a cool...,,,,,,2019-04-10,,Teens,False,2018-01-01,False,2010s,Jake Short,Michael Kennedy
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s6178,TV Show,忍者ハットリくん,,sin_informacion,Japan,"December 23, 2018",2012,TV-Y7,2 Seasons,"Anime Series, Kids' TV","Hailing from the mountains of Iga, Kanzo Hatto...",,,,,,NaT,2,,False,2012-01-01,False,2010s,,
8803,s4915,TV Show,海的儿子,,"Li Nanxing, Christopher Lee, Jesseca Liu, Appl...",,"April 27, 2018",2016,TV-14,,"International TV Shows, TV Dramas","Two brothers start a new life in Singapore, wh...",,,,,,2018-04-27,,Teens,False,2016-01-01,False,2010s,Li Nanxing,
8804,s7102,TV Show,마녀사냥,,"Si-kyung Sung, Se-yoon Yoo, Dong-yup Shin, Ji-...",South Korea,"February 19, 2018",2015,TV-MA,,"International TV Shows, Korean TV Shows, Stand...",Four Korean celebrity men and guest stars of b...,,,,,,2018-02-19,,Adults,False,2015-01-01,False,2010s,Si-kyung Sung,
8805,s5023,Movie,반드시 잡는다,Hong-seon Kim,Baek Yoon-sik,South Korea,"February 28, 2018",2017,TV-MA,110 min,"Dramas, International Movies, Thrillers",After people in his town start turning up dead...,,,,,,2018-02-28,110,Adults,False,2017-01-01,False,2010s,Baek Yoon-sik,Hong-seon Kim


In [145]:
df_union_final["num_cast"] = df_union_final["cast"].apply(lambda x: len(x.split(",")))
df_union_final[["cast", "num_cast"]]

Unnamed: 0,cast,num_cast
0,"Yoo Ah-in, Park Shin-hye",2
1,"Helen Mirren, Gengher Gatti",2
2,"Adipati Dolken, Vanesha Prescilla, Rendi Jhon,...",8
3,"Adipati Dolken, Mawar de Jongh, Sari Nila, Von...",8
4,"Jake Short, Sarah Fisher, Booboo Stewart, Dann...",8
...,...,...
8802,sin_informacion,1
8803,"Li Nanxing, Christopher Lee, Jesseca Liu, Appl...",5
8804,"Si-kyung Sung, Se-yoon Yoo, Dong-yup Shin, Ji-...",5
8805,Baek Yoon-sik,1



#### Ejercicio 9: Identificación de Directores Recurrentes

En este ejercicio, debes identificar los directores que aparecen más de una vez en el conjunto de datos. Realiza los siguientes pasos:

1. **Reemplaza los valores nulos** en la columna `director` por "sin información".

3. **Cuenta cuántas veces aparece cada director** en la columna creada en el ejercicio 6.

4. **Filtra aquellos directores que aparecen más de una vez** y crea una nueva columna llamada `recurrent_director` donde se indique "Yes" si el director aparece varias veces o "No" en caso contrario.

In [146]:
df_union_final["director"] = df_union_final["director"].fillna("sin_informacion")
df_union_final

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Genre,Premiere,Runtime,IMDB Score,Language,fecha,duration_clean,clasificacion,has_famous_actor,release_year_new,is_recent,decade,first_actor,first_director,num_cast
0,s2037,Movie,#Alive,Cho Il,"Yoo Ah-in, Park Shin-hye",South Korea,"September 8, 2020",2020,TV-MA,,"Horror Movies, International Movies, Thrillers","As a grisly virus rampages a city, a lone man ...",,,,,,2020-09-08,,Adults,False,2020-01-01,True,2020s,Yoo Ah-in,Cho Il,2
1,s2305,Movie,#AnneFrank - Parallel Stories,"Sabina Fedeli, Anna Migotto","Helen Mirren, Gengher Gatti",Italy,"July 1, 2020",2019,TV-14,,"Documentaries, International Movies","Through her diary, Anne Frank's story is retol...",,,,,,2020-07-01,,Teens,False,2019-01-01,False,2010s,Helen Mirren,Sabina Fedeli,2
2,s2482,Movie,#FriendButMarried,Rako Prijanto,"Adipati Dolken, Vanesha Prescilla, Rendi Jhon,...",Indonesia,"May 21, 2020",2018,TV-G,,"Dramas, International Movies, Romantic Movies","Pining for his high school crush for years, a ...",,,,,,2020-05-21,,,False,2018-01-01,False,2010s,Adipati Dolken,Rako Prijanto,8
3,s2325,Movie,#FriendButMarried 2,Rako Prijanto,"Adipati Dolken, Mawar de Jongh, Sari Nila, Von...",Indonesia,"June 28, 2020",2020,TV-G,104 min,"Dramas, International Movies, Romantic Movies",As Ayu and Ditto finally transition from best ...,,,,,,2020-06-28,104,,False,2020-01-01,True,2020s,Adipati Dolken,Rako Prijanto,8
4,s5974,Movie,#Roxy,Michael Kennedy,"Jake Short, Sarah Fisher, Booboo Stewart, Dann...",Canada,"April 10, 2019",2018,TV-14,,"Comedies, Romantic Movies",A teenage hacker with a huge nose helps a cool...,,,,,,2019-04-10,,Teens,False,2018-01-01,False,2010s,Jake Short,Michael Kennedy,8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s6178,TV Show,忍者ハットリくん,sin_informacion,sin_informacion,Japan,"December 23, 2018",2012,TV-Y7,2 Seasons,"Anime Series, Kids' TV","Hailing from the mountains of Iga, Kanzo Hatto...",,,,,,NaT,2,,False,2012-01-01,False,2010s,,,1
8803,s4915,TV Show,海的儿子,sin_informacion,"Li Nanxing, Christopher Lee, Jesseca Liu, Appl...",,"April 27, 2018",2016,TV-14,,"International TV Shows, TV Dramas","Two brothers start a new life in Singapore, wh...",,,,,,2018-04-27,,Teens,False,2016-01-01,False,2010s,Li Nanxing,,5
8804,s7102,TV Show,마녀사냥,sin_informacion,"Si-kyung Sung, Se-yoon Yoo, Dong-yup Shin, Ji-...",South Korea,"February 19, 2018",2015,TV-MA,,"International TV Shows, Korean TV Shows, Stand...",Four Korean celebrity men and guest stars of b...,,,,,,2018-02-19,,Adults,False,2015-01-01,False,2010s,Si-kyung Sung,,5
8805,s5023,Movie,반드시 잡는다,Hong-seon Kim,Baek Yoon-sik,South Korea,"February 28, 2018",2017,TV-MA,110 min,"Dramas, International Movies, Thrillers",After people in his town start turning up dead...,,,,,,2018-02-28,110,Adults,False,2017-01-01,False,2010s,Baek Yoon-sik,Hong-seon Kim,1


In [147]:
df_union_final["first_actor"].value_counts().reset_index()

Unnamed: 0,first_actor,count
0,Shah Rukh Khan,26
1,Akshay Kumar,23
2,Amitabh Bachchan,20
3,David Attenborough,20
4,Adam Sandler,20
...,...,...
5395,Warren Christie,1
5396,Alberto Ammann,1
5397,Balthazar Murillo,1
5398,Taiga Nakano,1


In [163]:
def contar_director(valor, lista):
    if lista.count(valor) > 1:
        return "Yes"
    else:
        return "No"

df_union_final["recurrent_director"] = df_union_final["first_director"].apply(lambda x: contar_director(x, df_union_final["first_director"]))
df_union_final

TypeError: count() takes 1 positional argument but 2 were given