<a href="https://colab.research.google.com/github/fralfaro/MAT281/blob/main/docs/labs/lab_03.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# MAT281 - Laboratorio N°03





**Objetivo**: Aplicar técnicas avanzadas de manipulación y análisis de datos con pandas sobre un conjunto real de datos de contenido de Netflix, reforzando buenas prácticas y métodos eficientes sin recurrir a `groupby`, `merge`, `pivot`, ni `join`.



**Dataset**:

Trabajaremos con el archivo `netflix_titles.csv`, que contiene información sobre los títulos disponibles en la plataforma Netflix hasta el año 2021.

| Variable       | Clase     | Descripción                                                                 |
|----------------|-----------|------------------------------------------------------------------------------|
| show_id        | caracter  | Identificador único del título en el catálogo de Netflix.                   |
| type           | caracter  | Tipo de contenido: 'Movie' o 'TV Show'.                                     |
| title          | caracter  | Título del contenido.                                                       |
| director       | caracter  | Nombre del director (puede ser nulo).                                       |
| cast           | caracter  | Lista de actores principales (puede ser nulo).                              |
| country        | caracter  | País o países donde se produjo el contenido.                                |
| date_added     | fecha     | Fecha en la que el título fue agregado al catálogo de Netflix.              |
| release_year   | entero    | Año de lanzamiento original del título.                                     |
| rating         | caracter  | Clasificación por edad (por ejemplo: 'PG-13', 'TV-MA').                      |
| duration       | caracter  | Duración del contenido (minutos o número de temporadas para series).        |
| listed_in      | caracter  | Categorías o géneros en los que está clasificado el contenido.              |
| description    | caracter  | Breve sinopsis del contenido.                                               |




In [None]:
import pandas as pd

# Cargar datos
df = pd.read_csv('https://raw.githubusercontent.com/fralfaro/MAT281/main/docs/labs/data/netflix_titles.csv')
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...



### Parte 1: Limpieza y preparación

1. Revisar y describir el dataset:

   * ¿Cuántas filas y columnas tiene?
   * ¿Qué tipos de datos hay?
   * ¿Cuántos valores nulos hay por columna?

2. Transformar la columna `date_added` a tipo fecha.

3. Crear columnas auxiliares con `assign`:

   * Año (`year_added`)
   * Mes (`month_added`)



In [None]:
def num_filas_columnas(df):
  filas=df.shape[0]
  columnas=df.shape[1]
  return print(f'El dataset tiene {filas} filas y {columnas} columnas')

def tipos_datos(df):
  return df.dtypes

def valores_nulos(df):
  return df.isnull().sum()


num_filas_columnas(df)
display(tipos_datos(df))
valores_nulos(df)

El dataset tiene 8807 filas y 12 columnas


Unnamed: 0,0
show_id,object
type,object
title,object
director,object
cast,object
country,object
date_added,object
release_year,int64
rating,object
duration,object


Unnamed: 0,0
show_id,0
type,0
title,0
director,2634
cast,825
country,831
date_added,10
release_year,0
rating,4
duration,3


In [None]:
df['new_date_added']=pd.to_datetime(df['date_added'], errors= 'coerce')
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,new_date_added
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021-09-25
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2021-09-24
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,2021-09-24
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",2021-09-24
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,2021-09-24


In [None]:
df_with_dates = df.assign(
    year_added=df['new_date_added'].dt.year,
    month_added=df['new_date_added'].dt.month
)
df_with_dates.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,new_date_added,year_added,month_added
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021-09-25,2021.0,9.0
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2021-09-24,2021.0,9.0
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,2021-09-24,2021.0,9.0
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",2021-09-24,2021.0,9.0
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,2021-09-24,2021.0,9.0


## Parte 2: Técnicas avanzadas de pandas

4. Utilizar `.loc` para seleccionar películas (`type == 'Movie'`) que fueron agregadas después del año 2018.

5. Utilizar `str.contains()` y `str.extract()`:

   * Filtrar títulos que contienen la palabra 'love' (sin distinguir mayúsculas/minúsculas).
   * Extraer la duración en minutos para las películas desde la columna `duration`.

6. Aplicar `explode()` sobre la columna `listed_in` para obtener una fila por cada género.

7. Obtener un top 10 de géneros más frecuentes utilizando `value_counts()`.

8. Aplicar `where()` y `mask()` para marcar las películas de más de 120 minutos como contenido largo en una nueva columna.

9. Utilizar `.loc` para filtrar películas que cumplen con:

   * Más de 100 minutos de duración.
   * Rating igual a `'R'`.
   * País igual a `'United States'`.

10. Utilizar `.style` para formatear visualmente el top 10 de películas más largas.

In [None]:
df_new = df_with_dates.loc[(df_with_dates['type']=='Movie') & (df_with_dates['year_added']>2018)].copy()
df_new.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,new_date_added,year_added,month_added
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021-09-25,2021.0,9.0
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,2021-09-24,2021.0,9.0
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021-09-24,2021.0,9.0
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021-09-24,2021.0,9.0
12,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic","September 23, 2021",2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,2021-09-23,2021.0,9.0


In [None]:
df_loveintitle=df_new[df_new['title'].str.contains('love', case=False)]
df_loveintitle.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,new_date_added,year_added,month_added,duration_minutes,long_content
158,s159,Movie,Love Don't Cost a Thing,Troy Byer,"Nick Cannon, Christina Milian, Kenan Thompson,...",United States,"September 1, 2021",2003,PG-13,101 min,"Comedies, Romantic Movies",A nerdy teen tries to make himself cool by ass...,2021-09-01,2021.0,9.0,101,No
159,s160,Movie,Love in a Puff,Pang Ho-cheung,"Miriam Chin Wah Yeung, Shawn Yue, Singh Hartih...",Hong Kong,"September 1, 2021",2010,TV-MA,103 min,"Comedies, Dramas, International Movies",When the Hong Kong government enacts a ban on ...,2021-09-01,2021.0,9.0,103,No
206,s207,Movie,"LSD: Love, Sex Aur Dhokha",Dibakar Banerjee,"Nushrat Bharucha, Anshuman Jha, Neha Chauhan, ...",India,"August 27, 2021",2010,TV-MA,112 min,"Dramas, Independent Movies, International Movies",This provocative drama examines how the voyeur...,2021-08-27,2021.0,8.0,112,No
227,s228,Movie,Really Love,Angel Kristi Williams,"Kofi Siriboe, Yootha Wong-Loi-Sing, Michael Ea...",United States,"August 25, 2021",2020,TV-MA,95 min,"Dramas, Independent Movies, Romantic Movies",A rising Black painter tries to break into a c...,2021-08-25,2021.0,8.0,95,No
246,s247,Movie,Man in Love,Yin Chen-hao,"Roy Chiu, Ann Hsu, Tsai Chen-nan, Chung Hsin-l...",,"August 20, 2021",2021,TV-MA,115 min,"Dramas, International Movies, Romantic Movies",When he meets a debt-ridden woman who's caring...,2021-08-20,2021.0,8.0,115,No


In [None]:
df_new['duration_minutes'] = df_new['duration'].str.extract(r'(\d+)').astype(int) #se hace esto para extraer las decuencias numericas (las cuales se representan por \d+) que estan como string en la duracion del dataset
display(df_new[['title', 'duration', 'duration_minutes']].head())

Unnamed: 0,title,duration,duration_minutes
0,Dick Johnson Is Dead,90 min,90
6,My Little Pony: A New Generation,91 min,91
7,Sankofa,125 min,125
9,The Starling,104 min,104
12,Je Suis Karl,127 min,127


In [None]:
df_generos = df_new.copy()  #creamos una copia del dataset para poder reconvertir el formato de la columna listed_in para que el .explode funcione bien (esto lo hago porque intente aplicarlo directamente y no separaba todas las categorias de las peliculas, es decir dejaba filas con mas de una categoria)
df_generos['listed_in'] = df_generos['listed_in'].astype(str).str.split(', ') #cambiamos el formato en la copia para no alterar el dataset original
# Aplicar explode() sobre la columna 'listed_in' para obtener una fila por cada género.
df_generos = df_generos.explode('listed_in')

display(df_generos[['title', 'listed_in']].head(10))


Unnamed: 0,title,listed_in
0,Dick Johnson Is Dead,Documentaries
6,My Little Pony: A New Generation,Children & Family Movies
7,Sankofa,Dramas
7,Sankofa,Independent Movies
7,Sankofa,International Movies
9,The Starling,Comedies
9,The Starling,Dramas
12,Je Suis Karl,Dramas
12,Je Suis Karl,International Movies
13,Confessions of an Invisible Girl,Children & Family Movies


In [None]:
top_generos=df_generos['listed_in'].value_counts().head(10)
top_generos

Unnamed: 0_level_0,count
listed_in,Unnamed: 1_level_1
International Movies,1593
Dramas,1511
Comedies,1135
Action & Adventure,568
Children & Family Movies,439
Independent Movies,438
Romantic Movies,437
Documentaries,405
Thrillers,380
Horror Movies,232


In [None]:
df_new['long_content'] = 'No' # Inicializar la columna con 'No'
df_new['long_content'] = df_new['long_content'].mask(df_new['duration_minutes'] > 120, 'Sí') # Marcar como 'Sí' donde la duración es > 120
df_new.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,new_date_added,year_added,month_added,duration_minutes,long_content
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021-09-25,2021.0,9.0,90,No
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,2021-09-24,2021.0,9.0,91,No
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021-09-24,2021.0,9.0,125,Sí
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021-09-24,2021.0,9.0,104,No
12,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic","September 23, 2021",2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,2021-09-23,2021.0,9.0,127,Sí


In [None]:
df_peliculas=df.loc[df['type'] == 'Movie'].copy()
df_peliculas['duration_minutes'] = df_peliculas['duration'].str.extract(r'(\d+)').astype(float)
df_filtered = df_peliculas.loc[
    (df_peliculas['duration_minutes'] > 100) &
    (df_peliculas['rating'] == 'R') &
    (df_peliculas['country'] == 'United States')
]
df_filtered

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,new_date_added,duration_minutes
48,s49,Movie,Training Day,Antoine Fuqua,"Denzel Washington, Ethan Hawke, Scott Glenn, T...",United States,"September 16, 2021",2001,R,122 min,"Dramas, Thrillers",A rookie cop with one day to prove himself to ...,2021-09-16,122.0
81,s82,Movie,Kate,Cedric Nicolas-Troyan,"Mary Elizabeth Winstead, Jun Kunimura, Woody H...",United States,"September 10, 2021",2021,R,106 min,Action & Adventure,"Slipped a fatal poison on her final job, a rut...",2021-09-10,106.0
131,s132,Movie,Blade Runner: The Final Cut,Ridley Scott,"Harrison Ford, Rutger Hauer, Sean Young, Edwar...",United States,"September 1, 2021",1982,R,117 min,"Action & Adventure, Classic Movies, Cult Movies","In a smog-choked dystopian Los Angeles, blade ...",2021-09-01,117.0
139,s140,Movie,Do the Right Thing,Spike Lee,"Danny Aiello, Ossie Davis, Ruby Dee, Richard E...",United States,"September 1, 2021",1989,R,120 min,"Classic Movies, Comedies, Dramas","On a sweltering day in Brooklyn, simmering rac...",2021-09-01,120.0
144,s145,Movie,House Party,Reginald Hudlin,"Christopher Reid, Christopher Martin, Robin Ha...",United States,"September 1, 2021",1990,R,104 min,"Comedies, Cult Movies","Grounded by his strict father, Kid risks life ...",2021-09-01,104.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8678,s8679,Movie,Vincent N Roxxy,Gary Michael Schultz,"Emile Hirsch, Zoë Kravitz, Emory Cohen, Zoey D...",United States,"September 2, 2017",2016,R,101 min,"Dramas, Thrillers","In rural Louisiana, a terse loner forges a red...",2017-09-02,101.0
8691,s8692,Movie,Wakefield,Robin Swicord,"Bryan Cranston, Jennifer Garner, Jason O'Mara,...",United States,"March 2, 2019",2016,R,109 min,Dramas,An unhappy father and lawyer quits his suburba...,2019-03-02,109.0
8751,s8752,Movie,Wish I Was Here,Zach Braff,"Zach Braff, Kate Hudson, Donald Faison, Joey K...",United States,"August 16, 2018",2014,R,106 min,"Comedies, Dramas, Independent Movies","With his acting career moribund, Aidan Bloom s...",2018-08-16,106.0
8754,s8755,Movie,Wolves,Bart Freundlich,"Michael Shannon, Carla Gugino, Taylor John Smi...",United States,"March 29, 2019",2016,R,109 min,"Dramas, Independent Movies, Sports Movies",A promising high school basketball player has ...,2019-03-29,109.0


In [None]:
mas_largas=df_peliculas.sort_values('duration_minutes', ascending=False).head(10)
mas_largas.style\
        .background_gradient() #no supe que hacer con el .style

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,new_date_added,duration_minutes
4253,s4254,Movie,Black Mirror: Bandersnatch,,"Fionn Whitehead, Will Poulter, Craig Parkinson, Alice Lowe, Asim Chaudhry",United States,"December 28, 2018",2018,TV-MA,312 min,"Dramas, International Movies, Sci-Fi & Fantasy","In 1984, a young programmer begins to question reality as he adapts a dark fantasy novel into a video game. A mind-bending tale with multiple endings.",2018-12-28 00:00:00,312.0
717,s718,Movie,Headspace: Unwind Your Mind,,"Andy Puddicombe, Evelyn Lewis Prieto, Ginger Daniels, Darren Pettie, Simon Prebble, Rhiannon Mcgavin, Kate Seftel",,"June 15, 2021",2021,TV-G,273 min,Documentaries,"Do you want to relax, meditate or sleep deeply? Personalize the experience according to your mood or mindset with this Headspace interactive special.",2021-06-15 00:00:00,273.0
2491,s2492,Movie,The School of Mischief,Houssam El-Din Mustafa,"Suhair El-Babili, Adel Emam, Saeed Saleh, Younes Shalabi, Hadi El-Gayyar, Ahmad Zaki, Hassan Moustafa",Egypt,"May 21, 2020",1973,TV-14,253 min,"Comedies, Dramas, International Movies",A high school teacher volunteers to transform five notorious misfits into model students — and has unintended results.,2020-05-21 00:00:00,253.0
2487,s2488,Movie,No Longer kids,Samir Al Asfory,"Said Saleh, Hassan Moustafa, Ahmed Zaki, Younes Shalabi, Nadia Shukri, Karima Mokhtar",Egypt,"May 21, 2020",1979,TV-14,237 min,"Comedies, Dramas, International Movies","Hoping to prevent their father from skipping town with his mistress, four rowdy siblings resort to absurd measures to stop him.",2020-05-21 00:00:00,237.0
2484,s2485,Movie,Lock Your Girls In,Fouad El-Mohandes,"Fouad El-Mohandes, Sanaa Younes, Sherihan, Ahmed Rateb, Ijlal Zaki, Zakariya Mowafi",,"May 21, 2020",1982,TV-PG,233 min,"Comedies, International Movies, Romantic Movies",A widower believes he must marry off his three problematic daughters before he can pursue his real goal of marrying his secret love.,2020-05-21 00:00:00,233.0
2488,s2489,Movie,Raya and Sakina,Hussein Kamal,"Suhair El-Babili, Shadia, Abdel Moneim Madbouly, Ahmed Bedir",,"May 21, 2020",1984,TV-14,230 min,"Comedies, Dramas, International Movies","When robberies and murders targeting women sweep early 20th-century Egypt, the hunt for suspects leads to two shadowy sisters. Based on a true story.",2020-05-21 00:00:00,230.0
166,s167,Movie,Once Upon a Time in America,Sergio Leone,"Robert De Niro, James Woods, Elizabeth McGovern, Treat Williams, Tuesday Weld, Burt Young, Joe Pesci, Danny Aiello, William Forsythe, James Hayden","Italy, United States","September 1, 2021",1984,R,229 min,"Classic Movies, Dramas",Director Sergio Leone's sprawling crime epic follows a group of Jewish mobsters who rise in the ranks of organized crime in 1920s New York City.,2021-09-01 00:00:00,229.0
7932,s7933,Movie,Sangam,Raj Kapoor,"Raj Kapoor, Vyjayanthimala, Rajendra Kumar, Lalita Pawar, Achala Sachdev, Hari Shivdasani, Raj Mehra, Iftekhar",India,"December 31, 2019",1964,TV-14,228 min,"Classic Movies, Dramas, International Movies","Returning home from war after being assumed dead, a pilot weds the woman he has long loved, unaware that she had been planning to marry his best friend.",2019-12-31 00:00:00,228.0
1019,s1020,Movie,Lagaan,Ashutosh Gowariker,"Aamir Khan, Gracy Singh, Rachel Shelley, Paul Blackthorne, Kulbhushan Kharbanda, Raghuvir Yadav, Yashpal Sharma, Rajendranath Zutshi, Rajesh Vivek, Aditya Lakhia","India, United Kingdom","April 17, 2021",2001,PG,224 min,"Dramas, International Movies, Music & Musicals","In 1890s India, an arrogant British commander challenges the harshly taxed residents of Champaner to a high-stakes cricket match.",2021-04-17 00:00:00,224.0
4573,s4574,Movie,Jodhaa Akbar,Ashutosh Gowariker,"Hrithik Roshan, Aishwarya Rai Bachchan, Sonu Sood, Poonam Sinha, Suhasini Mulay, Ila Arun, Raza Murad, Kulbhushan Kharbanda, Abeer Abrar",India,"October 1, 2018",2008,TV-14,214 min,"Action & Adventure, Dramas, International Movies","In 16th-century India, what begins as a strategic alliance between a Mughal emperor and a Hindu princess becomes a genuine opportunity for true love.",2018-10-01 00:00:00,214.0




### Pregunta Desafío

11. ¿Cuáles son las combinaciones más frecuentes de género y rating en el dataset?
    (Sugerencia: utilizar `value_counts` con `subset=["genre", "rating"]` después de aplicar `explode()`).



### Bonus: Análisis de duplicados y limpieza

12. ¿Existen películas con el mismo nombre (`title`) pero con distinto año de lanzamiento (`release_year`)?
13. ¿Cuántos títulos únicos hay en total en la columna `title`?





In [None]:
combinaciones_mas_frecuentes = df.value_counts(subset=['listed_in', 'rating']).head(10) #se toma en cuenta el subconjunto de listed in y rating y se cuenta cuales son los que mas se repiten
combinaciones_mas_frecuentes

Unnamed: 0_level_0,Unnamed: 1_level_0,count
listed_in,rating,Unnamed: 2_level_1
Stand-Up Comedy,TV-MA,284
"Dramas, International Movies",TV-MA,154
"Dramas, Independent Movies, International Movies",TV-MA,142
"Dramas, International Movies",TV-14,139
"Comedies, Dramas, International Movies",TV-14,139
Kids' TV,TV-Y,117
Documentaries,TV-MA,115
"Children & Family Movies, Comedies",PG,99
Documentaries,TV-14,98
"Dramas, International Movies, Romantic Movies",TV-14,92


In [None]:
titulos_duplicados = df[df.duplicated('title', keep=False)].sort_values('title') #buscamos los titulos duplicados con keep=False para no perder ninguno (estuve probando y se perdia uno si no ponia el keep false, por ejemplo si un titulo se repetia 3 veces me entregaba solo 2 de las repeticiones, estas pruebas ls hice sobre la lista de generos ya que en el df original no hay duplicados)
titulos_con_distinto_año= titulos_duplicados.groupby('title').filter(lambda x: x['release_year'].nunique() > 1) #aqui agrupamos todas las peliculas que tienen el mismo titulo y verificamos cuantos años de lanzamientos distintos hay, esto lo hacemos con el .nunique que cuenta cuantos elementos distintos hay
print("Hay", len(titulos_con_distinto_año), "títulos duplicados con años de lanzamiento distintos.")

Hay 0 títulos duplicados con años de lanzamiento distintos.


In [None]:
print("Hay", df['title'].nunique() , "títulos unicos.")

#Como conclusion de hacer esto creo que deberia haberlo hexho xon funciones y creando nuevos data set de vez en cuando ya que me habria facilitado bastante hacer lo solicitado pero bueno pa la proxima

Hay 8807 títulos unicos.
