<a href="https://colab.research.google.com/github/fralfaro/MAT281/blob/main/docs/labs/lab_03.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# MAT281 - Laboratorio N°03





**Objetivo**: Aplicar técnicas avanzadas de manipulación y análisis de datos con pandas sobre un conjunto real de datos de contenido de Netflix, reforzando buenas prácticas y métodos eficientes sin recurrir a `groupby`, `merge`, `pivot`, ni `join`.



**Dataset**:

Trabajaremos con el archivo `netflix_titles.csv`, que contiene información sobre los títulos disponibles en la plataforma Netflix hasta el año 2021.

| Variable       | Clase     | Descripción                                                                 |
|----------------|-----------|------------------------------------------------------------------------------|
| show_id        | caracter  | Identificador único del título en el catálogo de Netflix.                   |
| type           | caracter  | Tipo de contenido: 'Movie' o 'TV Show'.                                     |
| title          | caracter  | Título del contenido.                                                       |
| director       | caracter  | Nombre del director (puede ser nulo).                                       |
| cast           | caracter  | Lista de actores principales (puede ser nulo).                              |
| country        | caracter  | País o países donde se produjo el contenido.                                |
| date_added     | fecha     | Fecha en la que el título fue agregado al catálogo de Netflix.              |
| release_year   | entero    | Año de lanzamiento original del título.                                     |
| rating         | caracter  | Clasificación por edad (por ejemplo: 'PG-13', 'TV-MA').                      |
| duration       | caracter  | Duración del contenido (minutos o número de temporadas para series).        |
| listed_in      | caracter  | Categorías o géneros en los que está clasificado el contenido.              |
| description    | caracter  | Breve sinopsis del contenido.                                               |




In [107]:
import pandas as pd
# import datetime

# Cargar datos
df = pd.read_csv('https://raw.githubusercontent.com/fralfaro/MAT281/main/docs/labs/data/netflix_titles.csv')
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...



### Parte 1: Limpieza y preparación

1. Revisar y describir el dataset:

   * ¿Cuántas filas y columnas tiene?
   * ¿Qué tipos de datos hay?
   * ¿Cuántos valores nulos hay por columna?

2. Transformar la columna `date_added` a tipo fecha.

3. Crear columnas auxiliares con `assign`:

   * Año (`year_added`)
   * Mes (`month_added`)



In [108]:
'''Revisar y describir el dataset'''
df.info()
# 8807 filas y 12 columnas
# Datos del tipo entero y string
# Hay datos nulos en la columna director, cast, country, date_added, rating y duration

'''Transformando date_added a tipo fecha'''
df["date_added"] = df["date_added"].str.strip()
df['new_date_added'] = pd.to_datetime(df['date_added'])
df["year_added"] = df["new_date_added"].dt.year
df["month_added"] = df["new_date_added"].dt.month



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB


## Parte 2: Técnicas avanzadas de pandas

4. Utilizar `.loc` para seleccionar películas (`type == 'Movie'`) que fueron agregadas después del año 2018.

5. Utilizar `str.contains()` y `str.extract()`:

   * Filtrar títulos que contienen la palabra 'love' (sin distinguir mayúsculas/minúsculas).
   * Extraer la duración en minutos para las películas desde la columna `duration`.

6. Aplicar `explode()` sobre la columna `listed_in` para obtener una fila por cada género.

7. Obtener un top 10 de géneros más frecuentes utilizando `value_counts()`.

8. Aplicar `where()` y `mask()` para marcar las películas de más de 120 minutos como contenido largo en una nueva columna.

9. Utilizar `.loc` para filtrar películas que cumplen con:

   * Más de 100 minutos de duración.
   * Rating igual a `'R'`.
   * País igual a `'United States'`.

10. Utilizar `.style` para formatear visualmente el top 10 de películas más largas.

In [109]:
'''Seleccionar type == movie que fueron agregadas depués del 2018'''
peliculas = df.loc[(df["year_added"] >= 2018) & (df["type"] == 'Movie')]
peliculas.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,new_date_added,year_added,month_added
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021-09-25,2021.0,9.0
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,2021-09-24,2021.0,9.0
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021-09-24,2021.0,9.0
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021-09-24,2021.0,9.0
12,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic","September 23, 2021",2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,2021-09-23,2021.0,9.0


In [110]:
'''Filtrar títulos que contienen la palabra love '''
love = df.loc[df['title'].str.lower().str.contains("love")]
love.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,new_date_added,year_added,month_added
25,s26,TV Show,Love on the Spectrum,,Brooke Satchwell,Australia,"September 21, 2021",2021,TV-14,2 Seasons,"Docuseries, International TV Shows, Reality TV",Finding love can be hard for anyone. For young...,2021-09-21,2021.0,9.0
158,s159,Movie,Love Don't Cost a Thing,Troy Byer,"Nick Cannon, Christina Milian, Kenan Thompson,...",United States,"September 1, 2021",2003,PG-13,101 min,"Comedies, Romantic Movies",A nerdy teen tries to make himself cool by ass...,2021-09-01,2021.0,9.0
159,s160,Movie,Love in a Puff,Pang Ho-cheung,"Miriam Chin Wah Yeung, Shawn Yue, Singh Hartih...",Hong Kong,"September 1, 2021",2010,TV-MA,103 min,"Comedies, Dramas, International Movies",When the Hong Kong government enacts a ban on ...,2021-09-01,2021.0,9.0
206,s207,Movie,"LSD: Love, Sex Aur Dhokha",Dibakar Banerjee,"Nushrat Bharucha, Anshuman Jha, Neha Chauhan, ...",India,"August 27, 2021",2010,TV-MA,112 min,"Dramas, Independent Movies, International Movies",This provocative drama examines how the voyeur...,2021-08-27,2021.0,8.0
227,s228,Movie,Really Love,Angel Kristi Williams,"Kofi Siriboe, Yootha Wong-Loi-Sing, Michael Ea...",United States,"August 25, 2021",2020,TV-MA,95 min,"Dramas, Independent Movies, Romantic Movies",A rising Black painter tries to break into a c...,2021-08-25,2021.0,8.0


In [111]:
'''Extraer minutos de películas'''
minutos = peliculas
minutos['minutos'] = peliculas["duration"].str.extract(r"(\w+) min").astype(int)
minutos.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  minutos['minutos'] = peliculas["duration"].str.extract(r"(\w+) min").astype(int)


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,new_date_added,year_added,month_added,minutos
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021-09-25,2021.0,9.0,90
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,2021-09-24,2021.0,9.0,91
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021-09-24,2021.0,9.0,125
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021-09-24,2021.0,9.0,104
12,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic","September 23, 2021",2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,2021-09-23,2021.0,9.0,127


In [112]:
'''fila por género'''
genero = df
genero["listed_in"] = genero["listed_in"].str.split(",")
genero = genero.explode("listed_in")
genero.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,new_date_added,year_added,month_added
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021-09-25,2021.0,9.0
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,International TV Shows,"After crossing paths at a party, a Cape Town t...",2021-09-24,2021.0,9.0
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,TV Dramas,"After crossing paths at a party, a Cape Town t...",2021-09-24,2021.0,9.0
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,TV Mysteries,"After crossing paths at a party, a Cape Town t...",2021-09-24,2021.0,9.0
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,Crime TV Shows,To protect his family from a powerful drug lor...,2021-09-24,2021.0,9.0


In [113]:
'''géneros más frecuentes'''
genero = genero["listed_in"].value_counts()
print("El top 10 de géneros más frecuentes son")
genero.head(10)

El top 10 de géneros más frecuentes son


Unnamed: 0_level_0,count
listed_in,Unnamed: 1_level_1
International Movies,2624
Dramas,1600
Comedies,1210
Action & Adventure,859
Documentaries,829
Dramas,827
International TV Shows,774
Independent Movies,736
TV Dramas,696
Romantic Movies,613


In [114]:
'''Películas como contenido largo'''
peliculas["Contenido_largo"] = "No"
peliculas["Contenido_largo"] = peliculas["Contenido_largo"].mask(peliculas["minutos"] > 120, "Sí")
peliculas.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  peliculas["Contenido_largo"] = "No"
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  peliculas["Contenido_largo"] = peliculas["Contenido_largo"].mask(peliculas["minutos"] > 120, "Sí")


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,new_date_added,year_added,month_added,minutos,Contenido_largo
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021-09-25,2021.0,9.0,90,No
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,2021-09-24,2021.0,9.0,91,No
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021-09-24,2021.0,9.0,125,Sí
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021-09-24,2021.0,9.0,104,No
12,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic","September 23, 2021",2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,2021-09-23,2021.0,9.0,127,Sí


In [115]:
'''Filtros 100 minutos, Rating y país'''
peliculas = peliculas.loc[(peliculas["minutos"]>100) & (peliculas["rating"] == "R") & (peliculas["country"] == "United States")]
peliculas.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,new_date_added,year_added,month_added,minutos,Contenido_largo
48,s49,Movie,Training Day,Antoine Fuqua,"Denzel Washington, Ethan Hawke, Scott Glenn, T...",United States,"September 16, 2021",2001,R,122 min,"Dramas, Thrillers",A rookie cop with one day to prove himself to ...,2021-09-16,2021.0,9.0,122,Sí
81,s82,Movie,Kate,Cedric Nicolas-Troyan,"Mary Elizabeth Winstead, Jun Kunimura, Woody H...",United States,"September 10, 2021",2021,R,106 min,Action & Adventure,"Slipped a fatal poison on her final job, a rut...",2021-09-10,2021.0,9.0,106,No
131,s132,Movie,Blade Runner: The Final Cut,Ridley Scott,"Harrison Ford, Rutger Hauer, Sean Young, Edwar...",United States,"September 1, 2021",1982,R,117 min,"Action & Adventure, Classic Movies, Cult Movies","In a smog-choked dystopian Los Angeles, blade ...",2021-09-01,2021.0,9.0,117,No
139,s140,Movie,Do the Right Thing,Spike Lee,"Danny Aiello, Ossie Davis, Ruby Dee, Richard E...",United States,"September 1, 2021",1989,R,120 min,"Classic Movies, Comedies, Dramas","On a sweltering day in Brooklyn, simmering rac...",2021-09-01,2021.0,9.0,120,No
144,s145,Movie,House Party,Reginald Hudlin,"Christopher Reid, Christopher Martin, Robin Ha...",United States,"September 1, 2021",1990,R,104 min,"Comedies, Cult Movies","Grounded by his strict father, Kid risks life ...",2021-09-01,2021.0,9.0,104,No


In [116]:
'''formatear el top 10 películas más largas'''
peliculas = peliculas.sort_values(by="minutos", ascending=False)
peliculas.head(10).style

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,new_date_added,year_added,month_added,minutos,Contenido_largo
3227,s3228,Movie,The Irishman,Martin Scorsese,"Robert De Niro, Al Pacino, Joe Pesci, Harvey Keitel, Ray Romano, Bobby Cannavale, Anna Paquin, Stephen Graham, Stephanie Kurtzuba, Kathrine Narducci, Welker White, Jesse Plemons, Jack Huston, Domenick Lombardozzi, Louis Cancelmi, Paul Herman, Gary Basaraba, Marin Ireland, Sebastian Maniscalco, Steven Van Zandt",United States,"November 27, 2019",2019,R,209 min,Dramas,Hit man Frank Sheeran looks back at the secrets he kept as a loyal member of the Bufalino crime family in this acclaimed film from Martin Scorsese.,2019-11-27 00:00:00,2019.0,11.0,209,Sí
7957,s7958,Movie,Schindler's List,Steven Spielberg,"Liam Neeson, Ben Kingsley, Ralph Fiennes, Caroline Goodall, Jonathan Sagall, Embeth Davidtz, Małgorzata Gebel, Shmulik Levy, Mark Ivanir, Beatrice Macola, Friedrich von Thun, Andrzej Seweryn",United States,"April 1, 2018",1993,R,195 min,"Classic Movies, Dramas","Oskar Schindler becomes an unlikely humanitarian, spending his entire fortune to help save 1,100 Jews from Auschwitz during World War II.",2018-04-01 00:00:00,2018.0,4.0,195,Sí
341,s342,Movie,Magnolia,Paul Thomas Anderson,"John C. Reilly, Philip Baker Hall, Tom Cruise, Julianne Moore, Philip Seymour Hoffman, William H. Macy, Jeremy Blackman, Jason Robards, Melinda Dillon, April Grace, Luis Guzmán, Ricky Jay, Alfred Molina, Michael Murphy, Melora Walters",United States,"August 1, 2021",1999,R,189 min,"Dramas, Independent Movies","Through chance, history and divine intervention, a cast of eclectic characters weaves and warps through each other's lives on a random day in California.",2021-08-01 00:00:00,2021.0,8.0,189,Sí
392,s393,Movie,Django Unchained,Quentin Tarantino,"Jamie Foxx, Christoph Waltz, Leonardo DiCaprio, Kerry Washington, Samuel L. Jackson, Walton Goggins, Dennis Christopher, James Remar, David Steen, Dana Gourrier, Nichole Galicia, Laura Cayouette, Ato Essandoh, Sammi Rotibi, Escalante Lundy, Don Johnson",United States,"July 24, 2021",2012,R,165 min,"Action & Adventure, Dramas","Accompanied by a German bounty hunter, a freed slave named Django travels across America to free his wife from a sadistic plantation owner.",2021-07-24 00:00:00,2021.0,7.0,165,Sí
2863,s2864,Movie,There Will Be Blood,Paul Thomas Anderson,"Daniel Day-Lewis, Paul Dano, Kevin J. O'Connor, Ciarán Hinds, Dillon Freasier, Sydney McCallister, David Willis, David Warshofsky, Colton Woodward, Russell Harvard",United States,"March 1, 2020",2007,R,158 min,"Dramas, Independent Movies","An ambitious prospector strikes it rich and turns a simple village into a boomtown, stoking the ire of a charismatic young preacher.",2020-03-01 00:00:00,2020.0,3.0,158,Sí
8802,s8803,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey Jr., Anthony Edwards, Brian Cox, Elias Koteas, Donal Logue, John Carroll Lynch, Dermot Mulroney, Chloë Sevigny",United States,"November 20, 2019",2007,R,158 min,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a pair of cops investigate San Francisco's infamous Zodiac Killer in this thriller based on a true story.",2019-11-20 00:00:00,2019.0,11.0,158,Sí
2393,s2394,Movie,Da 5 Bloods,Spike Lee,"Delroy Lindo, Jonathan Majors, Clarke Peters, Norm Lewis, Isiah Whitlock Jr., Mélanie Thierry, Paul Walter Hauser, Jasper Pääkkönen, Johnny Nguyen, Chadwick Boseman",United States,"June 12, 2020",2020,R,156 min,"Action & Adventure, Dramas",Four African American veterans return to Vietnam decades after the war to find their squad leader's remains — and a stash of buried gold. From Spike Lee.,2020-06-12 00:00:00,2020.0,6.0,156,Sí
564,s565,Movie,Boogie Nights,Paul Thomas Anderson,"Mark Wahlberg, Burt Reynolds, Julianne Moore, Heather Graham, William H. Macy, Don Cheadle, Philip Seymour Hoffman, Luis Guzmán, John C. Reilly, Philip Baker Hall",United States,"July 1, 2021",1997,R,155 min,"Comedies, Dramas, Independent Movies",A well-endowed busboy is taken in by a tight-knit group of 1970s porn actors and transforms himself into skin flick celebrity Dirk Diggler.,2021-07-01 00:00:00,2021.0,7.0,155,Sí
7113,s7114,Movie,Jackie Brown,Quentin Tarantino,"Pam Grier, Samuel L. Jackson, Robert Forster, Bridget Fonda, Michael Keaton, Robert De Niro, Michael Bowen, Chris Tucker, LisaGay Hamilton, Tommy 'Tiny' Lister, Hattie Winston, Sid Haig",United States,"August 1, 2019",1997,R,154 min,"Dramas, Thrillers","When an aging flight attendant's caught smuggling cash and forced to help with an investigation, she hatches a clever plan to make off with the dough.",2019-08-01 00:00:00,2019.0,8.0,154,Sí
7802,s7803,Movie,Pulp Fiction,Quentin Tarantino,"John Travolta, Samuel L. Jackson, Uma Thurman, Harvey Keitel, Tim Roth, Amanda Plummer, Maria de Medeiros, Ving Rhames, Eric Stoltz, Rosanna Arquette, Christopher Walken, Bruce Willis",United States,"January 1, 2019",1994,R,154 min,"Classic Movies, Cult Movies, Dramas","This stylized crime caper weaves together stories featuring a burger-loving hit man, his philosophical partner and a washed-up boxer.",2019-01-01 00:00:00,2019.0,1.0,154,Sí




### Pregunta Desafío

11. ¿Cuáles son las combinaciones más frecuentes de género y rating en el dataset?
    (Sugerencia: utilizar `value_counts` con `subset=["genre", "rating"]` después de aplicar `explode()`).



### Bonus: Análisis de duplicados y limpieza

12. ¿Existen películas con el mismo nombre (`title`) pero con distinto año de lanzamiento (`release_year`)?
13. ¿Cuántos títulos únicos hay en total en la columna `title`?





In [119]:
'''combinaciones más frecuentes de género y rating'''
genero_rating = df.explode("listed_in")
genero_rating = genero_rating.value_counts(subset = ["listed_in", "rating"])
print("Las combinaciones más frecuentes de género y rating son:")
genero_rating.head(10)

Las combinaciones más frecuentes de género y rating son:


Unnamed: 0_level_0,Unnamed: 1_level_0,count
listed_in,rating,Unnamed: 2_level_1
International Movies,TV-MA,1074
International Movies,TV-14,1022
Dramas,TV-MA,616
Dramas,TV-14,428
TV Dramas,TV-MA,401
Comedies,TV-MA,400
Comedies,TV-14,393
International TV Shows,TV-MA,368
International TV Shows,TV-MA,346
Independent Movies,TV-MA,335


In [138]:
'''Existen películas con el mismo nombre pero con distinto lanzamiento?'''
# data frame chiquito para analizar lo puntual
df_new = pd.DataFrame()
df_new["title"] = df["title"]
df_new["release_year"] = df["release_year"]
if df_new.duplicated().sum() != 0:
  df_new = df.drop_duplicates()
cant_duplicados = df_new["title"].duplicated().sum()
print("Hay ", cant_duplicados, " títulos duplicados con distinta fecha de lanzamiento")

filas = df.shape[0]
titulos_unicos = filas - cant_duplicados
print("Hay ", titulos_unicos, " títulos únicos en la columna title")



Hay  0  títulos duplicados con distinta fecha de lanzamiento
Hay  8807  títulos únicos en la columna title
