<a href="https://colab.research.google.com/github/fralfaro/MAT281/blob/main/docs/labs/lab_03.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# MAT281 - Laboratorio N°03





**Objetivo**: Aplicar técnicas avanzadas de manipulación y análisis de datos con pandas sobre un conjunto real de datos de contenido de Netflix, reforzando buenas prácticas y métodos eficientes sin recurrir a `groupby`, `merge`, `pivot`, ni `join`.



**Dataset**:

Trabajaremos con el archivo `netflix_titles.csv`, que contiene información sobre los títulos disponibles en la plataforma Netflix hasta el año 2021.

| Variable       | Clase     | Descripción                                                                 |
|----------------|-----------|------------------------------------------------------------------------------|
| show_id        | caracter  | Identificador único del título en el catálogo de Netflix.                   |
| type           | caracter  | Tipo de contenido: 'Movie' o 'TV Show'.                                     |
| title          | caracter  | Título del contenido.                                                       |
| director       | caracter  | Nombre del director (puede ser nulo).                                       |
| cast           | caracter  | Lista de actores principales (puede ser nulo).                              |
| country        | caracter  | País o países donde se produjo el contenido.                                |
| date_added     | fecha     | Fecha en la que el título fue agregado al catálogo de Netflix.              |
| release_year   | entero    | Año de lanzamiento original del título.                                     |
| rating         | caracter  | Clasificación por edad (por ejemplo: 'PG-13', 'TV-MA').                      |
| duration       | caracter  | Duración del contenido (minutos o número de temporadas para series).        |
| listed_in      | caracter  | Categorías o géneros en los que está clasificado el contenido.              |
| description    | caracter  | Breve sinopsis del contenido.                                               |




In [172]:
import pandas as pd

# Cargar datos
df = pd.read_csv('https://raw.githubusercontent.com/fralfaro/MAT281/main/docs/labs/data/netflix_titles.csv')
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...



### Parte 1: Limpieza y preparación

1. Revisar y describir el dataset:

   * ¿Cuántas filas y columnas tiene?
   * ¿Qué tipos de datos hay?
   * ¿Cuántos valores nulos hay por columna?

2. Transformar la columna `date_added` a tipo fecha.

3. Crear columnas auxiliares con `assign`:

   * Año (`year_added`)
   * Mes (`month_added`)



In [173]:
#Punto 1
df.shape

(8807, 12)

El dataset tiene 8807 filas y 12 columnas.

In [174]:
#df.dtypes
df.dtypes

Unnamed: 0,0
show_id,object
type,object
title,object
director,object
cast,object
country,object
date_added,object
release_year,int64
rating,object
duration,object


In [175]:
print(f"Clase de 'show_id': {type(df["show_id"][0])}")
print(f"Clase de 'type': {type(df["type"][0])}")
print(f"Clase de 'title': {type(df["title"][0])}")
print(f"Clase de 'director': {type(df["director"][0])}")
print(f"Clase de 'cast': {type(df["cast"][1])}")
print(f"Clase de 'duration': {type(df["duration"][0])}")
print(f"Clase de 'listed_in': {type(df["listed_in"][0])}")

Clase de 'show_id': <class 'str'>
Clase de 'type': <class 'str'>
Clase de 'title': <class 'str'>
Clase de 'director': <class 'str'>
Clase de 'cast': <class 'str'>
Clase de 'duration': <class 'str'>
Clase de 'listed_in': <class 'str'>


El dataframe contiene datos de tipo entero (int) y *string* (str).

In [176]:
df.isnull().sum()

Unnamed: 0,0
show_id,0
type,0
title,0
director,2634
cast,825
country,831
date_added,10
release_year,0
rating,4
duration,3


La columna "director" tiene 2634 datos nulos, siendo la columna con más cantidad de estos datos. Le siguen la columna "country" con 831 datos nulos, la columna "cast" con 825 datos nulos, "date_added" con 10 datos nulos, "rating" con 4 y "duration" con 3. Las demás columnas no tienen valores nulos.

In [177]:
#Punto 2
import datetime
df["date_added"] = pd.to_datetime(df["date_added"], errors="coerce")
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [178]:
#Punto 3
df = df.assign(year_added=df["date_added"].dt.year, month_added=df["date_added"].dt.month)
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021.0,9.0
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2021.0,9.0
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,2021.0,9.0
3,s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",2021.0,9.0
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,2021.0,9.0


## Parte 2: Técnicas avanzadas de pandas

4. Utilizar `.loc` para seleccionar películas (`type == 'Movie'`) que fueron agregadas después del año 2018.

5. Utilizar `str.contains()` y `str.extract()`:

   * Filtrar títulos que contienen la palabra 'love' (sin distinguir mayúsculas/minúsculas).
   * Extraer la duración en minutos para las películas desde la columna `duration`.

6. Aplicar `explode()` sobre la columna `listed_in` para obtener una fila por cada género.

7. Obtener un top 10 de géneros más frecuentes utilizando `value_counts()`.

8. Aplicar `where()` y `mask()` para marcar las películas de más de 120 minutos como contenido largo en una nueva columna.

9. Utilizar `.loc` para filtrar películas que cumplen con:

   * Más de 100 minutos de duración.
   * Rating igual a `'R'`.
   * País igual a `'United States'`.

10. Utilizar `.style` para formatear visualmente el top 10 de películas más largas.

**Suposición:** De ahora en adelante, asumiremos que los puntos son independientes entre sí.

In [179]:
#Punto 4
df_movies = df.loc[(df["type"]=="Movie") & (df["year_added"]>2018)]
df_movies.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021.0,9.0
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,2021-09-24,2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,2021.0,9.0
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...",2021-09-24,1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021.0,9.0
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,2021-09-24,2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021.0,9.0
12,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic",2021-09-23,2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,2021.0,9.0


In [180]:
#Punto 5
df_love = df.loc[df["title"].apply(lambda x: x.lower()).str.contains("love")]
df_love.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added
25,s26,TV Show,Love on the Spectrum,,Brooke Satchwell,Australia,2021-09-21,2021,TV-14,2 Seasons,"Docuseries, International TV Shows, Reality TV",Finding love can be hard for anyone. For young...,2021.0,9.0
158,s159,Movie,Love Don't Cost a Thing,Troy Byer,"Nick Cannon, Christina Milian, Kenan Thompson,...",United States,2021-09-01,2003,PG-13,101 min,"Comedies, Romantic Movies",A nerdy teen tries to make himself cool by ass...,2021.0,9.0
159,s160,Movie,Love in a Puff,Pang Ho-cheung,"Miriam Chin Wah Yeung, Shawn Yue, Singh Hartih...",Hong Kong,2021-09-01,2010,TV-MA,103 min,"Comedies, Dramas, International Movies",When the Hong Kong government enacts a ban on ...,2021.0,9.0
206,s207,Movie,"LSD: Love, Sex Aur Dhokha",Dibakar Banerjee,"Nushrat Bharucha, Anshuman Jha, Neha Chauhan, ...",India,2021-08-27,2010,TV-MA,112 min,"Dramas, Independent Movies, International Movies",This provocative drama examines how the voyeur...,2021.0,8.0
227,s228,Movie,Really Love,Angel Kristi Williams,"Kofi Siriboe, Yootha Wong-Loi-Sing, Michael Ea...",United States,2021-08-25,2020,TV-MA,95 min,"Dramas, Independent Movies, Romantic Movies",A rising Black painter tries to break into a c...,2021.0,8.0


In [181]:
df_duration=df.assign(duration_2=df["duration"].str.extract(r'(\d+)\s+min').astype(float))
#Este comando retorna NaN para las series, pues su duración está en temporadas en lugar de minutos
df_duration = df_duration.loc[df_duration["type"]=="Movie"]
df_duration.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added,duration_2
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021.0,9.0,90.0
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,2021-09-24,2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,2021.0,9.0,91.0
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...",2021-09-24,1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021.0,9.0,125.0
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,2021-09-24,2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021.0,9.0,104.0
12,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic",2021-09-23,2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,2021.0,9.0,127.0


In [182]:
# Punto 6
#Transformamos datos de columna "listed_in" a listas
df_exp = df.assign(listed_in=df["listed_in"].apply(lambda x: x.split(", ")))
#Explotamos
df_exp = df_exp.explode("listed_in")
df_exp.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021.0,9.0
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,International TV Shows,"After crossing paths at a party, a Cape Town t...",2021.0,9.0
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,TV Dramas,"After crossing paths at a party, a Cape Town t...",2021.0,9.0
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,TV Mysteries,"After crossing paths at a party, a Cape Town t...",2021.0,9.0
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,1 Season,Crime TV Shows,To protect his family from a powerful drug lor...,2021.0,9.0


In [183]:
#Punto 7
#Usamos df_exp del punto anterior para que el conteo de géneros funcione
df_exp["listed_in"].value_counts().head(10) #Imprime el top 10

Unnamed: 0_level_0,count
listed_in,Unnamed: 1_level_1
International Movies,2752
Dramas,2427
Comedies,1674
International TV Shows,1351
Documentaries,869
Action & Adventure,859
TV Dramas,763
Independent Movies,756
Children & Family Movies,641
Romantic Movies,616


In [184]:
#Punto 8
#Reutilizamos la columna "duration_2"
df_long = df.assign(duration_2=df["duration"].str.extract(r'(\d+)\s+min').astype(float))
#Marcamos las películas de más de 120 mins
df_long["long_content"] = df_long["duration_2"].mask(df_long["duration_2"]>120, True)
#Si no son de más de 120 mins, las marcamos como False
df_long["long_content"] = df_long["long_content"].apply(lambda x: False if x!=True else True)
#Borramos la columna auxiliar
df_long = df_long.drop("duration_2", axis=1)
df_long.loc[df_long["type"]=="Movie"].head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added,long_content
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021.0,9.0,False
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,2021-09-24,2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,2021.0,9.0,False
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...",2021-09-24,1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021.0,9.0,True
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,2021-09-24,2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021.0,9.0,False
12,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic",2021-09-23,2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,2021.0,9.0,True


**Suposición:** Para el Punto 9, asumiremos que se pide filtrar las películas que cumplan con las tres condiciones pedidas al mismo tiempo.

In [185]:
#Punto 9
#Reutilizamos la columna "duration_2"
df_filter = df.assign(duration_2=df["duration"].str.extract(r'(\d+)\s+min').astype(float))
df_filter = df_filter.loc[(df_filter["type"]=="Movie") & (df_filter["duration_2"]>100) & (df_filter["rating"]=="R") & (df_filter["country"]=="United States")]
df_filter.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added,duration_2
48,s49,Movie,Training Day,Antoine Fuqua,"Denzel Washington, Ethan Hawke, Scott Glenn, T...",United States,2021-09-16,2001,R,122 min,"Dramas, Thrillers",A rookie cop with one day to prove himself to ...,2021.0,9.0,122.0
81,s82,Movie,Kate,Cedric Nicolas-Troyan,"Mary Elizabeth Winstead, Jun Kunimura, Woody H...",United States,2021-09-10,2021,R,106 min,Action & Adventure,"Slipped a fatal poison on her final job, a rut...",2021.0,9.0,106.0
131,s132,Movie,Blade Runner: The Final Cut,Ridley Scott,"Harrison Ford, Rutger Hauer, Sean Young, Edwar...",United States,2021-09-01,1982,R,117 min,"Action & Adventure, Classic Movies, Cult Movies","In a smog-choked dystopian Los Angeles, blade ...",2021.0,9.0,117.0
139,s140,Movie,Do the Right Thing,Spike Lee,"Danny Aiello, Ossie Davis, Ruby Dee, Richard E...",United States,2021-09-01,1989,R,120 min,"Classic Movies, Comedies, Dramas","On a sweltering day in Brooklyn, simmering rac...",2021.0,9.0,120.0
144,s145,Movie,House Party,Reginald Hudlin,"Christopher Reid, Christopher Martin, Robin Ha...",United States,2021-09-01,1990,R,104 min,"Comedies, Cult Movies","Grounded by his strict father, Kid risks life ...",2021.0,9.0,104.0


In [193]:
#Punto 10
#Función para aplicar .style
def make_pretty(styler):
    styler.set_caption("Top 10 Most Long Movies") #Título
    styler.hide() #Quita los índices
    styler.hide(subset=["show_id","type","date_added","year_added","month_added","country","release_year","rating","description","cast","director"], axis=1) #Quita columnas
    headers = {
    'selector': 'th:not(.index_name)',
    'props': 'background-color: #000066; color: white;'
    }
    styler.set_table_styles([headers]) #Colorea la primera fila
    styler.background_gradient(subset=["duration"], cmap='YlGnBu') #Colorea la columna "duration" de acuerdo a sus valores
    return styler

#Reutilizamos la columna "duration_2"
df_top = df.assign(duration=df["duration"].str.extract(r'(\d+)\s+min').astype(float))
#Filtramos por películas
df_top = df_top.loc[df_top["type"]=="Movie"]
#Ordenamos por duración
df_top = df_top.sort_values("duration", ascending=False)
df_top = df_top.head(10)
#aplicamos .style
df_top = df_top.style.pipe(make_pretty)
df_top

title,duration,listed_in
Black Mirror: Bandersnatch,312.0,"Dramas, International Movies, Sci-Fi & Fantasy"
Headspace: Unwind Your Mind,273.0,Documentaries
The School of Mischief,253.0,"Comedies, Dramas, International Movies"
No Longer kids,237.0,"Comedies, Dramas, International Movies"
Lock Your Girls In,233.0,"Comedies, International Movies, Romantic Movies"
Raya and Sakina,230.0,"Comedies, Dramas, International Movies"
Once Upon a Time in America,229.0,"Classic Movies, Dramas"
Sangam,228.0,"Classic Movies, Dramas, International Movies"
Lagaan,224.0,"Dramas, International Movies, Music & Musicals"
Jodhaa Akbar,214.0,"Action & Adventure, Dramas, International Movies"




### Pregunta Desafío

11. ¿Cuáles son las combinaciones más frecuentes de género y rating en el dataset?
    (Sugerencia: utilizar `value_counts` con `subset=["genre", "rating"]` después de aplicar `explode()`).



### Bonus: Análisis de duplicados y limpieza

12. ¿Existen películas con el mismo nombre (`title`) pero con distinto año de lanzamiento (`release_year`)?
13. ¿Cuántos títulos únicos hay en total en la columna `title`?





In [187]:
#Punto 11
#Reutilizamos código del punto 6
df_def = df.assign(listed_in=df["listed_in"].apply(lambda x: x.split(", ")))
df_def = df_def.explode("listed_in")
df_def.value_counts(subset=["listed_in", "rating"]).head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,count
listed_in,rating,Unnamed: 2_level_1
International Movies,TV-MA,1130
International Movies,TV-14,1065
Dramas,TV-MA,830
International TV Shows,TV-MA,714
Dramas,TV-14,693
International TV Shows,TV-14,472
Comedies,TV-14,465
TV Dramas,TV-MA,434
Comedies,TV-MA,431
Dramas,R,375


Las combinaciones más frecuentes de género y rating son International Movies con TV-MA, International Movies con TV-14 y Dramas con TV-MA.

In [188]:
#Punto 12
#Función para contar cuántas veces aparece un título en la columna "title"
def contar_repetidos(serie,pelicula):
  contador = 0
  for peli in serie:
    if peli.lower() == pelicula.lower():
      contador += 1
  return contador

df_bonus = df.loc[df["type"]=="Movie"]

df_bonus = df_bonus.assign(title_counter = df_bonus["title"].apply(lambda x: contar_repetidos(df_bonus["title"], x)))
df_bonus.loc[df_bonus["title_counter"]>1].head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added,title_counter
159,s160,Movie,Love in a Puff,Pang Ho-cheung,"Miriam Chin Wah Yeung, Shawn Yue, Singh Hartih...",Hong Kong,2021-09-01,2010,TV-MA,103 min,"Comedies, Dramas, International Movies",When the Hong Kong government enacts a ban on ...,2021.0,9.0,2
303,s304,Movie,Esperando la carroza,Alejandro Doria,"Luis Brandoni, China Zorrilla, Antonio Gasalla...",Argentina,2021-08-05,1985,TV-MA,95 min,"Comedies, Cult Movies, International Movies",Cora has three sons and a daughter and she´s a...,2021.0,8.0,2
6705,s6706,Movie,Esperando La Carroza,Alejandro Doria,"Luis Brandoni, China Zorrilla, Antonio Gasalla...",Argentina,2018-07-15,1985,NR,95 min,"Comedies, Cult Movies, International Movies",Cora has three sons and a daughter and she´s a...,2018.0,7.0,2
7345,s7346,Movie,Love In A Puff,Pang Ho-cheung,"Miriam Chin Wah Yeung, Shawn Yue, Singh Hartih...",Hong Kong,2018-08-01,2010,TV-MA,103 min,"Comedies, Dramas, International Movies",When the Hong Kong government enacts a ban on ...,2018.0,8.0,2


Podemos observar que sí existen películas repetidas con el mismo título. Sin embargo, estas películas tienen el mismo año de lanzamiento. Por tanto, no existen películas con el mismo nombre pero con distinto año de lanzamiento.

In [189]:
#Punto 13
df_bonus2 = df.assign(title = df["title"].apply(lambda x: x.lower()))

df_bonus3 = df.loc[df["type"]=="Movie"]
df_bonus3 = df_bonus3.assign(title = df_bonus3["title"].apply(lambda x: x.lower()))


print(f"Número total de títulos: {len(df_bonus2["title"])}")
print(f"Número de títulos únicas: {len(df_bonus2["title"].unique())}\n")
print(f"Número total de películas: {len(df_bonus3["title"])}")
print(f"Número de películas únicas: {len(df_bonus3["title"].unique())}")

Número total de títulos: 8807
Número de títulos únicas: 8802

Número total de películas: 6131
Número de películas únicas: 6129


Hay 8802 títulos únicos. Asimismo, hay 6129 títulos de películas únicos.