# ETL

**1º PASO: Importación de librerías que se utilizarán en el proyecto**

In [2]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.ticker import FormatStrFormatter
import seaborn as sns
import ast
import json

**2º PASO: Carga de la base de datos y análisis de la misma**

In [5]:
# Lee el archivo CSV sin realizar ninguna conversión de tipos
df_movies = pd.read_csv('movies_dataset.csv', low_memory=False)
df_movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45466 entries, 0 to 45465
Data columns (total 24 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   adult                  45466 non-null  object 
 1   belongs_to_collection  4494 non-null   object 
 2   budget                 45466 non-null  object 
 3   genres                 45466 non-null  object 
 4   homepage               7782 non-null   object 
 5   id                     45466 non-null  object 
 6   imdb_id                45449 non-null  object 
 7   original_language      45455 non-null  object 
 8   original_title         45466 non-null  object 
 9   overview               44512 non-null  object 
 10  popularity             45461 non-null  object 
 11  poster_path            45080 non-null  object 
 12  production_companies   45463 non-null  object 
 13  production_countries   45463 non-null  object 
 14  release_date           45379 non-null  object 
 15  re

Con el fin de prepara el DataFrame para vincularlo con la información del archivo `credits.csv`, se ve la necesidad de adecuar el campo `id` que posee formato de objeto, y conertirlo a valores numéricos.

In [3]:
# Intenta convertir los valores de la columna 'id' a enteros
def convert_id(value):
    try:
        return int(value)
    except ValueError:
        return None

# Filtrar las filas en las que las celdas de 'id' contengan fechas o sean nulas
df_movies = df_movies[df_movies['id'].apply(lambda x: not (pd.isnull(x) or isinstance(x, pd._libs.tslibs.nattype.NaTType)))]

# Convertir el resto de las celdas de 'id' a valores numéricos
df_movies['id'] = df_movies['id'].apply(convert_id)
df_movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45466 entries, 0 to 45465
Data columns (total 24 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   adult                  45466 non-null  object 
 1   belongs_to_collection  4494 non-null   object 
 2   budget                 45466 non-null  object 
 3   genres                 45466 non-null  object 
 4   homepage               7782 non-null   object 
 5   id                     45463 non-null  float64
 6   imdb_id                45449 non-null  object 
 7   original_language      45455 non-null  object 
 8   original_title         45466 non-null  object 
 9   overview               44512 non-null  object 
 10  popularity             45461 non-null  object 
 11  poster_path            45080 non-null  object 
 12  production_companies   45463 non-null  object 
 13  production_countries   45463 non-null  object 
 14  release_date           45379 non-null  object 
 15  re

Se procede a levantar el archivo `credits.csv`, y se lo analiza.

In [4]:
df_credits = pd.read_csv('credits.csv')
df_credits.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45476 entries, 0 to 45475
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   cast    45476 non-null  object
 1   crew    45476 non-null  object
 2   id      45476 non-null  int64 
dtypes: int64(1), object(2)
memory usage: 1.0+ MB


Al verificar que el campo `id` posee valores numéricos (int64) en ambos DataFrames, se procede a unirlos mediante este campo.

In [5]:
df_merged = df_credits.merge(df_movies, on='id')
df_merged.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45538 entries, 0 to 45537
Data columns (total 26 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   cast                   45538 non-null  object 
 1   crew                   45538 non-null  object 
 2   id                     45538 non-null  int64  
 3   adult                  45538 non-null  object 
 4   belongs_to_collection  4500 non-null   object 
 5   budget                 45538 non-null  object 
 6   genres                 45538 non-null  object 
 7   homepage               7792 non-null   object 
 8   imdb_id                45521 non-null  object 
 9   original_language      45527 non-null  object 
 10  original_title         45538 non-null  object 
 11  overview               44584 non-null  object 
 12  popularity             45535 non-null  object 
 13  poster_path            45152 non-null  object 
 14  production_companies   45535 non-null  object 
 15  pr

In [6]:
df_merged.head()

Unnamed: 0,cast,crew,id,adult,belongs_to_collection,budget,genres,homepage,imdb_id,original_language,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...",862,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,tt0114709,en,...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...",8844,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,tt0113497,en,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...",15602,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,tt0113228,en,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0
3,"[{'cast_id': 1, 'character': ""Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...",31357,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,tt0114885,en,...,1995-12-22,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0
4,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...",11862,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,tt0113041,en,...,1995-02-10,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0


Ahora, el DataFrame ya se encuentra en condiciones de pasar a la siguiente etapa de transformación.

**3º PASO: Transformaciones**<br>
            Se realizarán principalmente el desanidado de las columnas que lo necesiten, para poder acceder más fácil a la información que contienen.<br>
            Las columnas a las que se les aplicará esta transformación son: 'cast', 'crew', 'belongs_to_collection', 'géneros', 'production_companies', 'production_countries' y 'spoken_languages.

Para desanidar las celdas del campo `belongs_to_collection`, se empleó el módulo `ast` que permite evaluar las cadenas de texto como diccionarios, para así desanidar las columnas. Se genera una nueva columna que se denomina `franquicia`, en donde se aloja solamente el nombre (_name_) de la misma.

In [7]:
# Primero, empleo la función 'literal_eval()' del módulo 'ast' para evaluar las cadenas de texto como diccionarios y desanidar las columnas.
# Después utilizo la función 'apply(pd.Series)' para expandir las claves del diccionario en columnas separadas

df_merged['franquicia'] = df_merged['belongs_to_collection'].apply(lambda x: ast.literal_eval(x)['name'] if isinstance(x, str) and isinstance(ast.literal_eval(x), dict) else '').tolist()

df_merged

Unnamed: 0,cast,crew,id,adult,belongs_to_collection,budget,genres,homepage,imdb_id,original_language,...,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,franquicia
0,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...",862,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,tt0114709,en,...,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0,Toy Story Collection
1,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...",8844,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,tt0113497,en,...,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0,
2,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...",15602,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,tt0113228,en,...,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0,Grumpy Old Men Collection
3,"[{'cast_id': 1, 'character': ""Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...",31357,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,tt0114885,en,...,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0,
4,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...",11862,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,tt0113041,en,...,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0,Father of the Bride Collection
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45533,"[{'cast_id': 0, 'character': '', 'credit_id': ...","[{'credit_id': '5894a97d925141426c00818c', 'de...",439050,False,,0,"[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'n...",http://www.imdb.com/title/tt6209470/,tt6209470,fa,...,0.0,90.0,"[{'iso_639_1': 'fa', 'name': 'فارسی'}]",Released,Rising and falling between a man and woman,Subdue,False,4.0,1.0,
45534,"[{'cast_id': 1002, 'character': 'Sister Angela...","[{'credit_id': '52fe4af1c3a36847f81e9b15', 'de...",111109,False,,0,"[{'id': 18, 'name': 'Drama'}]",,tt2028550,tl,...,0.0,360.0,"[{'iso_639_1': 'tl', 'name': ''}]",Released,,Century of Birthing,False,9.0,3.0,
45535,"[{'cast_id': 6, 'character': 'Emily Shaw', 'cr...","[{'credit_id': '52fe4776c3a368484e0c8387', 'de...",67758,False,,0,"[{'id': 28, 'name': 'Action'}, {'id': 18, 'nam...",,tt0303758,en,...,0.0,90.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,A deadly game of wits.,Betrayal,False,3.8,6.0,
45536,"[{'cast_id': 2, 'character': '', 'credit_id': ...","[{'credit_id': '533bccebc3a36844cf0011a7', 'de...",227506,False,,0,[],,tt0008536,en,...,0.0,87.0,[],Released,,Satan Triumphant,False,0.0,0.0,


Para el caso del campo `genres` se procede de manera similar, recuperando en una nueva columna (**géneros**) el género o la lista de géneros que corresponden a cada película.

In [9]:
df_merged['géneros']=df_merged['genres'].str.replace("'",'"')
df_merged['géneros']=df_merged['géneros'].apply(lambda x: [item['name'] for item in json.loads(x)]).tolist()

df_merged

Unnamed: 0,cast,crew,id,adult,belongs_to_collection,budget,genres,homepage,imdb_id,original_language,...,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,franquicia,géneros
0,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...",862,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,tt0114709,en,...,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0,Toy Story Collection,"[Animation, Comedy, Family]"
1,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...",8844,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,tt0113497,en,...,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0,,"[Adventure, Fantasy, Family]"
2,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...",15602,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,tt0113228,en,...,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0,Grumpy Old Men Collection,"[Romance, Comedy]"
3,"[{'cast_id': 1, 'character': ""Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...",31357,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,tt0114885,en,...,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0,,"[Comedy, Drama, Romance]"
4,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...",11862,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,tt0113041,en,...,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0,Father of the Bride Collection,[Comedy]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45533,"[{'cast_id': 0, 'character': '', 'credit_id': ...","[{'credit_id': '5894a97d925141426c00818c', 'de...",439050,False,,0,"[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'n...",http://www.imdb.com/title/tt6209470/,tt6209470,fa,...,90.0,"[{'iso_639_1': 'fa', 'name': 'فارسی'}]",Released,Rising and falling between a man and woman,Subdue,False,4.0,1.0,,"[Drama, Family]"
45534,"[{'cast_id': 1002, 'character': 'Sister Angela...","[{'credit_id': '52fe4af1c3a36847f81e9b15', 'de...",111109,False,,0,"[{'id': 18, 'name': 'Drama'}]",,tt2028550,tl,...,360.0,"[{'iso_639_1': 'tl', 'name': ''}]",Released,,Century of Birthing,False,9.0,3.0,,[Drama]
45535,"[{'cast_id': 6, 'character': 'Emily Shaw', 'cr...","[{'credit_id': '52fe4776c3a368484e0c8387', 'de...",67758,False,,0,"[{'id': 28, 'name': 'Action'}, {'id': 18, 'nam...",,tt0303758,en,...,90.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,A deadly game of wits.,Betrayal,False,3.8,6.0,,"[Action, Drama, Thriller]"
45536,"[{'cast_id': 2, 'character': '', 'credit_id': ...","[{'credit_id': '533bccebc3a36844cf0011a7', 'de...",227506,False,,0,[],,tt0008536,en,...,87.0,[],Released,,Satan Triumphant,False,0.0,0.0,,[]


Para el desanidado de las columnas `production_companies`, `production_countries`, `spoken_languages` y `cast`, se desarrolló una función que recupera el valor asignado a `name` en cada campo correspondiente. Esta información recuperada se aloja en un nuevo campo que reemplazará al original.<br>La función desanidado_columnas(x) toma como argumento una celda x de una columna del DataFrame. Su objetivo es desanidar los valores de la celda y extraer los nombres de los elementos anidados en una lista.

In [10]:
# 1º- Comprueba si el valor de la celda es nulo o un booleano. Si es así, devuelve una lista vacía.
# 2º- Si el valor de la celda es una cadena de texto, se utiliza la función ast.literal_eval() para convertir la cadena en una estructura de datos.
# 3º- Si la estructura de datos es una lista, se itera sobre los elementos de la lista y se extrae el nombre de cada elemento utilizando la clave 'name'.
# 4º- Devuelve una lista con los nombres de los elementos desanidados.
# 5º- Si ocurre una excepción de tipo SyntaxError o ValueError durante el proceso, devuelve una lista vacía.
def desanidado_columnas(x):
    try:
        if pd.isnull(x) or isinstance(x, bool):
            return []
        elif isinstance(x, str):
            data = ast.literal_eval(x)
            if isinstance(data, list):
                return [item['name'] for item in data]
            else:
                return []
        else:
            return []
    except (SyntaxError, ValueError):
        return []

df_merged['productoras'] = df_merged['production_companies'].apply(desanidado_columnas)
df_merged['países'] = df_merged['production_countries'].apply(desanidado_columnas)
df_merged['doblajes'] = df_merged['spoken_languages'].apply(desanidado_columnas)
df_merged['elenco'] = df_merged['cast'].apply(desanidado_columnas)
df_merged

Unnamed: 0,cast,crew,id,adult,belongs_to_collection,budget,genres,homepage,imdb_id,original_language,...,title,video,vote_average,vote_count,franquicia,géneros,productoras,países,doblajes,elenco
0,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...",862,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,tt0114709,en,...,Toy Story,False,7.7,5415.0,Toy Story Collection,"[Animation, Comedy, Family]",[Pixar Animation Studios],[United States of America],[English],"[Tom Hanks, Tim Allen, Don Rickles, Jim Varney..."
1,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...",8844,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,tt0113497,en,...,Jumanji,False,6.9,2413.0,,"[Adventure, Fantasy, Family]","[TriStar Pictures, Teitler Film, Interscope Co...",[United States of America],"[English, Français]","[Robin Williams, Jonathan Hyde, Kirsten Dunst,..."
2,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...",15602,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,tt0113228,en,...,Grumpier Old Men,False,6.5,92.0,Grumpy Old Men Collection,"[Romance, Comedy]","[Warner Bros., Lancaster Gate]",[United States of America],[English],"[Walter Matthau, Jack Lemmon, Ann-Margret, Sop..."
3,"[{'cast_id': 1, 'character': ""Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...",31357,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,tt0114885,en,...,Waiting to Exhale,False,6.1,34.0,,"[Comedy, Drama, Romance]",[Twentieth Century Fox Film Corporation],[United States of America],[English],"[Whitney Houston, Angela Bassett, Loretta Devi..."
4,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...",11862,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,tt0113041,en,...,Father of the Bride Part II,False,5.7,173.0,Father of the Bride Collection,[Comedy],"[Sandollar Productions, Touchstone Pictures]",[United States of America],[English],"[Steve Martin, Diane Keaton, Martin Short, Kim..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45533,"[{'cast_id': 0, 'character': '', 'credit_id': ...","[{'credit_id': '5894a97d925141426c00818c', 'de...",439050,False,,0,"[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'n...",http://www.imdb.com/title/tt6209470/,tt6209470,fa,...,Subdue,False,4.0,1.0,,"[Drama, Family]",[],[Iran],[فارسی],"[Leila Hatami, Kourosh Tahami, Elham Korda]"
45534,"[{'cast_id': 1002, 'character': 'Sister Angela...","[{'credit_id': '52fe4af1c3a36847f81e9b15', 'de...",111109,False,,0,"[{'id': 18, 'name': 'Drama'}]",,tt2028550,tl,...,Century of Birthing,False,9.0,3.0,,[Drama],[Sine Olivia],[Philippines],[],"[Angel Aquino, Perry Dizon, Hazel Orencio, Joe..."
45535,"[{'cast_id': 6, 'character': 'Emily Shaw', 'cr...","[{'credit_id': '52fe4776c3a368484e0c8387', 'de...",67758,False,,0,"[{'id': 28, 'name': 'Action'}, {'id': 18, 'nam...",,tt0303758,en,...,Betrayal,False,3.8,6.0,,"[Action, Drama, Thriller]",[American World Pictures],[United States of America],[English],"[Erika Eleniak, Adam Baldwin, Julie du Page, J..."
45536,"[{'cast_id': 2, 'character': '', 'credit_id': ...","[{'credit_id': '533bccebc3a36844cf0011a7', 'de...",227506,False,,0,[],,tt0008536,en,...,Satan Triumphant,False,0.0,0.0,,[],[Yermoliev],[Russia],[],"[Iwan Mosschuchin, Nathalie Lissenko, Pavel Pa..."


Para recuperar el nombre del director, se trabaja con el campo `crew` y se recupera el nombre `name` en los casos en el que la variable `job` tiene el valor `Director`.

In [11]:
# Función para obtener el nombre del director
def get_director_name(crew):
    crew_list = ast.literal_eval(crew)
    for member in crew_list:
        if member['job'] == 'Director':
            return member['name']
    return None

# Aplicar la función a la columna 'crew'
df_merged['director'] = df_merged['crew'].apply(get_director_name)
df_merged

Unnamed: 0,cast,crew,id,adult,belongs_to_collection,budget,genres,homepage,imdb_id,original_language,...,video,vote_average,vote_count,franquicia,géneros,productoras,países,doblajes,elenco,director
0,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...",862,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,tt0114709,en,...,False,7.7,5415.0,Toy Story Collection,"[Animation, Comedy, Family]",[Pixar Animation Studios],[United States of America],[English],"[Tom Hanks, Tim Allen, Don Rickles, Jim Varney...",John Lasseter
1,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...",8844,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,tt0113497,en,...,False,6.9,2413.0,,"[Adventure, Fantasy, Family]","[TriStar Pictures, Teitler Film, Interscope Co...",[United States of America],"[English, Français]","[Robin Williams, Jonathan Hyde, Kirsten Dunst,...",Joe Johnston
2,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...",15602,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,tt0113228,en,...,False,6.5,92.0,Grumpy Old Men Collection,"[Romance, Comedy]","[Warner Bros., Lancaster Gate]",[United States of America],[English],"[Walter Matthau, Jack Lemmon, Ann-Margret, Sop...",Howard Deutch
3,"[{'cast_id': 1, 'character': ""Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...",31357,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,tt0114885,en,...,False,6.1,34.0,,"[Comedy, Drama, Romance]",[Twentieth Century Fox Film Corporation],[United States of America],[English],"[Whitney Houston, Angela Bassett, Loretta Devi...",Forest Whitaker
4,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...",11862,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,tt0113041,en,...,False,5.7,173.0,Father of the Bride Collection,[Comedy],"[Sandollar Productions, Touchstone Pictures]",[United States of America],[English],"[Steve Martin, Diane Keaton, Martin Short, Kim...",Charles Shyer
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45533,"[{'cast_id': 0, 'character': '', 'credit_id': ...","[{'credit_id': '5894a97d925141426c00818c', 'de...",439050,False,,0,"[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'n...",http://www.imdb.com/title/tt6209470/,tt6209470,fa,...,False,4.0,1.0,,"[Drama, Family]",[],[Iran],[فارسی],"[Leila Hatami, Kourosh Tahami, Elham Korda]",Hamid Nematollah
45534,"[{'cast_id': 1002, 'character': 'Sister Angela...","[{'credit_id': '52fe4af1c3a36847f81e9b15', 'de...",111109,False,,0,"[{'id': 18, 'name': 'Drama'}]",,tt2028550,tl,...,False,9.0,3.0,,[Drama],[Sine Olivia],[Philippines],[],"[Angel Aquino, Perry Dizon, Hazel Orencio, Joe...",Lav Diaz
45535,"[{'cast_id': 6, 'character': 'Emily Shaw', 'cr...","[{'credit_id': '52fe4776c3a368484e0c8387', 'de...",67758,False,,0,"[{'id': 28, 'name': 'Action'}, {'id': 18, 'nam...",,tt0303758,en,...,False,3.8,6.0,,"[Action, Drama, Thriller]",[American World Pictures],[United States of America],[English],"[Erika Eleniak, Adam Baldwin, Julie du Page, J...",Mark L. Lester
45536,"[{'cast_id': 2, 'character': '', 'credit_id': ...","[{'credit_id': '533bccebc3a36844cf0011a7', 'de...",227506,False,,0,[],,tt0008536,en,...,False,0.0,0.0,,[],[Yermoliev],[Russia],[],"[Iwan Mosschuchin, Nathalie Lissenko, Pavel Pa...",Yakov Protazanov


Los valores nulos de los campos  revenue ,  budget deben ser rellenados por el número  0.
Para eso empleo la función fillna() con el argumento 0.

In [12]:
df_merged['revenue'] = df_merged['revenue'].fillna(0)
df_merged['budget'] = df_merged['budget'].fillna(0)
df_merged

Unnamed: 0,cast,crew,id,adult,belongs_to_collection,budget,genres,homepage,imdb_id,original_language,...,video,vote_average,vote_count,franquicia,géneros,productoras,países,doblajes,elenco,director
0,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...",862,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,tt0114709,en,...,False,7.7,5415.0,Toy Story Collection,"[Animation, Comedy, Family]",[Pixar Animation Studios],[United States of America],[English],"[Tom Hanks, Tim Allen, Don Rickles, Jim Varney...",John Lasseter
1,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...",8844,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,tt0113497,en,...,False,6.9,2413.0,,"[Adventure, Fantasy, Family]","[TriStar Pictures, Teitler Film, Interscope Co...",[United States of America],"[English, Français]","[Robin Williams, Jonathan Hyde, Kirsten Dunst,...",Joe Johnston
2,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...",15602,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,tt0113228,en,...,False,6.5,92.0,Grumpy Old Men Collection,"[Romance, Comedy]","[Warner Bros., Lancaster Gate]",[United States of America],[English],"[Walter Matthau, Jack Lemmon, Ann-Margret, Sop...",Howard Deutch
3,"[{'cast_id': 1, 'character': ""Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...",31357,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,tt0114885,en,...,False,6.1,34.0,,"[Comedy, Drama, Romance]",[Twentieth Century Fox Film Corporation],[United States of America],[English],"[Whitney Houston, Angela Bassett, Loretta Devi...",Forest Whitaker
4,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...",11862,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,tt0113041,en,...,False,5.7,173.0,Father of the Bride Collection,[Comedy],"[Sandollar Productions, Touchstone Pictures]",[United States of America],[English],"[Steve Martin, Diane Keaton, Martin Short, Kim...",Charles Shyer
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45533,"[{'cast_id': 0, 'character': '', 'credit_id': ...","[{'credit_id': '5894a97d925141426c00818c', 'de...",439050,False,,0,"[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'n...",http://www.imdb.com/title/tt6209470/,tt6209470,fa,...,False,4.0,1.0,,"[Drama, Family]",[],[Iran],[فارسی],"[Leila Hatami, Kourosh Tahami, Elham Korda]",Hamid Nematollah
45534,"[{'cast_id': 1002, 'character': 'Sister Angela...","[{'credit_id': '52fe4af1c3a36847f81e9b15', 'de...",111109,False,,0,"[{'id': 18, 'name': 'Drama'}]",,tt2028550,tl,...,False,9.0,3.0,,[Drama],[Sine Olivia],[Philippines],[],"[Angel Aquino, Perry Dizon, Hazel Orencio, Joe...",Lav Diaz
45535,"[{'cast_id': 6, 'character': 'Emily Shaw', 'cr...","[{'credit_id': '52fe4776c3a368484e0c8387', 'de...",67758,False,,0,"[{'id': 28, 'name': 'Action'}, {'id': 18, 'nam...",,tt0303758,en,...,False,3.8,6.0,,"[Action, Drama, Thriller]",[American World Pictures],[United States of America],[English],"[Erika Eleniak, Adam Baldwin, Julie du Page, J...",Mark L. Lester
45536,"[{'cast_id': 2, 'character': '', 'credit_id': ...","[{'credit_id': '533bccebc3a36844cf0011a7', 'de...",227506,False,,0,[],,tt0008536,en,...,False,0.0,0.0,,[],[Yermoliev],[Russia],[],"[Iwan Mosschuchin, Nathalie Lissenko, Pavel Pa...",Yakov Protazanov


Luego elimino los valores nulos del campo release_date empleando dropna

In [13]:

df_merged = df_merged.dropna(subset=['release_date'])
df_merged.info()

<class 'pandas.core.frame.DataFrame'>
Index: 45451 entries, 0 to 45537
Data columns (total 33 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   cast                   45451 non-null  object 
 1   crew                   45451 non-null  object 
 2   id                     45451 non-null  int64  
 3   adult                  45451 non-null  object 
 4   belongs_to_collection  4497 non-null   object 
 5   budget                 45451 non-null  object 
 6   genres                 45451 non-null  object 
 7   homepage               7779 non-null   object 
 8   imdb_id                45437 non-null  object 
 9   original_language      45440 non-null  object 
 10  original_title         45451 non-null  object 
 11  overview               44510 non-null  object 
 12  popularity             45451 non-null  object 
 13  poster_path            45112 non-null  object 
 14  production_companies   45451 non-null  object 
 15  product

Se realiza una adecuación de los valores fechas para poder después generar un nuevo campo con el año de estreno (release_year)

In [15]:
# Adecuación de los valores de fechas:
# Para convertir la columna 'release_date' al tipo de datos datetime, utilizo el método to_datetime de Panda
# El parámetro errors='coerce' se utiliza para convertir las celdas no válidas en valores nulos (NaT) en lugar de generar un error.
df_merged.loc[:, 'release_date'] = pd.to_datetime(df_merged['release_date'], errors='coerce')

# Por último, esta última línea creará una nueva columna 'release_year' que contendrá el año extraído de la fecha de estreno
df_merged.loc[:, 'release_year'] = df_merged['release_date'].dt.year


In [16]:
df_merged

Unnamed: 0,cast,crew,id,adult,belongs_to_collection,budget,genres,homepage,imdb_id,original_language,...,vote_average,vote_count,franquicia,géneros,productoras,países,doblajes,elenco,director,release_year
0,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...",862,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,tt0114709,en,...,7.7,5415.0,Toy Story Collection,"[Animation, Comedy, Family]",[Pixar Animation Studios],[United States of America],[English],"[Tom Hanks, Tim Allen, Don Rickles, Jim Varney...",John Lasseter,1995
1,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...",8844,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,tt0113497,en,...,6.9,2413.0,,"[Adventure, Fantasy, Family]","[TriStar Pictures, Teitler Film, Interscope Co...",[United States of America],"[English, Français]","[Robin Williams, Jonathan Hyde, Kirsten Dunst,...",Joe Johnston,1995
2,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...",15602,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,tt0113228,en,...,6.5,92.0,Grumpy Old Men Collection,"[Romance, Comedy]","[Warner Bros., Lancaster Gate]",[United States of America],[English],"[Walter Matthau, Jack Lemmon, Ann-Margret, Sop...",Howard Deutch,1995
3,"[{'cast_id': 1, 'character': ""Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...",31357,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,tt0114885,en,...,6.1,34.0,,"[Comedy, Drama, Romance]",[Twentieth Century Fox Film Corporation],[United States of America],[English],"[Whitney Houston, Angela Bassett, Loretta Devi...",Forest Whitaker,1995
4,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...",11862,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,tt0113041,en,...,5.7,173.0,Father of the Bride Collection,[Comedy],"[Sandollar Productions, Touchstone Pictures]",[United States of America],[English],"[Steve Martin, Diane Keaton, Martin Short, Kim...",Charles Shyer,1995
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45532,"[{'cast_id': 1, 'character': 'Sir Robert Hode'...","[{'credit_id': '52fe44439251416c9100a899', 'de...",30840,False,,0,"[{'id': 18, 'name': 'Drama'}, {'id': 28, 'name...",,tt0102797,en,...,5.7,26.0,,"[Drama, Action, Romance]","[Westdeutscher Rundfunk (WDR), Working Title F...","[Canada, Germany, United Kingdom, United State...",[English],"[Patrick Bergin, Uma Thurman, David Morrissey,...",John Irvin,1991
45534,"[{'cast_id': 1002, 'character': 'Sister Angela...","[{'credit_id': '52fe4af1c3a36847f81e9b15', 'de...",111109,False,,0,"[{'id': 18, 'name': 'Drama'}]",,tt2028550,tl,...,9.0,3.0,,[Drama],[Sine Olivia],[Philippines],[],"[Angel Aquino, Perry Dizon, Hazel Orencio, Joe...",Lav Diaz,2011
45535,"[{'cast_id': 6, 'character': 'Emily Shaw', 'cr...","[{'credit_id': '52fe4776c3a368484e0c8387', 'de...",67758,False,,0,"[{'id': 28, 'name': 'Action'}, {'id': 18, 'nam...",,tt0303758,en,...,3.8,6.0,,"[Action, Drama, Thriller]",[American World Pictures],[United States of America],[English],"[Erika Eleniak, Adam Baldwin, Julie du Page, J...",Mark L. Lester,2003
45536,"[{'cast_id': 2, 'character': '', 'credit_id': ...","[{'credit_id': '533bccebc3a36844cf0011a7', 'de...",227506,False,,0,[],,tt0008536,en,...,0.0,0.0,,[],[Yermoliev],[Russia],[],"[Iwan Mosschuchin, Nathalie Lissenko, Pavel Pa...",Yakov Protazanov,1917


Se crea la columna con el retorno de inversión, llamada 'return', con los campos 'revenue' y 'budget', dividiendo estas dos últimas (revenue / budget) <br>
Cuando no hay datos disponibles para calcularlo, deberá tomar el valor 0

In [26]:
# Hacer una copia profunda del DataFrame
df_merged_copy = df_merged.copy()

# Convertir las columnas 'revenue' y 'budget' a tipo numérico
df_merged_copy['revenue'] = pd.to_numeric(df_merged_copy['revenue'], errors='coerce')
df_merged_copy['budget'] = pd.to_numeric(df_merged_copy['budget'], errors='coerce')

# Crear una máscara para identificar los valores no nulos de 'budget'
mask = (df_merged_copy['budget'] != 0)

# Calcular el ratio 'return' solo para las filas que cumplan la condición de la máscara
df_merged_copy.loc[mask, 'return'] = df_merged_copy.loc[mask, 'revenue'] / df_merged_copy.loc[mask, 'budget']
df_merged_copy.loc[:, 'return'].fillna(0, inplace=True)

# Asignar los cambios al DataFrame original
df_merged.loc[:, 'return'] = df_merged_copy['return']


In [27]:
df_merged

Unnamed: 0,cast,crew,id,adult,belongs_to_collection,budget,genres,homepage,imdb_id,original_language,...,vote_count,franquicia,géneros,productoras,países,doblajes,elenco,director,release_year,return
0,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...",862,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,tt0114709,en,...,5415.0,Toy Story Collection,"[Animation, Comedy, Family]",[Pixar Animation Studios],[United States of America],[English],"[Tom Hanks, Tim Allen, Don Rickles, Jim Varney...",John Lasseter,1995,12.451801
1,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...",8844,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,tt0113497,en,...,2413.0,,"[Adventure, Fantasy, Family]","[TriStar Pictures, Teitler Film, Interscope Co...",[United States of America],"[English, Français]","[Robin Williams, Jonathan Hyde, Kirsten Dunst,...",Joe Johnston,1995,4.043035
2,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...",15602,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,tt0113228,en,...,92.0,Grumpy Old Men Collection,"[Romance, Comedy]","[Warner Bros., Lancaster Gate]",[United States of America],[English],"[Walter Matthau, Jack Lemmon, Ann-Margret, Sop...",Howard Deutch,1995,0.0
3,"[{'cast_id': 1, 'character': ""Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...",31357,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,tt0114885,en,...,34.0,,"[Comedy, Drama, Romance]",[Twentieth Century Fox Film Corporation],[United States of America],[English],"[Whitney Houston, Angela Bassett, Loretta Devi...",Forest Whitaker,1995,5.09076
4,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...",11862,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,tt0113041,en,...,173.0,Father of the Bride Collection,[Comedy],"[Sandollar Productions, Touchstone Pictures]",[United States of America],[English],"[Steve Martin, Diane Keaton, Martin Short, Kim...",Charles Shyer,1995,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45532,"[{'cast_id': 1, 'character': 'Sir Robert Hode'...","[{'credit_id': '52fe44439251416c9100a899', 'de...",30840,False,,0,"[{'id': 18, 'name': 'Drama'}, {'id': 28, 'name...",,tt0102797,en,...,26.0,,"[Drama, Action, Romance]","[Westdeutscher Rundfunk (WDR), Working Title F...","[Canada, Germany, United Kingdom, United State...",[English],"[Patrick Bergin, Uma Thurman, David Morrissey,...",John Irvin,1991,0.0
45534,"[{'cast_id': 1002, 'character': 'Sister Angela...","[{'credit_id': '52fe4af1c3a36847f81e9b15', 'de...",111109,False,,0,"[{'id': 18, 'name': 'Drama'}]",,tt2028550,tl,...,3.0,,[Drama],[Sine Olivia],[Philippines],[],"[Angel Aquino, Perry Dizon, Hazel Orencio, Joe...",Lav Diaz,2011,0.0
45535,"[{'cast_id': 6, 'character': 'Emily Shaw', 'cr...","[{'credit_id': '52fe4776c3a368484e0c8387', 'de...",67758,False,,0,"[{'id': 28, 'name': 'Action'}, {'id': 18, 'nam...",,tt0303758,en,...,6.0,,"[Action, Drama, Thriller]",[American World Pictures],[United States of America],[English],"[Erika Eleniak, Adam Baldwin, Julie du Page, J...",Mark L. Lester,2003,0.0
45536,"[{'cast_id': 2, 'character': '', 'credit_id': ...","[{'credit_id': '533bccebc3a36844cf0011a7', 'de...",227506,False,,0,[],,tt0008536,en,...,0.0,,[],[Yermoliev],[Russia],[],"[Iwan Mosschuchin, Nathalie Lissenko, Pavel Pa...",Yakov Protazanov,1917,0.0


Elimino las columnas que no van a ser utilizadas, y las que fueron transformadas

In [28]:
columns_to_drop = ['video', 'imdb_id', 'adult', 'original_title', 'poster_path', 'homepage', 'belongs_to_collection', 'géneros', 'production_companies', 'production_countries', 'spoken_languages', 'cast', 'crew']
df_merged = df_merged.drop(columns=columns_to_drop)
df_merged

Unnamed: 0,id,budget,genres,original_language,overview,popularity,release_date,revenue,runtime,status,...,vote_average,vote_count,franquicia,productoras,países,doblajes,elenco,director,release_year,return
0,862,30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",en,"Led by Woody, Andy's toys live happily in his ...",21.946943,1995-10-30,373554033.0,81.0,Released,...,7.7,5415.0,Toy Story Collection,[Pixar Animation Studios],[United States of America],[English],"[Tom Hanks, Tim Allen, Don Rickles, Jim Varney...",John Lasseter,1995,12.451801
1,8844,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",en,When siblings Judy and Peter discover an encha...,17.015539,1995-12-15,262797249.0,104.0,Released,...,6.9,2413.0,,"[TriStar Pictures, Teitler Film, Interscope Co...",[United States of America],"[English, Français]","[Robin Williams, Jonathan Hyde, Kirsten Dunst,...",Joe Johnston,1995,4.043035
2,15602,0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",en,A family wedding reignites the ancient feud be...,11.7129,1995-12-22,0.0,101.0,Released,...,6.5,92.0,Grumpy Old Men Collection,"[Warner Bros., Lancaster Gate]",[United States of America],[English],"[Walter Matthau, Jack Lemmon, Ann-Margret, Sop...",Howard Deutch,1995,0.0
3,31357,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",en,"Cheated on, mistreated and stepped on, the wom...",3.859495,1995-12-22,81452156.0,127.0,Released,...,6.1,34.0,,[Twentieth Century Fox Film Corporation],[United States of America],[English],"[Whitney Houston, Angela Bassett, Loretta Devi...",Forest Whitaker,1995,5.09076
4,11862,0,"[{'id': 35, 'name': 'Comedy'}]",en,Just when George Banks has recovered from his ...,8.387519,1995-02-10,76578911.0,106.0,Released,...,5.7,173.0,Father of the Bride Collection,"[Sandollar Productions, Touchstone Pictures]",[United States of America],[English],"[Steve Martin, Diane Keaton, Martin Short, Kim...",Charles Shyer,1995,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45532,30840,0,"[{'id': 18, 'name': 'Drama'}, {'id': 28, 'name...",en,"Yet another version of the classic epic, with ...",5.683753,1991-05-13,0.0,104.0,Released,...,5.7,26.0,,"[Westdeutscher Rundfunk (WDR), Working Title F...","[Canada, Germany, United Kingdom, United State...",[English],"[Patrick Bergin, Uma Thurman, David Morrissey,...",John Irvin,1991,0.0
45534,111109,0,"[{'id': 18, 'name': 'Drama'}]",tl,An artist struggles to finish his work while a...,0.178241,2011-11-17,0.0,360.0,Released,...,9.0,3.0,,[Sine Olivia],[Philippines],[],"[Angel Aquino, Perry Dizon, Hazel Orencio, Joe...",Lav Diaz,2011,0.0
45535,67758,0,"[{'id': 28, 'name': 'Action'}, {'id': 18, 'nam...",en,"When one of her hits goes wrong, a professiona...",0.903007,2003-08-01,0.0,90.0,Released,...,3.8,6.0,,[American World Pictures],[United States of America],[English],"[Erika Eleniak, Adam Baldwin, Julie du Page, J...",Mark L. Lester,2003,0.0
45536,227506,0,[],en,"In a small town live two brothers, one a minis...",0.003503,1917-10-21,0.0,87.0,Released,...,0.0,0.0,,[Yermoliev],[Russia],[],"[Iwan Mosschuchin, Nathalie Lissenko, Pavel Pa...",Yakov Protazanov,1917,0.0


Para terminar, se cambia el tipo de dato del campo `popularity` a valor numérico.

In [29]:
df_merged['popularity'] = df_merged['popularity'].astype(float)

En este punto, ya estamos en condiciones de cerrar la etapa de **ETL**.<br>
Se procede a guardar el DataFrame en un archivo `.csv`

In [None]:
# Guarda el archivo CSV sin realizar ninguna conversión de tipos
df_merged.to_csv('movies_ETL.csv', index=False)