# MRS

Here I'll work on content-based movie recommender based on the previous notebook

data source: https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset

# Preparing Data

In [1]:
import pandas as pd
from ast import literal_eval

In [2]:
df = pd.read_csv('data/movies_metadata.csv')
# Transpose for easier exploration of this dataset with many cols
df.head(3).transpose()

  df = pd.read_csv('data/movies_metadata.csv')


Unnamed: 0,0,1,2
adult,False,False,False
belongs_to_collection,"{'id': 10194, 'name': 'Toy Story Collection', ...",,"{'id': 119050, 'name': 'Grumpy Old Men Collect..."
budget,30000000,65000000,0
genres,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...","[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...","[{'id': 10749, 'name': 'Romance'}, {'id': 35, ..."
homepage,http://toystory.disney.com/toy-story,,
id,862,8844,15602
imdb_id,tt0114709,tt0113497,tt0113228
original_language,en,en,en
original_title,Toy Story,Jumanji,Grumpier Old Men
overview,"Led by Woody, Andy's toys live happily in his ...",When siblings Judy and Peter discover an encha...,A family wedding reignites the ancient feud be...


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45466 entries, 0 to 45465
Data columns (total 24 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   adult                  45466 non-null  object 
 1   belongs_to_collection  4494 non-null   object 
 2   budget                 45466 non-null  object 
 3   genres                 45466 non-null  object 
 4   homepage               7782 non-null   object 
 5   id                     45466 non-null  object 
 6   imdb_id                45449 non-null  object 
 7   original_language      45455 non-null  object 
 8   original_title         45466 non-null  object 
 9   overview               44512 non-null  object 
 10  popularity             45461 non-null  object 
 11  poster_path            45080 non-null  object 
 12  production_companies   45463 non-null  object 
 13  production_countries   45463 non-null  object 
 14  release_date           45379 non-null  object 
 15  re

In [4]:
df["adult"].value_counts()

adult
False                                                                                                                             45454
True                                                                                                                                  9
 - Written by Ørnås                                                                                                                   1
 Rune Balot goes to a casino connected to the October corporation to try to wrap up her case once and for all.                        1
 Avalanche Sharks tells the story of a bikini contest that turns into a horrifying affair when it is hit by a shark avalanche.        1
Name: count, dtype: int64

In [5]:
df["video"].value_counts()

video
False    45367
True        93
Name: count, dtype: int64

In [6]:
df["status"].value_counts()

status
Released           45014
Rumored              230
Post Production       98
In Production         20
Planned               15
Canceled               2
Name: count, dtype: int64

The columns 'adult', 'status' and 'video' have predominantly one value, so let's remove them. Also, let's remove 'poster_path', 'hopepage' (too many null values), 'imdb_id', 'spoken_languages', 'overview' and 'tagline'

Apart from this, let's drop not much useful for recommender columns

In [7]:
df = df.drop(
    [
        "adult",
        "status",
        "video",
        "poster_path",
        "original_title",
        "homepage",
        "imdb_id",
        "spoken_languages",
        "overview",
        "tagline",
        "belongs_to_collection",
        "original_language",
        "production_companies",
        "production_countries"
    ],
    axis=1,
)
df.head(3).transpose()

Unnamed: 0,0,1,2
budget,30000000,65000000,0
genres,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...","[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...","[{'id': 10749, 'name': 'Romance'}, {'id': 35, ..."
id,862,8844,15602
popularity,21.946943,17.015539,11.7129
release_date,1995-10-30,1995-12-15,1995-12-22
revenue,373554033.0,262797249.0,0.0
runtime,81.0,104.0,101.0
title,Toy Story,Jumanji,Grumpier Old Men
vote_average,7.7,6.9,6.5
vote_count,5415.0,2413.0,92.0


Now let's have a look at dtypes

## Converting dtypes to more appropriate ones

In [8]:
df.dtypes

budget           object
genres           object
id               object
popularity       object
release_date     object
revenue         float64
runtime         float64
title            object
vote_average    float64
vote_count      float64
dtype: object

First of all, let's handle 'release_date' column

In [9]:
# Convert 'release_date' column to datetime type
df['release_date'] = pd.to_datetime(df['release_date'], errors='coerce')
# Count the number of rows with bad date values
bad_date_count = df['release_date'].isnull().sum()
print(f"Number of rows with bad date values: {bad_date_count}")

Number of rows with bad date values: 90


Since 90 rows compared to 45,000 in total is nothing, we can freely remove them

In [10]:
# Remove rows with null or NaT values
df = df.dropna(subset=['release_date'])
bad_date_count = df['release_date'].isnull().sum()
print(f"Number of rows with bad date values: {bad_date_count}")

Number of rows with bad date values: 0


The column 'budget' contains non-numerical values like '/ff9qCepilowshEtG2GYWwzt2bs4.jpg'. Let's remove them

In [11]:
# Clean 'budget' column to remove non-numeric characters
df["budget"] = df["budget"].str.replace(r"\D", "", regex=True)

I don't like that columns with whole numbers like 'runtime' or 'vote_count' have dtype set to float. Let's change that

In [12]:
# Specify columns and their new data types
dict_columns_to_convert = {
    "budget": "int64",
    "revenue": "int64",
    "runtime": "int",
    "vote_count": "int",
    "popularity": "float",
    "id": "int"
}
# Clean 'budget' column to remove non-numeric characters
df["budget"] = df["budget"].str.replace(r"\D", "", regex=True)
# Fill NaN values with 0
cols_to_fill = list(dict_columns_to_convert.keys())
df[cols_to_fill] = df[cols_to_fill].fillna(0)
# Convert columns to integer type
df = df.astype(dict_columns_to_convert)
# Check the data types of the DataFrame
print(df.dtypes)

budget                   int64
genres                  object
id                       int32
popularity             float64
release_date    datetime64[ns]
revenue                  int64
runtime                  int32
title                   object
vote_average           float64
vote_count               int32
dtype: object


## Handling of 'budget', 'revenue', and 'popularity' columns

Columns 'budget' and 'revenue' have too big values in them while 'popularity' column has too many digits after decimal point. Let's change this

In [13]:
# Divide 'budget' and 'revenue' columns by million and round to 2 decimal places
df['budget'] = (df['budget'] / 1000000).round(2)
df['revenue'] = (df['revenue'] / 1000000).round(2)

# Round 'popularity' column to 2 decimal places
df['popularity'] = df['popularity'].round(2)
df.head(3)

Unnamed: 0,budget,genres,id,popularity,release_date,revenue,runtime,title,vote_average,vote_count
0,30.0,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",862,21.95,1995-10-30,373.55,81,Toy Story,7.7,5415
1,65.0,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",8844,17.02,1995-12-15,262.8,104,Jumanji,6.9,2413
2,0.0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",15602,11.71,1995-12-22,0.0,101,Grumpier Old Men,6.5,92


## Working with 'genres' column

In [14]:
# Convert the stringified JSON into a list of dictionaries
df["genres"] = df["genres"].apply(
    lambda x: literal_eval(x.replace("'", '"')) if isinstance(x, str) else []
)
# Extract the names of genres into a list and sort them alphabetically
df["genres"] = df["genres"].apply(
    lambda x: sorted([genre["name"] for genre in x]) if isinstance(x, list) else []
)
# Display the DataFrame with the extracted genre names
df[["title", "genres"]].head(3)

Unnamed: 0,title,genres
0,Toy Story,"[Animation, Comedy, Family]"
1,Jumanji,"[Adventure, Family, Fantasy]"
2,Grumpier Old Men,"[Comedy, Romance]"


In [15]:
# Flatten the list of genre names
flat_genre_names = [genre for sublist in df["genres"] for genre in sublist]
# Get the unique genre names
unique_genre_names = set(flat_genre_names)
# Print the unique genre names
print(f"There are {len(unique_genre_names)} unique genres.")
print(unique_genre_names)

There are 20 unique genres.
{'Action', 'War', 'Family', 'Adventure', 'Crime', 'Thriller', 'History', 'Comedy', 'Fantasy', 'Animation', 'TV Movie', 'Documentary', 'Western', 'Foreign', 'Music', 'Science Fiction', 'Romance', 'Horror', 'Drama', 'Mystery'}


We can see that 'genres' colomn has faulty data like 'Carousel Productions' or 'Vision View Entertainment', which sound like production companies, not genres. Thus, let's remove such values from the column

In [16]:
# Define the list of valid genre names
valid_genres = {
    'Action', 'Adventure', 'Animation', 'Comedy', 'Crime', 'Documentary',
    'Drama', 'Family', 'Fantasy', 'History', 'Horror', 'Mystery',
    'Romance', 'Science Fiction', 'Thriller', 'War', 'Western'
}
# Filter the genre_names column to include only the valid genres
df["genres"] = df["genres"].apply(lambda x: [genre for genre in x if genre in valid_genres])

Now let's check again

In [17]:
flat_genre_names = [genre for sublist in df["genres"] for genre in sublist]
unique_genre_names = set(flat_genre_names)
print(f"There are {len(unique_genre_names)} unique genres.")
print(unique_genre_names)

There are 17 unique genres.
{'Action', 'Science Fiction', 'War', 'Romance', 'Crime', 'Documentary', 'Thriller', 'Horror', 'History', 'Mystery', 'Western', 'Family', 'Comedy', 'Fantasy', 'Drama', 'Adventure', 'Animation'}


In [18]:
df["genres"].value_counts().head(7)

genres
[Drama]              5617
[Comedy]             3873
[Documentary]        3164
[]                   2522
[Drama, Romance]     1951
[Comedy, Drama]      1845
[Comedy, Romance]    1325
Name: count, dtype: int64

One movie can belong to many genres and one genre can be applied to many movies. It's a many-to-many relationship. Ideally, this kind of relationship is supposed to be broken into two 1:M relationships and connected with an intermidiate or junction table. However, because

- it's a project to show my knowledge mainly of writing SQL queries
- I'm applying to a junior data analyst position, and, at that role, you're not supposed to design databases
- preparation part is already too long
- maximum string length for genres is known (80 symbols for the movie with the title 'Yu-Gi-Oh')

I'll keep things simple and connect genre names by comma.

In [19]:
# Convert the list of genres into a string with comma as a delimiter
df["genres"] = df["genres"].apply(lambda x: ", ".join(x) if x else None)

In [20]:
df["genres"].value_counts().head(7)

genres
Drama                     5617
Comedy                    3873
Documentary               3164
Drama, Romance            1951
Comedy, Drama             1845
Comedy, Romance           1325
Comedy, Drama, Romance    1153
Name: count, dtype: int64

In [21]:
df.head().transpose()

Unnamed: 0,0,1,2,3,4
budget,30.0,65.0,0.0,16.0,0.0
genres,"Animation, Comedy, Family","Adventure, Family, Fantasy","Comedy, Romance","Comedy, Drama, Romance",Comedy
id,862,8844,15602,31357,11862
popularity,21.95,17.02,11.71,3.86,8.39
release_date,1995-10-30 00:00:00,1995-12-15 00:00:00,1995-12-22 00:00:00,1995-12-22 00:00:00,1995-02-10 00:00:00
revenue,373.55,262.8,0.0,81.45,76.58
runtime,81,104,101,127,106
title,Toy Story,Jumanji,Grumpier Old Men,Waiting to Exhale,Father of the Bride Part II
vote_average,7.7,6.9,6.5,6.1,5.7
vote_count,5415,2413,92,34,173


## Final steps of data preparation with Pandas

Time to rearrange columns a little bit because I'm not happy with the order of columns

In [22]:
new_cols_order = [
    "id",
    "title",
    "release_date",
    "runtime",
    "genres",
    "budget",
    "revenue",
    "popularity",
    "vote_average",
    "vote_count"
]
df = df[new_cols_order]
df.head(3).transpose()

Unnamed: 0,0,1,2
id,862,8844,15602
title,Toy Story,Jumanji,Grumpier Old Men
release_date,1995-10-30 00:00:00,1995-12-15 00:00:00,1995-12-22 00:00:00
runtime,81,104,101
genres,"Animation, Comedy, Family","Adventure, Family, Fantasy","Comedy, Romance"
budget,30.0,65.0,0.0
revenue,373.55,262.8,0.0
popularity,21.95,17.02,11.71
vote_average,7.7,6.9,6.5
vote_count,5415,2413,92


Columns explanation:
- id - row id
- title - official title of the movie
- release_date - theatrical release date of the movie
- runtime - movie duration/runtime in minutes
- genres - genres associated with the movie, separated by a comma
- budget - movie budget in millions of dollars
- revenue - total movie revenue in millions of dollars
- popularity - popularity score assigned by TMDB
- vote_average - average movie rating
- vote_count - number of votes by users, counted by TMDB

In [23]:
df.dtypes

id                       int32
title                   object
release_date    datetime64[ns]
runtime                  int32
genres                  object
budget                 float64
revenue                float64
popularity             float64
vote_average           float64
vote_count               int32
dtype: object

Let's save the cleaned up dataset, which we'll use in the next chapters

In [24]:
# df.to_csv("data/data.csv", index=False)

## Adding data from other two datasets

In [25]:
credits = pd.read_csv('data/credits.csv')
keywords = pd.read_csv('data/keywords.csv')

In [26]:
credits

Unnamed: 0,cast,crew,id
0,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...",862
1,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...",8844
2,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...",15602
3,"[{'cast_id': 1, 'character': ""Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...",31357
4,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...",11862
...,...,...,...
45471,"[{'cast_id': 0, 'character': '', 'credit_id': ...","[{'credit_id': '5894a97d925141426c00818c', 'de...",439050
45472,"[{'cast_id': 1002, 'character': 'Sister Angela...","[{'credit_id': '52fe4af1c3a36847f81e9b15', 'de...",111109
45473,"[{'cast_id': 6, 'character': 'Emily Shaw', 'cr...","[{'credit_id': '52fe4776c3a368484e0c8387', 'de...",67758
45474,"[{'cast_id': 2, 'character': '', 'credit_id': ...","[{'credit_id': '533bccebc3a36844cf0011a7', 'de...",227506


In [27]:
credits.dtypes

cast    object
crew    object
id       int64
dtype: object

In [28]:
keywords

Unnamed: 0,id,keywords
0,862,"[{'id': 931, 'name': 'jealousy'}, {'id': 4290,..."
1,8844,"[{'id': 10090, 'name': 'board game'}, {'id': 1..."
2,15602,"[{'id': 1495, 'name': 'fishing'}, {'id': 12392..."
3,31357,"[{'id': 818, 'name': 'based on novel'}, {'id':..."
4,11862,"[{'id': 1009, 'name': 'baby'}, {'id': 1599, 'n..."
...,...,...
46414,439050,"[{'id': 10703, 'name': 'tragic love'}]"
46415,111109,"[{'id': 2679, 'name': 'artist'}, {'id': 14531,..."
46416,67758,[]
46417,227506,[]


In [29]:
keywords.dtypes

id           int64
keywords    object
dtype: object

In [30]:
df = df.merge(credits, on='id')
df = df.merge(keywords, on='id')
df.head()

Unnamed: 0,id,title,release_date,runtime,genres,budget,revenue,popularity,vote_average,vote_count,cast,crew,keywords
0,862,Toy Story,1995-10-30,81,"Animation, Comedy, Family",30.0,373.55,21.95,7.7,5415,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...","[{'id': 931, 'name': 'jealousy'}, {'id': 4290,..."
1,8844,Jumanji,1995-12-15,104,"Adventure, Family, Fantasy",65.0,262.8,17.02,6.9,2413,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...","[{'id': 10090, 'name': 'board game'}, {'id': 1..."
2,15602,Grumpier Old Men,1995-12-22,101,"Comedy, Romance",0.0,0.0,11.71,6.5,92,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...","[{'id': 1495, 'name': 'fishing'}, {'id': 12392..."
3,31357,Waiting to Exhale,1995-12-22,127,"Comedy, Drama, Romance",16.0,81.45,3.86,6.1,34,"[{'cast_id': 1, 'character': ""Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...","[{'id': 818, 'name': 'based on novel'}, {'id':..."
4,11862,Father of the Bride Part II,1995-02-10,106,Comedy,0.0,76.58,8.39,5.7,173,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...","[{'id': 1009, 'name': 'baby'}, {'id': 1599, 'n..."


In [31]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46540 entries, 0 to 46539
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   id            46540 non-null  int32         
 1   title         46540 non-null  object        
 2   release_date  46540 non-null  datetime64[ns]
 3   runtime       46540 non-null  int32         
 4   genres        43933 non-null  object        
 5   budget        46540 non-null  float64       
 6   revenue       46540 non-null  float64       
 7   popularity    46540 non-null  float64       
 8   vote_average  46540 non-null  float64       
 9   vote_count    46540 non-null  int32         
 10  cast          46540 non-null  object        
 11  crew          46540 non-null  object        
 12  keywords      46540 non-null  object        
dtypes: datetime64[ns](1), float64(4), int32(3), object(5)
memory usage: 4.1+ MB


## Extracting director

In [32]:
def extract_director(crew_list):
    for crew_member in crew_list:
        if crew_member["job"] == "Director":
            return crew_member["name"]
    return None

# For function get_director to work, convert the string representations to actual dictionaries
df["crew"] = df["crew"].apply(literal_eval)
# Extract the director's name for each movie
df["director"] = df["crew"].apply(extract_director)

In [33]:
df.head(3)

Unnamed: 0,id,title,release_date,runtime,genres,budget,revenue,popularity,vote_average,vote_count,cast,crew,keywords,director
0,862,Toy Story,1995-10-30,81,"Animation, Comedy, Family",30.0,373.55,21.95,7.7,5415,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...","[{'id': 931, 'name': 'jealousy'}, {'id': 4290,...",John Lasseter
1,8844,Jumanji,1995-12-15,104,"Adventure, Family, Fantasy",65.0,262.8,17.02,6.9,2413,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...","[{'id': 10090, 'name': 'board game'}, {'id': 1...",Joe Johnston
2,15602,Grumpier Old Men,1995-12-22,101,"Comedy, Romance",0.0,0.0,11.71,6.5,92,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...","[{'id': 1495, 'name': 'fishing'}, {'id': 12392...",Howard Deutch


In [34]:
df["director"].value_counts().head(10)

director
John Ford           67
Michael Curtiz      66
Werner Herzog       55
Julien Duvivier     54
Alfred Hitchcock    53
Georges Méliès      51
Woody Allen         49
Jean-Luc Godard     47
Frank Capra         47
Sidney Lumet        46
Name: count, dtype: int64

In [35]:
df[df["director"] == "Martin Scorsese"][["title", "release_date", "runtime", "genres", "director"]].head(10)

Unnamed: 0,title,release_date,runtime,genres,director
15,Casino,1995-11-22,178,"Crime, Drama",Martin Scorsese
109,Taxi Driver,1976-02-07,114,"Crime, Drama",Martin Scorsese
407,The Age of Innocence,1993-09-17,139,"Drama, Romance",Martin Scorsese
1177,GoodFellas,1990-09-12,145,"Crime, Drama",Martin Scorsese
1192,Raging Bull,1980-11-14,129,Drama,Martin Scorsese
1302,Cape Fear,1991-11-15,128,"Crime, Thriller",Martin Scorsese
1658,Kundun,1997-12-25,134,Drama,Martin Scorsese
1923,The Last Temptation of Christ,1988-08-12,164,Drama,Martin Scorsese
2371,The Color of Money,1986-10-07,119,Drama,Martin Scorsese
2873,Bringing Out the Dead,1999-10-22,121,Drama,Martin Scorsese


## Extracting top actors

In [36]:
def extract_actors(cast_list):
    top_actors = []
    for actor in cast_list[:5]:  # Select the top 5 actors
        top_actors.append(actor["name"])
    return ", ".join(top_actors)

# Convert the string representations to actual dictionaries
df["cast"] = df["cast"].apply(literal_eval)
# Extract the top 5 actor names for each movie
df["top_actors"] = df["cast"].apply(extract_actors)

In [37]:
df.head()

Unnamed: 0,id,title,release_date,runtime,genres,budget,revenue,popularity,vote_average,vote_count,cast,crew,keywords,director,top_actors
0,862,Toy Story,1995-10-30,81,"Animation, Comedy, Family",30.0,373.55,21.95,7.7,5415,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...","[{'id': 931, 'name': 'jealousy'}, {'id': 4290,...",John Lasseter,"Tom Hanks, Tim Allen, Don Rickles, Jim Varney,..."
1,8844,Jumanji,1995-12-15,104,"Adventure, Family, Fantasy",65.0,262.8,17.02,6.9,2413,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...","[{'id': 10090, 'name': 'board game'}, {'id': 1...",Joe Johnston,"Robin Williams, Jonathan Hyde, Kirsten Dunst, ..."
2,15602,Grumpier Old Men,1995-12-22,101,"Comedy, Romance",0.0,0.0,11.71,6.5,92,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...","[{'id': 1495, 'name': 'fishing'}, {'id': 12392...",Howard Deutch,"Walter Matthau, Jack Lemmon, Ann-Margret, Soph..."
3,31357,Waiting to Exhale,1995-12-22,127,"Comedy, Drama, Romance",16.0,81.45,3.86,6.1,34,"[{'cast_id': 1, 'character': 'Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...","[{'id': 818, 'name': 'based on novel'}, {'id':...",Forest Whitaker,"Whitney Houston, Angela Bassett, Loretta Devin..."
4,11862,Father of the Bride Part II,1995-02-10,106,Comedy,0.0,76.58,8.39,5.7,173,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...","[{'id': 1009, 'name': 'baby'}, {'id': 1599, 'n...",Charles Shyer,"Steve Martin, Diane Keaton, Martin Short, Kimb..."


In [57]:
df[df["title"] == "The Empire Strikes Back"][["title", "release_date", "runtime", "genres", "director", "top_actors"]]

Unnamed: 0,title,release_date,runtime,genres,director,top_actors
1161,The Empire Strikes Back,1980-05-17,124,"Action, Adventure, Science Fiction",Irvin Kershner,"Mark Hamill, Harrison Ford, Carrie Fisher, Bil..."
