---
title: "Data Analysis of Quentin Tarantino's Works"
description: "An analysis of the works of Quentin Tarantino using data from The Movie Database"
author: "Bryce Stulman"
date: "03/11/2023"
date-modified: "03/13/2023"
---

# Data Analysis of Quentin Tarantino's Works
The purpose of this analysis is to better understand the works of filmmaker Quentin Tarantino. Known for his stylized violence and profanity riddled dialog, Tarantino's films are generally highly regarded and he has won many [awards](https://www.imdb.com/name/nm0000233/awards), including Academy Awards wins for *Pulp Fiction* (1994) and *Django Unchained* (2012). Tarantino has been releasing new films every few years for the last three decades and this analysis uses data such as budget, revenue, cast, crew, and viewer sentinment to attempt to discover some of the factors which may contribute to his continued success.

It covers his (currently) 11 movies:  

* *Reservoir Dogs* (1992)
* *Pulp Fiction* (1994)
* *Four Rooms* (1995)
* *Jackie Brown* (1997)
* *Kill Bill: Vol. 1* (2003)
* *Kill Bill: Vol. 2* (2004)
* *Death Proof* (2007)
* *Inglorious Basterds* (2009)
* *Django Unchained* (2012)
* *The Hateful Eight* (2015)
* *Once Upon a Time... in Hollywood* (2019)



### Questions that Can Be Answered with Data

Data can be used to address many statistical questions as well as help form connections between seemingly disparate pieces of information.
Some examples of questions include:  

* How did the budget for tarantinos films change overtime (as he became more famous)?
* What are the relationships between budget, revenue, popularity and rating for tarantino films? How are these related to:
    * Genre?
    * Cast?
    * Crew?        
* Since Tarantino is known for repeatedly casting some actors and actresses, can examining which ones are in which movies reveal hidden connections related to success?
* Less well known are crew members, do Tarantino's films re-use crew members in the same way?

---

## The Data

Data for this analysis is sourced from [The Movie Database (TMDB)](https://www.themoviedb.org), a community built database with millions of users that has been in service since 2008, using their publicly available web API.  
For more information about TMDB see the [About page](https://www.themoviedb.org/about). For information about the API, including documentation, see the [API page](https://www.themoviedb.org/documentation/api).  

The [tmdbsimple](https://github.com/celiao/tmdbsimple) library provides easy access to the TMDB web API using Python.

TMDB catalogs the following data for movies and credits. The fields that are used for this analysis are shown in **bold**.

##### Movies: 

| Field | Datatype | Description |
| :--- | :------ | :--------- |
| adult | boolean | Whether the movie is an adult film |
| backdrop_path | string or null | Path for backdrop image |
| belongs_to_collection | null or object | A collection the movie belongs to |
| **budget** | integer | Budget of the movie, not adjusted for inflation |
| **genres** | array[object] | List of genres |
| homepage | string or null | URL for movie's webpage |
| id | integer | TMDB ID |
| imdb_id | string or null | IMDB ID |
| original_language | string | Original language of the movie |
| original_title | string | Original title of the movie |
| overview | string or null | Short description of the movie |
| **popularity** | number | Measure of how popular the movie is on TMDB (see [TMDB Popularity](#tmdb-popularity)) |
| poster_path | string or null | Path for poster image |
| **production_companies** | array[object] | List of production companies that worked on the movie |
| production_countries | array[object] | List of countries the movie was produced in |
| **release_date** | string | Release date of the movie |
| **revenue** | integer | Revenue made by the movie, not adjusted for inflation |
| **runtime** | integer or null | Duration of the movie (in minutes) |
| spoken_languages | array[object] | List of languages spoken in the movie |
| status | string | Production status of the movie |
| tagline | string or null | Tagline used for the movie |
| **title** | string | Title of the movie |
| video | boolean | Wether there is a video associated with the movie |
| **vote_average** | number | Average score among user ratings (max 10) |
| **vote_count** | integer | Number of user ratings |

##### Cast Credits:

| Field | Datatype | Description |
| :--- | :------ | :--------- |
| adult | boolean | Whether or not the person works on adult media |
| **gender** | integer | Integer representation of gender |
| id | integer | TMDB ID |
| known_for_department | string | Primary production role |
| **name** | string | Current name |
| original_name | string | Name before any changes (such as marriage) |
| **popularity** | number | Measure of how popular the person is on TMDB (see [TMDB Popularity](#tmdb-popularity)) |
| profile_path | string or null | Path to profile image |
| cast_id | integer | ID for this cast |
| **character** | string | Name of character played in the movie |
| credit_id | string | ID for this credit |
| order | integer | Order of this credit in the movie's credits |

##### Crew Credits:

| Field | Datatype | Description |
| :--- | :------ | :--------- |
| adult | boolean | Whether or not the person works on adult media |
| **gender** | integer | Integer representation of gender |
| id | integer | TMDB ID |
| known_for_department | string | Primary production role |
| **name** | string | Current name |
| original_name | string | Name before any changes (such as marriage) |
| **popularity** | number | Measure of how popular the person is on TMDB (see [TMDB Popularity](#tmdb-popularity)) |
| profile_path | string or null | Path to profile image |
| credit_id | string | ID for this credit |
| department | integer | Department worked in |
| **job** | string | Role in production for the movie |


<a id='tmdb-popularity'></a>

#### TMDB Popularity

TMDB has a **popularity** score that it associates with every page on the site. This score is described as "[A] very important metric [that] helps us boost search results, adds an incredibly useful sort value for Discover, and is also just kind of fun to see items chart up and down." It is a measure of how many users are interacting with the given database entry on a regular basis.

For a list of which factors affect popularity, see the [TMDB Popularity page](https://developers.themoviedb.org/3/getting-started/popularity).

### Getting the Data

> **Note:** In order to use the TMDB web API we must first create a free account and request an API key that will be used when making requests. The following sections cover the process of retreiving the data directly from the API and require an additional file named **api_key** containing a single line with your API key. Since each API key is private, this file has not been provided. However, a snapshot of the API data has been saved as **tarantino.json** and can be used to complete this analysis.  
> 
> To skip the API and use the provided json, skip directly to the [Exploring the Data](#get-from-file) section.

The first step is to import the libraries that will be used to access the API and parse the reponse and to setup the API key for the requests.

In [1]:
import tmdbsimple as tmdb
import json

with open('api_key', 'r') as file:
   API_KEY = file.read()

tmdb.API_KEY = API_KEY

In order to get information about a specific title from the TMDB API we need to provide the title's ID in the request.  

Although the TMDB API provides [Discover](https://developers.themoviedb.org/3/discover) and [Search](https://developers.themoviedb.org/3/search/search-movies) routes to look for titles and get their IDs, this analysis deals with fairly small number of titles and it is relatively easy to look them up by hand.

In [2]:
tarantino_tmdb_ids = {
    "Reservoir Dogs": 500,
    "Pulp Fiction": 680,
    "Four Rooms": 5,
    "Jackie Brown": 184,
    "Kill Bill: Vol. 1": 24,
    "Kill Bill: Vol. 2": 393,
    "Death Proof": 1991,
    "Inglorious Basterds": 16869,
    "Django Unchained": 68718,
    "The Hateful Eight": 273248,
    "Once Upon a Time... in Hollywood": 466272
}

With these IDs, requests can be made to the API to get [details](https://developers.themoviedb.org/3/movies/get-movie-details) about each title. An additional request can be made to get the [credits](https://developers.themoviedb.org/3/movies/get-movie-credits) for the title, including both the cast and the crew.  

The **tmdbsimple** library makes the process fairly painless.

In [3]:

tarantino_movies = []

for title, id in tarantino_tmdb_ids.items():
    movie = {}
    movie['info'] = tmdb.Movies(id).info()
    movie['credits'] = tmdb.Movies(id).credits()
    tarantino_movies.append(movie)

### Data Wrangling

As a web API, the responses from TMDB are sent in JSON format. While there are methods to extract the data from JSON objects, the results from the API are formatted in a way that makes utilizing them complicated. Additionally, we are only interested in a subset of the provided fields. As such, it is helpful to prune the unnecessarily fields and reformat the data to make it easier to use. A list of column names and a list of data rows for each table will make using the data easy.

The **gender** field is of special note because it is saved as an integer. The values are explained by TMDB staff [here](https://www.themoviedb.org/talk/58ee5022c3a3683ded00a887?language=en-US).

Also, since comparisons will be made between monetary values that were recorded many years apart, it will be necessary to adjust for inflation. The [cpi](https://pypi.org/project/cpi/) library can be used to adjust the budgets and revenue easily.

Let's create the lists and process the data.

In [4]:
from datetime import date   # date data structure to enable CPI conversions
import cpi

# Dictionary to convert gender from integer to value
tmdb_gender = {
    0: 'NS',    # Not Specified
    1: 'F',     # Female
    2: 'M',     # Male
    3: 'NB'     # Non-Binary
}

# Movie table - Movie information
movie_cols = ['title', 'release_date', 'runtime', 'budget', 'adj_budget', 'revenue', 'adj_revenue', 'popularity', 'vote_count', 'vote_average']
movie_rows = []

# Genre table - Relates movie and genres
genre_cols = ['title', 'genre']
genre_rows = []

# Production company table - Relates movie and production companys
prod_company_cols = ['title', 'company']
prod_company_rows = []

# Cast table - Relates movie to its cast
cast_cols = ['title', 'person', 'character']
cast_rows = []

# Crew table - Relates movie to its crew
crew_cols = ['title', 'person', 'job']
crew_rows = []

# Person table - Person information
person_cols = ['person', 'gender', 'popularity']
person_rows = []

# Process data
for movie in tarantino_movies:
    # Movie table
    date_parts = movie['info']['release_date'].split('-')
    cpi_date = date(int(date_parts[0]), int(date_parts[1]), int(date_parts[2]))
    
    movie_row = [
        movie['info']['title'],
        movie['info']['release_date'],
        movie['info']['runtime'],
        movie['info']['budget'],
        int(cpi.inflate(movie['info']['budget'], cpi_date)),
        movie['info']['revenue'],
        int(cpi.inflate(movie['info']['revenue'], cpi_date)),
        movie['info']['popularity'],
        movie['info']['vote_count'],
        movie['info']['vote_average']
    ]
    movie_rows.append(movie_row)

    # Genre table
    for genre in movie['info']['genres']:
        genre_row = [
            movie['info']['title'],
            genre['name']
        ]
        genre_rows.append(genre_row)
    
    # Production company table
    for company in movie['info']['production_companies']:
        company_row = [
            movie['info']['title'],
            company['name']
        ]
        prod_company_rows.append(company_row)
    
    # Cast / Person table
    for cast in movie['credits']['cast']:
        cast_row = [
            movie['info']['title'],
            cast['name'],
            cast['character']
        ]
        cast_rows.append(cast_row)

        person_row = [
            cast['name'],
            tmdb_gender[cast['gender']],
            cast['popularity']
        ]
        person_rows.append(person_row)

    # Crew / Person table
    for crew in movie['credits']['crew']:
        crew_row = [
            movie['info']['title'],
            crew['name'],
            crew['job']
        ]
        crew_rows.append(crew_row)

        person_row = [
            crew['name'],
            tmdb_gender[crew['gender']],
            crew['popularity']
        ]
        person_rows.append(person_row)

Let's save the results to facilitate the use of this notebook without an API key. The CSV files can then be loaded any time.

In [5]:
import csv

with open('./csv/movie.csv', 'w', encoding='utf-8') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(movie_cols)
    writer.writerows(movie_rows)

with open('./csv/genre.csv', 'w', encoding='utf-8') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(genre_cols)
    writer.writerows(genre_rows)

with open('./csv/prod_company.csv', 'w', encoding='utf-8') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(prod_company_cols)
    writer.writerows(prod_company_rows)

with open('./csv/cast.csv', 'w', encoding='utf-8') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(cast_cols)
    writer.writerows(cast_rows)

with open('./csv/crew.csv', 'w', encoding='utf-8') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(crew_cols)
    writer.writerows(crew_rows)

with open('./csv/person.csv', 'w', encoding='utf-8') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(person_cols)
    writer.writerows(person_rows)


<a id='get-from-file'></a>

### Exploring the Data

Now that the data has been processed, we can make the dataframes and begin with a cursory look at each table to examine some statistics and get an idea of the big picture for our data.

To skip the use of the API key, a snap shot of the data can be loaded from the provided set of CSV files.

In [6]:

# Important imports
from IPython.display import display     # tool for nicely displaying Pandas dataframes
import math
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import networkx as nx
from bokeh.plotting import figure, show, from_networkx
from bokeh.io import output_notebook

output_notebook() # Initialize bokeh

In [7]:
# Add the data to pandas dataframes
movie_df = pd.read_csv('./csv/movie.csv')
genre_df = pd.read_csv('./csv/genre.csv')
prod_company_df = pd.read_csv('./csv/prod_company.csv')
cast_df = pd.read_csv('./csv/cast.csv')
crew_df = pd.read_csv('./csv/crew.csv')
person_df = pd.read_csv('./csv/person.csv')

# person entries may contain duplicates since people could work on multiple movies, must be removed
person_df.drop_duplicates(inplace=True)

### Movie Data

The Movie dataframe contains mostly numeric information about each of Tarantino's movies. Let's take a look.

In [8]:
display(movie_df)

Unnamed: 0,title,release_date,runtime,budget,adj_budget,revenue,adj_revenue,popularity,vote_count,vote_average
0,Reservoir Dogs,1992-09-02,99,1200000,2540721,2859750,6054857,43.988,12753,8.141
1,Pulp Fiction,1994-09-10,154,8000000,16019812,214179088,428888606,69.434,24804,8.489
2,Four Rooms,1995-12-09,98,4000000,7795960,4257354,8297541,23.624,2327,5.742
3,Jackie Brown,1997-12-25,154,12000000,22256912,39673162,73583508,21.078,5607,7.352
4,Kill Bill: Vol. 1,2003-10-10,111,30000000,48514054,180906076,292549571,41.229,15565,8.0
5,Kill Bill: Vol. 2,2004-04-16,136,30000000,47739893,152159461,242135882,31.17,12437,7.878
6,Death Proof,2007-05-22,113,25000000,35966751,31126421,44780649,24.237,4450,6.819
7,Inglourious Basterds,2009-08-19,153,70000000,97027808,321457747,445576295,80.251,19910,8.212
8,Django Unchained,2012-12-25,165,100000000,130299955,425368238,554254623,67.358,23794,8.161
9,The Hateful Eight,2015-12-25,188,44000000,55653651,155760117,197014075,31.074,12818,7.7


We can already see a wide variety of values for almost every column, let's examine some statistical information about each of them.

In [9]:
display(movie_df.describe())

Unnamed: 0,runtime,budget,adj_budget,revenue,adj_revenue,popularity,vote_count,vote_average
count,11.0,11.0,11.0,11.0,11.0,11.0,11.0,11.0
mean,139.363636,38109090.0,52235320.0,172909000.0,248138600.0,44.801909,13276.818182,7.629818
std,30.000909,35406370.0,43002560.0,149682000.0,198336000.0,20.99184,7439.278498,0.784918
min,98.0,1200000.0,2540721.0,2859750.0,6054857.0,21.078,2327.0,5.742
25%,112.0,10000000.0,19138360.0,35399790.0,59182080.0,27.6555,8593.5,7.393
50%,153.0,30000000.0,47739890.0,155760100.0,242135900.0,41.229,12753.0,7.878
75%,158.0,57000000.0,76340730.0,267818400.0,432638800.0,63.368,17737.5,8.151
max,188.0,100000000.0,130300000.0,425368200.0,554254600.0,80.251,24804.0,8.489


With these statistics we can make some early observations:

* Tarantino's films are generally profitable, with the mean adjusted revenue a whole order of magnitude above the mean adjusted budget.
* Most of the standard deviations are high, showing the Tarantino works with a wide range of budgets and his films range in popularity and vote count on TMDB.
* The one column with a low standard deviation is the vote average, showing that, with a mean vote average of 7.6, Tarantino's film consistently score fairly well with TMDB users.

While statistics will give us information about individual columns, we are also interested in how the columns relate to each other. Let's make a pairplot, where each of the columns is plotted against each other. We will leave out unadjusted budget and revenue since they do not correlate will with the other measures based on more recent activity.

In [10]:
fig = px.scatter_matrix(movie_df, dimensions=movie_df[['runtime', 'adj_budget', 'adj_revenue', 'popularity', 'vote_count', 'vote_average']],width=700, height=700)
fig.update_layout(
    margin={'l': 25, 'r': 25, 't': 25, 'b':25}
)
fig.show()

These plots reveal some more interesting observations:

* There seems to be some relationship between adjusted revenue and almost every other column.
* Runtime does not seem to have a relationship with any other column.
* There seems to be a close, non-linear relationship between vote count and vote average.
* Popularity is perhaps less closely related to vote count and vote average than expected.

In addition to the numeric data, there is also genre data for each film. A histogram can be used to examine the genres that Tarantino favors.

In [11]:
display(genre_df.describe())

fig = px.histogram(genre_df, x=genre_df['genre'], height=400, width=600)
fig.update_layout(
    margin={'l': 25, 'r': 25, 't': 25, 'b':25}
)
fig.show()

Unnamed: 0,title,genre
count,27,27
unique,11,8
top,Jackie Brown,Thriller
freq,3,7


No surprises here. For anyone familiar with Tarantino's work "Crime Thriller" was likely to be top of the list of expected genres.

Perhaps less obvious may be the production companies involved in Tarantino's films.

In [12]:
display(prod_company_df.describe())

fig = px.histogram(prod_company_df, x=prod_company_df['company'], height=700, width=800)
fig.update_layout(
    margin={'l': 25, 'r': 25, 't': 25, 'b':25}
)
fig.show()

Unnamed: 0,title,company
count,34,34
unique,11,20
top,Inglourious Basterds,A Band Apart
freq,6,6


We can see that most of the production companies Tarantino works with only contribute to a single film, however there are a few that he has worked with repeatedly. It may be worth investigating which films they worked on and what Tarantino's relationship is to those companies.

### People Data

Next we'll examine the people behind the films. We have data for both actors/actresses and crew members.

Let's start with cast.

In [13]:
display(cast_df.describe())

Unnamed: 0,title,person,character
count,551,551,545
unique,11,446,498
top,Django Unchained,Quentin Tarantino,Tracker
freq,112,9,7


There are 550 total characters in Tarantino's 11 films with *Django Unchained* boasting 112 of them. Interestingly the most common actor in Tarantinos films is Tarantino himself, boasting a total of 9 roles across the 11 films.  

Next up, crew.

In [14]:
display(crew_df.describe())

Unnamed: 0,title,person,job
count,1572,1572,1572
unique,11,1113,281
top,Kill Bill: Vol. 1,Quentin Tarantino,Stunts
freq,220,34,45


Less interesting is the fact that Tarantino has the most crew credits among his films, often performing both roles of writer and director. One interesting point is the fact that, of the 1572 total crew credits, over two thirds are unique individuals. This shows that most of the crew is not reused between films, but there may be some key crew members who worked on several films.

Lastly, we have information about the individual people who make up the cast and crew.

In [15]:
display(person_df.describe(include='all'))

Unnamed: 0,person,gender,popularity
count,1555,1555,1555.0
unique,1530,3,
top,Brett C. Smith,NS,
freq,2,725,
mean,,,3.05327
std,,,8.152782
min,,,0.6
25%,,,0.6
50%,,,0.6
75%,,,1.62


Most interesting here is the popularity column. We can see that the vast majority of the people associated with Tarantino's films are not well known. This is not very surprising given the industries focus on celebrities, but with a maximum of 175, there are clearly some really large outliers.

We can also see that there are some duplicates in the name column which could represent some discrepancy in the data, but it is also likely that it is simply due to the fact that people's names are not unique.

Let's see who the most popular people are.

In [16]:
display(person_df.sort_values('popularity', ascending=False).head(10))

Unnamed: 0,person,gender,popularity
1864,Sydney Sweeney,F,140.582
1845,Austin Butler,M,94.319
1851,Al Pacino,M,68.403
1757,Channing Tatum,M,68.091
1523,Leonardo DiCaprio,M,66.736
1863,Victoria Pedretti,F,63.057
1360,Brad Pitt,M,56.496
303,Salma Hayek,F,56.348
429,Robert De Niro,M,54.01
109,Samuel L. Jackson,M,53.233


Unsurprisingly, super celebrities make up the most popular entries on TMDB, with Sydney Sweeney from the most recent film, *Once Upon a Time... in Hollywood*, topping the chart.

Another point of interest may be the gender breakdown between people who work on Tarantino's films.

In [17]:
df = person_df.join(cast_df.set_index('person'), on='person', how='right')[['title', 'person', 'gender']]

fig = make_subplots(rows=1, cols=3, subplot_titles=('Overall', 'Cast', 'Crew'))

fig.add_trace(go.Histogram(x=person_df['gender']), row=1, col=1)

df = person_df.join(cast_df.set_index('person'), on='person', how='right')[['title', 'person', 'gender']]

fig.add_trace(go.Histogram(x=df['gender']), row=1, col=2)

df = person_df.join(crew_df.set_index('person'), on='person', how='right')[['title', 'person', 'gender']]

fig.add_trace(go.Histogram(x=df['gender']), row=1, col=3)

fig.update_traces(showlegend=False)

fig.update_layout(
    width=1000,
    margin={'l': 25, 'r': 25, 't': 25, 'b':25}
)
fig.show()

We can see that there is actually a large amount of missing data in terms of gender, and none that have been recorded as Non-Binary. Unsurprisingly, the missing information is largely in the crew section, where the people are mostly unknown and TMDB users are less inclined to update information for.

Interestingly, there are about twice as many males and females working as cast on Tarantino's films and even more for the crew. However, this may be more indicative of industry biases than any specific bias of Tarantino or the Production Companies. 

## Analysis

Now that we have taken a look at the data we have, let's see if we can draw any conclusions about what makes Tarantino's films successful.

Before we can begin, we must first determine the relative success between the films.

In [18]:
fig = px.bar(movie_df, x='adj_budget', y='title')
fig.update_layout(
    width=700,
    margin={'l': 25, 'r': 25, 't': 25, 'b':25}
)

fig.show()

fig = px.bar(movie_df, x='adj_revenue', y='title')
fig.update_layout(
    width=700,
    margin={'l': 25, 'r': 25, 't': 25, 'b':25}
)

fig.show()

In [19]:
from bokeh.models import Range1d, Circle, ColumnDataSource, MultiLine, NodesAndLinkedEdges, LabelSet
from bokeh.palettes import Light3

G = nx.from_pandas_edgelist(cast_df, 'person', 'title', 'character')

degrees = dict(nx.degree(G))
nx.set_node_attributes(G, name='degree', values=degrees)
adjusted_node_size = dict([(node, 8*math.log(degree) + 5) for node, degree in nx.degree(G)])
nx.set_node_attributes(G, name='adjusted_size', values=adjusted_node_size)

node_colors = {} 

for node in G.nodes():
    if node in list(movie_df['title']):
        node_colors[node] = Light3[0]
    else:
        node_colors[node] = Light3[1]

nx.set_node_attributes(G, name='color', values=node_colors)

G = nx.convert_node_labels_to_integers(G, label_attribute='name')

#Establish which categories will appear when hovering over each node
HOVER_TOOLTIPS = [
    ("Name", "@name"),
    ("Degree", "@degree")
]


title = 'Tarantino Cast Network'

plot = figure(tooltips = HOVER_TOOLTIPS,
              tools="pan,wheel_zoom,save,reset", active_scroll='wheel_zoom',
              x_range=Range1d(-20.1, 20.1), y_range=Range1d(-20.1, 20.1), title=title)

graph = from_networkx(G, nx.spring_layout, scale=20, center=(0,0))

graph.node_renderer.glyph = Circle(size='adjusted_size', fill_color='color')

graph.edge_renderer.glyph = MultiLine(line_alpha=0.3)

plot.renderers.append(graph)

show(plot)

In [20]:
G = nx.from_pandas_edgelist(crew_df, 'person', 'title', 'job')

degrees = dict(nx.degree(G))
nx.set_node_attributes(G, name='degree', values=degrees)
adjusted_node_size = dict([(node, 8*math.log(degree) + 5) for node, degree in nx.degree(G)])
nx.set_node_attributes(G, name='adjusted_size', values=adjusted_node_size)

node_colors = {} 

for node in G.nodes():
    if node in list(movie_df['title']):
        node_colors[node] = Light3[0]
    else:
        node_colors[node] = Light3[1]

nx.set_node_attributes(G, name='color', values=node_colors)

G = nx.convert_node_labels_to_integers(G, label_attribute='name')

#Establish which categories will appear when hovering over each node
HOVER_TOOLTIPS = [
    ("Name", "@name"),
    ("Degree", "@degree")
]


title = 'Tarantino Crew Network'

plot = figure(tooltips = HOVER_TOOLTIPS,
              tools="pan,wheel_zoom,save,reset", active_scroll='wheel_zoom',
              x_range=Range1d(-20.1, 20.1), y_range=Range1d(-20.1, 20.1), title=title)

graph = from_networkx(G, nx.spring_layout, scale=20, center=(0,0))

graph.node_renderer.glyph = Circle(size='adjusted_size', fill_color='color')

graph.edge_renderer.glyph = MultiLine(line_alpha=0.3)

plot.renderers.append(graph)

show(plot)