# Cheating at Six Degrees of Kevin Bacon

I wanted to follow up on my LinkedIn post where I stated that programming is not about languages, it's about the processes and algorithms behind them. So, given that Python is not significantly highlighted on my resume, I went ahead and spent a little bit of time writing some code to help cheat at [The six degrees of Kevin Bacon](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon). I allowed myself to use data analytics libraries and used the [imdb Non-Comercial Dataset](https://developer.imdb.com/non-commercial-datasets/) to get the data.

In [1]:
# Import Pandas for Dataframes / easy data manipulation
import pandas as pd
# Import networkx for graph manipulation
import networkx as nx

In [2]:
# Import data sets downloaded from imdb
name_basics_df = pd.read_csv("name.basics.tsv.gz", sep='\t')
title_basics_df = pd.read_csv("title.basics.tsv.gz", sep='\t')
title_principals_df = pd.read_csv("title.principals.tsv.gz", sep='\t')
# It's okay console warning, everything is going to be okay

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


In [3]:
# Change nconst to index for easy lookups
name_basics_df = name_basics_df.set_index("nconst")
# Show the first few rows just to see what the data looks like
name_basics_df.head()

Unnamed: 0_level_0,primaryName,birthYear,deathYear,primaryProfession,knownForTitles
nconst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
nm0000001,Fred Astaire,1899,1987,"soundtrack,actor,miscellaneous","tt0050419,tt0053137,tt0045537,tt0072308"
nm0000002,Lauren Bacall,1924,2014,"actress,soundtrack","tt0075213,tt0037382,tt0117057,tt0038355"
nm0000003,Brigitte Bardot,1934,\N,"actress,soundtrack,music_department","tt0049189,tt0057345,tt0056404,tt0054452"
nm0000004,John Belushi,1949,1982,"actor,soundtrack,writer","tt0077975,tt0080455,tt0078723,tt0072562"
nm0000005,Ingmar Bergman,1918,2007,"writer,director,actor","tt0083922,tt0069467,tt0050976,tt0050986"


In [4]:
# We are only going to get movies that are for a general audience
title_basics_df = title_basics_df.query('isAdult == 0 & titleType == "movie"')
# Change tconst to index for easy lookups
title_basics_df = title_basics_df.set_index("tconst")
# Show the first few rows just to see what the data looks like
title_basics_df.head()

Unnamed: 0_level_0,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
tconst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
tt0000009,movie,Miss Jerry,Miss Jerry,0,1894,\N,45,Romance
tt0000147,movie,The Corbett-Fitzsimmons Fight,The Corbett-Fitzsimmons Fight,0,1897,\N,100,"Documentary,News,Sport"
tt0000502,movie,Bohemios,Bohemios,0,1905,\N,100,\N
tt0000574,movie,The Story of the Kelly Gang,The Story of the Kelly Gang,0,1906,\N,70,"Action,Adventure,Biography"
tt0000591,movie,The Prodigal Son,L'enfant prodigue,0,1907,\N,90,Drama


In [5]:
# We will just get the actor credits
title_principals_df = title_principals_df.query('category == "actor" | category == "actress"')
# Show the first few rows just to see what the data looks like
title_principals_df.head()

Unnamed: 0,tconst,ordering,nconst,category,job,characters
11,tt0000005,1,nm0443482,actor,\N,"[""Blacksmith""]"
12,tt0000005,2,nm0653042,actor,\N,"[""Assistant""]"
16,tt0000007,1,nm0179163,actor,\N,\N
17,tt0000007,2,nm0183947,actor,\N,\N
21,tt0000008,1,nm0653028,actor,\N,"[""Sneezing Man""]"


In [6]:
# We will now create a graph from all the actors to their films
actor_to_movies_graph = nx.Graph()

all_movie_ids = list(title_basics_df.index.unique())
all_actor_ids = list(title_principals_df["nconst"].unique())

node_list = all_movie_ids + all_actor_ids

# Create nodes
actor_to_movies_graph.add_nodes_from(node_list)

In [7]:
# Use inner join to ensure only movies are included in edges
title_principals_filtered_by_movies = title_principals_df.join(title_basics_df, how='inner', on='tconst')

# Create list of edges
edge_list = []
for index, row in title_principals_filtered_by_movies.iterrows():
    edge_list.append((row['tconst'], row['nconst']))

# Add edges to the graph
actor_to_movies_graph.add_edges_from(edge_list)

In [8]:
def translate_shortest_path_to_human_readable_path(id_path):
    # Translate the shortest path into human readable strings
    readable_shortest_path = []
    for item in id_path:
        if item.startswith("nm"):
            readable_shortest_path.append("Actor: " + name_basics_df.at[item, "primaryName"])
        else:
            readable_shortest_path.append("Movie: " + title_basics_df.at[item, "primaryTitle"])
    return readable_shortest_path

# Create a function for finding the shortest path. It will take in two strings of actors names.
# For simplicity, we will not error check / handle. However, if the name doesn't exist, this
# code will throw an error. Additionally, imdb has multiple actors with the same name. Therefore,
# we must check all just to be safe
def get_shortest_degrees_of_kevin_bacon_path(first_actor_name, second_actor_name):
    all_paths = []
    # Translate actor names to their indexes
    first_actor_list = name_basics_df.loc[name_basics_df.primaryName == first_actor_name].iloc
    second_actor_list = name_basics_df.loc[name_basics_df.primaryName == second_actor_name].iloc
    # Use the networkx graph we created to find the shortest path for each actor combo
    for first_actor in first_actor_list:
        for second_actor in second_actor_list:
            try:
                shortest_path = nx.shortest_path(actor_to_movies_graph, first_actor.name, second_actor.name)
                all_paths.append(translate_shortest_path_to_human_readable_path(shortest_path))
            except:
                pass
    return all_paths    

In [9]:
get_shortest_degrees_of_kevin_bacon_path("Will Smith", "Kevin Bacon")

[['Actor: Will Smith',
  'Movie: Concussion',
  'Actor: Alec Baldwin',
  "Movie: She's Having a Baby",
  'Actor: Kevin Bacon']]

In [10]:
get_shortest_degrees_of_kevin_bacon_path("James Stewart", "Barry Keoghan")

[['Actor: James Stewart',
  'Movie: The Magic of Lassie',
  'Actor: Mickey Rooney',
  'Movie: Babe: Pig in the City',
  'Actor: James Cromwell',
  'Movie: Owd Bob',
  'Actor: Colm Meaney',
  'Movie: Bring Them Down',
  'Actor: Barry Keoghan'],
 ['Actor: James Stewart',
  'Movie: The Ripper',
  'Actor: Scott Fulmer',
  'Movie: The Possession 2',
  'Actor: Tristan Riggs',
  "Movie: Dante's Hotel",
  'Actor: Marcia Rodd',
  'Movie: T.R. Baskin',
  'Actor: James Caan',
  'Movie: This Is My Father',
  'Actor: Colm Meaney',
  'Movie: Bring Them Down',
  'Actor: Barry Keoghan']]

In [11]:
get_shortest_degrees_of_kevin_bacon_path("Jennifer Lawrence", "Chris Pratt")

[['Actor: Jennifer Lawrence', 'Movie: Passengers', 'Actor: Chris Pratt'],
 ['Actor: Jennifer Lawrence',
  'Movie: The Hunger Games',
  'Actor: Liam Hemsworth',
  'Movie: Love and Honor',
  'Actor: Austin Stowell',
  'Movie: Higher Power',
  'Actor: Jordan Danger',
  'Movie: Stuck in the Middle',
  'Actor: Temple Baker',
  'Movie: When We Burn Out',
  'Actor: Christian Bland',
  'Movie: Motorcycle',
  'Actor: Chris Pratt']]

And there it is. A simple six degrees solution written in a language I don't have on my resume. For full transparency, I used Python in college and used it on a job for processing pdf documents for OCR processes. However, I don't go into it because it wasn't my main focus in that role. To try to categorize a dev into what is on the sheet of paper is reductive. It is much better to look at a developer as a problem solver. Just my two cents. Thanks for reading and hope you all have a wonderful weekend, week, or whatever holiday that is occurring when you read this.