# Notebook 5: Recommendation System


With the saved Doc2Vec model, I will now create 3 interactive functions that will allow users to look for similar movie characters based on their preferences. `similar_characters()` receives a chosen character's id as input and returns the top 10 most similar characters based on their cosine similarities. Both `multi_filter()` and `quick_filter()` use `similar_characters()` in their functions. The `multi_filter()` function allows the user to look for a character by filtering through the genres, word count, sentiment, and movies. The `quick_filter()` returns a list of movies based on the user input and allows the user to search for the character in a much more streamlined process.

In [1]:
import pandas as pd
import numpy as np
import pickle
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from IPython.display import Image, display, HTML
import time

Loading movie lines dataframe, movie info dataframe, Doc2Vec model, and list of genres:

In [2]:
mov_lines = pd.read_pickle("../data/mov_model.pkl")
mov_info = pd.read_pickle('../data/mov_combo_final.pkl')
model = Doc2Vec.load("../models/d2v.model")

with open('../data/genres.pkl', 'rb') as f:
    unique_genres = pickle.load(f)

In [3]:
# Characters are in the same position
sum(mov_lines['character'] == mov_info['character'])

76127

In [4]:
mov_info.head(3)

Unnamed: 0,imdb_title,character,text,tokenized_text,word_count,vader,genre,imdb_url,pic_url
0,10 Things I Hate About You (1999),bartender,What can I get you? You forgot to pay!,"[What, can, I, get, you, You, forgot, to, pay]",9,"{'neg': 0.195, 'neu': 0.805, 'pos': 0.0, 'comp...","[Comedy, Drama, Romance]",https://www.imdb.com/title/tt0147800/,https://m.media-amazon.com/images/M/MV5BMmVhZj...
1,10 Things I Hate About You (1999),bianca,Did you change your hair? You might wanna thin...,"[Did, you, change, your, hair, You, might, wan...",1295,"{'neg': 0.108, 'neu': 0.726, 'pos': 0.166, 'co...","[Comedy, Drama, Romance]",https://www.imdb.com/title/tt0147800/,https://m.media-amazon.com/images/M/MV5BMmVhZj...
2,10 Things I Hate About You (1999),bianca and walter,The sound of a fifteen-year-old in labor.,"[The, sound, of, a, fifteen, year, old, in, la...",9,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...","[Comedy, Drama, Romance]",https://www.imdb.com/title/tt0147800/,https://m.media-amazon.com/images/M/MV5BMmVhZj...


The following function, `similar_characters()`, receives a chosen character's id, and returns the top ten most similar characters based on its cosine similarity between their movie lines. The end result provides an IMDb link for whichever movie the recommended character is from and displays a nice little photo.

In [5]:
def similar_characters(input_char_index):
    print("TOP 10 SIMILAR CHARACTERS")
    # Obtain the top results for character based on index
    similar = np.array(model.docvecs.most_similar(input_char_index, topn = 10))
    similar_characters = list(similar[:,0])
    similar_weights = list(similar[:,1])

    # Reduced dataframe
    df = mov_info.loc[similar_characters, :]
    df.reset_index(drop=True,inplace=True)
    df['similarity'] = similar_weights

    # similar character names:
    similar_characters = list(df['character'].map(lambda x: x.strip()))

    # displaying similar characters and their movie title, text, and scores
    display(df[['imdb_title','character', 'text', 'similarity','genre']])

    print("\nWhich recommended character's lines would you like to examine?")
    print("You can either select the character's id (0 to 9) or input the character name.\n")

    while True:
        choose_line = input("Response: ")
        try:
            if str(choose_line) in similar_characters:
                break
        except:
            print('Error, response not found. Please check your spelling.')
            pass
        try:
            if int(choose_line) in range(10):
                choose_line = int(choose_line)
                break
        except:
            print('Error, response not found. Please check your spelling.')
            pass

    if type(choose_line) == int:

        character_name_to_print = df.loc[int(choose_line),'character']
        print(f'\n{character_name_to_print}')

        text_to_display = df.loc[int(choose_line),'text']
        print(f'"{text_to_display}"')

    elif type(choose_line) == str: 
        text_to_display = df[df['character'].str.contains(choose_line)]['text'].values[0]
        print(f'"{text_to_display}"')

    url = mov_info[mov_info['text'] == text_to_display]['imdb_url'].values[0]
    pic = mov_info[mov_info['text'] == text_to_display]['pic_url'].values[0]

    print(f'\nFor more information about the selected movie character, please go to {url}')

    display(Image(url = pic, width = 400, height = 400))

I will arbitrarily select character 99 as a test:

In [6]:
mov_lines.loc[99,'character']

'cutler'

In [7]:
mov_lines.loc[99,'text']

"I'm not your lawyer until I see the money. Emil.  Take it easy.  Stay with me.  Sit down.  What do you need?  What are you looking for? This man is unarmed, officer.  He's surrendered. What are you hitting him for? Don't say anything. I'm coming with you. I'm invoking rights - this man is represented by counsel.  I'm coming with him. Don't say a word.  Don't respond to his taunting!  He's represented by counsel.  You want to speak to someone - you speak to me! Don't you put your hands on me, Detective. My client, Mr. Slovak, is a victim. What's happened is not his fault.  Emil was under the influence of his partner. At the trial, you'll see that my client will be vindicated... I brought you some letters.  It's really fan mail.  Women mostly.  One wants to buy you clothes, another sent a check. Another wants a check. Oh, sure. How're they treating you, alright  I want to get the cuffs off... but there's a little bit of a problem. Things out there are very negative right now for us.  We

It appears that Cutler is a lawyer who appears to be quite rough on the edges. Let's examine what other characters are recommended by his text.

In [8]:
# example:
# hitman
similar_characters(99)

TOP 10 SIMILAR CHARACTERS


Unnamed: 0,imdb_title,character,text,similarity,genre
0,The Grifters (1990),maid,"What if it's wet? I'm glad you're better. Yes,...",0.359508,"[Crime, Drama, Thriller]"
1,Changeling (2008),nd kid,Walter was as tall as me.... I guess...,0.346835,"[Biography, Crime, Drama]"
2,Election (1999),student,"Come on, Tracy.",0.321051,"[Comedy, Drama, Romance]"
3,Chill Factor (1999),brynner,"Hemmings! Sam, I thought I told you to close ...",0.31898,"[Action, Adventure, Comedy]"
4,Man Trouble (1992),ext andy s house patio night,Socorro sets the tray down and departs for the...,0.310872,"[Comedy, Romance]"
5,Robin Hood: Prince of Thieves (1991),int nottingham cathedral day,Jesus hangs from the cross. A magnificent stai...,0.301107,"[Action, Adventure, Drama]"
6,The Last Boy Scout (1991),hitman,"Shit, he's packing. What should we do? There's...",0.301012,"[Action, Comedy, Crime]"
7,The Mechanic (2011),turkish envoy s voice,"No, we're literally locked in here. I'm suppos...",0.298745,"[Action, Crime, Thriller]"
8,Demolition Man (1993),old woman,"Buenos dias, senor.... Esta carne de rodentia....",0.298143,"[Action, Crime, Sci-Fi]"
9,Peggy Sue Got Married (1986),al,"You know, this is very exciting for all of us.",0.29797,"[Comedy, Drama, Fantasy]"



Which recommended character's lines would you like to examine?
You can either select the character's id (0 to 9) or input the character name.

Response: hitman
"Shit, he's packing. What should we do? There's no contract for him. Start walking. I'm right behind you."

For more information about the selected movie character, please go to https://www.imdb.com/title/tt0102266/


Since the doc2vec model was used as an unsupervised model, it is hard to quantitatively examine whether the model is working properly. However, based on the recommended character, I wouldn't expect there to be much similarity between the lawyer cutler and the hitman from 'The Last Boyscout'. The similarities that I can see are their terse sentences laced with cursewords.

The following function, `multi_filter()`, is used in conjunction with `similar_characters()`. The function allows the user to filter through the `mov_lines` dataframe to select a character. The filtering process is the following: selecting the genre, whether the chosen movie character should have positive or negative sentiment, how many words a character should have in their script, and what movie the character should be from. After the character is selected, it is used as the input for the `similar_characters()` function.

In [9]:
def multi_filter():
    print("Welcome to Dansthemanwhosakid's movie character recommendation extravaganza!!!\n")
    print(f"With over {len(mov_info)} characters, you can select a character of your choice.\n")
    print("You can then see which characters are most similar to your selected character.\n")
    print("Which genre would you like to choose?")
    print(f'{unique_genres}\n')

    time.sleep(2)

    while True:
        choose_genre = input("Response: ")
        try:
            if str(choose_genre) in unique_genres:
                break
        except:
            print('Error: response not found. Please check your spelling.')
            pass

    genre_mask = (mov_lines[choose_genre] == 1)

    print("\nDo you want your character to have positive or negative sentiment in their lines?\n")

    while True:
        choose_sentiment = input("Response: ")
        try:
            if str(choose_sentiment) in ['positive','negative']:
                break
        except:
            print('Error: response not found. Please choose either positive or negative.')
            pass

    if choose_sentiment == 'positive':
        sentiment_mask = (mov_lines['vader_cmpd'] >= 0)
    else:
        sentiment_mask = (mov_lines['vader_cmpd'] < 0)

    print("\nHow many words should your character have up to in their movie lines?")

    while True:
        try:
            choose_words = int(input("\nResponse: "))
            if int(choose_words):
                break
        except:
            print('Error! Please input a number greater than 0.')
            pass

    word_count_mask = (mov_lines['word_count'] < choose_words)

    print('\nHere are the following movies that you can choose from:')

    filtered_df = mov_lines[genre_mask & sentiment_mask & word_count_mask][['imdb_title','character','text']]
    movies = list(filtered_df['imdb_title'].unique())

    print(movies)
    print("\nPlease copy and paste a movie into the response:")

    while True:
        choose_mov = input("\nResponse: ")
        try:
            if str(choose_mov) in movies:
                break
        except:
            print('Error! Try again')
            pass

    mov_mask = (mov_lines['imdb_title'] == choose_mov)

    filtered_df = mov_lines[genre_mask & sentiment_mask & 
                            word_count_mask & mov_mask][['imdb_title','character','text']]

    display(filtered_df)

    character_list = list(filtered_df['character'])

    print("\nWhich character's lines would you like to examine?")
    print("You can either select the character's id or input the character name.\n")

    while True:
        choose_character = input("Response: ")
        try:
            if str(choose_character) in character_list:
                break
        except:
            print('Error, response not found. Please check your spelling.')
            pass
        try:
            if int(choose_character) in list(filtered_df.index):
                choose_character = int(choose_character)
                break
        except:
            print('Error, response not found. Please check your spelling.')
            pass

    if type(choose_character) == int:
        character_id = int(choose_character)
    else:
        character_id = filtered_df[filtered_df['character'] == choose_character].index[0]

    character_name = mov_info.loc[character_id,'character']
    character_text = mov_info.loc[character_id,'text']

    print(f'\n{character_name}')
    print(f'\n{character_text}')

    url = mov_info[mov_info['text'] == character_text]['imdb_url'].values[0]
    pic = mov_info[mov_info['text'] == character_text]['pic_url'].values[0]

    print(f'\nFor more information about the selected movie character, please go to {url}')

    display(Image(url = pic, width = 400, height = 400))
    
    similar_characters(character_id)

In [10]:
### Examples picked:
### Horror, negative, 60, The Shining (1980), eileen
multi_filter()

Welcome to Dansthemanwhosakid's movie character recommendation extravaganza!!!

With over 76127 characters, you can select a character of your choice.

You can then see which characters are most similar to your selected character.

Which genre would you like to choose?
['Action', 'Adventure', 'Animation', 'Biography', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Family', 'Fantasy', 'Film-Noir', 'History', 'Horror', 'Music', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Short', 'Sport', 'Thriller', 'War', 'Western']

Response: Horror

Do you want your character to have positive or negative sentiment in their lines?

Response: negative

How many words should your character have up to in their movie lines?

Response: 60

Here are the following movies that you can choose from:
['A Bucket of Blood (1959)', 'A Nightmare on Elm Street (1984)', 'A Nightmare on Elm Street 3: Dream Warriors (1987)', 'A Nightmare on Elm Street 4: The Dream Master (1988)', "A Nightmare on Elm Street 2: Freddy's Reven

Unnamed: 0,imdb_title,character,text
67393,The Shining (1980),,"Danny, did you get tired of bombing the univer..."



Which character's lines would you like to examine?
You can either select the character's id or input the character name.

Response: 



Danny, did you get tired of bombing the universe? Jack, Jack, there's someone else in the hotel with us. There's a crazy woman in one of the rooms. She tried to strangle Danny.

For more information about the selected movie character, please go to https://www.imdb.com/title/tt0081505/


TOP 10 SIMILAR CHARACTERS


Unnamed: 0,imdb_title,character,text,similarity,genre
0,Lost Horizon (1937),close shot a squirrel,"A squirrel, near to Sondra, chatters excitedly.",0.344162,"[Adventure, Drama, Fantasy]"
1,Punch-Drunk Love (2002),cu healthy choice coupon,Barry's scissors cut out a coupon and reveal a...,0.333828,"[Comedy, Drama, Romance]"
2,A Perfect World (1993),eileen,He leaves at four. Not much traffic after lunc...,0.313151,"[Crime, Drama, Thriller]"
3,Platinum Blonde (1931),close shot stew,"In bed, asleep, all curled up, his head on his...",0.311018,"[Comedy, Romance]"
4,Galaxy Quest (1999),b,Tommy maneuvers the field with concentration a...,0.304611,"[Adventure, Comedy, Sci-Fi]"
5,Cellular (2004),is anyone up there,"Jessica calms herself, resolving to save her c...",0.294745,"[Action, Crime, Thriller]"
6,Midnight Cowboy (1969),sad woman,"No, I never had, well, whatever it is you call...",0.292662,[Drama]
7,Alien³ (1992),white haired man,Take it.,0.291957,"[Action, Horror, Sci-Fi]"
8,The Graduate (1967),elaine,Hello. You're living at home now. Is that righ...,0.289186,"[Comedy, Drama, Romance]"
9,The Hospital (1971),resident anesthesiologist,"There's no pulse, Doctor. There's no blood pre...",0.288564,"[Comedy, Drama, Mystery]"



Which recommended character's lines would you like to examine?
You can either select the character's id (0 to 9) or input the character name.

Response: eileen
"He leaves at four. Not much traffic after lunch. If you need me, I'll be right over here. Very polite. So is she dead or not? I'm so sorry. Nosy little feller'. I keep a cot in the back."

For more information about the selected movie character, please go to https://www.imdb.com/title/tt0107808/


This could be by chance, but the comparison between The Shining's character and A Perfect World's Eileen are better. Both character movie lines are quite ominous and deal with death. The Shining's character has the words 'bombing' and 'strangling', whereas Eileen mentions if someone is dead or not, and that she keeps a gun in the back.

The last function deals with directly searching for characters based on which movie they're in. It is a quicker filtering process.

In [11]:
def quick_filter():
    print("Welcome to Dansthemanwhosakid's movie character recommendation extravaganza!!!\n")
    print(f"With over {len(mov_info)} characters, you can select a character of your choice.\n")
    print("You can then see which characters are most similar to your selected character.\n")
    print("Which movie would you like to choose?\n")
    
    filtered_df = mov_info
    filtered_df['imdb_title'] = filtered_df['imdb_title'].map(lambda x: x.lower())
    
    time.sleep(2)
    
    while True:
        choose_mov = str(input("Response: "))
        try:
            if len(filtered_df[filtered_df['imdb_title'].str.contains(choose_mov)]) > 0:
                break
        except:
            print('Error: response not found. Please check your spelling.')
            pass

    mov_mask = (filtered_df['imdb_title'].str.contains(choose_mov))
                        
    filtered_df = filtered_df[mov_mask][['imdb_title','character','text']]
    
    display(filtered_df)
    
    character_list = list(filtered_df['character'])

    print("\nWhich character's lines would you like to examine?")
    print("You can either select the character's id or input the character name.\n")

    while True:
        choose_character = input("Response: ")
        try:
            if str(choose_character) in character_list:
                break
        except:
            print('Error, response not found. Please check your spelling.')
            pass
        try:
            if int(choose_character) in list(filtered_df.index):
                choose_character = int(choose_character)
                break
        except:
            print('Error, response not found. Please check your spelling.')
            pass

    if type(choose_character) == int:
        character_id = int(choose_character)
    elif type(choose_character) == str:
        character_id = filtered_df[filtered_df['character'] == choose_character].index[0]

    character_name = mov_info.loc[character_id,'character']
    character_text = mov_info.loc[character_id,'text']

    print(f'\n{character_name}')
    print(f'\n{character_text}')

    url = mov_info[mov_info['text'] == character_text]['imdb_url'].values[0]
    pic = mov_info[mov_info['text'] == character_text]['pic_url'].values[0]

    print(f'\nFor more information about the selected movie character, please go to {url}')

    display(Image(url = pic, width = 400, height = 400))
    
    similar_characters(character_id)

In [12]:
### Examples picked:
### star wars, 53333, gordon
quick_filter()

Welcome to Dansthemanwhosakid's movie character recommendation extravaganza!!!

With over 76127 characters, you can select a character of your choice.

You can then see which characters are most similar to your selected character.

Which movie would you like to choose?

Response: star wars


Unnamed: 0,imdb_title,character,text
52820,star wars: episode iv - a new hope (1977),astro officer,"We count thirty Rebel ships, Lord Vader. But t..."
52821,star wars: episode iv - a new hope (1977),aunt beru,Luke? Luke! Come to dinner! Where are you goin...
52822,star wars: episode iv - a new hope (1977),bartender,We don't serve their kind here! Your droids. T...
52823,star wars: episode iv - a new hope (1977),base voice,"His computer's off. Luke, you switched off you..."
52824,star wars: episode iv - a new hope (1977),ben,Hello there! Come here my little friend. Don't...
52825,star wars: episode iv - a new hope (1977),ben s voice,"Run, Luke! Run! Luke, the Force will be with y..."
52826,star wars: episode iv - a new hope (1977),beru,"Luke, tell Owen that if he gets a translator t..."
52827,star wars: episode iv - a new hope (1977),biggs,"Just now. I wanted to surprise you, hot shot. ..."
52828,star wars: episode iv - a new hope (1977),camie,It was just wormie on another rampage. Don't w...
52829,star wars: episode iv - a new hope (1977),captain,Hold your fire. There are no life forms. It mu...



Which character's lines would you like to examine?
You can either select the character's id or input the character name.

Response: 53333

yoda

The very Republic is threatened, if involved the Sith are. Hard to see, the dark side is. Discover who this assassin is, we must. With this Naboo queen you must stay, Qui-Gon. Protect her. May the Force be with you. Master Qui-Gon more to say have you? A vergence, you say? But you do! Rrevealed your opinion is. Trained as a Jedi, you request for him? Tested he will be. Good, good, young one. How feel you? Afraid are you? See through you, we can. Afraid to lose her..I think. Eveything. Fear is the path to the dark side... fear leads to anger... anger leads to hate.. hate leads to suffering. A Jedi must have the deepest commitment, the most serious mind. I sense much fear in you. Then continue, we will. ...Correct you were, Qui-Gon. Clouded, this boy's future is. Masked by his youth. An apprentice, you have, Qui-Gon. Impossible, to take on a se

TOP 10 SIMILAR CHARACTERS


Unnamed: 0,imdb_title,character,text,similarity,genre
0,batman & robin (1997),gordon,"Miss Ivy, you've just met one of the most sini...",0.328585,"[Action, Sci-Fi]"
1,american shaolin (1991),establishing shot of door to the sleeping quar...,"Suddenly, it BURSTS open and in come the disci...",0.317977,[Action]
2,the invention of lying (2009),woman,I've just thought the chocolate sauce is diarr...,0.312965,"[Comedy, Fantasy, Romance]"
3,get shorty (1995),doris,"Harry Zimm. You look like a wet kiss. Well, ar...",0.306574,"[Comedy, Crime, Thriller]"
4,bean (1997),nurse,"Doctor ... Bean? Just in time, sir. Allow me. ...",0.301113,"[Adventure, Comedy, Family]"
5,the matrix reloaded (2003),cu trinity,"She slaps a magazine into her pistol, pulls th...",0.300249,"[Action, Sci-Fi]"
6,star trek ii: the wrath of khan (1982),intercom voice,Docking procedure complete. This is Starfleet ...,0.298779,"[Action, Adventure, Sci-Fi]"
7,hellraiser: hellseeker (video 2002),lange,Thanks for coming down Mr. Gooding. Has your h...,0.297167,"[Horror, Mystery, Thriller]"
8,neuromancer,wintermute,"We had to die, Case... Both of us... To break ...",0.295322,[Sci-Fi]
9,stir of echoes (1999),maggie,"Who you talkin' to, Jake? He turns to her and ...",0.295039,"[Horror, Mystery, Thriller]"



Which recommended character's lines would you like to examine?
You can either select the character's id (0 to 9) or input the character name.

Response: gordon
"Miss Ivy, you've just met one of the most sinister men in Gotham. There's no sign he came back here after the escape.  We pulled this off the surveillance cameras at Arkham. What happened?  How'd they get away? It's on top of police headquarters. Why, I'm Commissioner of Police.  I have the keys right here."

For more information about the selected movie character, please go to https://www.imdb.com/title/tt0118688/


In this `quick_filter()` example, it appears that there is an overlap between Star War's episode 1's Yoda and Batman & Robin's Commissioner Gordon. It appears that both characters are in some kind of distress as Yoda exclaims 'Fear is the path to the dark side... fear leads to anger... anger leads to hate.. hate leads to suffering', and Gordon talks about 'one of the most sinister men in Gotham'.

## Conclusion:

As you can see, natural language processing combined with neural networks can be pretty fun! The objective of the project is to allow a user to select movies and characters based on the quotes and dialogue of a different character. Perhaps there was a lasting impression of a particular movie character that a user wants to see again in a different setting. There are a few issues however. Though the data was acquired through two corporas, a few of the character lines are actually action and scene cues in a script. Non-quote and dialogue sentences may drastically affect the model's predictive abilities. Furthermore, a different statistical model, such as a Latent Dirichlet Allocation, can be used to explain why some documents are more similar to one another by allowing observations to be explained by unobserved groups. Perhaps a different recommendation system can be built with higher cosine similarities.