In [1]:
# Install SentenceBert Library
!pip install -U sentence-transformers

Successfully installed sacremoses-0.0.43 sentence-transformers-1.0.3 sentencepiece-0.1.95 tokenizers-0.10.1 transformers-4.4.2


In [4]:
# Import dataset
import pandas as pd
pd.set_option('display.max_colwidth', 500)

df = pd.read_csv("/content/drive/MyDrive/Full Projects/IMDb Comp Search (Complete)/imdb_semantic_search.csv")
df.head()

Unnamed: 0,Name,Story,Genres,Certificate
0,Bridgerton,"Wealth, lust, and betrayal set against the backdrop of Regency-era England, seen through the eyes of the powerful Bridgerton family.","Drama ,Romance",TV-MA
1,Cobra Kai,"Thirty years after their final confrontation at the 1984 All Valley Karate Tournament, Johnny Lawrence is at rock-bottom as an unemployed handyman haunted by his wasted life. However, when Johnny rescues a bullied kid, Miguel, from bullies, he is inspired to restart the notorious Cobra Kai dojo. However, this revitalization of his life and related misunderstandings find Johnny restarting his old rivalry with Daniel LaRousso, a successful businessman who may be happily married, but is missing...","Action ,Comedy ,Drama ,Sport",TV-14
2,The Mandalorian,"After the stories of Jango and Boba Fett, another warrior emerges in the Star Wars universe. The Mandalorian is set after the fall of the Empire and before the emergence of the First Order. We follow the travails of a lone gunfighter in the outer reaches of the galaxy far from the authority of the New Republic.","Action ,Adventure ,Sci-Fi",TV-14
3,Superstore,A look at the lives of employees at a big box store.,Comedy,TV-14
4,Game of Thrones,"In the mythical continent of Westeros, several powerful families fight for control of the Seven Kingdoms. As conflict erupts in the kingdoms of men, an ancient enemy rises once again to threaten them all. Meanwhile, the last heirs of a recently usurped dynasty plot to take back their homeland from across the Narrow Sea.","Action ,Adventure ,Drama ,Fantasy",TV-MA


In [5]:
# Set story list
# tv_ma = df[df['Certificate'] == 'TV-MA']
# all = df

In [15]:
# Import library, utilities 
from sentence_transformers import SentenceTransformer, util
import torch

# Set embedding model and max_seq_len and push to GPU
embedder = SentenceTransformer('bert-base-uncased')
embedder.to('cuda')
embedder.max_seq_len = 512

# tvma_stories = tv_ma['Story'].tolist()
# tvma_titles = tv_ma['Name'].tolist()

# Set feature lists for concatonation to sematic asearch results
titles = df['Name'].tolist()
ratings = df['Certificate'].tolist()
stories = df['Story'].tolist()

# Fit model to corpus
# tvma_embeddings = embedder.encode(tvma_stories, convert_to_tensor=True)
story_embeddings = embedder.encode(stories, convert_to_tensor=True)
story_embeddings = story_embeddings.to('cuda')

In [36]:
# Define Semantic Search Function
def semantic_search(story):
  # set lists to capture results
  title_list = []
  rating_list = []
  story_list = []
  score_list = []
  # empty dataframe to display results 
  results = pd.DataFrame()



  # Find the closest 5 stories of the corpus for each query sentence based on cosine similarity
  top_k = min(10, len(story_embeddings))

  query_embeddings = embedder.encode(story, convert_to_tensor=True)
  query_embeddings = query_embeddings.to('cuda')

  # Use cosine-similarity and torch.topk to find the highest 5 scores
  cos_scores = util.pytorch_cos_sim(query_embeddings, story_embeddings)[0]
  top_results = torch.topk(cos_scores, k=top_k)

  story = story.replace('.', '.\n')
  print("\n\n======================")
  print("\tSTORY")
  print("======================\n")
  print('',story)
  print("\n\n======================")
  print("    TOP 10 RESULTS")
  print("======================\n")
  
  # For score, index in torch.topk(cos_scores, k=top_k) use index  locator for feature lists
  # push score to cpu and convert to 1D array
  for score, idx in zip(top_results[0], top_results[1]):
    title_list.append(titles[idx])
    rating_list.append(ratings[idx])
    story_list.append(stories[idx])
    score_list.append(score.cpu().numpy().flatten())

  # Push results to dictionary columns 
  results['Title'] = title_list
  results['Rating'] = rating_list
  results['Story'] = story_list
  results['Score'] = score_list
  # return dictionary
  return results
  
# User input function 
story = input("""Enter Story: """)
# Push user input to Semantic Search function
semantic_search(story)

Enter Story: One morning in an ordinary town, five people are shot dead in a seemingly random attack. All evidence points to a single suspect: an ex-military sniper who is quickly brought into custody. The man's interrogation yields one statement: Get Jack Reacher (Tom Cruise). Reacher, an enigmatic ex-Army investigator, believes the authorities have the right man but agrees to help the sniper's defense attorney (Rosamund Pike). However, the more Reacher delves into the case, the less clear-cut it appears.


	STORY

 One morning in an ordinary town, five people are shot dead in a seemingly random attack.
 All evidence points to a single suspect: an ex-military sniper who is quickly brought into custody.
 The man's interrogation yields one statement: Get Jack Reacher (Tom Cruise).
 Reacher, an enigmatic ex-Army investigator, believes the authorities have the right man but agrees to help the sniper's defense attorney (Rosamund Pike).
 However, the more Reacher delves into the case, the l

Unnamed: 0,Title,Rating,Story,Score
0,Vanished,TV-14,"The FBI examines various crimes all somehow relating to the kidnapping of Sara, U.S. senator Jeffrey Collins's second wife, from Atlanta. First FBI special agent Graham Kelton is in charge, who proved his value in a previous kidnapping case but was traumatized for life by its ending, costing the life of his own son Nathan, takes charge, but quickly finds he's not just dealing with ransom-thieves but facing a complex conspiratorial web involving the senator's crucial political role, notably i...",[0.91909117]
1,The Fugitive,,"Dr. Richard Kimble is framed for his wife's murder by a mysterious one-armed man. During sentencing Kimble escapes intending to catch the one-armed man and find out why he was framed. Following in hot pursuit is Inspector Philip Gerard, who is intending to bring in Kimble alive. But Gerard and the one-armed man are not the only thing Kimble has to worry about. The father of his late wife has hired bounty hunters who are willing to break the law to catch him, and in the age of internet tracki...",[0.9125941]
2,Instinct,TV-14,"Former CIA operative is lured back to his old life when the NYPD needs his help to stop a serial killer. Dr. Dylan Reinhart (Alan Cumming) is a gifted author and university professor living a quiet life teaching psychopathic behavior to packed classes of adoring students. But when top NYPD detective Lizzie Needham (Bojana Novakovic) appeals to him to help her catch a serial murderer who is using Dylan's first book as a tutorial, Dylan is compelled by the case, comes out of retirement and tap...",[0.9074605]
3,Lincoln Rhyme: Hunt for the Bone Collector,TV-14,"Inspired by the best-selling book, the enigmatic and notorious serial killer known only as ""The Bone Collector"" once terrified New York City...Until he seemingly disappeared. Now, three years later, when an elaborate murder points to his return, it brings former NYPD detective and forensic genius Lincoln Rhyme (Russell Hornsby) out of retirement and back into the fold. Rhyme has a personal connection to the case - a trap set by the killer left him paralyzed - but this time he's teaming up wi...",[0.9045589]
4,Barry,TV-MA,"Disillusioned at the thought of taking down another ""mark,"" depressed, low-level hit man Barry Berkman seeks a way out. When the Midwesterner reluctantly travels to Los Angeles to execute a hit on an actor who is bedding a mobster's wife, little does Barry know that the City of Angels may be his sanctuary. He follows his target into acting class and ends up instantly drawn to the community of eager hopefuls, especially dedicated student Sally, who becomes the object of his affection. While B...",[0.90067184]
5,Truth Be Told,TV-MA,"True crime podcaster Poppy Parnell is called to investigate the case of convicted killer Warren Cave, a man she incriminated after he murdered the father of two identical twins. Soon, Parnell must decide where the lines between guilty and innocent lie when Cave confesses to the fact that he was framed for the crime.",[0.8975978]
6,Deputy,TV-14,"The Los Angeles County Sheriff's Department is one of the largest police forces in the world, but when the elected Sheriff dies, an arcane rule in the county charter, forged back in the Wild West, suddenly thrusts the most unlikely man into the job. That man is Bill Hollister. A fifth-generation lawman, Bill is only interested in justice; his soul wears a white hat. The bad guys don't stand a chance, but neither do the politicos in the Hall of Justice. Under Bill's command is a county-wide c...",[0.8965649]
7,Berlin Station,TV-MA,"Follows Daniel Miller (Richard Armitage), who has just arrived at the CIA foreign station in Berlin, Germany. Miller has a clandestine mission: to uncover the source of a leak who has supplied information to a now-famous whistleblower named Thomas Shaw. Guided by veteran Hector DeJean (Rhys Ifans), Daniel learns to contend with the rough-and-tumble world of the field agent: agent-running, deception, and the dangers and moral compromises.",[0.8963322]
8,Raines,TV-14,"Los Angeles. Present day. Michael Raines, an eccentric but brilliant cop, solves murders in a very unusual way - he turns the victims into his partners. These visions are figments of Raines' imagination, and he knows it, but when he can't make the dead disappear, he works with them to find the killer. Through his discussions, along with the evidence, Raines' image of the victim changes until he has a clear picture of what really happened. Only when the case is closed do the visions end. Othe...",[0.8960431]
9,Project Blue Book,TV-14,"In this conspiratorial Sci-Fi thriller set some time after WWII and loosely based on the US government's real life Project Blue Book, Captain Michael Quinn and Dr. Allen Hynek are tasked by the US Air Force to investigate reports of UFOs and debunk them, or at least come up with rational explanations for them. While Quinn, a smooth and tough military type, doesn't care about anything other than doing the job he was given, at first, the more skeptical Hynek quickly becomes convinced that some...",[0.89547336]
