<a href="https://colab.research.google.com/github/AyushiKashyapp/movie_question_answering_neo4j/blob/main/QandAusingNeo4j.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Movie Question Answering System using Neo4j Knowledge Database


*Reference*: [Tutorial](https://www.youtube.com/watch?v=dRO43qnkNqg&list=PLreVlKwe2Z0Sf3lEsAJ0E7kv60vdhnUR7&index=3&ab_channel=BhaveshBhatt)

**Installing required libraries**

- neo4j-driver  : To connect with the neo4j movie data sandbox.
- gradio        : To build a demo or web application for question answering chat system.

In [1]:
!pip install -q neo4j-driver
!pip install -q gradio

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m208.3/208.3 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for neo4j-driver (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.3/12.3 MB[0m [31m32.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.0/92.0 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m318.1/318.1 kB[0m [31m24.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━

In [2]:
from neo4j import GraphDatabase
import spacy
import gradio as gr

# Cypher Querying to analyse the Movies Database

**Get a list of all movies in database with their release date**

In [14]:
from neo4j import GraphDatabase, basic_auth

# Creating a driver object to connect to the Neo4j database, using the Bolt protocol (indicated by bolt://) to the specified IP address and port.
driver = GraphDatabase.driver(
  "bolt://3.84.44.98:7687",
  auth=basic_auth("neo4j", "death-assistance-affair"))

# Matching a movie node to get a list of all movies in database with their release date
cypher_query = '''
MATCH (movie:Movie)
 RETURN movie.title as title, movie.released as release_date
 ORDER BY movie.released
'''
# Opens a session with the database named "neo4j". Execute the Cypher Query with read transcation and convert the result into a list of dictionaries using data().
with driver.session(database="neo4j") as session:
  result = session.read_transaction(lambda tx: tx.run(cypher_query).data())
  for movies in result:
    print(f"The movie, {movies['title']} was released in {movies['release_date']}")

driver.close()

  result = session.read_transaction(lambda tx: tx.run(cypher_query).data())


The movie, One Flew Over the Cuckoo's Nest was released in 1975
The movie, Top Gun was released in 1986
The movie, Stand By Me was released in 1986
The movie, Joe Versus the Volcano was released in 1990
The movie, A Few Good Men was released in 1992
The movie, Unforgiven was released in 1992
The movie, Hoffa was released in 1992
The movie, A League of Their Own was released in 1992
The movie, Sleepless in Seattle was released in 1993
The movie, Johnny Mnemonic was released in 1995
The movie, Apollo 13 was released in 1995
The movie, That Thing You Do was released in 1996
The movie, The Birdcage was released in 1996
The movie, Twister was released in 1996
The movie, The Devil's Advocate was released in 1997
The movie, As Good as It Gets was released in 1997
The movie, What Dreams May Come was released in 1998
The movie, You've Got Mail was released in 1998
The movie, When Harry Met Sally was released in 1998
The movie, The Matrix was released in 1999
The movie, Snow Falling on Cedars wa

**A list of all the actors in "The matrix".**

In [16]:
from neo4j import GraphDatabase, basic_auth

# Creating a driver object to connect to the Neo4j database, using the Bolt protocol (indicated by bolt://) to the specified IP address and port.
driver = GraphDatabase.driver(
  "bolt://3.84.44.98:7687",
  auth=basic_auth("neo4j", "death-assistance-affair"))

# Matching a movie node with the specified title (passed as a parameter $favorite), finds actors who acted in this movie, and then finds other movies that those actors have also acted in.
cypher_query = '''
MATCH (actor:Person)-[:ACTED_IN]->(movie:Movie {title: "The Matrix"})
RETURN actor.name as actor_name
'''
# Opens a session with the database named "neo4j". Execute the Cypher Query with read transcation and convert the result into a list of dictionaries using data().
with driver.session(database="neo4j") as session:
  result = session.read_transaction(lambda tx: tx.run(cypher_query).data())
  for actor in result:
    print(actor)

driver.close()

  result = session.read_transaction(lambda tx: tx.run(cypher_query).data())


{'actor_name': 'Emil Eifrem'}
{'actor_name': 'Hugo Weaving'}
{'actor_name': 'Laurence Fishburne'}
{'actor_name': 'Carrie-Anne Moss'}
{'actor_name': 'Keanu Reeves'}


In [5]:
from neo4j import GraphDatabase, basic_auth

# Creating a driver object to connect to the Neo4j database, using the Bolt protocol (indicated by bolt://) to the specified IP address and port.
driver = GraphDatabase.driver(
  "bolt://3.84.44.98:7687",
  auth=basic_auth("neo4j", "death-assistance-affair"))

# Matching a movie node with the specified title (passed as a parameter $favorite), finds actors who acted in this movie, and then finds other movies that those actors have also acted in.
cypher_query = '''
MATCH (movie:Movie {title:$favorite})<-[:ACTED_IN]-(actor)-[:ACTED_IN]->(rec:Movie)
 RETURN distinct rec.title as title LIMIT 20
'''
# Opens a session with the database named "neo4j". Execute the Cypher Query with read transcation and convert the result into a list of dictionaries using data().
with driver.session(database="neo4j") as session:
  results = session.read_transaction(
    lambda tx: tx.run(cypher_query,
                      favorite="The Matrix").data())
  for record in results:
    print(record['title'])

driver.close()


  results = session.read_transaction(


Cloud Atlas
V for Vendetta
The Matrix Revolutions
The Matrix Reloaded
Something's Gotta Give
The Replacements
Johnny Mnemonic
The Devil's Advocate


# ChatBot

A chatbot to answer the questions about the release dates of the movies mentioned in the questions by the user.

**Loading the spacy model "en_core_web_sm"**

In [4]:
nlp = spacy.load("en_core_web_sm")

**A function to extract entities out of the questions asked about the movies.**

In [7]:
def extract_entity(question, entity_type):
  doc = nlp(question) # Processing the input text using the NLP Model spacy, and storing the extracted entities in a doc object.
  for ent in doc.ents: # Iterating over all the entities.
    if ent.label_ in entity_type: #Checking if the label of the entity matches the type of entity type (e.g., "PERSON", "ORG", etc.). .
      return ent.text  #Returning actual substring from the input text that was recognized as an entity.
  return None

**A function to get the answers for the questions and keep the history of questions and their answers.**

In [8]:
def chatbot(input, history = []):
  output = get_answer(input) #Calling the getanswer function on the input text (question).
  history.append((input, output)) #Appends a tuple of the input and the generated output to the conversation history.
  return history, history

**A function to match the movie in the question with the movies in Neo4j DB and return the respective release date.**

In [9]:
def get_answer(question):
    doc = nlp(question) #Processing the input text using an NLP model, spacy.
    movie_title = extract_entity(question, ["WORK_OF_ART", "MOVIE"]) #Extracting movie title from the question (entities with label as "WORK_OF_ART", "MOVIE")
    director_name = extract_entity(question, ["PERSON"]) #Extracting the director name from the question (entities with label as PERSON)
    if movie_title is None:
        return "Sorry, I don't understand what movie you're asking about." #Default response if movie title is not found in the question text.

    with driver.session() as session: #Opening a session with Neo4j
        if "release" in question or "released" in question: #Check if the question is about release of the movie.
            result = session.run(f"MATCH (m:Movie) WHERE m.title = '{movie_title}' RETURN m.released") #Cypher query to find the release date of the specified movie.
            if result.peek() is None: #Checking if the query returned any result.
                return f"No information found for the movie '{movie_title}'"
            for record in result:
                return f"The release date of '{movie_title}' is: {record['m.released']}" #Return result if the movie in the question is found.
        else:
            return "Sorry, I don't understand what you're asking."

**Using gradio interface to launch a web interface for question inputs and answer outputs.**

In [10]:
gr.Interface(fn = chatbot,
             inputs = ["text",'state'],
             outputs = ["chatbot",'state']).launch(debug = True)

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://c57803d67cc6329527.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


  with driver.session() as session:


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://c57803d67cc6329527.gradio.live


