### Build a Question Answering application over a Graph Database

- Initialize the variables u downloaded while creating a free instance from Neo4j

In [1]:
NEO4J_URI = "neo4j+s://a6208d63.databases.neo4j.io"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = "rnl7AXFxrALS17r_57UEZ-VKjzuNlsMkyVlPwZyd4Z0"
## U can also put these in the environment varaible if u want and probably read from those environment variables

In [2]:
import os
## Setting up the environment variables
os.environ["NEO4J_URI"] = NEO4J_URI
os.environ["NEO4J_USERNAME"] = NEO4J_USERNAME
os.environ["NEO4J_PASSWORD"] = NEO4J_PASSWORD

In [3]:
from langchain_community.graphs import Neo4jGraph
## Neo4jGraph actually helps u to connect to ur db with the help of the information which u have i.e. NEO4J_URI, NEO4J_USERNAME, NEO4J_PASSWORD
graph = Neo4jGraph(url=NEO4J_URI, username=NEO4J_USERNAME, password=NEO4J_PASSWORD)
## This way also we will be able to initialize our graph, which will be basically connected to ur entire database
graph

<langchain_community.graphs.neo4j_graph.Neo4jGraph at 0x2484d152cf0>

In [4]:
## Dataset Movie
URL = "https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv"
## Above is the RAW Data URL
movie_query = """
LOAD CSV WITH HEADERS FROM
'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv' as row

MERGE(m:Movie{id:row.movieId})
SET m.released = date(row.released),
    m.title = row.title,
    m.imdbRating = toFloat(row.imdbRating)
FOREACH (director in split(row.director, '|') |
    MERGE (p:Person {name:trim(director)})
    MERGE (p)-[:DIRECTED]->(m))
FOREACH (actor in split(row.actors, '|') |
    MERGE (p:Person {name:trim(actor)})
    MERGE (p)-[:ACTED_IN]->(m))
FOREACH (genre in split(row.genres, '|') |
    MERGE (g:Genre {name:trim(genre)})
    MERGE (m)-[:IN_GENRE]->(g))    
"""
## We will be putting all the directors in the person node itself
## Above we have used SET keyword to assign property to the variables
## Above using for each loop we have created multiple relationships: person to movie, actor to movie, movie to genre
## This is the query to probably load this entire dataset

In [5]:
movie_query

"\nLOAD CSV WITH HEADERS FROM\n'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv' as row\n\nMERGE(m:Movie{id:row.movieId})\nSET m.released = date(row.released),\n    m.title = row.title,\n    m.imdbRating = toFloat(row.imdbRating)\nFOREACH (director in split(row.director, '|') |\n    MERGE (p:Person {name:trim(director)})\n    MERGE (p)-[:DIRECTED]->(m))\nFOREACH (actor in split(row.actors, '|') |\n    MERGE (p:Person {name:trim(actor)})\n    MERGE (p)-[:ACTED_IN]->(m))\nFOREACH (genre in split(row.genres, '|') |\n    MERGE (g:Genre {name:trim(genre)})\n    MERGE (m)-[:IN_GENRE]->(g))    \n"

In [None]:
graph.query(movie_query)  ## This is how we execute the query we have specified above

[]

In [None]:
graph.refresh_schema() ## Refreshing the graph schema
print(graph.schema)  ## It will show all the Node properties and all the Relationship properties

Node properties:
CEO {POB: STRING, name: STRING, YOB: INTEGER}
Company {name: STRING}
Entrepreneur {POB: STRING, name: STRING, YOB: INTEGER}
Country {name: STRING}
Person {name: STRING, born: INTEGER}
Movie {title: STRING, released: INTEGER, id: STRING, imdbRating: FLOAT}
User {name: STRING, city: STRING, userId: INTEGER, age: INTEGER}
Post {postId: INTEGER, content: STRING, timestamp: DATE_TIME}
Genre {name: STRING}
Relationship properties:

The relationships:
(:Entrepreneur)-[:LIVES_IN]->(:Country)
(:Person)-[:ACTED_IN]->(:Movie)
(:Person)-[:DIRECTED]->(:Movie)
(:Movie)-[:IN_GENRE]->(:Genre)
(:User)-[:POSTED]->(:Post)
(:User)-[:FRIEND]->(:User)
(:User)-[:LIKES]->(:User)


- Now since u have used the credentials u downloaded for that particular instance in neo4j auradb, u will be able to view it(the graph, nodes, relationships and data) in the Neo4jAuraDB when u try to refresh it.

- To insert this entire data we just used graph.query() function --> And this executed the entire query itself inside the graph database.

In [29]:
import os
from dotenv import load_dotenv
load_dotenv()

groq_api_key = os.getenv("GROQ_API_KEY")

In [30]:
from langchain_groq import ChatGroq
llm = ChatGroq(groq_api_key=groq_api_key, model_name="llama-3.1-8b-instant")
llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x000002486C21B8C0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000002486C2133B0>, model_name='llama-3.1-8b-instant', model_kwargs={}, groq_api_key=SecretStr('**********'))

- Our plan was whenever user writes the query it should probably go to the LLM model this LLM model should create my Cypher Query, and then further by using this cypher query we should be able to query from my graph database and get the output and along with the output i should be able to get back the response.

In [31]:
from langchain.chains import GraphCypherQAChain
chain = GraphCypherQAChain.from_llm(graph=graph, llm=llm, allow_dangerous_requests=True, cypher_validation=True, verbose=True)
## Bcoz of verbose = True i will be able to see that how the conversation is happening.
chain

GraphCypherQAChain(verbose=True, graph=<langchain_community.graphs.neo4j_graph.Neo4jGraph object at 0x000002484D152CF0>, cypher_generation_chain=LLMChain(verbose=False, prompt=PromptTemplate(input_variables=['question', 'schema'], input_types={}, partial_variables={}, template='Task:Generate Cypher statement to query a graph database.\nInstructions:\nUse only the provided relationship types and properties in the schema.\nDo not use any other relationship types or properties that are not provided.\nSchema:\n{schema}\nNote: Do not include any explanations or apologies in your responses.\nDo not respond to any questions that might ask anything else than for you to construct a Cypher statement.\nDo not include any text except the generated Cypher statement.\n\nThe question is:\n{question}'), llm=ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x000002486C21B8C0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000002486C2133B0>, model_name=

- Prompt template over here by default, in this GraphCypherQAChain i do not have to seperately specify my prompt template, bcoz if u go forward there is a prompt template and it is internally using this llm chain. As per this prompt template i need to pass question and schema.
- With GraphCypherQAChain it has a default prompt generated along with this.

In [33]:
response = chain.invoke({"query":"Who was the director of the movie Casino"})
response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:DIRECTED]->(m:Movie {title: "Casino"})-[:IN_GENRE]->(g:Genre) RETURN p.name AS director[0m
Full Context:
[32;1m[1;3m[{'director': 'Martin Scorsese'}, {'director': 'Martin Scorsese'}][0m

[1m> Finished chain.[0m


{'query': 'Who was the director of the movie Casino',
 'result': 'Martin Scorsese was the director of the movie Casino.'}

- Query: MATCH (m:Movie {title:"Casino"})<-[:DIRECTED]-(p:Person) RETURN p.name

In [None]:
response = chain.invoke({"query":"Who were the actors of the movie Casino"})
response
## Note: Along with the output it is being able to generate the full context.



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: "Casino"})<-[:ACTED_IN]-(p:Person) RETURN p.name[0m
Full Context:
[32;1m[1;3m[{'p.name': 'Robert De Niro'}, {'p.name': 'Joe Pesci'}, {'p.name': 'Sharon Stone'}, {'p.name': 'James Woods'}][0m

[1m> Finished chain.[0m


{'query': 'Who were the actors of the movie Casino',
 'result': 'Robert De Niro, Joe Pesci, Sharon Stone, James Woods were actors of the movie Casino.'}

In [43]:
response = chain.invoke({"query":"How many artists are there?"})
response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:DIRECTED|ACTED_IN]->(m:Movie) RETURN COUNT(DISTINCT p) AS artists[0m
Full Context:
[32;1m[1;3m[{'artists': 1240}][0m

[1m> Finished chain.[0m


{'query': 'How many artists are there?', 'result': 'There are 1240 artists.'}

In [44]:
response = chain.invoke({"query":"How many movies has Tom Hanks acted in?"})
response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE p.name = 'Tom Hanks' RETURN COUNT(m)[0m
Full Context:
[32;1m[1;3m[{'COUNT(m)': 3}][0m

[1m> Finished chain.[0m


{'query': 'How many movies has Tom Hanks acted in?',
 'result': "I don't know the answer."}

## Prompting Statergies GraphDB With LLM
- Earlier with the help of LLM's we were creating our entire Cypher Queries and based on that we were executing it and retrieving the results from the Graph Database.
- There may be scenarios where the LLM may not perform well w.r.t different different complex kind of queries so it is better that we try to improve this graph database query generation mechanism.
- Now here what we will do is that we will go with some proper prompting statergies which will try to improve the Graph Database Query Generation.
- We will largely focus on methods for getting the relevant database specific information in ur prompt.