## **LLM-Based QA System**

The following is the implementation of the task 1 and task 2. To execute it you should already have the pandas, openai, neo4j, langchain, langchain-community, langchain-openai libraries installed on your system and also have a valid OpenAI API key.

The credentials of the KG (Movie graph) that is used are provided below in the code. In order to create this KG, i visited the Neo4j sandbox and created a new project based on the Movie graph.

### **Task 1**

In [None]:
# Importing the required packages
from langchain_openai import ChatOpenAI
from langchain.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain.prompts import PromptTemplate
import pandas as pd

Connecting to ChatGPT large language model and also establishing a connection with the neo4j database

In [None]:
# Setting the openai key
OPENAI_API_KEY = "Here put you OpenAI API key"

# Setting up the ChatGPT llm
llm_openai = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY, temperature = 0
)

# Connecting to the neo4j database
movie_graph = Neo4jGraph(
    url="bolt://34.230.42.201",
    username="neo4j",
    password="produce-books-scab",
)

Creating a promt template to generate Cypher queries from questions and a Cypher QA chain to link the llm, graph and prompt to get the desired results.

In [3]:
# Setting up the prompt template for generating Cypher queries from questions
CYPHER_GENERATION_TEMPLATE = """
Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Schema:
{schema}

Note: Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
Do not include any text except the generated Cypher statement.

The question is:
{question}"""

# Setting up the prompt
cypher_generation_prompt = PromptTemplate(
    template=CYPHER_GENERATION_TEMPLATE,
    input_variables=["schema", "question"],
)

# Setting up a Cypher QA chain
cypher_chain = GraphCypherQAChain.from_llm(
    llm_openai,
    graph=movie_graph,
    cypher_prompt=cypher_generation_prompt,
    verbose=True,
    allow_dangerous_requests = True
)

Defining a function to transform natural language questions into Cypher queries and get an answer by executing the queries

In [None]:
def answer_question_using_cypher(cypher_chain_parameter, question):
  """
  This function used to transform natural language questions into Cypher queries 
  and print an answer by executing the queries.
  
  :param cypher_chain_parameter: The cypher chain to use
  :param question: The natural language question
  """  
  
  # Exception handler
  try:

    # Getting and printing the response of the Cypher chain
    answer = cypher_chain_parameter.run(question)
    print(f"Question: {question}\nAnswer: ", answer)

  
  except Exception as e:
    
    # Printing the excpetion raised
    print("Problem answering the question: ",e)

Reading the questions

In [6]:
# Reading the evaluation dataset
questions = pd.read_excel("evaluation_dataset.xlsx")

Transforming them into Cypher queries based on the intial promt

In [6]:
# Iterating through the dataset and finding the answers to the questions
for i in range(len(questions)):

    answer_question_using_cypher(cypher_chain, questions["Question"][i])

  answer = cypher_chain_parameter.run(question)




[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.title = "Snow Falling on Cedars"
RETURN m.released[0m
Full Context:
[32;1m[1;3m[{'m.released': 1999}, {'m.released': 1999}, {'m.released': 1999}, {'m.released': 1999}][0m

[1m> Finished chain.[0m
Question: What is the release date for Snow Falling on Cedars?
Answer:  The release date for Snow Falling on Cedars is 1999.


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (:Person)-[:ACTED_IN]->(m:Movie {title: "Speed Racer"}) RETURN m.tagline[0m
Full Context:
[32;1m[1;3m[{'m.tagline': 'Speed has no limits'}, {'m.tagline': 'Speed has no limits'}, {'m.tagline': 'Speed has no limits'}, {'m.tagline': 'Speed has no limits'}, {'m.tagline': 'Speed has no limits'}, {'m.tagline': 'Speed has no limits'}, {'m.tagline': 'Speed has no limits'}][0m

[1m> Finished chain.[0m
Question: What is the slogan of speed racer?
Answ

### **Task 2**

Refactoring the Cypher generation promt

In [None]:
# Setting up the prompt template for generating Cypher queries from questions
CYPHER_GENERATION_TEMPLATE = f"""
You are a system that is expert in generating optimized Cypher queries for a Neo4j graph. The schema of the graph is:

{movie_graph.schema.replace("{", "{{").replace("}", "}}")}

Important Instructions

- Use only the provided relationship types and properties in the schema.
- Do not use any other relationship types or properties that are not provided.
- Use only the provided relationships directions in the schema.
- Return only the Cypher query. Do not include any extra text or explanations.
- Use a relationship only when it is needed to answer a question.
- Use the 'DISTINCT' keyword to keep only unique values. Keep duplicates only when the question explicitly asks for them.
- Assign a variable to a relationship before accessing its properties (e.g. '[p:Person]-[r:REVIEWED]->[m:Movie] ... RETURN r.rating, r.summary', '[p:Person]-[r:ACTED_IN]->[m:Movie] ... RETURN r.roles').
- Return all entities when there are multiple with the same highest or lowest value. Avoid using LIMIT 1 in such cases.
- Always use parentheses to group conditions and ensure the precedence of operations and operators, especially with logical operators (AND/OR).
- For finding the followers of a person use the incoming FOLLOWS relationships, otherwise use the outgoing FOLLOWS relationships.


Below are some examples of Cypher queries based on specific questions:

Example 1:
Question: What are the names of all persons?
Cypher Query: MATCH (p:Person) RETURN p.name

Example 2:
Question: What are the roles in movies?
Cypher Query: MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN r.roles

Example 3:
Question: Which actors have been in the most movies?
Cypher Query: MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WITH p, COUNT(m) as total_movies WITH MAX(total_movies) as max_movies MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WITH p, COUNT(m) as total_movies, max_movies WHERE total_movies = max_movies RETURN p.name

Example 4:
Questions: Who is the oldest actor?
Cypher Query: MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WITH MAX(p.born) as max_born MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WITH p, max_born WHERE p.born = max_born RETURN p.name

Example 5:
Question: Which persons were born between 1950 and 2000?
Cypher Query: MATCH (p:Person) WHERE (p.born > 1950 AND p.born < 2000) RETURN p.name

Example 6:
Question: Which persons follows only one person?
Cypher Query: MATCH (p:Person) WHERE size([(p)-[:FOLLOWS]->(n:Person) | n]) = 1 RETURN p.name

Based on the examples and the instructions above, generate an optimized Cypher query for the following question:

Question: {{question}}
Cypher Query:
"""

# Setting up a new prompt
cypher_generation_prompt = PromptTemplate(
    template=CYPHER_GENERATION_TEMPLATE,
    input_variables=["question"],
)

# Setting up a new Cypher QA chain
cypher_chain_2 = GraphCypherQAChain.from_llm(
    llm_openai,
    graph=movie_graph,
    cypher_prompt=cypher_generation_prompt,
    verbose=True,
    allow_dangerous_requests = True
)

Transforming the questions to Cypher queries based on the new promt

In [8]:
# Iterating through the dataset and finding the answers to the questions
for i in range(len(questions)):

    answer_question_using_cypher(cypher_chain_2, questions["Question"][i])

  answer = cypher_chain_parameter.run(question)




[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: "Snow Falling on Cedars"}) RETURN m.released[0m
Full Context:
[32;1m[1;3m[{'m.released': 1999}][0m

[1m> Finished chain.[0m
Question: What is the release date for Snow Falling on Cedars?
Answer:  The release date for Snow Falling on Cedars is 1999.


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: "Speed Racer"}) RETURN m.tagline[0m
Full Context:
[32;1m[1;3m[{'m.tagline': 'Speed has no limits'}][0m

[1m> Finished chain.[0m
Question: What is the slogan of speed racer?
Answer:  Speed has no limits.


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person {name: "Al Pacino"}) RETURN p.born[0m
Full Context:
[32;1m[1;3m[{'p.born': 1940}][0m

[1m> Finished chain.[0m
Question: When is Al Pacino's birthday?
Answer:  Al Pacino was born in 1940.


[1m> Entering new GraphCyp