## Building Q&A systems of graph databases

## Architecture
At a high-level, the steps of most graph chains are:

1. Convert question to a graph database query: Model converts user input to a graph database query (e.g. Cypher).
2. Execute graph database query: Execute the graph database query.
3. Answer the question: Model responds to user input using the query results.
  

![Alt text](https://js.langchain.com/v0.2/assets/images/graph_usecase-34d891523e6284bb6230b38c5f8392e5.png)

## Set-up 
Installing the required libraries

In [None]:
! pip install langchain langchain-community
! pip install langchain-openai tiktoken
! pip install neo4j

We default to OpenAI models in this guide.

In [2]:
import os
from dotenv import load_dotenv
load_dotenv()

# Get the OpenAI API key
openai_api_key = os.getenv('OPENAI_API_KEY')

Next, we need to define Neo4j credentials.

In [3]:
import os
os.environ["NEO4J_URI"] = "bolt://localhost:7687"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "Yourpassword"

In [4]:
import pandas as pd
data=pd.read_csv('https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv')
data.head(10)

Unnamed: 0,movieId,released,title,actors,director,genres,imdbRating
0,1,1995-11-22,Toy Story,Jim Varney|Tim Allen|Tom Hanks|Don Rickles,John Lasseter,Adventure|Animation|Children|Comedy|Fantasy,8.3
1,2,1995-12-15,Jumanji,Robin Williams|Bradley Pierce|Kirsten Dunst|Jo...,Joe Johnston,Adventure|Children|Fantasy,6.9
2,3,1995-12-22,Grumpier Old Men,Walter Matthau|Ann-Margret|Jack Lemmon|Sophia ...,Howard Deutch,Comedy|Romance,6.6
3,4,1995-12-22,Waiting to Exhale,Whitney Houston|Lela Rochon|Angela Bassett|Lor...,Forest Whitaker,Romance|Drama|Comedy,5.6
4,5,1995-12-08,Father of the Bride Part II,Steve Martin|Kimberly Williams-Paisley|Diane K...,Charles Shyer,Comedy,5.9
5,6,1995-12-15,Heat,Al Pacino|Robert De Niro|Val Kilmer|Jon Voight,Michael Mann,Action|Crime|Thriller,8.2
6,7,1995-12-15,Sabrina,Julia Ormond|Harrison Ford|Nancy Marchand|Greg...,Sydney Pollack,Comedy|Romance,6.2
7,8,1995-12-22,Tom and Huck,Jonathan Taylor Thomas|Brad Renfro|Eric Schwei...,Peter Hewitt,Children|Adventure,5.6
8,9,1995-12-22,Sudden Death,Jean-Claude Van Damme|Powers Boothe|Raymond J....,Peter Hyams,Action,5.7
9,10,1995-11-17,GoldenEye,Pierce Brosnan|Famke Janssen|Sean Bean|Izabell...,Martin Campbell,Adventure|Action|Thriller,7.2


The below example will create a connection with a Neo4j database and will populate it with example data about movies and their actors.

In [5]:
from langchain_community.graphs import Neo4jGraph

graph = Neo4jGraph()

# Import movie information

movies_query = """
LOAD CSV WITH HEADERS FROM 
'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'
AS row
MERGE (m:Movie {id:row.movieId})
SET m.released = date(row.released),
    m.title = row.title,
    m.imdbRating = toFloat(row.imdbRating)
FOREACH (director in split(row.director, '|') | 
    MERGE (p:Person {name:trim(director)})
    MERGE (p)-[:DIRECTED]->(m))
FOREACH (actor in split(row.actors, '|') | 
    MERGE (p:Person {name:trim(actor)})
    MERGE (p)-[:ACTED_IN]->(m))
FOREACH (genre in split(row.genres, '|') | 
    MERGE (g:Genre {name:trim(genre)})
    MERGE (m)-[:IN_GENRE]->(g))
"""

graph.query(movies_query)

[]

## Graph schema
In order for an LLM to be able to generate a Cypher statement, it needs information about the graph schema. When you instantiate a graph object, it retrieves the information about the graph schema. If you later make any changes to the graph, you can run the refresh_schema method to refresh the schema information.

In [6]:
graph.refresh_schema()
print(graph.schema)

Node properties:
Movie {imdbRating: FLOAT, id: STRING, released: DATE, title: STRING}
Person {name: STRING}
Genre {name: STRING}
Relationship properties:

The relationships:
(:Movie)-[:IN_GENRE]->(:Genre)
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:ACTED_IN]->(:Movie)


## Chain
Let's use a simple chain that takes a question, turns it into a Cypher query, executes the query, and uses the result to answer the original question.

![Alt text](https://js.langchain.com/v0.2/assets/images/graph_chain-6379941793e0fa985e51e4bda0329403.webp)


LangChain comes with a built-in chain for this workflow that is designed to work with Neo4j: GraphCypherQAChain

In [7]:
from langchain.chains import GraphCypherQAChain
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
chain = GraphCypherQAChain.from_llm(graph=graph, llm=llm, verbose=True)
response = chain.invoke({"query": "What was the cast of the Casino?"})
response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (:Movie {title: "Casino"})<-[:ACTED_IN]-(actor:Person)
RETURN actor.name[0m
Full Context:
[32;1m[1;3m[{'actor.name': 'Sharon Stone'}, {'actor.name': 'Joe Pesci'}, {'actor.name': 'James Woods'}, {'actor.name': 'Robert De Niro'}][0m

[1m> Finished chain.[0m


{'query': 'What was the cast of the Casino?',
 'result': 'The cast of Casino included Sharon Stone, Joe Pesci, James Woods, and Robert De Niro.'}

## Validating relationship direction
LLMs can struggle with relationship directions in generated Cypher statement. Since the graph schema is predefined, we can validate and optionally correct relationship directions in the generated Cypher statements by using the validate_cypher parameter.

In [8]:
chain = GraphCypherQAChain.from_llm(
    graph=graph, llm=llm, verbose=True, validate_cypher=True
)
response = chain.invoke({"query": "What was the cast of Toy Story?"})
response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: "Toy Story"})
RETURN p.name[0m
Full Context:
[32;1m[1;3m[{'p.name': 'Jim Varney'}, {'p.name': 'Tim Allen'}, {'p.name': 'Tom Hanks'}, {'p.name': 'Don Rickles'}][0m

[1m> Finished chain.[0m


{'query': 'What was the cast of Toy Story?',
 'result': 'Jim Varney, Tim Allen, Tom Hanks, Don Rickles were part of the cast of Toy Story.'}

In [9]:
chain = GraphCypherQAChain.from_llm(
    graph=graph, llm=llm, verbose=True, validate_cypher=True
)
response = chain.invoke({"query": "What was the Genre of Toy Story?"})
response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (:Movie {title: "Toy Story"})-[:IN_GENRE]->(g:Genre)
RETURN g.name;[0m
Full Context:
[32;1m[1;3m[{'g.name': 'Adventure'}, {'g.name': 'Animation'}, {'g.name': 'Children'}, {'g.name': 'Fantasy'}, {'g.name': 'Comedy'}][0m

[1m> Finished chain.[0m


{'query': 'What was the Genre of Toy Story?',
 'result': 'The genre of Toy Story was Adventure, Animation, Children, Fantasy, and Comedy.'}

In [None]:
### End of the string

In [None]:
#end of the Document