https://python.langchain.com/docs/use_cases/graph/quickstart/

langchain-openai==0.0.5, openai==1.21.2, langchain==0.1.16

**Start the Movie graph vectorDB in neo4j Desktop**

In [12]:
from dotenv import load_dotenv
import os
from langchain.chat_models import AzureChatOpenAI
# from langchain.schema import SystemMessage, HumanMessage
from langchain_community.graphs import Neo4jGraph
load_dotenv()
# Warning control
import warnings
warnings.filterwarnings("ignore")

**Load OpenAI**

In [13]:
model_name = "gpt-35-turbo-1106"
# model_name = "gpt-35-turbo"
azure_openai_api_key = os.environ["OPENAI_API_KEY"]
azure_openai_endpoint = os.environ["OPENAI_API_BASE"]

**Load the LLM (e.g: GPT 3.5)**

In [14]:
llm = AzureChatOpenAI(
    openai_api_version=os.getenv("OPENAI_API_VERSION"),
    azure_deployment=model_name,
    model_name=model_name,
    temperature=0.0)

**Add Neo4j credentials (These information need to be kept secret)**

In [15]:
NEO4J_URI = "bolt://localhost:7687"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = "12345678"
NEO4J_DATABASE = 'neo4j'

In [16]:

graph = Neo4jGraph(url=NEO4J_URI, username=NEO4J_USERNAME, password=NEO4J_PASSWORD, database=NEO4J_DATABASE)

**Print the graph database schema**

In [17]:
graph.refresh_schema()
print(graph.schema)

Node properties are the following:
Movie {imdbRating: FLOAT, taglineEmbedding: LIST, tagline: STRING, id: STRING, released: DATE, title: STRING},Person {name: STRING},Genre {name: STRING},Location {name: STRING},SimilarMovie {name: STRING}
Relationship properties are the following:

The relationships are the following:
(:Movie)-[:IN_GENRE]->(:Genre),(:Movie)-[:WAS_TAKEN_IN]->(:Location),(:Movie)-[:IS_SIMILAR_TO]->(:SimilarMovie),(:Person)-[:DIRECTED]->(:Movie),(:Person)-[:ACTED_IN]->(:Movie)


Questions

In [18]:
q_one = "What was the cast of the Casino?"
q_two = "What are the most common genres for movies released in 1995?"
q_three = "What are the similar movies to the ones that Tom Hanks acted in?"

### **Chain**

**`Simple Agent (a)`:**

In [19]:
from langchain.chains import GraphCypherQAChain
chain = GraphCypherQAChain.from_llm(graph=graph, llm=llm, verbose=True)

In [20]:
response = chain.invoke({"query": q_one})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (:Movie {title: "Casino"})<-[:ACTED_IN]-(actor:Person)
RETURN actor.name[0m
Full Context:
[32;1m[1;3m[{'actor.name': 'James Woods'}, {'actor.name': 'Robert De Niro'}, {'actor.name': 'Sharon Stone'}, {'actor.name': 'Joe Pesci'}][0m

[1m> Finished chain.[0m
{'query': 'What was the cast of the Casino?', 'result': 'The cast of Casino included James Woods, Robert De Niro, Sharon Stone, and Joe Pesci.'}

LLM response: The cast of Casino included James Woods, Robert De Niro, Sharon Stone, and Joe Pesci.


In [21]:
response = chain.invoke({"query": q_two})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie)
WHERE m.released = date("1995-01-01")
MATCH (m)-[:IN_GENRE]->(g:Genre)
RETURN g.name, COUNT(*) AS count
ORDER BY count DESC
LIMIT 5;[0m
Full Context:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m
{'query': 'What are the most common genres for movies released in 1995?', 'result': "I'm sorry, I don't have the information to answer that question."}

LLM response: I'm sorry, I don't have the information to answer that question.


In [22]:
response = chain.invoke({"query": q_three})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (:Person {name: "Tom Hanks"})-[:ACTED_IN]->(:Movie)-[:IS_SIMILAR_TO]->(similar:SimilarMovie)
RETURN similar.name;[0m
Full Context:
[32;1m[1;3m[{'similar.name': 'Finding Nemo'}][0m

[1m> Finished chain.[0m
{'query': 'What are the similar movies to the ones that Tom Hanks acted in?', 'result': "I don't know the answer."}

LLM response: I don't know the answer.


**`Simple Agent (b):`**

**Validating relationship direction**

LLMs can struggle with relationship directions in generated Cypher statement. Since the graph schema is predefined, we can validate and optionally correct relationship directions in the generated Cypher statements by using the validate_cypher parameter.

In [23]:
chain = GraphCypherQAChain.from_llm(
    graph=graph, llm=llm, verbose=True, validate_cypher=True
)

In [24]:
response = chain.invoke({"query": q_one})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (:Movie {title: "Casino"})<-[:ACTED_IN]-(actor:Person)
RETURN actor.name[0m
Full Context:
[32;1m[1;3m[{'actor.name': 'James Woods'}, {'actor.name': 'Robert De Niro'}, {'actor.name': 'Sharon Stone'}, {'actor.name': 'Joe Pesci'}][0m

[1m> Finished chain.[0m
{'query': 'What was the cast of the Casino?', 'result': 'The cast of Casino included James Woods, Robert De Niro, Sharon Stone, and Joe Pesci.'}

LLM response: The cast of Casino included James Woods, Robert De Niro, Sharon Stone, and Joe Pesci.


In [25]:
response = chain.invoke({"query": q_two})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie)
WHERE m.released = date("1995-01-01")
MATCH (m)-[:IN_GENRE]->(g:Genre)
RETURN g.name, COUNT(*) AS count
ORDER BY count DESC
LIMIT 5;[0m
Full Context:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m
{'query': 'What are the most common genres for movies released in 1995?', 'result': "I don't know the answer."}

LLM response: I don't know the answer.


In [26]:
response = chain.invoke({"query": q_three})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (:Person {name: "Tom Hanks"})-[:ACTED_IN]->(:Movie)-[:IS_SIMILAR_TO]->(similar:SimilarMovie)
RETURN similar.name;[0m
Full Context:
[32;1m[1;3m[{'similar.name': 'Finding Nemo'}][0m

[1m> Finished chain.[0m
{'query': 'What are the similar movies to the ones that Tom Hanks acted in?', 'result': "I don't know the answer."}

LLM response: I don't know the answer.


----------------------------------------

**`Improved Agents`: Contains 4 steps**
1. Detecting entities in the user input
2. Match entities to database.
3. Define a custom Cypher prompt that takes the entity mapping information along with the schema and the user question to construct a Cypher statement.
4. Generating answers based on database results

### **Strategies to improve graph database query generation by mapping values from user inputs to database**

When using the built-in graph chains, the LLM is aware of the graph schema, but has no information about the values of properties stored in the database. Therefore, we can introduce a new step in graph database QA system to accurately map values.

**Detecting entities in the user input**

We have to extract the types of entities/values we want to map to a graph database. In this example, we are dealing with a movie graph, so we can map movies and people to the database.

In [27]:
from typing import List

from langchain.chains.openai_functions import create_structured_output_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field

class Entities(BaseModel):
    """Identifying information about entities."""

    names: List[str] = Field(
        ...,
        description="All the person or movies appearing in the text",
    )


prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are extracting person, movies, and years from the text.",
        ),
        (
            "human",
            "Use the given format to extract information from the following "
            "input: {question}",
        ),
    ]
)


entity_chain = create_structured_output_chain(Entities, llm, prompt)

In [28]:
entities_q_two = entity_chain.invoke({"question": q_two})
print(entities_q_two)
entities_q_three = entity_chain.invoke({"question": q_three})
print(entities_q_three)

{'question': 'What are the most common genres for movies released in 1995?', 'function': Entities(names=['1995'])}
{'question': 'What are the similar movies to the ones that Tom Hanks acted in?', 'function': Entities(names=['Tom Hanks'])}


**Utilizing a simple CONTAINS clause to match entities to database. In practice, you might want to use a fuzzy search or a fulltext index to allow for minor misspellings.**

In [32]:
match_query = """MATCH (p:Person|Movie)
WHERE p.name CONTAINS $value OR p.title CONTAINS $value
RETURN coalesce(p.name, p.title) AS result, labels(p)[0] AS type
LIMIT 1
"""

def map_to_database(values)->str:
    """
    Maps the values to entities in the database and returns the mapping information.

    Args:
        values (list): A list of values to map to entities in the database.

    Returns:
        str: A string containing the mapping information of each value to entities in the 
    """
    result = ""
    for entity in values.names:
        response = graph.query(match_query, {"value": entity})
        try:
            result += f"{entity} maps to {response[0]['result']} {response[0]['type']} in database\n" # Query the database to find the mapping for the entity
        except IndexError:
            pass
    return result

In [34]:
print("2:", map_to_database(entities_q_two["function"]))
print("3:", map_to_database(entities_q_three["function"]))

2: 
3: Tom Hanks maps to Tom Hanks Person in database



**Custom Cypher generating chain**

We need to define a custom Cypher prompt that takes the entity mapping information along with the schema and the user question to construct a Cypher statement. We will be using the LangChain expression language to accomplish that.

In [35]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Generate Cypher statement based on natural language input
cypher_template = """Based on the Neo4j graph schema below, write a Cypher query that would answer the user's question:
{schema}
Entities in the question map to the following database values:
{entities_list}
Question: {question}
Cypher query:"""  # noqa: E501

cypher_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Given an input question, convert it to a Cypher query. No pre-amble.",
        ),
        ("human", cypher_template),
    ]
)

cypher_response = (
    RunnablePassthrough.assign(names=entity_chain)
    | RunnablePassthrough.assign(
        entities_list=lambda x: map_to_database(x["names"]["function"]),
        schema=lambda _: graph.get_schema,
    )
    | cypher_prompt
    | llm.bind(stop=["\nCypherResult:"])
    | StrOutputParser()
)

In [36]:
cypher_q_three = cypher_response.invoke({"question": entities_q_three})
print(cypher_q_three)

MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)-[:IS_SIMILAR_TO]->(s:SimilarMovie)
RETURN s.name


**Generating answers based on database results**

Now that we have a chain that generates the Cypher statement, we need to execute the Cypher statement against the database and send the database results back to an LLM to generate the final answer. Again, we will be using LCEL

In [37]:
from langchain.chains.graph_qa.cypher_utils import CypherQueryCorrector, Schema

# Cypher validation tool for relationship directions
corrector_schema = [
    Schema(el["start"], el["type"], el["end"])
    for el in graph.structured_schema.get("relationships")
]
cypher_validation = CypherQueryCorrector(corrector_schema)

# Generate natural language response based on database results
response_template = """Based on the the question, Cypher query, and Cypher response, write a natural language response:
Question: {question}
Cypher query: {query}
Cypher Response: {response}"""

response_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Given an input question and Cypher response, convert it to a natural"
            " language answer. No pre-amble.",
        ),
        ("human", response_template),
    ]
)

chain = (
    RunnablePassthrough.assign(query=cypher_response)
    | RunnablePassthrough.assign(
        response=lambda x: graph.query(cypher_validation(x["query"])),
    )
    | response_prompt
    | llm
    | StrOutputParser()
)

In [38]:
chain.invoke({"question": q_one})

'The cast of the movie "Casino" includes James Woods, Robert De Niro, Sharon Stone, and Joe Pesci.'

In [39]:
chain.invoke({"question": q_two})

'The most common genres for movies released in 1995 are Comedy with 10 movies, Adventure with 6 movies, Action and Romance with 5 movies each, and Children with 4 movies.'

In [40]:
chain.invoke({"question": q_three})

'Similar movies to the ones that Tom Hanks acted in include "Toy Story" and "Finding Nemo".'

In [41]:
chain.invoke({"question": "How many of the movies have the Action genre?"})

'There are 5 movies in the database that have the Action genre.'

Exercise:

In [56]:
chain.invoke({"question": "From the movies that were taken in United States, how many had the comedy genre?"})

ValueError: Generated Cypher Statement is not valid
{code: Neo.ClientError.Statement.SyntaxError} {message: Unexpected end of input: expected CYPHER, EXPLAIN, PROFILE or Query (line 0, column 0 (offset: 1))
""
 ^}