# Neo4j & Langchain Graph Retrieval Augmented Generation 

### Objective:

This notebooks ilustrates a way to given a certain text,generate a graph and then save into Neo4j. This should not be a productive approach, Neo4j should be hydratated through cypher queries

In [1]:
!pip install graphdatascience retry==0.9.2 langchain neo4j openai -q 


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


### Env configuration

In [2]:
from neo4j import GraphDatabase
import os
import json

In [3]:
# Neo4j configuration & constraints
neo4j_url = os.getenv("NEO4J_CONNECTION_URL")
neo4j_user = os.getenv("NEO4J_USER")
neo4j_password = os.getenv("NEO4J_PASSWORD")
gds = GraphDatabase.driver(neo4j_url, auth=(neo4j_user, neo4j_password))
gds

<neo4j._sync.driver.Neo4jDriver at 0x10842e850>

### Prompts for LLM

In [4]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import HumanMessagePromptTemplate
from langchain_core.messages import SystemMessage
from string import Template

In [5]:
# Function to call the Mistral7b Ollama 
from langchain.callbacks.manager import CallbackManager
from langchain.chat_models import ChatOllama
from langchain.prompts import ChatPromptTemplate
from timeit import default_timer as timer
from langchain.chat_models import ChatOpenAI

chat_model = ChatOpenAI(temperature=0)

def get_chat_template(system_msg):
    chat_template = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content=(
               system_msg
            )
        ),
        HumanMessagePromptTemplate.from_template("{text}"),
    ]
    )
    return chat_template

In [6]:
products_prompt_template = """
From the Project Brief below, extract the following Entities & relationships described in the mentioned format 
 0.⁠ ⁠ALWAYS FINISH THE OUTPUT. Never send partial responses
 1.⁠ ⁠First, look for these Entity types in the text and generate as comma-separated format similar to entity type.
   ⁠ id ⁠ property of each entity must be alphanumeric and must be unique among the entities. You will be referring this property to define the relationship between entities. Do not create new entity types that aren't mentioned below. Document must be summarized and stored inside Project entity under ⁠ summary ⁠ property. You will have to generate as many entities as needed as per the types below:
    Entity Types:
    label:'Product',id:string,name:string //Product mentioned in the brief; ⁠ id ⁠ property is the name of the product, in lowercase, with no capital letters, special characters, spaces or hyphens; The Product is the general product which then will have in label 'type' it's different types
    label:'Gender',id:string,types:string //Gender Entity; ⁠ id ⁠ property is the gender of the product, in camel-case. Identify as many of the gender used as possible
    label:'Color',id:string,types:string //The available colors for the product; ⁠ id ⁠ property is the name of the Color or Colors, in camel-case;
    label:'Size',id:string,types:string // The available sizes for the specific porduct; ⁠ id ⁠ property are the different sizes
    label:'ProductType', id:string,name:string;summary:string //The specific product mentioned in the brief; ⁠ id ⁠ property is the name of the product, in lowercase, with no capital letters, special characters,space or hyphens; This is the type specific of a general product
    
 2.⁠ ⁠Next generate each relationships as triples of head, relationship and tail. To refer the head and tail entity, use their respective ⁠ id ⁠ property. Relationship property should be mentioned within brackets as comma-separated. They should follow these relationship types below. You will have to generate as many relationships as needed as defined below:
    Relationship types:
    Product|PROVIDES|Producttype 
    ProductType|WITH|Color
    ProductType|FOR|Gender
    ProductType|IN|Size

 3.⁠ ⁠The output must look like :
{
    "entities": [{"label":"Product","id":string,"name":string],
    "relationships": ["Sweater|PROVIDES|relaxedturtleneck"] (an array of strings)
}

Case Sheet:
$ctext
"""

### Process & Helpers functions

In [7]:
def process(chunk_prompt, system_msg):
    chat_template=get_chat_template(system_msg)
    result = chat_model(chat_template.format_messages(text=chunk_prompt))
    return result.content

# Function to take a series of chunks and a prompt template, and return a json-object of all the entities and relationships
def extract_entities_relationships(pages, prompt_template):
    start = timer()
    system_msg = "You are a helpful IT-project and account management expert who extracts information from documents."
    print(f"Running pipeline for {len(pages)} pages")
    results = []
    for document in pages:
        page_number=document.metadata.get('page')
        print(f"Extracting entities and relationships for page number: {page_number}")
        try:
           prompt = Template(prompt_template).substitute(ctext=document.page_content)
           result = process(prompt, system_msg=system_msg)
           results.append(json.loads(result))
        except Exception as e:
            print(f"Error processing  page number {page_number}")
    end = timer()
    print(f"Pipeline completed in {end-start} seconds")
    return results


# Function to take a json-object of entitites and relationships and generate cypher query for creating those entities
def generate_cypher(json_obj):
    e_statements = []
    r_statements = []

    e_label_map = {}

    # loop through our json object
    for i, obj in enumerate(json_obj):
        print(f"Generating cypher for file {i+1} of {len(json_obj)}")
        for entity in obj["entities"]:
            label = entity["label"]
            id = entity["id"]
            id = id.replace("-", "").replace("_", "")
            properties = {k: v for k, v in entity.items() if k not in ["label", "id"]}
            cypher = f'MERGE (n:{label} {{id: "{id}"}})'
            if properties:
                props_str = ", ".join(
                    [f'n.{key} = "{val}"' for key, val in properties.items()]
                )
                cypher += f" ON CREATE SET {props_str}"
            e_statements.append(cypher)
            e_label_map[id] = label
        for rs in obj["relationships"]:
            if(rs is list or isinstance(rs,str)):
                src_id, rs_type, tgt_id = rs.split("|")
                src_id = src_id.replace("-", "").replace("_", "")
                tgt_id = tgt_id.replace("-", "").replace("_", "")
                if(src_id in e_label_map and tgt_id in e_label_map):
                    src_label = e_label_map[src_id]
                    tgt_label = e_label_map[tgt_id]
                    cypher = f'MERGE (a:{src_label} {{id: "{src_id}"}}) MERGE (b:{tgt_label} {{id: "{tgt_id}"}}) MERGE (a)-[:{rs_type}]->(b)'
                    r_statements.append(cypher)
                else:
                    r_statements.append(cypher)
            else:
                print("Wrong Generated data, Try Again")
                r_statements.append("")

    with open("cyphers.txt", "w") as outfile:
        outfile.write("\n".join(e_statements + r_statements))

    return e_statements + r_statements

# Final function to bring all the steps together
def ingestion_pipeline(pages):
    # Extrating the entites and relationships from each folder, append into one json_object
    entities_relationships = []
    entities_relationships.extend(extract_entities_relationships(pages, products_prompt_template))
    # Generate and execute cypher statements
    cypher_statements = generate_cypher(entities_relationships)
    for i, stmt in enumerate(cypher_statements):
        print(f"Executing cypher statement {i+1} of {len(cypher_statements)}")
        try:
            if(stmt != ""):
                gds.execute_query(stmt)
        except Exception as e:
            with open("failed_statements.txt", "w") as f:
                f.write(f"{stmt} - Exception: {e}\n")

In [8]:
import ipywidgets as widgets
from IPython.display import display
import tempfile
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pydantic import BaseModel

def process_pdf(file_info):
    if not file_info:
        print("No file uploaded")
        return

    # Extract the first item in the tuple which is the file information dictionary
    file_info_dict = file_info[0]

    # Extract the content of the uploaded file
    uploaded_file_content = file_info_dict['content']

    pdf_path = ""
    with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as temp_file:
        temp_file.write(uploaded_file_content)
        pdf_path = temp_file.name

    if pdf_path == "":
        print("File upload error")
        return

    loader = PyPDFLoader(pdf_path)
    documents=loader.load()
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1500,
        chunk_overlap=150,
        length_function=len,
        is_separator_regex=False,
    )
    pages = splitter.split_documents(documents)
    return pages


upload_button = widgets.FileUpload(
    accept='.pdf',  # Accept only .pdf files
    multiple=False  # Allow only one file to be uploaded
)

process_button = widgets.Button(description="Process PDF")


display(upload_button)

FileUpload(value=(), accept='.pdf', description='Upload')

In [9]:
pages=process_pdf(upload_button.value)

In [10]:
ingestion_pipeline(pages)

Running pipeline for 12 pages
Extracting entities and relationships for page number: 0
Extracting entities and relationships for page number: 0
Extracting entities and relationships for page number: 1
Extracting entities and relationships for page number: 1
Extracting entities and relationships for page number: 2
Extracting entities and relationships for page number: 2
Error processing  page number 2
Extracting entities and relationships for page number: 3
Extracting entities and relationships for page number: 3
Error processing  page number 3
Extracting entities and relationships for page number: 4
Extracting entities and relationships for page number: 4
Error processing  page number 4
Extracting entities and relationships for page number: 5
Extracting entities and relationships for page number: 5
Pipeline completed in 80.17167262500152 seconds
[{'entities': [{'label': 'Product', 'id': 'sweaters', 'name': 'Sweaters'}, {'label': 'ProductType', 'id': 'relaxedturtleneck', 'name': 'Relaxe

### Play a bit with your graphs 🚀🚀

In [14]:
from langchain.chains import GraphCypherQAChain
from langchain.graphs import Neo4jGraph
from langchain.chat_models import ChatOllama

graph = Neo4jGraph(url=neo4j_url, username=neo4j_user,password=neo4j_password)
chain = GraphCypherQAChain.from_llm(
    ChatOllama(model="mistral-openorca:latest"), graph=graph, verbose=True,
)

In [16]:
chain.run("""
Which sweaters are provided? 
""")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3m MATCH (n:Product) WHERE n.name CONTAINS "Sweater" AND NOT (n)-[:WITH]->() AND NOT (n)-[:FOR]->() AND NOT (n)-[:IN]->() RETURN n.name[0m
Full Context:
[32;1m[1;3m[{'n.name': 'Sweaters'}, {'n.name': 'Relaxed Turtleneck Sweater'}, {'n.name': 'Drop Tail Down Sweater Jacket'}][0m

[1m> Finished chain.[0m


' There are three types of sweaters provided: Sweaters, Relaxed Turtleneck Sweater, and Drop Tail Down Sweater Jacket.'

In [21]:
chain.run("""
Do you have Short-Sleeve T-Shirt with white?
""")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3m MATCH (p:Product {name:'Short-Sleeve T-Shirt'}),
OPTIONAL MATCH (c:Color {types: 'white'})
WHERE (p)-[:WITH]->(c)
RETURN p,c;[0m


ValueError: Generated Cypher Statement is not valid
{code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input 'OPTIONAL': expected "(", "ALL", "ANY" or "SHORTEST" (line 2, column 1 (offset: 50))
"OPTIONAL MATCH (c:Color {types: 'white'})"
 ^}