Topics: Data Modelling and Search Models
* Langsmith (for inspection and debugging)
* Semantic model extraction (continued)
* Graph QA using GraphCypherQAChain
* Graph QA using Vector Indices

# Chapter 1:  Langsmith

[Documentation](https://docs.smith.langchain.com/)

[Website](https://www.langchain.com/langsmith)

In [8]:
!pip install -qU langsmith


[notice] A new release of pip is available: 24.1.1 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
import os
import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

In [None]:
import os

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = "YourKey"

In [None]:
!pip install -qU langchain-openai

In [None]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Please respond to the user's request only based on the given context."),
    ("user", "Question: {question}\nContext: {context}")
])
model = ChatOpenAI(model="gpt-3.5-turbo")
output_parser = StrOutputParser() # https://www.restack.io/docs/langchain-knowledge-langchain-stroutputparser-guide

chain = prompt | model | output_parser

question = "What are the place names and geopolitical entities mentioned in the context?"
context = "Germany is a country in Europe and its capital is Berlin."
chain.invoke({"question": question, "context": context})

# Chapter 2: Semantic Model Extraction

In [None]:
!pip install -q langchain-community langchain-openai langchain_experimental neo4j

In [None]:
from langchain.graphs import Neo4jGraph

url = "neo4j+s://f02e0524.databases.neo4j.io"
username = "neo4j"
password = "w60PF-SK2gGIlDII6zZMw8XMo67mqIFSrPU54_E3AU4"

graph = Neo4jGraph(
    url=url,
    username=username,
    password=password
)

In [None]:
import getpass
import os

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

In [None]:
# From wikipedia: https://en.wikipedia.org/wiki/M%C3%BCnster
example_text = """
Münster is an independent city (Kreisfreie Stadt)
in North Rhine-Westphalia, Germany. It is in the northern part of the state and is considered to
 be the cultural centre of the Westphalia region. It is also a state district capital. Münster was the
  location of the Anabaptist rebellion during the Protestant Reformation and the site of the signing of the
   Treaty of Westphalia ending the Thirty Years' War in 1648. Today, it is known as the bicycle capital of Germany.
Münster gained the status of a Großstadt (major city) with more than 100,000 inhabitants in 1915.[4]
 As of 2014, there are 300,000[5] people living in the city, with about 61,500 students,[6]
 only some of whom are recorded in the official population statistics as having their primary residence in Münster.
 Münster is a part of the international Euregio region with more than 1,000,000 inhabitants (Enschede, Hengelo, Gronau, Osnabrück).
 Companies offering jobs in Münster include the Institute for Geoinformatics at the University of Münster,
 the Münster University of Applied Sciences, Reedu GmbH, con terra, the Deutsche Bank, IKEA, LIDL, REWE, ALDI and BASF Coatings.
"""

In [None]:
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(temperature=0, model_name="gpt-4-turbo") # https://platform.openai.com/docs/models

llm_transformer = LLMGraphTransformer(llm=llm) # documentation, see https://python.langchain.com/docs/how_to/graph_constructing/

In [None]:
from langchain_core.documents import Document

documents = [Document(page_content=example_text)]
graph_documents = llm_transformer.convert_to_graph_documents(documents)
print(f"Nodes:{graph_documents[0].nodes}")
print(f"Relationships:{graph_documents[0].relationships}")

In [None]:
graph.add_graph_documents(graph_documents)

# Chapter 3: Graph QA using GraphCypherQAChain

In [None]:
!pip install  --quiet langchain langchain-openai langchain-community neo4j


[notice] A new release of pip is available: 24.1.1 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
from langchain.graphs import Neo4jGraph

url = "neo4j+ssc://f02e0524.databases.neo4j.io:7687"
username = "neo4j"
password = ""

graph = Neo4jGraph(
    url=url,
    username=username,
    password=password
)

In [None]:
import os
import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = ""

In [6]:
from langchain.chains import GraphCypherQAChain
from langchain_community.graphs import Neo4jGraph
from langchain_openai import ChatOpenAI
import os

chain = GraphCypherQAChain.from_llm(
    graph=graph,
    cypher_llm=ChatOpenAI(temperature=0, model="gpt-4o-mini"), # gpt-4o-mini	gpt-3.5-turbo
    qa_llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k"),
    verbose=True,
    allow_dangerous_requests=True
)

In [7]:
from langchain.prompts.prompt import PromptTemplate


CYPHER_GENERATION_TEMPLATE = """
Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Schema:
{schema}
Cypher examples:
# Which cities lie left of Münster?
MATCH p=(C:City WHERE C.Name = "Münster") -[T:touches WHERE T.`Rel_Position`in ["western","southwestern","northwestern"]]->(:City) RETURN p

Note: Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
Do not include any text except the generated Cypher statement.

The question is:
{question}"""
CYPHER_GENERATION_PROMPT = PromptTemplate(
    input_variables=["schema", "question"], template=CYPHER_GENERATION_TEMPLATE
)

In [8]:
chain = GraphCypherQAChain.from_llm(
    graph=graph,
    cypher_llm=ChatOpenAI(temperature=0, model="gpt-4o-mini"), # gpt-4o-mini	gpt-3.5-turbo
    qa_llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k"),
    verbose=True,
    allow_dangerous_requests=True,
    cypher_prompt=CYPHER_GENERATION_PROMPT
)

In [10]:
question_1 = "What is the population of Hessen?"
question_2 = "What is the geometry of Rheinland-Pfalz?"
question_3 = "What are the areas of Hessen and Niedersachen. Is the area of Hessen bigger than the area of Niedersachsen"
question_4 = "Is Düsseldorf the state capital of Nordrhein-Westfalen"
question_5 = "Which cities lie in the district of Steinfurt?"
question_6 = "Which cities lie southern, southeastern and southwestern of Münster? The relative position is saved as a property in the touches relation. Also give me every touches relation bewteen those cities."
question_7 = "Which cities lie within Steinfurt? Only return the Names."
question_8 = "Which cities lie right of Siegburg?"
question_9 = "Which is the next bigger City with a population more than 500000 in the area of Bocholt"
question_10 = "Has Köln or Düsseldorf a bigger population?"
question_11 = "In what cardinal direction is havixbeck from Münster?"
question_12 = "Was liegt rechts Bocholt?"

chain.invoke(question_11)



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH p=(C:City WHERE C.Name = "Münster") -[T:touches]->(H:City WHERE H.Name = "Havixbeck") RETURN T.`Rel_Position`[0m
Full Context:
[32;1m[1;3m[{'T.`Rel_Position`': 'western'}][0m

[1m> Finished chain.[0m


{'query': 'In what cardinal direction is havixbeck from Münster?',
 'result': 'Havixbeck is located in the western cardinal direction from Münster.'}

# Chapter 4: GraphQA using Vector Indices

## Indexing

In [None]:
!pip install langchain openai wikipedia tiktoken neo4j langchain_openai langchain_community --quiet

In [None]:
# https://neo4j.com/developer-blog/knowledge-graph-rag-application/
# https://github.com/tomasonjo/blogs/blob/master/llm/devops_rag.ipynb
from langchain.graphs import Neo4jGraph

url = "neo4j+s://9df9a03f.databases.neo4j.io:7687"
username = "neo4j"
password = "MgDR4X6UnRMmXoJ-awLtSZKZkzY43jKpUuqnZKlnqn0"

In [None]:
import os
import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

In [None]:
# create the index

import os
from langchain.vectorstores.neo4j_vector import Neo4jVector
from langchain_openai import OpenAIEmbeddings

vector_index = Neo4jVector.from_existing_graph(
    OpenAIEmbeddings(),
    url=url,
    username=username,
    password=password,
    index_name='D_ID',
    node_label="District",
    text_node_properties= ["Name","Geometry"], #['name', 'description', 'status'], #['name', 'state_capital', 'url'],
    embedding_node_property='embedding',
)

In [None]:
# see the index just created
vector_index.query(
    """SHOW INDEXES
       YIELD name, type, labelsOrTypes, properties, options
       WHERE type = 'VECTOR'
    """
)

## Retrieval

In [None]:
question1 = "How many states in the database?"
question2 = "How many geometries in the the database?"
question3 = "What is the population of Hessen?"
question4 = "What is the area of Hessen?"
question5 = "What is the capital of Hessen?"
question6 = "What is the geometry of Hessen?"
question7 = "What are the geometries of Hessen and Niedersachsen?"
question8 = "What is the url of the geometry of Hessen?"

In [None]:
response = vector_index.similarity_search(question3)
response

In [None]:
response_with_score = vector_index.similarity_search_with_score(question3)
response_with_score

## Generation: Example 1

In [None]:
# using documents as context
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt

In [None]:
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [None]:
chain = prompt | llm

In [None]:
docs = response

chain.invoke({"context": docs, "question": question3})

## Generation: Example 2

In [None]:
# Using a retriever as context
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

retriever = vector_index.as_retriever() # search_kwargs={"k": 1}

graph_chain = ({"context": retriever, "question": RunnablePassthrough()}
                | prompt
                | llm
                | StrOutputParser()
                )

graph_chain.invoke(question3)

## Generation: Example 3

In [None]:
# Using a custom retriever as a context and post-processing of the answer
# https://python.langchain.com/docs/how_to/custom_retriever/
from typing import List
from langchain_core.callbacks import CallbackManagerForRetrieverRun
from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever

class CustomRetriever(BaseRetriever):
    """ Custom retriever to return the scores of the documents as well.
        Then the scores are passed into an custom ranking function to include the spatial similarity
        between the query and the document.
    """

    vector_index: Neo4jVector

    def _get_relevant_documents(
        self, query: str, *, run_manager: CallbackManagerForRetrieverRun
    ) -> List[Document]:
        """Sync implementations for retriever."""

        docs, scores = zip(*self.vector_index.similarity_search_with_score(query))
        for doc, score in zip(docs, scores):
             print("***", doc)
             #new_score = updated_score(score, query, doc)
             doc.page_content = doc.page_content
             doc.metadata["score"] = score
        return docs

def update_scores(docs):

    for doc in docs:
       new_score = doc.metadata["score"] * 10
       doc.page_content = doc.page_content+ "\nScore: " + str(new_score)
       doc.metadata["score"] = new_score
    return docs

In [None]:
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

retriever_r = CustomRetriever(vector_index=vector_index)

graph_chain = ({"context": retriever_r | update_scores, "question": RunnablePassthrough()}
                | prompt
                | llm
                | StrOutputParser()
                )

graph_chain.invoke(question6)

# Cypher Queries

We will not dive deep into the cypher syntax during the course. The following queries should be enough for the interaction with the neo4j database. You can also check the [documentation](https://neo4j.com/docs/cypher-cheat-sheet/5/aura-dbe/auradb-free), if you happen to need more.

In [None]:
# delete every node and edge
MATCH(n)
DETACH DELETE (n)

# create nodes and edges
follow the structure shown at https://github.com/aurioldegbelo/sis2024/blob/main/vector_data/data.cypher

# visualize the model of the graph database
CALL apoc.meta.graph()

# Project work

* Exercice 01: clarify what your search target is

* Exercice 02: elaborate on your data model (what are entities and relationships)

* Exercice 03: create a neo4j account and a database instance

* Exercice 04: create an example of cypher query (CREATE) for your data (just a few instances), upload it to the database to see if it works

* Exercice 05: write a script to generate a CREATE query (it converts from your original format [csv, tsv, json, ...]) to a cypher template
