# 📡 Introduction to Retrieval Augmented Generation using Knowledge Graphs


The **Retrieval Augmented Generation** (RAG) approach involves dynamically retrieving relevant information from a knowledge base to assist in the generation of responses or predictions.

In our case, RAG enables to combine structured knowledge from knowledge graphs (KG) with Large Language Models(LLM) natural language capabilities.

By default LLMs are 1. a natural language interface with machines; 2. a knowledge base (their training data). But a LLM knowledge base is opaque and not consistent, as the knowledge generated is probabilistic and highly depends on the input ton, it can output contradictory knowledge.

With RAG we can reduce the incertainty related to the LLM black-box knowledge base, by plugin an external trusted knowledge base. And using the LLM as a natural language interface to our own data.

But it is still a **really new field**. The 2 main approaches currently used are:

1. Just vectorize the KG to provide better context to the LLM. Easiest approach (classic RAG):
    1. You iterate a KG, generate text embeddings for each concepts (various strategies possible here: text embeddings, GNN...) and store them in a vector database. 
    2. When a user ask a question to the LLM, you ask the LLM to reformulate the question to make it straight to the point, so the search for relevant concepts in your vector db is more efficient
    3. You generate embeddings for this question, and search for the most similar concepts in the vector database
    4. You provide the informations about those concepts to the LLM as context, now asking it to answer the question using the provided context.

    Its efficiency will depend a lot on the strategy adopted to generate embeddings (do you just use the concepts description? Do you include the links between concepts? How?)

2. Use the LLM to generate queries for your KG
    1. Get the "schema" of the queried KG (e.g. OWL ontology, SHACL shapes, VoID description)
    2. Provide the KG schema as input to the LLM
    3. Ask the LLM to generate a query to answer your question
    4. Execute the query


## 🎯 Aim

This notebook present a demo of **Retrieval Augmented Generation** (RAG) to faithfully resolve and use concepts from an OWL ontology, with conversation memory, using open source components:
* [LangChain](https://python.langchain.com) (cf. docs: [RAG with memory](https://python.langchain.com/docs/expression_language/cookbook/retrieval), [streaming RAG](https://python.langchain.com/docs/use_cases/question_answering/streaming))
* [FastEmbed embeddings](https://github.com/qdrant/fastembed)
* [Qdrant vectorstore](https://github.com/qdrant/qdrant)

You can easily change the different components used in this workflow to use whatever you prefer thanks to LangChain: 
* LLM (e.g. switch to Mistral, and that's it. Every other LLMs are not available in Europe)
* Vectorstore (e.g. switch to [FAISS](https://python.langchain.com/docs/integrations/vectorstores/faiss), [Chroma](https://python.langchain.com/docs/integrations/vectorstores/chroma), Milvus)
* Embedding model (e.g. switch to [HuggingFace sentence transformer](https://python.langchain.com/docs/integrations/text_embedding/sentence_transformers), OpenAI ADA)

## 🔗 Relevant resources

Programming frameworks:

* [LangChain](https://github.com/langchain-ai/langchain): build linear workflows for LLM
* [LangGraph](https://github.com/langchain-ai/langgraph): build circular workflows for LLM, with control loop
* [LlamaIndex](): framework to help build RAG systems (a lot of noise around it, but they use global static variables for configuration in their top examples, which is highly concerning for the rest of the code. What if I want to deploy 2 indexes with different config? I can't.)

Most existing research for TRAG+KG are using property grap[hs like neo4j or nebula graph (the tooling to work with RDF graphs is too bad):
* [KG-RAG](https://github.com/BaranziniLab/KG_RAG): integrate ad-hoc KG data to LLM answers. 
    1. ask the LLM to extract disease entities from the question
    2. query the KG to retrieve the context related to the extracted entities
    3. feed the retrieved context to the LLM
* [Graph-LLM](https://github.com/CurryTang/Graph-LLM): Paper and code about "Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs ". Basically LLM + GNN using PyTorch Geometric (PyG). They use LLM to increase the informations used to build the GNN.


## Questions

Your LLM should answer those 2 follow-up questions using the [SemanticScience OWL ontology](https://github.com/MaastrichtU-IDS/semanticscience):

- "What is a protein?"
- "What is the URI for this concept?"

## 📦️ Install and import dependencies

Get your OpenAI API key


In [1]:
import sys
!{sys.executable} -m pip install -q langchain langchain-community langchain-openai fastembed qdrant-client oxrdflib

from operator import itemgetter
import getpass
import os

from langchain.globals import set_debug
from langchain.memory import ConversationBufferMemory
from langchain.prompts import ChatPromptTemplate
from langchain.prompts.prompt import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import format_document
from langchain_openai import OpenAI
from langchain_community.vectorstores import Qdrant
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import get_buffer_string
from langchain_core.runnables import RunnableLambda, RunnablePassthrough

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Provide your OpenAI API Key")


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## ⚗️ Define how to extract infos to embed from the ontology

We will extend the LangChain document loader class to create a loader for OWL ontologies

**TODO**: you will need to write the SPARQL query to extract the concepts (classes and properties) labels and descriptions from the ontology. 

The current loader code expect your query to return the variables below.
- `?uri` of the concept
- `?label` containing the human readable label or description of this concept
- `?predicate` defining the retrieved label (e.g. rdfs:label, rdfs:comment, dcterms:description)
- `?type` of the concept (e.g. owl:Class or property)

Feel free to add more variables! You will just need to add 1 line to the code below to add this variable as concept metadata in the vectorstore

In [2]:
from typing import Any, List, Optional

from langchain_community.document_loaders.base import BaseLoader
from langchain_core.documents import Document
from rdflib import Graph

# TODO: the SPARQL query to retrieve labels and descriptions of classes and properties
extract_classes_query = """PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs:  <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl:  <http://www.w3.org/2002/07/owl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>

SELECT ?uri ?predicate ?label ?type
WHERE {
    TODO
}"""

class OntologyLoader(BaseLoader):
    """Load an OWL ontology and extract classes and properties as documents."""

    def __init__(self, ontology_url: str, format: Optional[str] = None):
        self.ontology_url = ontology_url
        self.format = format
        self.graph = Graph(store="Oxigraph")
        self.graph.parse(self.ontology_url, format=self.format)

    def load(self) -> List[Document]:
        """Load and return documents (classes and properties) from the OWL ontology."""
        docs: List[Document] = []
        for cls in self.graph.query(extract_classes_query):
            docs.append(self._create_document(cls))
        return docs

    def _create_document(self, result_row: Any) -> Document:
        """Create a Document object from a query result row."""
        label = str(result_row.label)
        return Document(
            page_content=label,
            # NOTE: you can include more metadata retrieved by the SPARQL query here
            metadata={
                "label": label,
                "uri": str(result_row.uri),
                "type": str(result_row.type),
                "predicate": str(result_row.predicate),
                "ontology": self.ontology_url,
            },
        )


## 🌀 Initialize local vectorstore and LLM

Use the ontology loader we just created, with the embedding model we chose.

In [None]:
flag_embeddings = FastEmbedEmbeddings(model_name="BAAI/bge-small-en-v1.5", max_length=512)
loader = OntologyLoader("https://semanticscience.org/ontology/sio.owl", format="xml")
docs = loader.load()

# Split the documents into chunks if necessary
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Qdrant.from_documents(
    splits,
    flag_embeddings,
    collection_name="ontologies",
    location=":memory:",
    # path="./data/qdrant",
)
# vectorstore = FAISS.from_documents(documents=docs, embedding=flag_embeddings)
# K is the number of source documents retrieved
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

llm = OpenAI(temperature=0)

## 🧠 Initialize prompts and memory

**TODO**: Fill in the prompts

In [4]:

# Create the memory object that is used to add messages
memory = ConversationBufferMemory(
    return_messages=True, output_key="answer", input_key="question"
)
# Add a "memory" key to the input object
loaded_memory = RunnablePassthrough.assign(
    chat_history=RunnableLambda(memory.load_memory_variables) | itemgetter("history"),
)

# Prompt to reformulate the question using the chat history
reform_template = """TODO

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
REFORM_QUESTION_PROMPT = PromptTemplate.from_template(reform_template)

# Prompt to ask to answer the reformulated question
answer_template = """TODO
{context}

Question: {question}
"""
ANSWER_PROMPT = ChatPromptTemplate.from_template(answer_template)

# Format how the ontology concepts are passed as context to the LLM
DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(
    template="Concept label: {page_content} | URI: {uri} | Type: {type} | Predicate: {predicate} | Ontology: {ontology}"
)
def _combine_documents(
    docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n\n"
):
    doc_strings = [format_document(doc, document_prompt) for doc in docs]
    # print("Formatted docs:", doc_strings)
    return document_separator.join(doc_strings)


## ⛓️ Define the chain

`itemgetter()` is used to retrieve objects defined in the previous step in the chain.

In [5]:
# Reformulate the question using chat history
reformulated_question = {
    "reformulated_question": {
        "question": lambda x: x["question"],
        "chat_history": lambda x: get_buffer_string(x["chat_history"]),
    }
    | REFORM_QUESTION_PROMPT
    | llm
    | StrOutputParser(),
}
# Retrieve the documents using the reformulated question
retrieved_documents = {
    "docs": itemgetter("reformulated_question") | retriever,
    "question": lambda x: print("💭 Reformulated question:", x["reformulated_question"]) or x["reformulated_question"],
    # "question": lambda x: x["reformulated_question"],
}
# Construct the inputs for the final prompt using retrieved documents
final_inputs = {
    "context": lambda x: _combine_documents(x["docs"]),
    "question": itemgetter("question"),
}
# Generate the answer using the retrieved documents and answer prompt
answer = {
    "answer": final_inputs | ANSWER_PROMPT | llm,
    "docs": itemgetter("docs"),
}
# Put the chain together
final_chain = loaded_memory | reformulated_question | retrieved_documents | answer

def stream_chain(final_chain, memory: ConversationBufferMemory, inputs: dict[str, str]) -> dict[str, Any]:
    """Ask question, stream the answer output, and return the answer with source documents."""
    output = {"answer": ""}
    for chunk in final_chain.stream(inputs):
        # print(chunk, flush=True)
        if "docs" in chunk:
            output["docs"] = [doc.dict() for doc in chunk["docs"]]
            print("📚 Documents retrieved:")
            for doc in output["docs"]:
                print(f"· {doc['page_content']} ({doc['metadata']['uri']})")
            # print(json.dumps(output["docs"], indent=2))
        if "answer" in chunk:
            output["answer"] += chunk["answer"]
            print(chunk["answer"], end="", flush=True)
    # Add messages to chat history
    memory.save_context(inputs, {"answer": output["answer"]})
    return output

## 🗨️ Ask questions

In [None]:
# set_debug(True)   # Uncomment to enable detailed LangChain debugging
output = stream_chain(final_chain, memory, {
    "question": "What is a protein?"
})

In [None]:
output = stream_chain(final_chain, memory, {
    "question": "What is the URI for this concept?"
})

## 🚀 Going further


How could this workflow be improved to answer more complex questions such as:
* Give me all properties that can be used for a Protein
* Write the SPARQL query to retrieve all human proteins from UniProt