# Project name: 'GraphRAG Analysis: Extractive Question Answering with Knowledge Graphs and Language Models'

Contributor: Rajeev singh sisodiya

Project overview:
This project explores using a combination of knowledge graphs and large language models (LLMs) for extractive question answering. The system leverages a PDF document as input and aims to answer user questions based on the information within it.

Information Extraction:
Processes a PDF document using a PyPDFLoader.
Splits the document content into smaller chunks using a RecursiveCharacterTextSplitter.

Knowledge Graph Creation:
Utilizes an LLM (ChatOpenAI) to identify key entities and relationships within the document text.
Creates a knowledge graph in Neo4j based on the extracted entities and relationships.
Integrates FAISS (vector search) for document retrieval.

#Question Answering

Offers two retrieval methods:
FAISS retriever: Efficiently retrieves relevant document passages based on the question using vector similarity search.

Cypher-based Neo4j retriever:
Retrieves entities from the knowledge graph that match the user's query.
Employs another LLM (ChatOpenAI) to generate concise answers to the user's questions based on the retrieved information.

Evaluation:
Generates ground truth question-answer pairs for the processed document.
Evaluates the performance of the two retrieval methods (FAISS and Neo4j) using RAG (Reasoning Augumented Generation) metrics like faithfulness, answer relevancy, context relevancy, and context recall.

This project demonstrates the potential of combining knowledge graphs and LLMs to build robust and informative question answering systems for textual data.

#Setting Up the Environment
First, let's set up our environment and import the necessary libraries:

In [3]:
!pip install python-dotenv

Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1


In [5]:
!pip install langchain_openai

Collecting langchain_openai
  Downloading langchain_openai-0.1.23-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain-core<0.3.0,>=0.2.35 (from langchain_openai)
  Downloading langchain_core-0.2.38-py3-none-any.whl.metadata (6.2 kB)
Collecting openai<2.0.0,>=1.40.0 (from langchain_openai)
  Downloading openai-1.43.0-py3-none-any.whl.metadata (22 kB)
Collecting tiktoken<1,>=0.7 (from langchain_openai)
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain-core<0.3.0,>=0.2.35->langchain_openai)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langsmith<0.2.0,>=0.1.75 (from langchain-core<0.3.0,>=0.2.35->langchain_openai)
  Downloading langsmith-0.1.114-py3-none-any.whl.metadata (13 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain-core<0.3.0,>=0.2.35->langchain_openai)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting ht

In [7]:
!pip install langchain-community

Collecting langchain-community
  Downloading langchain_community-0.2.16-py3-none-any.whl.metadata (2.7 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting langchain<0.3.0,>=0.2.16 (from langchain-community)
  Downloading langchain-0.2.16-py3-none-any.whl.metadata (7.1 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.22.0-py3-none-any.whl.metadata (7.2 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting langchain-text-splitters<0.3.0,>=0.2.0 (from langchain<0.3.0,>=0.2.16->langchain-community)
  Downloading langchain_text_splitters-0.2.4-py3-none-any.whl.metadata (2.3 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community)


In [11]:
!pip install neo4j

Collecting neo4j
  Downloading neo4j-5.24.0-py3-none-any.whl.metadata (5.7 kB)
Downloading neo4j-5.24.0-py3-none-any.whl (294 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/294.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.6/294.6 kB[0m [31m14.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: neo4j
Successfully installed neo4j-5.24.0


In [13]:
!pip install --upgrade ragas # Upgrade to the latest version of ragas



In [8]:
import ragas.metrics
print(dir(ragas.metrics))


['AnswerCorrectness', 'AnswerRelevancy', 'AnswerSimilarity', 'AspectCritique', 'ContextEntityRecall', 'ContextPrecision', 'ContextRecall', 'ContextUtilization', 'Faithfulness', 'FaithulnesswithHHEM', 'LabelledRubricsScore', 'NoiseSensitivity', 'ReferenceFreeRubricsScore', 'SummarizationScore', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '_answer_correctness', '_answer_relevance', '_answer_similarity', '_context_entities_recall', '_context_precision', '_context_recall', '_faithfulness', '_noise_sensitivity', '_rubrics_based', '_summarization', 'answer_correctness', 'answer_relevancy', 'answer_similarity', 'base', 'context_entity_recall', 'context_precision', 'context_recall', 'context_utilization', 'critique', 'faithfulness', 'labelled_rubrics_score', 'noise_sensitivity_irrelevant', 'noise_sensitivity_relevant', 'reference_free_rubrics_score', 'summarization_score']


In [22]:
import warnings
warnings.filterwarnings('ignore')

import os
import asyncio

import nest_asyncio
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from dotenv import load_dotenv
from typing import List, Dict, Union

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain_community.document_loaders import PyPDFLoader

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Neo4jVector, FAISS

import langchain
from langchain.schema import BaseRetriever

# Remove the duplicate import of RunnablePassthrough
#from langchain.runnables import RunnablePassthrough
from langchain_core.documents import Document
from langchain.schema import OutputParserException
from langchain.schema.output_parser import StrOutputParser
from langchain.prompts import PromptTemplate, ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema.runnable import RunnablePassthrough # This import is sufficient

from neo4j import GraphDatabase
from ragas import evaluate

from ragas.metrics import faithfulness, answer_relevancy, context_recall


from datasets import Dataset
import random

import re
from tqdm.asyncio import tqdm

from concurrent.futures import ThreadPoolExecutor

# Load environment variables for API keys
load_dotenv()

openai_api_key = os.getenv("sk-UUXg6tJgz4oVkt9Sx4lWwWSS7fbPdqrDbAtAu0YsReT3BlbkFJzMEGGwT2caQXxodaahMvf6kmaN0CA2Jsvzs-BA0eEA")
neo4j_url = os.getenv("neo4j+s://db14414e.databases.neo4j.io")

neo4j_user = os.getenv("neo4j")
neo4j_password = os.getenv("1uwMBy5pleVc8rmrOeunHEIHaUXz89_9rWHEgaISYXM")

# nest_asyncio applies a fix for asyncio running in Jupyter notebooks
nest_asyncio.apply()

#Setting Up Neo4j Connection
To use Neo4j as the graph database, let's set up the connection and create some utility functions

#Verify Environment Variables

In [43]:
import os
from dotenv import load_dotenv

# Load the environment variables from the .env file
load_dotenv()

# Fetch environment variables
# Replace 'neo4j+s' with 'bolt+s' if necessary
neo4j_url = os.getenv("neo4j+s://db14414e.databases.neo4j.io")
neo4j_user = os.getenv("neo4j")
neo4j_password = os.getenv("1uwMBy5pleVc8rmrOeunHEIHaUXz89_9rWHEgaISYXM")

# Print values to verify
if not neo4j_url or not neo4j_user or not neo4j_password:
    print("Error: One or more environment variables are missing or incorrect")
    print(f"Neo4j URL: {neo4j_url}")
    print(f"Neo4j User: {neo4j_user}")
    print(f"Neo4j Password: {neo4j_password}")
else:
    print("All environment variables are loaded correctly.")


Error: One or more environment variables are missing or incorrect
Neo4j URL: None
Neo4j User: None
Neo4j Password: None


In [44]:
import os
from dotenv import load_dotenv

# Explicitly specify the path to your .env file if it's not in the same directory
load_dotenv('.env')

# Retrieve OpenAI API key and Neo4j credentials from the environment variables
#please use your personal openai API key and neo4j credentials. I uesd here my personal credentials.
openai_api_key = os.getenv("sk-UUXg6tJgz4oVkt9Sx4lWwWSS7fbPdqrDbAtAu0YsReT3BlbkFJzMEGGwT2caQXxodaahMvf6kmaN0CA2Jsvzs-BA0eEA")  # Correct the environment variable name if needed
neo4j_url = os.getenv("neo4j+s://db14414e.databases.neo4j.io")
neo4j_user = os.getenv("neo4j")
neo4j_password = os.getenv("1uwMBy5pleVc8rmrOeunHEIHaUXz89_9rWHEgaISYXM")

# Print values to verify they are loaded correctly
print("Neo4j URL:", neo4j_url)
print("Neo4j User:", neo4j_user)
print("Neo4j Password:", neo4j_password)

# Adjust the URI scheme to 'bolt+s' for Neo4j Aura if needed
if neo4j_url and neo4j_url.startswith("neo4j+s://"):
    neo4j_url = neo4j_url.replace("neo4j+s://", "bolt+s://")
    print("Corrected Neo4j URL:", neo4j_url)

from neo4j import GraphDatabase

# Create Neo4j driver instance and establish connection
try:
    driver = GraphDatabase.driver(neo4j_url, auth=(neo4j_user, neo4j_password))
    print("Neo4j driver created successfully.")
except Exception as e:
    print(f"Error connecting to Neo4j: {e}")

# Function to clear the Neo4j instance
def clear_neo4j_data(tx):
    tx.run("MATCH (n) DETACH DELETE n")

# Ensure vector index exists in Neo4j
def ensure_vector_index(recreate=False):
    with driver.session() as session:
        result = session.run("""
            SHOW INDEXES
            YIELD name, labelsOrTypes, properties
            WHERE name = 'entity_index'
            AND labelsOrTypes = ['Entity']
            AND properties = ['embedding']
            RETURN count(*) > 0 AS exists
        """).single()

        index_exists = result['exists'] if result else False

        if index_exists and recreate:
            session.run("DROP INDEX entity_index")
            print("Existing vector index 'entity_index' dropped.")
            index_exists = False

        if not index_exists:
            session.run("""
                CALL db.index.vector.createNodeIndex(
                    'entity_index',
                    'Entity',
                    'embedding',
                    1536,
                    'cosine'
                )
            """)
            print("Vector index 'entity_index' created successfully.")
        else:
            print("Vector index 'entity_index' already exists. Skipping creation.")

# Add embeddings to entities in Neo4j
def add_embeddings_to_entities(tx, embeddings):
    query = """
        MATCH (e:Entity)
        WHERE e.embedding IS NULL
        WITH e LIMIT 100
        SET e.embedding = $embedding
    """

    entities = tx.run("MATCH (e:Entity) WHERE e.embedding IS NULL RETURN e.name AS name LIMIT 100").data()

    for entity in tqdm(entities, desc="Adding embeddings"):
        embedding = embeddings.embed_query(entity['name'])
        tx.run(query, embedding=embedding)

# Example of using the functions
with driver.session() as session:
    session.write_transaction(clear_neo4j_data)
    ensure_vector_index()
    # Assuming 'embeddings' is an object you have for embedding purposes
    # session.write_transaction(add_embeddings_to_entities, embeddings)


#Data Processing and Graph Creation
Now, let's load our data and create our knowledge graph

In [46]:
!pip install pypdf

Collecting pypdf
  Downloading pypdf-4.3.1-py3-none-any.whl.metadata (7.4 kB)
Downloading pypdf-4.3.1-py3-none-any.whl (295 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/295.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.8/295.8 kB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-4.3.1


In [48]:
#pdf link chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://www.astrid-online.it/static/upload/tran/transcript-of-the-second-presidential-debate.pdf
!pip install pypdf
# Load and process the PDF
pdf_path = "/content/transcript-of-the-second-presidential-debate.pdf"

loader = PyPDFLoader(pdf_path)
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

# Function to create graph structure
def create_graph_structure(tx, texts):
    llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

    prompt = ChatPromptTemplate.from_template(
        """
        Given the following text, identify key entities and their relationships.

        Format the output as a list of tuples, each on a new line: (entity1, relationship, entity2)

        Text: {text}

        Entities and Relationships:
        """
    )

    for text in tqdm(texts, desc="Creating graph structure"):
        response = llm(prompt.format_messages(text=text.page_content))

        # Process the response and create nodes and relationships
        lines = response.content.strip().split('\n')

        for line in lines:
            if line.startswith('(') and line.endswith(')'):
                parts = line[1:-1].split(',')
                if len(parts) == 3:
                    entity1, relationship, entity2 = [part.strip() for part in parts]

                    # Create nodes and relationship
                    query = (
                        """
                        MERGE (e1:Entity {name: $entity1})
                        MERGE (e2:Entity {name: $entity2})
                        MERGE (e1)-[:RELATED (type: $relationship)]->(e2)
                        """
                    )
                    tx.run(query, entity1=entity1, entity2=entity2, relationship=relationship)

#Setting Up Retrievers
We'll set up two types of retrievers: one using FAISS for vector-based retrieval, and another using Neo4j for graph-based retrieval.

In [55]:
# Embeddings model
embeddings = OpenAIEmbeddings(openai_api_key='sk-UUXg6tJgz4oVkt9Sx4lWwWSS7fbPdqrDbAtAu0YsReT3BlbkFJzMEGGwT2caQXxodaahMvf6kmaN0CA2Jsvzs-BA0eEA')

embedding_cache = {}

def get_embedding(text):
        if text not in embedding_cache:
            embedding_cache[text] = embeddings.embed_query(text)  # Assuming you're using embed_query for single texts
        return embedding_cache[text]
        embeddings.embed_query = get_embedding

# Create FAISS retriever
faiss_vector_store = FAISS.from_documents(texts, embeddings)
faiss_retriever = faiss_vector_store.as_retriever(search_kwargs={"k": 2})

# Neo4j retriever
def create_neo4j_retriever():
    with driver.session() as session:
        # Clear existing data
        session.run("MATCH (n) DETACH DELETE n")  # equivalent to the clear_neo4j_data function created earlier in code

        # Create graph structure
        session.execute_write(create_graph_structure, texts)

        # Add embeddings to entities
        max_attempts = 10
        attempt = 1

        while attempt <= max_attempts:
            count = session.execute_read(lambda tx: tx.run("MATCH (e:Entity) WHERE e.embedding IS NULL RETURN COUNT(e) AS count").single()['count'])
            if count == 0:
                break

            session.execute_write(add_embeddings_to_entities, embeddings)

            if attempt == max_attempts:
                print("Warning: Not all entities have embeddings after maximum attempts.")

        # Create Neo4j retriever
        neo4j_vector_store = Neo4jVector.from_existing_index(
            embeddings,
            url=neo4j_url,
            username=neo4j_user,
            password=neo4j_password,
            index_name="entity_index",
            node_label="Entity",
            text_node_property="name",
            embedding_node_property="embedding"
        )

        return neo4j_vector_store.as_retriever(search_kwargs={"k": 2})

# Cypher-based retriever
def cypher_retriever(search_term: str) -> List[Document]:
    with driver.session() as session:
        result = session.run("""
            MATCH (e:Entity)
            WHERE e.name CONTAINS $search_term
            RETURN e.name AS name, [(e)-[r:RELATED]->(related) | related.name, r.type] AS related
            LIMIT 2
        """, search_term=search_term)

        documents = []
        for record in result:
            content = f"Entity: {record['name']}\nRelated: {', '.join(record['related'])}"
            documents.append(Document(page_content=content))

        return documents

#Creating RAG Chains
Now, let's create our RAG chains

In [57]:
def create_rag_chain(retriever):
    llm = ChatOpenAI(model_name="gpt-3.5-turbo")

    template = """Answer the question based on the following context:
    {context}

    Question: {question}

    Answer:"""
    prompt = PromptTemplate.from_template(template)

    if callable(retriever):
        # For Cypher retriever
        retriever_func = lambda q: retriever(q)
    else:
        # For FAISS retriever
        retriever_func = retriever

    return (
        {"context": retriever_func, "question": RunnablePassthrough()}
        | prompt
        | StroutputParser()
    )

# Embeddings model
embeddings = OpenAIEmbeddings(openai_api_key='sk-UUXg6tJgz4oVkt9Sx4lWwWSS7fbPdqrDbAtAu0YsReT3BlbkFJzMEGGwT2caQXxodaahMvf6kmaN0CA2Jsvzs-BA0eEA')
# Create FAISS retriever
faiss_vector_store = FAISS.from_documents(texts, embeddings)
faiss_retriever = faiss_vector_store.as_retriever(search_kwargs={"k": 2})

# Create RAG chains
faiss_rag_chain = create_rag_chain(faiss_retriever)
cypher_rag_chain = create_rag_chain(cypher_retriever)

#Evaluation Setup
To evaluate our RAG systems, we'll create a ground truth dataset and use the RAGAS framework

In [58]:
def create_ground_truth(texts: List[Union[str, Document]], num_questions: int = 100) -> List[Dict]:
    llm_ground_truth = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.2)

    def get_text(item):
        return item.page_content if isinstance(item, Document) else item

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    all_splits = text_splitter.split_text(' '.join(get_text(doc) for doc in texts))

    ground_truth = []

    question_prompt = ChatPromptTemplate.from_template(
        "Given the following text, generate {num_questions} diverse and specific questions that can be answered based on the information in the text. "
        "Provide the questions as a numbered list.\n\nText: {text}\n\nQuestions:"
    )

    all_questions = []
    for split in tqdm(all_splits, desc="Generating questions"):
        response = llm_ground_truth(question_prompt.format_messages(num_questions=3, text=split))
        questions = response.content.strip().split('\n')
        all_questions.extend([q.split('. ', 1)[1] if '. ' in q else q for q in questions])

    random.shuffle(all_questions)
    selected_questions = all_questions[:num_questions]

    llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

    for question in tqdm(selected_questions, desc="Generating ground truth"):
        answer_prompt = ChatPromptTemplate.from_template(
            "Given the following question, provide a concise and accurate answer based on the information available. "
            "If the answer is not directly available, respond with 'Information not available in the given context.'\n\nQuestion: {question}\n\nAnswer:"
        )
        answer_response = llm(answer_prompt.format_messages(question=question))
        answer = answer_response.content.strip()

        context_prompt = ChatPromptTemplate.from_template(
            "Given the following question and answer, provide a brief, relevant context that supports this answer. "
            "If no relevant context is available, respond with 'No relevant context available.'\n\n"
            "Question: {question}\nAnswer: {answer}\n\nRelevant context:"
        )
        context_response = llm(context_prompt.format_messages(question=question, answer=answer))
        context = context_response.content.strip()

        ground_truth.append({
            "question": question,
            "answer": answer,
            "context": context,
        })

    return ground_truth

async def evaluate_rag_async(rag_chain, ground_truth, name):
    splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

    generated_answers = []
    for item in tqdm(ground_truth, desc=f"Evaluating {name}"):
        question = splitter.split_text(item["question"])[0]

        try:
            answer = await rag_chain.ainvoke(question)
        except AttributeError:
            answer = rag_chain.invoke(question)

        truncated_answer = splitter.split_text(str(answer))[0]
        truncated_context = splitter.split_text(item["context"])[0]
        truncated_ground_truth = splitter.split_text(item["answer"])[0]

        generated_answers.append({
            "question": question,
            "answer": truncated_answer,
            "contexts": [truncated_context],
            "ground_truth": truncated_ground_truth
        })

    dataset = Dataset.from_pandas(pd.DataFrame(generated_answers))

    result = evaluate(
        dataset,
        metrics=[
            context_relevancy,
            faithfulness,
            answer_relevancy,
            context_recall,
        ]
    )

    return {name: result}
async def run_evaluations(rag_chains, ground_truth):
    results = {}
    for name, chain in rag_chains.items():
        result = await evaluate_rag_async(chain, ground_truth, name)
        results.update(result)
    return results

# Main execution function
async def main():
    # Ensure vector index
    ensure_vector_index(recreate=True)

    # Create retrievers
    neo4j_retriever = create_neo4j_retriever()

    # Create RAG chains
    faiss_rag_chain = create_rag_chain(faiss_retriever)
    neo4j_rag_chain = create_rag_chain(neo4j_retriever)

    # Generate ground truth
    ground_truth = create_ground_truth(texts)

    # Run evaluations
    rag_chains = {
        "FAISS": faiss_rag_chain,
        "Neo4j": neo4j_rag_chain
    }
    results = await run_evaluations(rag_chains, ground_truth)
    return results

# Run the main function
if __name__ == "__main__":
    nest_asyncio.apply()
    try:
        results = asyncio.run(asyncio.wait_for(main(), timeout=7200))  # 2 hour timeout
        plot_results(results)

        # Print detailed results
        for name, result in results.items():
            print(f"Results for {name}:")
            print(result)
            print()
    except asyncio.TimeoutError:
        print("Evaluation timed out after 2 hours.")
    finally:
        # Close the Neo4j driver
        driver.close()

#Results and Analysis of the project:
This project implements a system for extractive question answering using a combination of retrievers and large language models (LLMs) on a PDF document.

1.Functionalities

Information Extraction:

Successfully extracts text chunks from the PDF document using PyPDFLoader and RecursiveCharacterTextSplitter.

Knowledge Graph Creation:

Utilizes ChatOpenAI to identify key entities and relationships from the extracted text.
Creates a knowledge graph in Neo4j based on the identified entities and relationships.
Integrates FAISS for efficient document retrieval based on vector similarity.

2 .Question Answering:

Offers two retrieval methods:

FAISS retriever: Efficiently retrieves relevant document passages based on the question using vector similarity search.

Cypher-based Neo4j retriever:
Retrieves entities from the knowledge graph that match the user's query.
Employs another LLM (ChatOpenAI) to generate concise answers to the user's questions based on the retrieved information.

Evaluation:

Generates ground truth question-answer pairs for the processed document.
Evaluates the performance of the two retrieval methods (FAISS and Neo4j) using RAG (Reasoning Augmented Generation) metrics like faithfulness, answer relevancy, context relevancy, and context recall.

3 .Expected Outcome:

The project is designed to evaluate the effectiveness of FAISS and Neo4j retrievers in retrieving relevant information for question answering. The RAG metrics would ideally show:

High faithfulness: The generated answers accurately reflect the retrieved information.

High answer relevancy: The generated answers directly address the user's question.

High context relevancy: The retrieved context snippets are relevant to the question and answer.

High context recall: The retrieved context captures most of the important information needed for answering the question.

#Analysis of Missing Information

Unfortunately, the project doesn't include the plot_results function or the final printed results. However, based on the functionalities, we can infer that the analysis would involve examining the RAG metric scores for both FAISS and Neo4j retrievers.

Here are some pointers for further analysis:

Compare the RAG metric scores between FAISS and Neo4j retrievers. Identify which retrieval method performs better in terms of faithfulness, answer relevancy, context relevancy, and context recall.

Analyze potential reasons for the observed performance. Consider factors like the complexity of the knowledge graph, the effectiveness of entity and relationship identification by ChatOpenAI, and the quality of retrieved passages by FAISS or Neo4j.

Overall, the project demonstrates a promising approach for extractive question answering using LLMs and knowledge graphs. Analyzing the RAG metrics would provide valuable insights into the effectiveness of different retrieval methods and guide further improvements to the system.


