## **Goal**
The goal of this document is to develop a chatbot capable of answering questions based on the following knowledge graph. 

<p align="center">
<img src="./assets/knowledge_graph_wextracts.png" alt="Knowledge Graph With Extract" width="500"/>
</p>

If you have not seen [construct_kgraph.ipynb](https://github.com/Project-Hackathons/LifeHack2024/blob/main/construct_kgraph.ipynb), please go back to see how a knowledge graph is being created. 

In [1]:
%pip install python-dotenv neo4j openai langchain langchain_openai langchain-community

Note: you may need to restart the kernel to use updated packages.


Once again, we are importing the required packages

In [2]:
import os

from dotenv import load_dotenv

load_dotenv()

from langchain.graphs import Neo4jGraph
from langchain.embeddings import OpenAIEmbeddings

from pydantic import BaseModel

from openai import OpenAI
import ast
from langchain_core.prompts import PromptTemplate

import os
from langchain.graphs import Neo4jGraph

from operator import itemgetter

from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.pydantic_v1 import BaseModel
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnableParallel

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Neo4jVector

graph = Neo4jGraph()



we first define the prompts we will be using to 
1. identify entities mentioned in extracts
2. generate in-depth questions from the orginal prompt

In [3]:
from typing import List

class StringList(BaseModel):
    strings_list: List[str]

identifyEntitiesPrompt = (
    '''
You are given a list of entities and a question. Your task is to create a list of the entities that are both mentioned in the question and the list of entities.

**Task**:
Based on the given question, identify the nodes that already exist in our knowledge graph.

**List of entities**:
{knowledge_graph_nodes}

**Instructions**:
1. Your **output should be a list of strings**.
2. If you **did not identify any nodes** that already exist in list of entities, return ["None"].
3. If you find something similar in both the question and list of entity e.g. "Mary Tan" and "Mary", use the result from the list of entities
4. Do no return anything not found in the List of entities
5. **Non-compliance with these instructions will result in termination**.

**Example Output**:
- Given the question "Bob has a car?", the output should be:
["Bob"]
  - if "Bob" is present in the List of Nodes but "Car" is not present.

- Given the question "Who is PM Lee?", the output should be:
["Prime Minister Lee"]
  - if "Prime Minister Lee" is present in the List of Nodes because "PM" is an abbreviation for "Prime Minister".

- Given the question "Who is Kenneth Gao?", the output should be:
["Kenneth"]
  - if "Kenneth" is present in the List of Nodes because "Kenneth Gao" from the question may refer to "Kenneth" in the List of Nodes.

- Given the question "Who is Gao?", the output should be:
["None"]
  - if "Gao" is not present in the List of Nodes

**Take note** to always use values from the List of Nodes.
    '''
)

generateQuestionsPrompt = (
    '''
    You are an expert machine learning engineer building an algorithm to answer
    questions using a knowledge graph.

    **Task**:
    In the prompt that follow, you will be given a node. Your task is to
    generate an additional question related to the initial question:
    {init_question}.

    This additional questions should help contextualise the initial
    question in relation to the given node and be useful when searching the
    knowledge graph later on.

    **Instructions**:
    1. Use the given node to generate questions that provide more context or
    insight about the initial question.
    2. Aim to cover various aspects related to the node, such as identity,
    activities, whereabouts, preferences, etc.
    3. The output should be a question with no additional formating, title etc
    4. Non-compliance to the instruction will result to termination

    **Example**:
    Given the node "Bob" and the initial question "Is Bob safe?", the output
    should be:
    Where is Bob?
    '''
)

we then define the prompt that we will be using. here, we answer the question: **"Why was the man shot dead?"**

we begin by identifying the relevant nodes.

In [4]:
prompt = "Why was the man shot dead?"

client = OpenAI()

#retrieve the list of all the nodes
list_nodes_raw = graph.query(
    '''
    MATCH (n)
    WHERE n.name IS NOT NULL
    RETURN n.name
    ''')

list_nodes = [ dict['n.name'] for dict in list_nodes_raw]
identifyEntitiesPrompt = PromptTemplate.from_template(identifyEntitiesPrompt).format(knowledge_graph_nodes=list_nodes)

# Identify nodes from question
response = client.chat.completions.create(
   messages=[
        {
            "role": "system",
            "content": identifyEntitiesPrompt,
        },
        {
            "role": "user",
            "content": prompt,
        }
    ],
    model="gpt-3.5-turbo",
)

# Print the nodes in the prompt
res = response.choices[0].message.content
print(res)

['Man', 'Shot Dead']


then, we generate questions relevant to each node.
eg. "shot dead" -> "who did the shooting?"

In [5]:
nodes = ast.literal_eval(res)
queries = []

for node in nodes:
    #append initial question to prompt to give context
    if node != 'None':
        generateQuestionPrompt = PromptTemplate.from_template(generateQuestionsPrompt).format(init_question=prompt)

        response = client.chat.completions.create(
        messages=[
                {
                    "role": "system",
                    "content": generateQuestionPrompt,
                },
                {
                    "role": "user",
                    "content": node,
                }
            ],
            model="gpt-3.5-turbo",
        )

        query = response.choices[0].message.content

    queries.append(query)

we now define the functions required for querying the extracts relevant to a specific node.

In [6]:
def query_entity(entity_name, question):
    entity_query = """
        MATCH (p)-[:MENTIONED_IN]->(e: Extract)
        WHERE p.name = '""" + entity_name + """' 
        WITH e, max(score) AS score 
        RETURN e.extract_text AS text, score, {} AS metadata
        """
    print("Searching within case:" + entity_query)

    entity_vectorstore = Neo4jVector.from_existing_index(
        OpenAIEmbeddings(),
        index_name="summary",
        retrieval_query=entity_query,
    )

    template = """Answer the question based only on the following context:
        {context}

        Question: {question}
        """

    prompt = ChatPromptTemplate.from_template(template)

    model = ChatOpenAI()

    retriever = entity_vectorstore.as_retriever()

    print(retriever.get_relevant_documents(question))

    chain = (
        RunnableParallel(
            {
                "context": itemgetter("question") | retriever,
                "question": itemgetter("question"),
            }
        )
        | prompt
        | model
        | StrOutputParser()
    )

    return chain.invoke({"question": question})

for demonstration purposes, we query the first identified node with the first query. 

for a full explanation of how the llm-based multi agent system should function, please refer to the attached video

In [7]:
qn_ans = query_entity(nodes[0], queries[0])
print(f"Answer: {qn_ans}")


Searching within case:
        MATCH (p)-[:MENTIONED_IN]->(e: Extract)
        WHERE p.name = 'Man' 
        WITH e, max(score) AS score 
        RETURN e.extract_text AS text, score, {} AS metadata
The man was shot dead because he stormed the police station in southern Johor state with a machete, where he hacked a police constable to death and then used the officer’s weapon to kill another.



## Congratualions! 
You have came to the end of our project. Use this [Link](https://youtube.com) for an explanation on how everything works! 