# University Artificially Intelligent Chatbot
## Week 3 Prototype

### Current checklist:


This checklist isn't exhaustive, more things may be added to it as time passes.

- *Data mining*
  - 🛠️ Policies
  - ❌ Campus locations
  - ❌ Societies
- *Data Storage*
  - ✅ Vector DB Identification
  - ✅ Embedding
- *Chatbot*  
  - ✅ Data retrieval
  - ✅ Conversational memory
  - ❌ User interface

### Future plans 

- Prompt engineering 
    - The prototype seems to work okay with its current prompt, but maybe it could be even better.
    
- More policy data
    - A small selection of University policies can currently be queried, though not all of them.


- Tools 
    - Currently, the LLM can use one tool (if it deems it necessary), which is to retrieve data from the DB.
            Perhaps more tools can be added for general use, like telling the time? (How long until this deadline, etc)

- Manual Data Creation
    - While the LLM can gather some general information about the university from the policies it retrieves, it will be helpful (perhaps essential) to create an additional PDF of my own with some information about the university, such as the campus locations. This may also be the best way to encode data about societies? 

### Initialisation

#### Imports and key variables

The imported methods and classes are described in further detail when they're used.

In [1]:
# Used to get the OpenAI API key from the system environment variables.
import os

# Initialises the LLM.
from langchain.chat_models import init_chat_model

# The vector DB used for this prototype. It's worked well this far,
# so I'll likely use it in future versions, too.
from langchain_community.vectorstores import FAISS

# Allows for inputs to be sent to the LLM as a human (user input)
# and the system (specialised prompt that defines LLM behaviour).
from langchain_core.messages import HumanMessage, SystemMessage

# Allows for "tools" to be created. The LLM can use these tools
# to execute defined code, such as retrieving data from the FAISS DB.
from langchain_core.tools import tool

# Used to embed queries to the FAISS DB.
from langchain_openai import OpenAIEmbeddings

# Used to save the conversation to memory.
from langgraph.checkpoint.memory import MemorySaver

# LangGraph key functionality. Declares a clear and consistent 
# structure for the chatbot. LangGraph is described in detail in 
# its own notebook section.
from langgraph.graph import END, MessagesState, StateGraph
from langgraph.prebuilt import ToolNode, tools_condition

# Get the OpenAI API key from environment variables so that 
# it's not visible in this code on GitHub. OpenAI would revoke the key if it leaked.
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")

# Sets the directory of the FAISS DB that's being loaded from.
    # Options:
    #   FAISS: Chunk size 1000, Overlap 200, UnstructuredPDFLoader in elements mode.
    #   FAISS-PyPDF: Chunk size 1000, Overlap 200, PyPDFLoader with default args.
FAISS_PATH = "FAISS-PyPDF"
# Experimentation showed that using the PyPDFLoader-based DB gave better results.

#### FAISS, embeddings and the LLM

LangChain allows for easy switching of embedding models by merely changing the model argument. However, using a different embedding model than the one used to create the vector database will have significant negative consequences that could render the chatbot inoperable, so it's essential that this matches what's used in the database embedding file, which is OpenAI's ``text-embedding-3-small`` in this prototype.

In [2]:
# Sets up the embedding model with the API key.
embedder = OpenAIEmbeddings(
    model = "text-embedding-3-small",
    api_key = os.environ["OPENAI_API_KEY"]
)

It also allows for many different vector database options. This prototype uses a Facebook AI Similarity Search (FAISS) DB primarily due to its easy integration with LangChain - this single line of code is all that's necessary to retrieve the stored data from the chosen ``FAISS_PATH``.

In [3]:
# Load the vector database.
db = FAISS.load_local(folder_path = FAISS_PATH,
                      embeddings = embedder,
                      allow_dangerous_deserialization=True) 

FAISS stores data in a Pickle file. This is a serialised format that allows Python to load the database. However, a malicious Pickle file can actually execute arbitrary code. The files used in this vector DB are not malicious, so it is fine to enable ``allow_dangerous_deserialization``.

In [4]:
llm = init_chat_model("gpt-4o-mini")

``init_chat_model``, as suggested by its name, initialises the model. This prototype, and most likely the final version, will use GPT-4o-mini due to its low cost in relation to other models. While it is a lower-quality model than higher-end models like GPT-4 or reasoning models like o1/o3-mini, it still can perform the simple chatbot functionalities of this prototype. 

#### Tools

Currently, there's only one tool - the retriever itself, which will perform a semantic search on the FAISS DB based on the user's query. It returns the content of the 3 most similar chunks to the user's query, as well as which PDF they came from, though the user won't see that part.

In [6]:
@tool(response_format = "content_and_artifact")
def retrieve(query):
    # The docstring below is actually REQUIRED by LangGraph, and this 
    # won't run without it.
    """Retrieves the 3 most relevant chunks for the user's query."""
    retrieved_docs = db.similarity_search(query, k = 3)
    
    # The chunk's content and the document it came from (e.g "Attendance.pdf")
    content = "\n\n".join(
        (f"Source: {doc.metadata}\n" f"Content: {doc.page_content}")
        for doc in retrieved_docs
    )
    return content, retrieved_docs

### LangGraph

LangGraph is a new option from the LangChain devs, which allow for the actions in the RAG chain to be directly plotted as a sequence of events.
Primarily, it makes **conversational memory** extremely simple to implement.

In [7]:
# Initialise an empty graph. Nodes and edges are added later.
graph_builder = StateGraph(MessagesState)

A ``MessagesState`` is a list of messages, which contains the active conversation. They add a massive layer of abstraction in programming the chatbot, as no management of the conversation history is performed in this code.

In [8]:
def query_or_respond(state: MessagesState):
    # Allows the LLM to use the retrieve tool that was created earlier.
    rag_llm = llm.bind_tools([retrieve])
    
    # The LLM decides on its own if it needs
    response = rag_llm.invoke(state["messages"])
    
    # A key element of using a MessagesState is that that will append
    # the response to the conversation history rather than overwriting.
    return {"messages": [response]}


# A node is created for the retrieval tool.
# It's called tools despite only being one tool in case I make more later.
# If I made more they'd be part of this list.
tools = ToolNode([retrieve])

# Step 3: Generate a response using the retrieved content.
def generate(state: MessagesState):
    # To massively reduce token consumption (and therefore cost),
    # the most recent RAG context from tool calls is added to the 
    # prompt to stop the LLM searching the entire conversation history 
    # for something that was JUST said. 
    recent_tool_messages = []
    
    # To get the most recent ones, the list needs to be reversed
    # so that the most recent come first instead of last.
    for message in reversed(state["messages"]):
        if message.type == "tool":
            recent_tool_messages.append(message)
        else:
            # If it's a normal message, stop.
            break
    
    # Put the tool messages in their original order in case 
    # the sequence of the retrieved context mattered. 
    tool_messages = recent_tool_messages[::-1]

    # Saves the context from the recent tool messages.
    docs_content = "\n\n".join(doc.content for doc in tool_messages)
    
    # This is the LLM's system prompt, which decides how the LLM behaves.
    system_message_content = (
        # This is formatted like this to follow the Ruff linter's line length rule.
        "You are an assistant to help new students get acclimated to Birmingham City "
        "University. Use the following pieces of retrieved context to answer "
        "the question. If you don't know the answer, say that you "
        "don't know. When referring to context, be specific and quote the context. "
        "Use three sentences maximum and keep the answer concise."
        "\n\n"
        f"{docs_content}" # RAG context is attached here
    )
    
    # The list of messages in the conversation.
    # Only adds messages that AREN'T tool calls, as tool calls
    # would hugely increase the input tokens used, and the LLM 
    # should (hopefully) have already said the useful info in its response
    # so it can use that instead.
    conversation_messages = [
        message for message in state["messages"] 
        
        if message.type in ("human", "system") 
        or (message.type == "ai" and not message.tool_calls)
    ]
    
    # The final prompt consists of the system prompt and the conversation.
    # This does mean that as a conversation continues, token cost will greatly increase.
    prompt = [SystemMessage(system_message_content)] + conversation_messages

    # Get the LLM's response to the prompt and return the response.
    response = llm.invoke(prompt)
    return {"messages": [response]}

#### Graph Building

After establishing all the functions that form the chatbot, the graph can be built using the graph_builder `StateGraph` that was made earlier.

In [9]:
# Add all the nodes.
graph_builder.add_node(query_or_respond)
graph_builder.add_node(tools)
graph_builder.add_node(generate)

<langgraph.graph.state.StateGraph at 0x1e5237a66b0>

``query_or_respond`` has **conditions**:
- If the user's query needs RAG context, the retrieve tool will be called.
- If it does not, the LLM will generate a response by itself.

"What is the late submission deadline?" invokes the retrieval tool. 

"Hello!" does not.

In [10]:
graph_builder.set_entry_point("query_or_respond")
graph_builder.add_conditional_edges(
    "query_or_respond",
    tools_condition, 
    # If retrieval is not needed, generate a general answer.
    # If it is, call the retrieval tool.
    {END: END, "tools": "tools"},
)

# An edge is also needed between the tool call and response generation
# to ensure the response has the RAG context.
graph_builder.add_edge("tools", "generate")

# After the response is generated, the graph is done.
graph_builder.add_edge("generate", END)

<langgraph.graph.state.StateGraph at 0x1e5237a66b0>

To learn about LangGraph and memory, I used their documentation at https://langchain-ai.github.io/langgraph/concepts/persistence/

In [11]:
# Used to keep the ongoing conversation in memory.
memory = MemorySaver()

# Thread ID would allow for multiple sessions of the chatbot to run simultaneously.
# Different thread IDs have their own memory.
config = {"configurable": {"thread_id": "W3Prototyping"}}

# Compile the graph
graph = graph_builder.compile(checkpointer = memory)

In [12]:
def query(query):
    input_messages = [HumanMessage(query)]
    output = graph.invoke({"messages": input_messages}, config)
    output["messages"][-1].pretty_print()

## Prototype Main Loop

After running all earlier cells, the cell below contains an infinite loop (broken by inputting "quit" or CTRL+C) to prompt the LLM.

In [None]:
while True:
    q = input("What is your query?")
    if q != "quit":
        query(q)
    else: 
        break

The cell below outputs the entire conversation. It does include tool messages and the context they retrieved.

In [None]:
state = graph.get_state(config).values

for message in state["messages"]:
    message.pretty_print()