

# Setup Instructions

## 1. Create a Virtual Environment
```bash
python -m venv venv

# Activate the virtual environment
# On Windows:
venv\Scripts\activate
# On Unix or MacOS:
source venv/bin/activate
```

## 2. Create Environment File
1. Create a new file named `.env` in the root directory of the project
2. Add the following line to the file:
   ```
   GROQ = "your_api_key_here"
   ```
   > **Note**: You can get a free API key by creating an account at [Groq's website](https://www.groq.com/). This is a free-of-charge service as requested for using free models.

## 3. Install Dependencies
You have two options:
- Install from requirements file:
  ```bash
  pip install -r requirements.txt
  ```
- Or run the first cell in the notebook which contains the package installations

After completing these steps, you should be able to run the notebook successfully.

In [None]:
%pip install -qU langchain-text-splitters
%pip install -qU langchain-community langchain-text-splitters "unstructured[md]" nltk
%pip install -qU langchain-openai
%pip install -qU sentence-transformers
%pip install -qU faiss-cpu
%pip install rank_bm25
%pip install -qU anthropic
%pip install langgraph
%pip install langchain-groq
%pip install pygraphviz

# Introduction
I'd like to share my ideas and thought process behind the development of this project:

Through my TDK and [thesis](https://diplomaterv.vik.bme.hu/hu/Theses/Megbeszelesek-szemelyre-szabott-osszegzese-es) work, I gained experience with multi-agent systems focused on creating realistic meeting transcripts. This involved converting unstructured information into detailed summaries, and further transforming those into structured formats like tasks and personal summaries. 

To leverage this previous work, I decided to use my existing meeting transcripts and detailed summaries as the data source for this project. I envisioned a system where these meeting summaries would be processed using a markdown-aware splitter for chunking, then embedded to enable efficient information retrieval. 

Given my thesis experience with multi-step and multi-agent workflows, I wanted to incorporate this approach into the current system too.

![Architecture](./figures/pwc.png)



The system consists of 2 main parts:

User Interaction - This component handles user questions through multiple agents:

- First, an agent analyzes the question and determines the steps needed for answering, following a "planning" approach
- Second, an agent uses the plan, task, and basic contextual information to identify what additional information is needed
- Third, this information is used to query the vector database for relevant content
- Finally, an agent combines all this information to generate the answer


RAG System - This handles information retrieval using both BM25 and FAISS:

- BM25 excels when words align directly
- FAISS captures semantic meaning
I have previous experience with both approaches, making them natural choices for this implementation


# RAG

Load files

In [58]:
from langchain_core.documents import Document
import os

def load_markdown_files(directory_path):
    documents = []
    for filename in os.listdir(directory_path):
        if filename.endswith('.md'):
            try:
                file_path = os.path.join(directory_path, filename)
                # Simple raw text loading
                with open(file_path, 'r', encoding='utf-8') as file:
                    content = file.read()
                    # Create a Document with the raw content and metadata
                    doc = Document(
                        page_content=content,
                        metadata={"source": filename}
                    )
                    documents.append(doc)
                print(f"Successfully loaded {filename}")
            except Exception as e:
                print(f"Error loading {filename}: {str(e)}")
    return documents

# Load all documents
markdown_directory = "data/note/"  
documents = load_markdown_files(markdown_directory)

Successfully loaded note_9.md
Successfully loaded note_14.md
Successfully loaded note_10.md
Successfully loaded note_11.md
Successfully loaded note_8.md
Successfully loaded note_15.md
Successfully loaded note_3.md
Successfully loaded note_7.md
Successfully loaded note_6.md
Successfully loaded note_2.md
Successfully loaded note_5.md
Successfully loaded note_18.md
Successfully loaded note_1.md
Successfully loaded note_4.md
Successfully loaded note_12.md
Successfully loaded note_16.md
Successfully loaded note_17.md
Successfully loaded note_13.md


Splitting the documents

In [70]:
from langchain_text_splitters import MarkdownHeaderTextSplitter, RecursiveCharacterTextSplitter

# Some of the used data does not contain # as for markdown it is needed so i added a basic recursive chunking method which splits on new lines
# This is a cool site to visualize the chunks: i used this and went with 1200: https://chunkviz.up.railway.app/

def smart_split_documents(documents):
    markdown_docs = []
    plaintext_docs = []
    
    # Categorize documents
    for doc in documents:
        if any(line.strip().startswith('#') for line in doc.page_content.split('\n')):
            markdown_docs.append(doc)
        else:
            plaintext_docs.append(doc)
    
    all_splits = []
    
    # Process markdown documents
    if markdown_docs:
        headers_to_split_on = [
            ("#", "Header 1"),
            ("##", "Header 2"),
            ("###", "Header 3"),
            ("####", "Header 4"),
        ]
        markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on)
        
        for doc in markdown_docs:
            try:
                splits = markdown_splitter.split_text(doc.page_content)
                all_splits.extend(splits)
            except Exception as e:
                print(f"Error splitting markdown document: {str(e)}")
    
    # Process plaintext documents
    if plaintext_docs:
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1200,
            chunk_overlap=0,
            length_function=len,
        )
        plaintext_splits = text_splitter.split_documents(plaintext_docs)
        all_splits.extend(plaintext_splits)
    
    print(f"Split {len(markdown_docs)} markdown documents and {len(plaintext_docs)} plaintext documents")
    print(f"Total chunks created: {len(all_splits)}")
    
    return all_splits

# Use the smart splitter
chunks = smart_split_documents(documents)


Split 11 markdown documents and 7 plaintext documents
Total chunks created: 268


Embeddings

In [71]:
from langchain.embeddings import CacheBackedEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.storage import LocalFileStore
from langchain_community.retrievers import BM25Retriever

store = LocalFileStore("./shared_cache/")
# HuggingFace embeddings - I found this - to be honest i dont know too much about this model I used OpenAI embeddings before
embedder = CacheBackedEmbeddings.from_bytes_store(
    HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2"),
    store,
    namespace="hf-miniLM-l6",
    
)

# Create FAISS vectorstore
faiss_db = FAISS.from_documents(chunks, embedder)
faiss_retriever = faiss_db.as_retriever(search_kwargs={"k": 1})

# Create BM25 retriever
bm25_retriever = BM25Retriever.from_documents(chunks, k=1)

In [81]:
# Test query
query = "door handle repair"  

# Get results from both retrievers
faiss_results = faiss_retriever.get_relevant_documents(query)
bm25_results = bm25_retriever.get_relevant_documents(query)

# Print results
print("FAISS Results:")
for i, doc in enumerate(faiss_results, 1):
    print(f"\n--- Result {i} ---")
    print(f"Metadata: {doc.metadata}")
    print(f"Content: {doc.page_content}")

print("\nBM25 Results:")
for i, doc in enumerate(bm25_results, 1):
    print(f"\n--- Result {i} ---")
    print(f"Metadata: {doc.metadata}")
    print(f"Content: {doc.page_content}")

FAISS Results:

--- Result 1 ---
Metadata: {}
Content: VII. Definition of Done  
The team established the following Definition of Done criteria: 1. Code review completion 2. Unit tests with at least 80% coverage 3. Integration tests for critical paths 4. Successful deployment to staging 5. No critical or high-priority bugs 6. API documentation (added by Michael Kim) 7. Accessibility testing to WCAG 2.1 AA standards (added by Liam Foster) 8. Security testing (added by Alex Rodriguez)  
VIII. Sprint Backlog Creation  
The team prioritized the following high-priority items: 1. User authentication 2. Basic database structure for user profiles 3. Basic dashboard structure and activity input forms  
Detailed breakdown for user authentication: - Security requirements: secure password hashing, rate limiting for login attempts, JWT token management with proper expiration, and secure session handling. - Backend implementation (Michael Kim): 3 days, including testing - Frontend implementation (Em

To be honest I'm not too satisfied with the accuracy. 
I did a small testing with it and I found that BM25 works a bit better for the kind of information i searched for. Mainly administrative tasks like pest control or door handle repair.

The tipical RAG pipeline look like these embedding and searching but what i found in my recent work as that the models need to a carefully collected contextual data. Since it is very easy to misslead the models. I can add these as cotnext to my prompts but Im 100 percent sure this wont make perfect results. Also right now the biggest challenge is getting these LLM based systems robust. With inconsistent context collection this just makes things worse. 

One important step i believe is to try to find what informations are needed to be able to answer the question. You will see in the next agentic pipeline I add several contextual discription to the model as context. This helps a lot when talking about robustness. These will go into the XML tags into the prompts. Also there are two steps one for planning and one for finding out what information could be needed to answer.

# Agentic system

### Prompts

In [119]:
PLANNER_PROMPT= """
You are a planning agent in a multi-agent QA pipeline. Your role is to analyze user questions and create structured plans for answering them. You have access to basic context information and work alongside a context collection agent.

Here is the basic context information available to you, these are same that the final anwswer agent will get:
These informations are about the company, the running project and its state as well as information about the employees.
<basic_context>
    <company_data>
    {COMPANY_DATA}
    </company_data>

    <project_general>
    {PROJECT_GENERAL}
    </project_general>

    <employee_profiles>
    {EMPLOYEE_PROFILES}
    </employee_profiles>

    <project_state>
    {PROJECT_STATE}
    </project_state>
</basic_context>

The user has asked the following question:
<user_question>
{USER_QUESTION}
</user_question>

Your task is to create a clear, step-by-step plan for answering this question. Follow these guidelines:

1. Carefully analyze the question to identify its main components and any sub-questions it might contain.

2. Consider what aspects of the question need to be addressed to provide a complete answer.

3. Break down complex questions into manageable parts if necessary.

4. Think about what types of information might be needed to answer each part of the question.

5. Consider how to structure the final answer in a logical and coherent manner.

6. Keep in mind the basic context information provided and how it might be relevant to answering the question.

7. If there are potential limitations or areas where more information might be needed, include steps to address these.

Create your plan as a numbered list of steps, with brief explanations for each step. This plan will guide the answer agent in providing a complete and accurate response.

Begin your response with "Here is the plan to answer the user's question:" and then provide your numbered list of steps. Each step should be concise but clear, explaining what needs to be done and why.

Remember to consider only the information available in the basic context and what could reasonably be inferred from the question itself. Do not assume access to information or capabilities not mentioned in the context.

Provide your plan inside <plan> tags.

"""

In [120]:
CONTEXT_COLLECTOR_PROMPT =  """
You are a context collection agent in a multi-agent QA pipeline. Your role is to identify what specific information needs to be retrieved from the knowledge base to answer the user's question effectively.

These informations are about the company, the running project and its state as well as information about the employees.
These are the information that the final agent will get too. If the information that is need to be able to answer the user question is in this context than you dont need to contaion that in the query, since the next agent will get these as you did.
<basic_context>
    <company_data>
    {COMPANY_DATA}
    </company_data>

    <project_general>
    {PROJECT_GENERAL}
    </project_general>

    <employee_profiles>
    {EMPLOYEE_PROFILES}
    </employee_profiles>

    <project_state>
    {PROJECT_STATE}
    </project_state>
</basic_context>


Here is the user's question:
<user_question>
{USER_QUESTION}
</user_question>

Here is the plan that has been developed to answer this question:
<plan>
{PLAN}
</plan>


Your task is to analyze the user's question and the provided plan to determine what specific information is needed to answer the question completely. Consider the following:

1. What key facts or data points are necessary to provide a comprehensive answer?
2. Are there any domain-specific concepts or terminology that need to be explained?
3. What background information might be required to fully understand the context of the question?
4. Are there any specific details mentioned in the plan that require additional information?

Based on your analysis, create a list of all the required information pieces. Each piece of information should be concise and specific.

Present your response as a comma-separated list of required information pieces. Do not include explanations or justifications for your choices. Simply list the information needed.

For example, your output might look like this:
<output>meeting summary from last sprint, project timeline details, team member responsibilities, budget allocation for Q3, client feedback on prototype</output>

Remember to focus on information that is directly relevant to answering the user's question and executing the provided plan. Avoid listing unnecessary or tangential information.

"""

In [121]:
ANSWER_PROMPT = """
You are an AI agent tasked with generating a correct and comprehensive answer to a user's question. Follow these instructions carefully:

Here is the question you need to answer:
<user_question>
{USER_QUESTION}
</user_question>

1. First, understand your role and the system you're working in:
These informations are about the company, the running project and its state as well as information about the employees.
<basic_context>
    <company_data>
    {COMPANY_DATA}
    </company_data>

    <project_general>
    {PROJECT_GENERAL}
    </project_general>

    <employee_profiles>
    {EMPLOYEE_PROFILES}
    </employee_profiles>

    <project_state>
    {PROJECT_STATE}
    </project_state>
</basic_context>

2. Review the following context, which includes the retrieved context from the knowledge base, and that might be needed to asnwer user question:
<context>
{CONTEXT}
</context>

3. Your task is to create a comprehensive answer. Follow this plan:
{PLAN}

4. Before providing your final answer, use a scratchpad to organize your thoughts and plan your response. Write your planning process inside <scratchpad> tags.

5. After planning, provide your final answer inside <answer> tags. Ensure your answer is clear, well-structured, and directly addresses the user's question while following the provided plan.

As a reminder you need to answer to the following user question:
<user_question>
{USER_QUESTION}
</user_question> 

Remember to incorporate relevant information from the context, maintain clarity and coherence, and stay focused on the specific question asked by the user.
If you cant answer since that information is not in your context window say that "I dont have clear data on that sorry".

"""

### Workflow State

In [122]:
from typing import TypedDict, Dict, Any, List
class QAState(TypedDict):
    #basic context about the company and project
    company_data: str	
    project_general: str	
    employee_profiles: str	
    project_state : str

    #user question
    question : str
    #plan to answer
    plan : str
    #query from collect context agent
    query: str
    #retrieved context from vector db
    retrieved_context: str
    #final answer
    answer: str

In [123]:
def load_markdown_to_str(file_path):
	with open(file_path, 'r', encoding='utf-8') as md_file:
		markdown_content = md_file.read()
	return markdown_content

### Workflow graph

In [129]:
from langgraph.graph import StateGraph, END
from langchain_core.messages import SystemMessage, AIMessage, HumanMessage

class QAPipeline:
    def __init__(self, model):
        self.model = model
        self.state = QAState()
        graph = StateGraph(QAState)


        graph.add_node("plan_to_answer", self.plan_to_answer)
        graph.add_node("collect_context", self.collect_context)
        graph.add_node("query_context", self.query_context)
        graph.add_node("generate_answer", self.generate_answer)

        graph.add_edge("plan_to_answer","collect_context")
        graph.add_edge("collect_context","query_context")        
        graph.add_edge("query_context","generate_answer")
        graph.add_edge("generate_answer",END)
        
        graph.set_entry_point("plan_to_answer")
        self.graph = graph.compile()

        
    def initialize_state(self, user_question : str):
        print("--- INITIALIZE STATE NODE ---")
        self.state = QAState(
            company_data= load_markdown_to_str(os.path.join("data","company-general.md")),
            project_general= load_markdown_to_str(os.path.join("data","project-general.md")),
            employee_profiles= load_markdown_to_str(os.path.join("data","employee-profiles.md")),
            project_state= load_markdown_to_str(os.path.join("data","project-state.md")),

            question = user_question,

            plan = "",
            query = "",
            retrieved_context = "",
            answer = ""

        )
        return self.state
        
    def plan_to_answer(self, state: QAState) -> QAState:
        print("--- PLAN TO ANSWER NODE ---")
        formatted_prompt = PLANNER_PROMPT.format(
            COMPANY_DATA = self.state.get("company_data"),
            PROJECT_GENERAL = self.state.get("project_general"),
            EMPLOYEE_PROFILES = self.state.get("employee_profiles"),
            PROJECT_STATE = self.state.get("project_state"),
            USER_QUESTION = self.state.get("question"),
        )
        messages = [
            HumanMessage(content=formatted_prompt)
        ]
        response = self.model.invoke(messages)
        
        self.state["plan"] = response.content
        print(self.state.get("plan"))
        return self.state
    
    def collect_context(self, state : QAState) -> QAState:
        print("--- collect context node ---")
        formatted_prompt = CONTEXT_COLLECTOR_PROMPT.format(
            COMPANY_DATA = self.state.get("company_data"),
            PROJECT_GENERAL = self.state.get("project_general"),
            EMPLOYEE_PROFILES = self.state.get("employee_profiles"),
            PROJECT_STATE = self.state.get("project_state"),
            USER_QUESTION = self.state.get("question"),
            PLAN = self.state.get("plan")
        )
        messages = [
            HumanMessage(content=formatted_prompt)
        ]
        response = self.model.invoke(messages)
        self.state["query"] = response.content
        print(self.state.get("query"))
        return self.state

    
    def query_context(self, state: QAState):
        print("--- QUERY CONTEXT ---")
        
        # Split the query into a list of strings and clean them
        queries = [q.strip() for q in self.state.get("query").split(",")]
        print(f"Processing queries: {queries}")
        
        all_retrieved_contexts = []
        
        # Process each query separately
        for query in queries:
            print(f"\nRetrieving context for: '{query}'")
            
            # Get results from both retrievers
            faiss_results = faiss_retriever.get_relevant_documents(query)
            bm25_results = bm25_retriever.get_relevant_documents(query)
            
            # Format FAISS results
            faiss_context = "\n=== FAISS Results ===\n"
            for doc in faiss_results:
                faiss_context += f"\nSource: {doc.metadata.get('source', 'Unknown')}\n"
                faiss_context += f"{doc.page_content}\n"
                faiss_context += "-" * 50 + "\n"
            
            # Format BM25 results
            bm25_context = "\n=== BM25 Results ===\n"
            for doc in bm25_results:
                bm25_context += f"\nSource: {doc.metadata.get('source', 'Unknown')}\n"
                bm25_context += f"{doc.page_content}\n"
                bm25_context += "-" * 50 + "\n"
            
            # Combine results for this query
            query_context = f"\n=== Query: '{query}' ===\n"
            query_context += faiss_context + bm25_context
            
            all_retrieved_contexts.append(query_context)
        
        # Combine all retrieved contexts
        self.state["retrieved_context"] = "\n".join(all_retrieved_contexts)
        
        print(f"\nTotal retrieved context length: {len(self.state['retrieved_context'])} characters")
        return self.state
    
    def generate_answer(self, state : QAState) -> QAState:
        print("--- generating answer ---")
        formatted_prompt = ANSWER_PROMPT.format(
            USER_QUESTION = self.state.get("question"),
            COMPANY_DATA = self.state.get("company_data"),
            PROJECT_GENERAL = self.state.get("project_general"),
            EMPLOYEE_PROFILES = self.state.get("employee_profiles"),
            PROJECT_STATE = self.state.get("project_state"),
            CONTEXT = self.state.get("retrieved_context"),
            PLAN = self.state.get("plan")

        )
        messages = [
            HumanMessage(content=formatted_prompt)
        ]
        response = self.model.invoke(messages)
        self.state["answer"] = response.content
        print(self.state.get("answer"))
        return self.state
    
    def run(self, user_question: str):
        print("--- RUN ---")
        self.initialize_state(user_question)
        self.graph.invoke(self.state)
        return self.state



In [130]:
from  dotenv import load_dotenv
from IPython.display import Image
from langchain_groq import ChatGroq
import os

load_dotenv()
groq_api_key = os.getenv("GROQ")

model = ChatGroq(
    model="llama-3.3-70b-versatile",  # Replace with the exact model name if different
    temperature=0.1,               # Adjust temperature as needed
    max_tokens=2048,               # Set max tokens based on your requirements
    timeout=30,                    # Set timeout as needed
    max_retries=3,                  # Set max retries for API calls
    api_key = groq_api_key
)
agent = QAPipeline(model=model)
#Image(agent.graph.get_graph().draw_png())

Running example:

In [134]:
agent.run("How is responsible for backend development?")

--- RUN ---
--- INITIALIZE STATE NODE ---
--- PLAN TO ANSWER NODE ---
Here is the plan to answer the user's question:
<plan>
1. Identify the key components of the question, which are "responsible" and "backend development", to understand that the question is asking about the person or role in charge of backend development.
2. Review the basic context information provided, specifically the <employee_profiles> section, to find details about the roles and responsibilities of the employees at TechNova Solutions.
3. Look for the employee profile that mentions "Backend Developer" or responsibilities related to backend development, such as server-side logic, database management, or API development.
4. Extract the name and relevant details of the employee responsible for backend development to provide a direct answer to the question.
5. Consider if there are any additional details from the context that might be relevant to the answer, such as the technology stack used for backend development.


{'company_data': '# TechNova Solutions\n\n## Company Overview\nTechNova Solutions is a small, dynamic IT company specializing in web application development. With a team of 6 skilled professionals, they focus on creating innovative, user-friendly web solutions for small to medium-sized businesses.\n\n## Current Project: HealthTrack Pro\nTechNova is developing HealthTrack Pro, a comprehensive web application for personal health management. This application allows users to track their daily activities, nutrition, and health metrics, and provides insights and recommendations for a healthier lifestyle.\n\n## Team Structure\n1. ** Sarah Chen - Project Manager / Scrum Master**\n   - Oversees project progress, manages timelines, and facilitates communication\n   - Has a background in both frontend and backend development\n\n2. ** Alex Rodriguez - Senior Full-Stack Developer**\n   - Leads technical decisions and architecture design\n   - Proficient in both frontend and backend technologies\n\n

In [136]:
#<answer>
#Michael Kim is responsible for backend development at TechNova Solutions. As the Backend Developer, his responsibilities include developing and maintaining server-side logic, database management, and API development, utilizing technologies such as Node.js, Express.js, and PostgreSQL **
#</answer>

The pipeline successfully answered the question. In this case, while the planning agent might seem a little overkill, I found that generating a dynamic plan based on the circumstances can significantly enhance a pipeline's effectiveness. I implemented this approach in my thesis when working on meeting summarization. The key question was: How can we specify what the model should focus on when meetings constantly change? This led to the solution of implementing a planning agent.

This planning approach is valuable because we know that answers improve when we include structured planning steps. However, this only works effectively with sufficiently large models. There are many more advanced techniques available, such as reflection, tool use, and various prompting techniques like Step-Back, Chain of Thought (CoT), or ReAct, that could be used if the use case requires it.


What I found during my research back at uni was examining, what context is needed to execute tasks and what steps are required. Instead of using a single context window or conversation and adding steps to it (which becomes more expensive and unreliable), I implemented a multi-step approach using the QAPipeline class with langgraph. Consists of several steps: getting a plan, and getting the context. These are the 2 most important and then these informations are added to the final agent which answers the question. Hopefully with approach the reliablilty increases.

Currently, one of the biggest challenges is making LLM-based systems reliable, and overwhelming the model with long conversations is counterproductive. I also believe that the common interface to these systems could be improved. Since system reliability depends heavily on the question itself, we shouldn't allow users to type whatever they want. Anyway Im just no too convinced about chatbot interfaces, I think regular people can't use this as effectively compared to someone who have bunch of exprerience. 

In business settings, there are typically well-defined workflows, and instead of letting users ask free-form questions, I would prefer and approch where users are provided with buttons for complex task execution. When implementing these systems for professionals, we should let them determine the necessary context, let the system process it and do the job, and then have the employee (Human-In-The-Loop) verify the results, rather than having them type out workflow instructions. These are just my personal observations on the matter. 

Anyway, these are some of my thoughts after these intensive few months of writing my TDK and finishing up my Thesis. If you have time to look inside of that, I would happily share more details.
Cheers