<a href="https://colab.research.google.com/github/adammuhtar/semantic-information-retrieval/blob/main/notebooks/sierra-llama3-groq-crewai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **SIERRA ⛰️ with crewAI 🚣‍♀️🚣‍♂️**

---

*- by Adam Muhtar*

This notebook details steps to run an end-to-end Agentic RAG framework, i.e. SIERRA (Semantic Information Encoding, Retrieval, and Reasoning Agents), powered by [crewAI](https://www.crewai.com/). At its core, this framework is a simple crew of AI agents to enhance report generation process of a technical subject like liquidity risk management

1. **Semantic Search and Contextual Compression**:
   We first run a semantic search on a predefined corpus to retrieve information pertinent to a user's specific request. This process involves understanding the query's context, recognising key concepts, and pulling relevant documents or excerpts that align closely with the user's needs.

2. **Document Drafting**:
   Once the relevant data is gathered, an AI agent uses this information to compose a well-structured briefing document. The document is organised with an overview section that summarises the key findings, followed by detailed sections that delve deeper into specific research findings. This agent ensures that all critical information is covered effectively and is presented in a logical, coherent manner.

3. **Critical Review**:
   The next AI agent reviews the drafted document critically. It assesses both the content and the structure of the report, providing detailed feedback on various aspects such as clarity, completeness, relevance, and coherence. This critique includes specific suggestions for improvement, such as areas where additional information is needed, where simplification might be beneficial, or where the argument needs stronger supporting evidence.

4. **Report Refinement**:
   Following the critique, the initial drafting AI agent or another specialised agent rewrites the report, incorporating the feedback received. This involves adjusting the content where necessary, enhancing the clarity and flow of the information, and ensuring that all sections of the document now align more closely with the best practices of report writing and the specific demands of the briefing's audience.

This AI-driven workflow leverages the capabilities of multiple pre-defined agents with various specialisms, each focusing on different aspects of the process to ensure the final output is of high quality and meets the user's specific needs. The collaboration between these AI agents mimics an iterative approach typically seen in human teams, providing a way forward to incorporate these systems as knowledge worker co-pilots.

![vault-crew](./img/vault-crew-2.webp)

## **Table of Contents**

* [1. Notebook setup](#section-1)
* [2. Download corpus and define text pre-processing functions](#section-2)
* [3. Setup semantic search with contextual compressor](#section-3)
* [4. Define agents](#section-4)
* [5. Define tasks](#section-5)
* [6. Define crew](#section-6)

## 1. Notebook Setup <a id="section-1"></a>

This notebook is run using [Google Colab](https://colab.research.google.com/) - Google's implementation of [Jupyter Notebooks](https://jupyter.org/). This notebook will require the following package(s) to be installed:
* `crewai`
* `faiss-cpu`
* `langchain`
* `pymupdf`
* `python-dotenv`
* `rank_bm25`
* `sentence-transformers`

Running this Colab notebook will require hardware accelerators to access higher RAM runtimes; this instance runs on the Tesla T4 GPU (16 GB GDDR6 @ 320 GB/s) provided for free by Google. Additionally, we will be using the [Llama 3](https://llama.meta.com/llama3/) hosted on [Groq](https://groq.com/) as the AI agents.

In [None]:
# Check IP address details if there are restrictions running non-local servers
!curl ipinfo.io

{
  "ip": "35.223.96.102",
  "hostname": "102.96.223.35.bc.googleusercontent.com",
  "city": "Council Bluffs",
  "region": "Iowa",
  "country": "US",
  "loc": "41.2619,-95.8608",
  "org": "AS396982 Google LLC",
  "postal": "51502",
  "timezone": "America/Chicago",
  "readme": "https://ipinfo.io/missingauth"
}

In [None]:
# Query GPU device status/details
!nvidia-smi

Wed May  8 21:00:37 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   38C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [None]:
# Install dependencies
!pip install --quiet --upgrade crewai faiss-cpu langchain langchain-groq pymupdf rank_bm25 sentence_transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.6/61.6 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m42.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m867.6/867.6 kB[0m [31m49.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.5/3.5 MB[0m [31m31.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m171.5/171.5 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.4/194.4 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m314.1/314.1 kB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.1/60.1 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━

In [None]:
# Standard library imports
import logging
from pathlib import Path
import re
import requests
from typing import List
import textwrap
from urllib.parse import urlparse

# Third party imports
from crewai import Agent, Crew, Process, Task
from google.colab import userdata
from langchain.retrievers import BM25Retriever, EnsembleRetriever, ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain.embeddings import HuggingFaceBgeEmbeddings
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_groq import ChatGroq
import torch
from tqdm import tqdm

In [None]:
# Configure logging
logging.basicConfig(
    level=logging.WARNING, format="%(asctime)s - %(levelname)s - %(message)s"
)

# Load config files
try:
    # Instantiate LLM
    llm = ChatGroq(
        groq_api_key=userdata.get("Groq"),
        model="llama3-70b-8192",
        temperature=0
    )
    logging.info("Groq API credentials successfully loaded")
except FileNotFoundError:
    logging.error(f"Colab secrets not found")
    raise
except KeyError:
    logging.error(f"API credentials in Colab secrets not formatted correctly")
    raise
except Exception as e:
    logging.error(f"An error occurred while loading the API key: {e}")
    raise

## 2. Download corpus and define utility functions <a id="section-2"></a>

This notebook makes use of several of the most recently available Pillar 3 reports from Lloyds Banking Group:
* [Lloyds Banking Group plc 2023 Year-End Pillar 3 Disclosures](https://www.lloydsbankinggroup.com/assets/pdfs/investors/financial-performance/lloyds-banking-group-plc/2023/q4/2023-lbg-fy-pillar-3.pdf)
* [Lloyds Banking Group plc 2022 Year-End Pillar 3 Disclosures](https://www.lloydsbankinggroup.com/assets/pdfs/investors/financial-performance/lloyds-banking-group-plc/2022/full-year/2022-lbg-fy-pillar3.pdf)
* [Lloyds Banking Group plc 2021 Year-End Pillar 3 Disclosures](https://www.lloydsbankinggroup.com/assets/pdfs/investors/financial-performance/lloyds-banking-group-plc/2021/q4/2021-lbg-fy-pillar3.pdf)
* [Lloyds Banking Group plc 2020 Year-End Pillar 3 Disclosures](https://www.lloydsbankinggroup.com/assets/pdfs/investors/financial-performance/lloyds-banking-group-plc/2020/full-year/2020-lbg-fy-pillar-3.pdf)

We manually load the PDF files in a folder named "pillar_3" within the same directory where the script is executed.

In [None]:
# Function to pre-process multi-line string into a single line string
def clean_text(input_string: str) -> str:
    """
    Process a multi-line string to remove leading and trailing whitespace,
    replace newline characters with spaces, and collapse multiple spaces into a
    single space.

    Arg:
        input_string (`str`): The multi-line string to be processed.

    Returns:
        `str`: The processed string.
    """
    return re.sub(" +", " ", input_string.strip().replace("\n", " "))

# Function to pre-process text into clean plain text
def preprocess_text(
    text: str,
    encoding: bool = True,
    lowercase: bool = False,
    remove_newlines: bool = True
) -> str:
    """
    Takes in a string and removes newline characters, tab characters, excess
    whitespaces, as well as regularizing common unicode characters.

    Args:
        * text (`str`): Text to pre-process
        * encoding (`bool`): Convert non UTF-8 characters to UTF-8. Default is
        `True`.
        * lowercase (`bool`): Returns the processed string in lowercase if set
        to `True`. Default is `False`.
        * remove_newlines (`bool`): Removes all newline characters in string.
        Default is `True`.

    Returns:
        * `str`: Pre-processed text
    """
    # Fix apostrophes/quotation marks
    _text = re.sub("[‘’]", "'", text)
    _text = re.sub("[“”]", '"', _text)

    if encoding:
        _text = re.sub("(&\\\\#x27;|&#x27;)", "'", _text)

    # Remove newlines, tabs, non-breaking spaces, excess backslashes/whitespaces
    if remove_newlines:
        _text = re.sub("[\n\r]+", " ", _text)
    _text = re.sub("[\t\xa0]+", " ", _text)
    _text = re.sub(r"\\+", "", _text)
    _text = re.sub(r"\s+", " ", _text).strip()

    if lowercase:
        _text = _text.lower()

    return _text

# Function to wrap text while preserving newlines
def wrap_with_newline(text: str, width: int = 80) -> str:
    """
    Wrap text to a specified width while preserving newlines.

    Args:
        * text (`str`): The text to wrap.
        * width (`int`): The maximum width of each line. Default is 80.

    Returns:
        * `str`: The wrapped text.
    """
    lines = text.split("\n")
    wrapped_lines = [textwrap.fill(line, width) for line in lines]
    return "\n".join(wrapped_lines)

In [None]:
# Extract text from PDFs
pillar_3_dir = Path.cwd() / "pillar_3"
docs = []
for pdf in tqdm(pillar_3_dir.rglob("*.pdf"), desc="Processing PDFs", unit="PDF"):
    docs.extend(PyMuPDFLoader(str(pdf)).load())

Processing PDFs: 4PDF [00:03,  1.27PDF/s]


## 3. Setup semantic search with contextual compressor <a id="section-3"></a>

We then create a vector database of our corpus by creating sentence-level embeddings from extracted texts. This allows us to:
* encode extracted texts from documents as vector embeddings.
* store these embeddings and their associated metadata.
* perform semantic similarity searches on these embeddings.

For this step, we use the BAAI General Embedding (BGE) model, based on this model checkpoint: https://huggingface.co/BAAI/bge-large-en-v1.5. At the time of writing, the BGE models are among the top performing models in the Hugging Face [Massive Text Embedding Benchmark (MTEB)](https://huggingface.co/spaces/mteb/leaderboard) leaderboard.

From the retrieved information, we feed the information and the user's query to an LLM ([OpenAI's GPT-4](https://openai.com/research/gpt-4)) to act as a filter and removing any retrieved information that is unnecessary. This 'compresses' the context provided to the downstream agents and removing potentially redundant information from being used as part of the final output.

In [None]:
# Setup encoder for semantic search
device = (
    "mps" if torch.backends.mps.is_built()
    else "cuda" if torch.cuda.is_available()
    else "cpu"
)
bge_embeddings = HuggingFaceBgeEmbeddings(
    model_name="BAAI/bge-large-en-v1.5",
    model_kwargs={"device": device},
    encode_kwargs={"normalize_embeddings": True}
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/779 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

In [None]:
# User's query input
query = "What is the bank's contingency funding plans?"

# Setting up an ensemble retriever with BM25 and FAISS
faiss_retriever = FAISS.from_documents(
    documents=docs, embedding=bge_embeddings
).as_retriever(search_kwargs={"k": 10})
bm25_retriever = BM25Retriever.from_documents(docs)
bm25_retriever.k = 10
ensemble_retriever = EnsembleRetriever(
    retrievers=[faiss_retriever, bm25_retriever], weights=[0.8, 0.2]
)

# Setting up contextual compression pipeline
extractor = LLMChainExtractor.from_llm(llm)
compressed_retriever = ContextualCompressionRetriever(
    base_compressor=extractor, base_retriever=ensemble_retriever
)
retrieved_info = compressed_retriever.get_relevant_documents(query)

# Reformat retrieved information into a long string
compressed_context = ""
for i, doc in enumerate(retrieved_info, start=1):
    compressed_context += f"Source {i}: " + doc.metadata["title"] + "\n"
    compressed_context += "Page: " + str(doc.metadata["page"]) + "\n"
    compressed_context += "Content: " + preprocess_text(doc.page_content) + "\n\n"

print(wrap_with_newline(compressed_context))

  warn_deprecated(


Source 1: Pillar 3 disclosures
Page: 120
Content: Here is the extracted relevant part of the context: "An outline of the
bank`s contingency funding plans The Group maintains a Contingency Liquidity
Framework as part of the wider Recovery Plan which is designed to identify
emerging liquidity concerns at an early stage, so that mitigating actions can be
taken to avoid a more serious crisis developing. The Contingency Framework has a
foundation of robust and regular monitoring and reporting of KPIs, EWIs and Risk
Appetite by both GCT and Risk up to and including Board level. Where movements
in any of these metrics and indicator suites point to a potential issue, SME
teams and their Directors will escalate this information as appropriate."

Source 2: Pillar 3 disclosures
Page: 122
Content: Here is the extracted relevant part of the context: "An outline of the
bank`s contingency funding plans The Group maintains a Liquidity Contingency
Framework as part of the wider Recovery Plan which is d

## 4. Define agents <a id="section-4"></a>

We use the crewAI framework to define personas of various AI agents, each with their own respective backstories and goals to achieve.

In [None]:
class LiquidityRiskAgents:
    def __init__(self):
        self.llm = llm

    def report_agent(self):
        goal_text = """
        Based on all information provided, conduct a through review of all Near
        Earth Object (NEO)-related findings, providing an in-depth report of
        their key findings and outputs detailing their risks and impacts. You
        should always cite your sources.
        """
        backstory_text = """
        You are the central bank's chief liquidity risk officer, ensuring the
        central bank governor is briefed with all the most important information
        relating to risks posed by liquidity risk management issues such as
        liquidity coverage ratio (LCR) and net stable funding ratio (NSFR).
        """
        return Agent(
            role="Liquidity Risk Report Writer",
            goal=clean_text(goal_text),
            backstory=clean_text(backstory_text),
            llm=self.llm,
            verbose=True,
            allow_delegation=False
        )

    def reviewer_agent(self):
        goal_text = """
        Based on the report and source information retrieved, conduct a through
        review of the report about the risks liquidity risk-related findings.
        The detailed feedback could include aspects such as clarity, completeness,
        relevance, and coherence. The output should be a critique and points on
        ways to improve the report. Do not make up new facts, only use facts from
        the source information retrieved.
        """
        backstory_text = """
        You are the central bank's chief liquidity risk officer, ensuring the all
        reports containing information relating to liquidity risks such as liquidity
        coverage ratio (LCR) and net stable funding ratio (NSFR) are accurate
        from the information provided.
        """
        return Agent(
            role="Liquidity Risk Report Reviewer",
            goal=clean_text(goal_text),
            backstory=clean_text(backstory_text),
            llm=self.llm,
            verbose=True,
            allow_delegation=False
        )

    def rewriter_agent(self):
        goal_text = """
        Rewrite the report to incorporate the critiques from the original report.
        Do not make up new facts, only use facts from the source information
        retrieved.
        """
        backstory_text = """"
        You are the central bank's chief liquidity risk officer, ensuring the
        central bank governor is briefed with all the most important information
        relating to risks posed by liquidity risk management issues such as
        liquidity coverage ratio (LCR) and net stable funding ratio (NSFR).
        """
        return Agent(
            role="Liquidity Risk Report Writer",
            goal=clean_text(goal_text),
            backstory=clean_text(backstory_text),
            llm=self.llm,
            verbose=True,
            allow_delegation=False
        )

## 5. Define tasks <a id="section-5"></a>

We then use the crewAI framework to define the various tasks that needs to be performed by the various agents, through pre-defined task and desired output descriptions.

In [None]:
class ResearchTasks:
    def write_report(self, agent, query, retrieved_info):
        task_desc = clean_text(
            f"""
            For the given question '{query}', Compile all the research findings
            into a comprehensive briefing document. Ensure this document contains
            all the relevant entities and technical information provided from
            research, delimited by triple backticks:
            """
        )
        task_desc += "\n\n" + f"```{retrieved_info}```"
        output_desc = clean_text(
            """
            A well-structured briefing document that includes sections for
            the overview, detailed information for the various research findings.
            """
        )
        return Task(
            description=task_desc,
            agent=agent,
            expected_output=output_desc
        )

    def critique_report(self, agent, retrieved_info):
        task_desc = clean_text(
            f"""
            Write a critique of the original reports based on the source
            information provided. The detailed feedback could include aspects
            such as clarity, completeness, relevance, and coherence. Do not make
            up new facts, only use facts from the source information retrieved,
            delimited by triple backticks:
            """
        )
        task_desc += "\n\n" + f"```{retrieved_info}```"
        output_desc = clean_text(
            """
            A series of points that lists detailed pointers of how to improve
            the report based on the facts.
            """
        )
        return Task(
            description=task_desc,
            agent=agent,
            expected_output=output_desc,
            context=[write_report]
        )

    def rewrite_report(self, agent, retrieved_info):
        task_desc = clean_text(
            f"""
            Rewrite the report, utilising all the critique points and research
            findings into an updated comprehensive briefing document. Ensure this
            document sticks to source information retrieved, delimited by triple
            backticks:
            """
        )
        task_desc += "\n\n" + f"```{retrieved_info}```"
        output_desc = clean_text(
            """
            A well-structured briefing document that includes sections for
            the overview, detailed information for the various research findings
            with citations where those information came from, where necessary.
            Do not repeat the original report or the source information.
            """
        )
        return Task(
            description=task_desc,
            agent=agent,
            expected_output=output_desc,
            context=[critique_report]
        )

## 6. Define crew <a id="section-6"></a>

Finally, we assemble the crew of agents and map them to their respective tasks. We then kickoff the process and print out the outputs generated by the crew. An audit trail of the outputs of the various crews are printed and the final output is shown in the final cell.

In [None]:
# Create Agents
agents = LiquidityRiskAgents()
report_agent = agents.report_agent()
reviewer_agent = agents.reviewer_agent()
rewriter_agent = agents.rewriter_agent()

# Create Tasks
tasks = ResearchTasks()
write_report = tasks.write_report(
    agent=report_agent, query=query, retrieved_info=compressed_context
)
critique_report = tasks.critique_report(
    agent=reviewer_agent, retrieved_info=compressed_context
)
rewrite_report = tasks.rewrite_report(
    agent=rewriter_agent, retrieved_info=compressed_context
)

# Create Crew
boe_crew = Crew(
    agents=[report_agent, reviewer_agent, rewriter_agent],
    tasks=[write_report, critique_report, rewrite_report],
    process=Process.sequential,
    verbose=True
)

# Run the Crew
result = boe_crew.kickoff()

[1m[95m [DEBUG]: == Working Agent: Liquidity Risk Report Writer[00m
[1m[95m [INFO]: == Starting Task: For the given question 'What is the bank's contingency funding plans?', Compile all the research findings into a comprehensive briefing document. Ensure this document contains all the relevant entities and technical information provided from research, delimited by triple backticks:

```Source 1: Pillar 3 disclosures
Page: 120
Content: Here is the extracted relevant part of the context: "An outline of the bank`s contingency funding plans The Group maintains a Contingency Liquidity Framework as part of the wider Recovery Plan which is designed to identify emerging liquidity concerns at an early stage, so that mitigating actions can be taken to avoid a more serious crisis developing. The Contingency Framework has a foundation of robust and regular monitoring and reporting of KPIs, EWIs and Risk Appetite by both GCT and Risk up to and including Board level. Where movements in any of t

In [None]:
# Final output of the crew
print(wrap_with_newline(result))

**Liquidity Risk Management Briefing Document**

**Overview**

This briefing document provides an overview of the bank's contingency funding
plans, which are an essential component of its overall liquidity risk management
strategy. The bank's contingency funding plans are designed to identify emerging
liquidity concerns at an early stage, enabling the bank to take mitigating
actions to avoid a crisis developing.

**Contingency Funding Plans**

The bank maintains a Contingency Liquidity Framework and a Liquidity Contingency
Framework as part of its wider Recovery Plan. These frameworks have a foundation
of robust and regular monitoring and reporting of key performance indicators
Chief Treasury (GCT) and Risk up to and including Board level (Source 1, Page
120; Source 2, Page 122).

The Contingency Frameworks are designed to identify emerging liquidity concerns
at an early stage, enabling the bank to take mitigating actions to avoid a
crisis developing. The frameworks involve regular mon