<a href="https://colab.research.google.com/github/RDGopal/IB9LQ0-GenAI/blob/main/8_3_simple_multi_agent_system.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Simple Multi-Agent System
In this tutorial we will put together a simple multi-agent system designed to research AI.

We will introduce a new tool for this system ... internet search via DuckDuckGo (a search engine).

Note, we will use the bigger version of Falcon (10bil parameters) because otherwise the results will be ... underwhelming.

We start with the usual code we have seen all session:

In [None]:
!pip install -q transformers accelerate bitsandbytes duckduckgo-search

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from duckduckgo_search import DDGS
from google.colab import userdata
import os

# If you haven't already added the secret in Colab, here's how:
# Runtime > Secrets > Set 'HF_TOKEN' with your Hugging Face API token

hf_api_token = userdata.get('HF_TOKEN')  # Make sure the secret is set in Colab

# Set the token if it's retrieved correctly, else print an error
if hf_api_token:
    os.environ["HUGGINGFACEHUB_API_TOKEN"] = hf_api_token
else:
    print("Hugging Face API Token is missing. Please set it in Colab Secrets.")

# Load a small instruction-following model

# model_id = "tiiuae/falcon-7b-instruct" # faster but less accurate
model_id = "tiiuae/Falcon3-10B-Instruct" # this will be quite slow

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16,
    offload_buffers=True, # offloads stuff like attention scores to CPU to free up GPU
    low_cpu_mem_usage=True
    # low_cpu_mem_usage loads only parts of the module into RAM at a given moment in
    # time, thus meaning we don't need to fill up RAM with the whole model
)

# we also define a pipeline as well will be repeatedly querying the LLM model
pipe = pipeline(
       "text-generation",
       model=model,
       tokenizer=tokenizer,
       device_map="auto",
       torch_dtype=torch.float16
   )

Now we can define our tool - DuckDuckGo search (via the DDGS function of the DuckDuckGo-Search Python package)

In [None]:
# DuckDuckGo tool
def search_duckduckgo(query): # pass user query to DuckDuckGo
    with DDGS() as ddgs:

        # query using the user query and get top 3 results
        results = ddgs.text(query, max_results=3)

    # combine the top three results (just the meta description)
    return "\n".join([r["body"] for r in results])

With our tool setup we can now create our agent. We will follow the OOP-style approach of creating an object from a class template (we discussed this in ADA).

In [None]:
class ResearchAgent:
    # rules to apply when instantiating an object of this class
    def __init__(self, name, goal, llm, tools):
        self.name = name # name of this object/agent
        self.goal = goal # the agent's goal
        self.llm = llm # the LLM we installed
        self.tools = tools # the tools it can use (just DuckDuckGo in this case)

    def run_task(self, query):
        # extract context by using the search tool
        context = self.tools["search"](query)

        # prompt template
        prompt = f"""
          You are a {self.name}.
          Your goal is: {self.goal}
          Using the following information from web search:
          {context}
          Write a comprehensive and professional report on the goal.
          Please write no more than 200 words.
          Do not repeat yourself.
        """

        # rules for how to create the response
        response = self.llm(prompt, # the prompt above
                            # only return the answer not the prompt (full text)
                            return_full_text=False,
                            # maximum number of tokens to return
                            max_new_tokens=256,
                            # sample when decoding the response (tokens -> to text)
                            # this allows the decoder to be a bit more creative when
                            # generating the output by referring to the whole probability
                            # distribution of the output rather than just the most likely.
                            do_sample=True,
                            # temperature controls how much creativity the LLM can have
                            # higher is more creativity
                            # [0] means return just the main response
                            # extract from this the "generated_text" item
                            temperature=0.8)[0]["generated_text"]
        return response

Now we have set up the Class template we can create an object of this type.

In [None]:
# Set up agent and tools

# first create the tools list - just one tool
tools = {"search": search_duckduckgo}

# now create an object of the ResearchAgent Class template. As per the __init__ rules,
# it must have a name, a goal, an LLM connection (our pipeline) and the toolset.
agent1 = ResearchAgent(
    name="AI Trend Analyst",
    goal="Summarize the top 2024 AI breakthroughs",
    llm=pipe,
    tools=tools
)

And then let's put the agent to work!

In [None]:
# Run the agent
agent1_report = agent1.run_task("AI breakthroughs in 2024")
print("=== FINAL REPORT ===\n")
print(agent1_report)

Assuming you used the (slow) 10bil parameter model the response should be quite good! Now let's build some more agents:

In [None]:
agent2 = ResearchAgent(
    name="AI Data Scientist",
    goal="Identify breakthrough AI technologies and research papers published in 2024",
    llm=pipe,
    tools={"search": search_duckduckgo}
)

agent3 = ResearchAgent(
    name="AI Ethics Officer",
    goal="Review ethical concerns and AI regulations that emerged in 2024",
    llm=pipe,
    tools={"search": search_duckduckgo}
)

Now we have two more agents doing slightly different searches to support slightly different goals. We should get three unique reports!

However, reading three reports sounds very boring. What we really want is another agent that can summarise all three individual reports into just one report. Because AI should help us be more lazy:

In [None]:
class SummariserAgent:
    def __init__(self, name, role, llm):
        self.name = name
        self.role = role
        self.llm = llm

    def summarise_reports(self, reports):
        # Combine the reports from other agents
        combined_reports = "\n".join(reports)

        prompt = f"""
          Your goal is to summarize the following reports into a cohesive, professional summary:

          {combined_reports}

          The report should be at least three paragraphs in length.
        """

        response = self.llm(prompt,
                              return_full_text=False,
                              # double the amount of tokens 256 for the research agents
                              # 512 for our summariser
                              max_new_tokens=512,
                              do_sample=True,
                              temperature=0.8)[0]["generated_text"]
        return response

And again we can create an object of this Class template:

In [None]:
summariser = SummariserAgent(
    name="AI Research Summarizer",
    role="Senior AI Research Analyst",
    llm=pipe
)

Now we can call research agents #2 and #3:

In [None]:
torch.cuda.empty_cache() # empty the cache - rolling memory in RAM to avoid overload

# hashed out as we ran this earlier

# Task 1: AI Researcher
#agent1_report = agent1.run_task("AI breakthroughs in 2024")
#print(f"\n[{agent1.name}] Report:\n{agent1_report}\n")

# torch.cuda.empty_cache()

# Task 2: AI Data Scientist
agent2_report = agent2.run_task("AI research papers in 2024")
print(f"\n[{agent2.name}] Report:\n{agent2_report}\n") # print report

torch.cuda.empty_cache()

# Task 3: AI Ethics Officer
agent3_report = agent3.run_task("AI ethics and regulations in 2024")
print(f"\n[{agent3.name}] Report:\n{agent3_report}\n") # print report

torch.cuda.empty_cache()

Again, decent looking work (assuming 10bil). Let's summarise!

In [None]:
# 🧠 Summarise all the reports
reports = [agent1_report, agent2_report, agent3_report]
final_report = summariser.summarise_reports(reports)

print(f"\nSummary Report:\n{final_report}\n")

torch.cuda.empty_cache()

A bit slow, but a good overall summary!

### __TASK__:
Can you create another agent that checks the work of the researchers and asks them to rewrite if the response is not good? What would this agent class look like? How could we create the flow of work to complete these checks?

You don't necessarily need to write this code in full but think about what it would look like.