<a href="https://colab.research.google.com/github/MJMortensonWarwick/AIS-AI-Agents/blob/main/simple_multi_agent_system.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Simple Multi-Agent System


## DO THIS FIRST!!
You will need to use a Hugging Face token to make this work. Follow these steps:
1. Got to https://huggingface.co/
2. Click "Sign Up" in the top right corner.
3. Do the usual account sign up steps.
4. Make sure you go to your email and click on the link to confirm your account.
5. Once logged-in, click on your icon in the top right corner and select "Access tokens" (right at the bottom of the menu).
6. Click "+ Create new token".
7. Give your token a name and then scroll to the bottom to click "Create". You can ignore all the other options.
8. Copy your token secret ("hf_...") and save it somewhere on your machine (e.g. Word or Notepad).
9. Back in Colab, click on the key icon on the left hand side.
10. Click on "+ Add new secret".
11. Give the new secret the Name HF_TOKEN (please copy exactly this name).
12. Paste in your token secret from step 8 as the Value.
13. Make sure Notebook access is slid to the right. If done it will go blue and show a tick.
14. Now select Runtime > Run all (it won't be quick)
15. Read on!

## Tutorial Starts Here
In this tutorial we will put together a simple multi-agent system designed to research AI.

We will introduce a new tool for this system ... internet search via DuckDuckGo (a search engine).

We will use a Small Languange Model ... [Microsoft Phi](https://azure.microsoft.com/en-us/products/phi). Phi is an example of a Small Language Model (SLM), although the largest of this family is 14 billion parameters (which feels pretty big to me but maybe that's because I'm old). SLMs are often used in the agent space as they can be downloaded (as we'll do here), trained on your own data and objectives, kept private, and can be run more easily on commercial hardware. Quite often we don't need all of the knowledge and skills that a LLM may have, so its a good trade-off.

The model we use will be the mini version. Really its too small to do the task we will give it particularly well, but you could easily update this code to use a more powerful model on the HuggingFace model hub if you want to build your own agents (ideally look for "Instruct" models that are fine tuned to follow instructions). It will just take more time to run or require better hardware (e.g. Colab Pro).

We start by setting every things up:

In [None]:
!pip install -q -U transformers accelerate bitsandbytes duckduckgo-search

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig
from duckduckgo_search import DDGS
from google.colab import userdata, output
import os

# If you haven't already added the secret in Colab, here's how:
# Runtime > Secrets > Set 'HF_TOKEN' with your Hugging Face API token

hf_api_token = userdata.get('HF_TOKEN')  # Make sure the secret is set in Colab

# Set the token if it's retrieved correctly, else print an error
if hf_api_token:
    os.environ["HUGGINGFACEHUB_API_TOKEN"] = hf_api_token
else:
    print("Hugging Face API Token is missing. Please set it in Colab Secrets.")

# Load a small instruction-following model
model_id = "microsoft/Phi-3-mini-4k-instruct" # this will be quite slow

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    quantization_config=bnb_config,
    dtype=torch.float16,
    offload_buffers=True, # offloads stuff like attention scores to CPU to free up GPU
    low_cpu_mem_usage=True
    # low_cpu_mem_usage loads only parts of the module into RAM at a given moment in
    # time, thus meaning we don't need to fill up RAM with the whole model
)

# we also define a pipeline as well will be repeatedly querying the LLM model
pipe = pipeline(
       "text-generation",
       model=model,
       tokenizer=tokenizer,
       device_map="auto",
       dtype=torch.float16
   )

There's quite a lot going on here, so let's discuss a little.


*   We are using our HuggingFace to access their model hub (and other functions) to access the model we want.
*   We specify the Microsoft Phi Mini model ... probably too small but makes the runtime acceptable in a tutorial like this.
* We specify a tonkeniser to use to convert text to embeddings and back.
* 'Bits and Bytes' is used to quantise the model. This is basically just a hack to make the model smaller (and therefore faster) without losing much of its key intelligence.
* We then specify the model, using tricks like the quantisation (above), using the CPU to store things like weights to give the GPU more bandwidth and not loading the whole model into RAM (just loading parts when we need them). All of this is to make things run faster and in the Colab environment.
* Lastly we specify a _pipeline_ which will allow us to interact with the model efficiently over multiple exchanges.



Now we can start to define tools. Let's start with a common, but useful, example, the humble calculator:

In [None]:
# Define calculator tool
def calculator(expression):
    try:
        return str(eval(expression))
    except Exception as e:
        return f"Error: {e}"

In the above code, an agent model can pass a calculation as text. The calculator will evaluate the calculation (using _eval_) and pass the results back as text.

However, for this task (and to reduce the complexity of the tutorial) we will use just a single tool - DuckDuckGo search (via the DDGS function of the DuckDuckGo-Search Python package).

In [None]:
# DuckDuckGo tool
def search_duckduckgo(query): # pass user query to DuckDuckGo
    with DDGS() as ddgs:

        # query using the user query and get top 3 results
        results = ddgs.text(query, max_results=3)

    # combine the top three results (just the meta description)
    return "\n".join([r["body"] for r in results])

With our tool setup we can now create our agent. We will follow the OOP-style (Object Orientated Programming) approach of creating an object from a class template. The idea of this is that this template (_ResearchAgent_) can be used to create multiple objects ... in this case agents that are tasked with performing different types of research.

In [None]:
class ResearchAgent:
    # rules to apply when instantiating an object of this class
    def __init__(self, name, goal, llm, tools):
        self.name = name # name of this object/agent
        self.goal = goal # the agent's goal
        self.llm = llm # the LLM we installed
        self.tools = tools # the tools it can use (just DuckDuckGo in this case)

    def run_task(self, query):
        # extract context by using the search tool
        context = self.tools["search"](query)

        # prompt template
        prompt = f"""
          You are a {self.name}.
          Your goal is: {self.goal}
          Using the following information from web search:
          {context}
          Write a comprehensive and professional report on the goal.
          Please write no more than 200 words.
          Do not repeat yourself.
        """

        # rules for how to create the response
        response = self.llm(prompt, # the prompt above
                            # only return the answer not the prompt (full text)
                            return_full_text=False,
                            # maximum number of tokens to return
                            max_new_tokens=256,
                            # sample when decoding the response (tokens -> to text)
                            # this allows the decoder to be a bit more creative when
                            # generating the output by referring to the whole probability
                            # distribution of the output rather than just the most likely.
                            do_sample=True,
                            # temperature controls how much creativity the LLM can have
                            # higher is more creativity
                            # [0] means return just the main response
                            # extract from this the "generated_text" item
                            temperature=0.8)[0]["generated_text"]
        return response

Now we have set up the Class template we can create an object of this type.

In [None]:
# Set up agent and tools

# first create the tools list - just one tool
tools = {"search": search_duckduckgo}

# now create an object of the ResearchAgent Class template. As per the __init__ rules,
# it must have a name, a goal, an LLM connection (our pipeline) and the toolset.
agent1 = ResearchAgent(
    name="AI Trend Analyst",
    goal="Summarize the top 2024 AI breakthroughs",
    llm=pipe, # connect it to the pipeline we created earlier
    tools=tools
)

And then let's put the agent to work!

In [None]:
# Run the agent
agent1_report = agent1.run_task("AI breakthroughs in 2024")
output.clear()

Sadly lots of warning messages I can't supress ... but they are harmless. Let's see the report:

In [None]:
print("=== FINAL REPORT ===\n")
print(agent1_report)

Even with a relatively small model the response should be quite good! Now let's build some more agents:

In [None]:
agent2 = ResearchAgent(
    name="AI Data Scientist",
    goal="Identify breakthrough AI technologies and research papers published in 2024",
    llm=pipe,
    tools={"search": search_duckduckgo}
)

agent3 = ResearchAgent(
    name="AI Ethics Officer",
    goal="Review ethical concerns and AI regulations that emerged in 2024",
    llm=pipe,
    tools={"search": search_duckduckgo}
)

Now we have two more agents doing slightly different searches to support slightly different goals. We should get three unique reports!

However, reading three reports sounds very boring. What we really want is another agent that can summarise all three individual reports into just one report. Because AI should help us be more lazy:

In [None]:
class SummariserAgent:
    def __init__(self, name, role, llm):
        self.name = name
        self.role = role
        self.llm = llm

    def summarise_reports(self, reports):
        # Combine the reports from other agents
        combined_reports = "\n".join(reports)

        prompt = f"""
          Your goal is to summarize the following reports into a cohesive, professional summary:

          {combined_reports}

          The report should be at least three paragraphs in length and should cover each of the individual reports sents to you.
        """

        response = self.llm(prompt,
                              return_full_text=False,
                              # double the amount of tokens 256 for the research agents
                              # 512 for our summariser
                              max_new_tokens=512,
                              do_sample=True,
                              temperature=0.8)[0]["generated_text"]
        return response

And again we can create an object of this Class template:

In [None]:
summariser = SummariserAgent(
    name="AI Research Summarizer",
    role="Senior AI Research Analyst",
    llm=pipe
)

Now we can call research agents #2 and #3:

In [None]:
torch.cuda.empty_cache() # empty the cache - rolling memory in RAM to avoid overload

# hashed out as we ran this earlier

# Task 1: AI Researcher
#agent1_report = agent1.run_task("AI breakthroughs in 2024")
#print(f"\n[{agent1.name}] Report:\n{agent1_report}\n")

# torch.cuda.empty_cache()

# Task 2: AI Data Scientist
agent2_report = agent2.run_task("AI research papers in 2024")
output.clear()

torch.cuda.empty_cache()

# Task 3: AI Ethics Officer
agent3_report = agent3.run_task("AI ethics and regulations in 2024")
output.clear()

torch.cuda.empty_cache()

In [None]:
print(f"[{agent2.name}] Report:\n{agent2_report}\n") # print report
print(f"\n[{agent3.name}] Report:\n{agent3_report}\n") # print report

Again, decent looking work. Let's summarise!

In [None]:
# Summarise all the reports
reports = [agent1_report, agent2_report, agent3_report]
final_report = summariser.summarise_reports(reports)
output.clear()

torch.cuda.empty_cache()

In [None]:
print(f"Summary Report:\n{final_report}")

A bit slow, but a good overall summary!

### __TASKS__:
1. Try modifying the roles and prompts of the agents to explore another area (e.g. your area of specialisation). How well does it perform? How might we better instruct the _SummariserAgent_ to create the sort of output we want?
2. Can you create another agent that checks the work of the researchers and asks them to rewrite if the response is not good? What would this agent class look like? How could we create the flow of work to complete these checks? You don't necessarily need to write this code in full but think about what it would look like.