# MultiAgents 
learning from Alejandro Tutorial
https://alejandro-ao.com/posts/agents/multi-agent-deep-research/


<ol>Smolagents: A minimalist, very powerful agent library that allows you to create and run multi-agent systems with a few lines of code. </ol>
<ol>Firecrawl: A robust search-and-scrape engine for LLMs to crawl, index, and extract web content.</ol>
<ol>Open models from Hugging Face to scrape and research the web.</ol>

We will be creating a multi-agent system that is coordinated by a ‚ÄúCoordinator Agent‚Äù that spawns multiple ‚ÄúSub-Agent‚Äù instances to handle different subtasks.

!["Agents"](/mnt/data/projects/.immune/Personal/AI_Agents_Tutorial/open-deep-research-workflow-diagram.jpg)


In [1]:
# conda activate torch_gpu_dna
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Qwen/Qwen2.5-7B-Instruct"
## KimiK2 thinking cannot be downloaded so we start with Qwen. Also my GPU is Tesla T4 so I will stick to Qwen-7B.

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=True,
    device_map="auto"
)

  from .autonotebook import tqdm as notebook_tqdm
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Loading checkpoint shards: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 [00:14<00:00,  3.53s/it]


In [3]:
print(model.device)
print(model)

cuda:0
Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(152064, 3584)
    (layers): ModuleList(
      (0-27): 28 x Qwen2DecoderLayer(
        (self_attn): Qwen2Attention(
          (q_proj): Linear4bit(in_features=3584, out_features=3584, bias=True)
          (k_proj): Linear4bit(in_features=3584, out_features=512, bias=True)
          (v_proj): Linear4bit(in_features=3584, out_features=512, bias=True)
          (o_proj): Linear4bit(in_features=3584, out_features=3584, bias=False)
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear4bit(in_features=3584, out_features=18944, bias=False)
          (up_proj): Linear4bit(in_features=3584, out_features=18944, bias=False)
          (down_proj): Linear4bit(in_features=18944, out_features=3584, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
        (post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06)
      )
    )
    (norm): Qwen2

# 1. Generating a Research Plan

In [4]:
PLANNER_SYSTEM_INSTRUCTIONS = """
You are a research planning assistant.

Your task is to produce a clear, structured research plan
for the given user query.

Requirements:
- Break the topic into major research dimensions or questions
- Identify key biological concepts, methods, and datasets
- Include both background and cutting-edge aspects
- The plan should be suitable for later decomposition into subtasks
- Do NOT write the final answer or conclusions

Output format:
- Plain text
- Use numbered sections and bullet points
- Be concise but comprehensive
- No markdown, no JSON, no code blocks
"""


In [5]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Qwen/Qwen2.5-7B-Instruct"
# tokenizer = AutoTokenizer.from_pretrained(model_id)
# model = AutoModelForCausalLM.from_pretrained(
#     model_id,
#     load_in_4bit=True,
#     device_map="auto"
# )

def generate_research_plan(user_query: str) -> str:
    print("Generating the research plan for the query:", user_query)
    print("MODEL:", model_id)

    messages = [
        {"role": "system", "content": PLANNER_SYSTEM_INSTRUCTIONS},
        {"role": "user", "content": user_query},
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=500,
            temperature=0.3,
            do_sample=False,
            repetition_penalty=1.1
        )

    response = tokenizer.decode(
        output[0][inputs["input_ids"].shape[-1]:],
        skip_special_tokens=True
    )

    print("\033[93mGenerated Research Plan\033[0m")
    print(f"\033[93m{response}\033[0m")

    return response.strip()

In [6]:
research_plan = generate_research_plan(
    "Research about immune cell aging using single-cell RNA-seq"
)

The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Generating the research plan for the query: Research about immune cell aging using single-cell RNA-seq
MODEL: Qwen/Qwen2.5-7B-Instruct
[93mGenerated Research Plan[0m
[93m1. Introduction to Immune Cell Aging
   - Definition of immune cell aging
   - Overview of age-related changes in immune function
2. Background on Single-Cell RNA Sequencing (scRNA-seq)
   - Principles of scRNA-seq technology
   - Advantages of scRNA-seq over bulk RNA sequencing
3. Key Biological Concepts Related to Immune Cell Aging
   - Senescence and exhaustion markers in immune cells
   - Epigenetic changes associated with aging
   - Intrinsic vs extrinsic factors influencing immune aging
4. Research Questions
   - How do gene expression patterns change during immune cell aging?
   - What specific cellular pathways are affected by aging?
   - Can we identify unique transcriptional signatures of aged immune cells?
5. Methods for Studying Immune Cell Aging Using scRNA-seq
   - Sample collection and preparation tec

<h4> Kimi-k2 </h4>
So previous model Kimi-k2 thinking has much better thinking so we can provide long instruction
Kimi-K2-Thinking (and similar ‚Äúreasoning‚Äù models) has:
<ol>Strong instruction-following</ol>
<ol> Hidden chain-of-thought / internal planning</ol>
<ol>Better tolerance for long, nuanced constraints</ol>

So this worked well:
<ol>Rich requirements</ol>
<ol>Soft heuristics (‚Äúuse your judgment‚Äù)</ol>
<ol>Multi-objective planning</ol>

The KimiK2 models could:
<p>Think ‚Üí structure ‚Üí output JSON </p>

<h4> Qwen model </h4>
What changes with local Transformers (Qwen2.5-7B-Instruct)
<ol>4-bit</ol>
<ol>no reasoning mode</ol>
<ol>no response_format enforcement</ol>

This means:
<ol>Risks with long instructions</ol>
<ol>Model may explain itself</ol>
<ol>Model may summarize constraints</ol>
<ol>Model may violate JSON-only</ol>
<ol>Model may partially follow constraints</ol>

But‚Ä¶
<ol>Benefits of long instructions</ol>
<ol>Better task decomposition</ol>
<ol>Better coverage</ol>
<ol>Less shallow subtasks</ol>

# 2. Dividing into sub task
Each Agent or subtask would help the agent to take the action
<h4> shorter instruction to Qwen

In [8]:
import json
from pydantic import BaseModel, Field
from typing import List
from pprint import pprint

class Subtask(BaseModel): # subtask inherits from BaseModel i.e. from pydantic to make it in a json format
    id: str = Field(
        ...,
        description="Short identifier for the subtask (e.g. 'A', 'history', 'drivers').",
    )
    title: str = Field(
        ...,
        description="Short descriptive title of the subtask.",
    )
    description: str = Field(
        ...,
        description="Clear, detailed instructions for the sub-agent that will research this subtask.",
    )

class SubtaskList(BaseModel):
    subtasks: List[Subtask] = Field(
        ...,
        description="List of subtasks that together cover the whole research plan.",
    )


In [9]:
TASK_SPLITTER_SYSTEM_INSTRUCTIONS = f"""
You will be given a research plan.

Your job is to split it into subtasks.

Return ONLY valid JSON in the following schema:

{json.dumps(SubtaskList.model_json_schema(), indent=2)}

Rules:
- Do not include any explanation
- Do not include markdown
- Do not include text outside JSON
- Output must be valid JSON
"""

## Local Generation Transformers

In [10]:
def generate_json_response(prompt: str, max_new_tokens=1024):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        output_ids = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.2,
            do_sample=False,
            eos_token_id=tokenizer.eos_token_id,
        )

    output_text = tokenizer.decode(
        output_ids[0][inputs["input_ids"].shape[-1]:],
        skip_special_tokens=True,
    )

    return output_text.strip()


#### JSON extraction + Pydantic validation
LLMs sometimes add junk ‚Äî extract safely.

In [11]:
import re

def extract_json(text: str) -> str:
    start = text.find("{")
    if start == -1:
        raise ValueError("No JSON object found")

    brace_count = 0
    for i in range(start, len(text)):
        if text[i] == "{":
            brace_count += 1
        elif text[i] == "}":
            brace_count -= 1
            if brace_count == 0:
                return text[start:i + 1]

    raise ValueError("Unbalanced JSON braces")


In [12]:
def split_into_subtasks(research_plan: str):

    prompt = f"""
{TASK_SPLITTER_SYSTEM_INSTRUCTIONS}

Research plan:
{research_plan}
"""

    raw_output = generate_json_response(prompt)

    json_text = extract_json(raw_output)
    data = json.loads(json_text)

    # üîí Validate with Pydantic
    subtask_list = SubtaskList(**data)

    print("\033[93mGenerated The Following Subtasks\033[0m")
    for task in subtask_list.subtasks:
        print(f"\033[93m{task.title}\033[0m")
        pprint(task.description)
        print()

    return subtask_list.subtasks


In [13]:
subtasks = split_into_subtasks(research_plan)

[93mGenerated The Following Subtasks[0m
[93mIntroduction to Immune Cell Aging[0m
'Define immune cell aging and overview age-related changes in immune function.'

[93mBackground on Single-Cell RNA Sequencing[0m
('Explain principles of scRNA-seq technology and its advantages over bulk RNA '
 'sequencing.')

[93mKey Biological Concepts[0m
('Identify senescence and exhaustion markers, epigenetic changes, and '
 'intrinsic vs extrinsic factors influencing immune aging.')

[93mResearch Questions[0m
('Formulate questions about gene expression patterns, affected cellular '
 'pathways, and transcriptional signatures of aged immune cells.')

[93mMethods for Studying Immune Cell Aging[0m
('Detail sample collection, preparation, normalization, quality control, '
 'clustering, and differential expression analysis.')

[93mDatasets and Resources[0m
('Locate publicly available scRNA-seq datasets and relevant databases for '
 'human and murine immune systems.')

[93mCutting-Edge Aspects 

In [14]:
import json
TASK_SPLITTER_SYSTEM_INSTRUCTIONS = f"""
You are a task decomposition engine.

You will be given a set of research instructions (a research plan).
Your job is to break this plan into a set of coherent, non-overlapping
subtasks that can be researched independently by separate agents.

Planning guidelines:
- 3 to 8 subtasks is usually a good range. Use your judgment.
- Subtasks should collectively cover the full scope of the original plan
  without unnecessary duplication.
- Prefer grouping by meaningful dimensions such as:
  time periods, regions, actors, themes, or causal mechanisms,
  depending on the topic.
- Do NOT include a final task that synthesizes results.
  That will be done later in another step.
- Each subtask description should be very clear and detailed about
  what the agent must research and produce.

Output requirements (STRICT):
- Return ONLY valid JSON
- Do NOT include explanations
- Do NOT include markdown
- Do NOT include text outside JSON
- Output MUST conform exactly to the following schema:

{json.dumps(SubtaskList.model_json_schema(), indent=2)}
"""

In [15]:
subtasks_long = split_into_subtasks(research_plan)

[93mGenerated The Following Subtasks[0m
[93mDefinition and Overview of Immune Cell Aging[0m
('Research the definition of immune cell aging and provide an overview of '
 'age-related changes in immune function. This includes identifying key terms, '
 'concepts, and mechanisms involved in the aging process of immune cells.')

[93mIntroduction to Single-Cell RNA Sequencing Technology[0m
('Examine the principles of single-cell RNA sequencing (scRNA-seq) technology, '
 'including its methodology, advantages over bulk RNA sequencing, and how it '
 'can be used to study immune cell aging at the individual cell level.')

[93mKey Biological Concepts Related to Immune Cell Aging[0m
('Investigate senescence and exhaustion markers in immune cells, epigenetic '
 'changes associated with aging, and the distinction between intrinsic and '
 'extrinsic factors influencing immune aging. Provide a comprehensive '
 'understanding of these concepts and their relevance to the aging process.')

[93m