# Part 2: Agents vs. Workflows

## Intro: What is an Agent? 

- Agents are systems that can "_independently accomplish a task on our behalf_"
- Agents are composed of: 
    - **Task**: The goal that the agent wishes to accomplish for the user 
    - **Environment**: The space where the agent operates in, in order to accomplish the task
    - **Tools**: The set of actions that the agent can take within the environment in order to accomplish the goal

## Agents vs workflows

- We have a clear idea of the steps the agent needs to accomplish: **use workflows**
- We don't have a clear idea of the steps the agents needs to accomplish: **use agents**


Cheatcheet: 

| Aspect           | Workflows                                         | Agents                                                      |
|------------------|---------------------------------------------------|-------------------------------------------------------------|
| Structure        | Predefined and sequential                         | Dynamic and autonomous                                      |
| Flexibility      | Best for predictable, repeatable tasks            | Suited for complex, open-ended problems                     |
| Control          | Driven by developer-written code paths            | LLM decides actions based on current context                |
| Tool Usage       | Fixed order or recipe                             | Chosen and sequenced by the agent as needed                 |
| Predictability   | Highly predictable                                | Variable; can adapt or retry based on feedback              |
| Main Use Cases   | Well-defined automations, document flows          | Coding agents, interactive assistants, task planners        |



# Workflow pattern: Evaluator optimizer

![./assets/evaluator_optimizer.png](./assets/evaluator-optimizer.png)

Let's setup some boilerplate: 

In [1]:
import jupyter_black
from openai import OpenAI
import os
import json
from markitdown import MarkItDown
import re
from IPython.display import Markdown
from dotenv import load_dotenv

load_dotenv(dotenv_path=".envrc")

API_KEY = os.environ["DEEPSEEK_API_KEY"]
BASE_URL = "https://api.deepseek.com"
MODEL = "deepseek-chat"
# MODEL = "deepseek-reasoner"

jupyter_black.load()

A function to make calls to DeepSeek (includes json output formatting!)

In [2]:
def llm_call(prompt: str, with_json_output: bool = False) -> tuple[str | dict, str]:
    client = OpenAI(api_key=API_KEY, base_url=BASE_URL)

    args = {
        "model": MODEL,
        "messages": [],
    }

    if with_json_output is True:
        json_prompt = """
        Output your response in JSON format with the keys specified in the prompt.
        Do not include any other text such as ```json or ```.
        The response should be directly parseable by json.loads.
        """.strip()
        args["messages"].append({"role": "system", "content": json_prompt})
        args["response_format"] = {"type": "json_object"}

    args["messages"].append({"role": "user", "content": prompt})
    response = client.chat.completions.create(**args)

    # if hasattr(response.choices[0].message, "reasoning_content"):
    #     reasoning = response.choices[0].message.reasoning_content
    #     final_response = response.choices[0].message.content

    reasoning = ""
    final_response = response.choices[0].message.content

    if with_json_output is True:
        return json.loads(final_response), reasoning

    return final_response, reasoning

We first create our generator. Responsible for generating!

In [3]:
def generate(prompt: str, task: str, context: str = "") -> tuple[str, str]:
    """Generate and improve a solution based on feedback."""
    full_prompt = (
        f"{prompt}\n{context}\nTask: {task}" if context else f"{prompt}\nTask: {task}"
    )
    response, thoughts = llm_call(full_prompt)
    result = re.search(r"<RESPONSE>(.*?)</RESPONSE>", response, re.DOTALL).group(1)

    print("\n=== GENERATION START ===")

    print("\n*** THOUGHTS START ***")
    print(thoughts)
    print("\n*** THOUGHTS END ***")

    print("\n*** RESULT START ***")
    print(result)
    print("\n*** RESULT END ***")

    print("=== GENERATION END ===\n")

    return thoughts, result

We also create our evaluator, which is responsible for providing feedback on the task: 

In [4]:
def evaluate(prompt: str, content: str, task: str) -> tuple[str, str]:

    full_prompt = f"{prompt}\nOriginal task: {task}\nContent to evaluate: {content}"
    response, thoughts = llm_call(full_prompt, with_json_output=True)
    evaluation = response.get("evaluation")
    feedback = response.get("feedback")

    print("=== EVALUATION START ===")

    print("\n*** THOUGHTS START ***")
    print(thoughts)
    print("\n*** THOUGHTS END ***")

    print("\n*** STATUS START ***")
    print(f"Status: {evaluation}")
    print("\n*** STATUS END ***")

    print("\n*** FEEDBACK START ***")
    print(feedback)
    print("\n*** FEEDBACK END ***")

    print("=== EVALUATION END ===\n")

    return evaluation, feedback

This is our main agentic loop 

In [5]:
def loop(
    task: str, evaluator_prompt: str, generator_prompt: str
) -> tuple[str, list[dict]]:
    """Keep generating and evaluating until requirements are met."""
    memory = []
    chain_of_thought = []

    thoughts, result = generate(generator_prompt, task)
    # TODO: add the reasoning content to the chain of thought
    memory.append(result)
    chain_of_thought.append({"thoughts": thoughts, "result": result})

    while True:
        evaluation, feedback = evaluate(evaluator_prompt, result, task)
        if evaluation == "PASS":
            return result, chain_of_thought

        context = "\n".join(
            [
                "Previous attempts:",
                *[f"- {m}" for m in memory],
                f"\nFeedback: {feedback}",
            ]
        )

        thoughts, result = generate(generator_prompt, task, context)
        memory.append(result)
        chain_of_thought.append({"thoughts": thoughts, "result": result})

A simple function to fetch papers from the arxiv: 

In [6]:
def get_text_from_arxiv_paper(url: str) -> str:
    md = MarkItDown(enable_plugins=True)
    result = md.convert(url)
    return result.text_content

Finally, we create our prompts: 

In [7]:
evaluator_prompt = """

Evaluate the following summary. A good summary should:

1. Be understandable by an undergraduate student
2. Formatted in markdown, with proper headings and subheadings
3. Have a title and a clear structure
4. Have at least 500 words
5. Grammar and spelling should be correct

You should be evaluating only and not attemping to solve the task.
Only output "PASS" if all criteria are met and you have no further suggestions for improvements.
Output your evaluation concisely in the following format:

EXAMPLE JSON OUTPUT:
{
    "evaluation": "PASS, NEEDS_IMPROVEMENT, or FAIL",
    "feedback": "What needs improvement and why."
}
"""

generator_prompt = """
Your goal is to complete the task based on <user input>. If there are feedback 
from your previous generations, you should reflect on them to improve your solution

Output your answer concisely in the following format:

<RESPONSE>
Content of the response
</RESPONSE>
"""


def generate_agentic_summary_for(paper_url: str) -> str:

    web_page_text = get_text_from_arxiv_paper(paper_url)

    task = f"""
    <user input>
    Write a summary of the following article:

    <article>
    {web_page_text}
    </article>

    </user input>
    """

    result, cot = loop(task, evaluator_prompt, generator_prompt)

    return result, cot

We launch the agent: 

In [8]:
result, cot = generate_agentic_summary_for("https://arxiv.org/pdf/2506.13585")




=== GENERATION START ===

*** THOUGHTS START ***


*** THOUGHTS END ***

*** RESULT START ***

MiniMax-M1 is an open-weight, large-scale hybrid-attention reasoning model powered by a Mixture-of-Experts (MoE) architecture and a lightning attention mechanism. Developed from MiniMax-Text-01, it features 456 billion parameters (45.9B activated per token) and supports a context length of 1 million tokens—8x longer than DeepSeek R1. The lightning attention reduces FLOPs by 75% for 100K-token generations compared to DeepSeek R1, enhancing efficiency for long-context tasks. 

Key innovations include:
1. **CISPO**: A novel RL algorithm that clips importance sampling weights instead of token updates, improving training efficiency (2x faster than DAPO).
2. **Hybrid Attention**: Combines transformer blocks with linear attention (Transnormer blocks) for near-linear scaling in long sequences.
3. **Training Efficiency**: Full RL training completed in 3 weeks on 512 H800 GPUs ($534,700 cost).

MiniMa

In [9]:
Markdown(result)


# MiniMax-M1: A Breakthrough in Efficient Large-Scale Reasoning Models

## Introduction  
MiniMax-M1 represents a significant advancement in large reasoning models (LRMs), combining a hybrid Mixture-of-Experts (MoE) architecture with a novel **lightning attention mechanism**. Developed from MiniMax-Text-01, this open-weight model achieves unprecedented efficiency in processing long-context tasks while maintaining high performance across diverse benchmarks. With 456 billion parameters (45.9B activated per token) and support for **1 million tokens of context length**, MiniMax-M1 outperforms competitors like DeepSeek-R1 and Qwen3-235B in software engineering, tool use, and long-context understanding.

---

## Key Innovations  

### 1. **Lightning Attention Mechanism**  
   - Reduces FLOPs by **75%** for 100K-token generations compared to traditional attention.  
   - Enables **near-linear scaling** for long sequences via a hybrid design:  
     - Combines transformer blocks (softmax attention) with **Transnormer blocks** (linear attention).  
   - Supports **8x longer context** than DeepSeek R1 (1M vs. 128K tokens).  

### 2. **CISPO: A Novel RL Algorithm**  
   - **Clips importance sampling weights** instead of token updates, stabilizing training.  
   - Achieves **2x faster training** than DAPO (Yu et al., 2025) by preserving gradient contributions from all tokens.  
   - Empirical validation shows superior performance on mathematical reasoning benchmarks (e.g., AIME 2024).  

### 3. **Training Efficiency**  
   - Full RL training completed in **3 weeks** on 512 H800 GPUs ($534,700 cost).  
   - Techniques like **FP32 precision for LM heads** and **early truncation heuristics** mitigate instability during scaling.  

---

## Performance Highlights  

### Benchmark Results  
MiniMax-M1 excels in:  
- **Software Engineering (SWE-bench)**: 56% accuracy (vs. 57.6% for DeepSeek-R1-0528).  
- **Long Context (OpenAI-MRCR)**: Outperforms OpenAI o3 and Claude 4 Opus, trailing only Gemini 2.5 Pro.  
- **Agentic Tool Use (TAU-bench)**: Surpasses all open-weight models and Gemini 2.5 Pro.  

| Model            | AIME 2024 | LiveCodeBench | SWE-bench | TAU-bench |  
|------------------|-----------|---------------|-----------|-----------|  
| MiniMax-M1-80k   | 86.0%     | 70.1%         | 56.0%     | 73.4%     |  
| DeepSeek-R1-0528 | 88.0%     | 75.5%         | 57.6%     | 67.2%     |  

### Extended Thinking  
- **80K-token generation** version (MiniMax-M1-80k) shows consistent gains over the 40K variant, validating the benefits of scaled test-time compute.  

---

## Architectural and Training Insights  

### Hybrid Attention Design  
- Alternates **7 Transnormer blocks** (linear attention) with **1 transformer block** (softmax attention) per layer.  
- Theoretical FLOPs scale near-linearly with sequence length (Figure 1).  

### RL Training Pipeline  
1. **Continual Pretraining**: 7.5T tokens of reasoning-intensive data (70% STEM/code).  
2. **Supervised Fine-Tuning (SFT)**: Injects chain-of-thought (CoT) patterns for RL readiness.  
3. **RL with CISPO**: Curriculum includes verifiable (math, coding) and generative tasks (QA, creative writing).  

### Challenges Addressed  
- **Precision Mismatch**: FP32 precision for LM heads resolved training-inference probability gaps.  
- **Length Bias**: Online monitoring and reward shaping prevent reward hacking via verbose outputs.  

---

## Conclusion and Future Directions  
MiniMax-M1 sets a new standard for **efficient, scalable reasoning models**, with open-weight availability via [GitHub](https://github.com/MiniMax-AI/MiniMax-M1) and commercial API. Future work aims to:  
- Expand applications in **real-world workflows** (e.g., automated software engineering).  
- Further optimize **long-context agentic interactions**.  

By combining architectural efficiency with algorithmic innovations like CISPO, MiniMax-M1 paves the way for next-generation language model agents.


## **Exercise**
- We are using the `deepseek-chat` model, can you change this notebook to use the reasoner? 
- How do we integrate the reasoning process into our evaluator optimizer? (hint: You need to change the code in two places!)


Great! Now it's time to move to function calling agents! No more workflows :) 