## Code to Chapter 10 of LangChain for Life Science and Healthcare book, by Dr. Ivan Reznikov

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1xPf4HrMBqymlXPADmv2P2RKa6byhXHxA?usp=sharing)

## Tutorial AutoGen - Multi-Agent Research System

This notebook demonstrates how to create a sophisticated multi-agent system using Microsoft's AutoGen framework. The system consists of specialized agents that work together to perform complex research tasks, including data collection, analysis, and report generation. This approach showcases how AI agents can collaborate to solve problems that would be challenging for a single agent to handle alone.

## Key Concepts

- **Multi-Agent Collaboration**: Multiple AI agents with different roles working together
- **Automated Group Chat**: Agents communicate through structured conversations
- **Specialized Roles**: Each agent has a specific function (planning, coding, analysis, etc.)
- **Tool Integration**: Agents can execute code, scrape web data, and generate reports

## Installation and Setup

### Auto Generated Agent Chat: Performs Research with Multi-Agent Group Chat

`flaml.autogen` offers conversable agents powered by LLM, tool or human, which can be used to perform tasks collectively via automated chat. This framwork allows tool use and human participance through multi-agent conversation.
Please find documentation about this feature [here](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat).

In [1]:
!pip install -q flaml[autogen]
!pip install numpy==1.26.4



In [2]:
!pip freeze | grep "langc\|openai\|autogen\|FLAM"

FLAML==2.3.5
langchain==0.3.26
langchain-core==0.3.71
langchain-text-splitters==0.3.8
langcodes==3.5.0
openai==0.27.8


In [3]:
from google.colab import userdata
import os

os.environ["OPENAI_API_KEY"] = userdata.get("LC4LS_OPENAI_API_KEY")

## Model Configuration

### Setting Up the Language Model

**Configuration Explanation**: This creates a configuration list that specifies which models to use. GPT-4o-mini is chosen here as it provides a good balance of capability and cost-effectiveness for multi-agent scenarios.

❗IMPORTANT:

<font color='red'>if you get an error don't panic - simply restart the notebook and run all cells. This issue is well known for google colab❗

Just click on Runtime -> Restart session and run all
</font>


In [4]:
from flaml import autogen

config_list_gpt = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["gpt-4o-mini"], # some fine-tuned models
    },
)

### Global Model Settings

**Parameter Breakdown**:
- **seed**: Ensures reproducible results across different runs
- **temperature**: Set to 0 for deterministic, focused responses (crucial for code generation)
- **request_timeout**: 120 seconds to handle complex reasoning tasks
- **model**: Specifies the exact model variant to use

In [5]:
gpt_config = {
    "seed": 42,  # change the seed for different trials
    "model": "gpt-4o-mini",
    "temperature": 0,
    "config_list": config_list_gpt,
    "request_timeout": 120
}

## Agent Architecture

This system uses 7 specialized agents, each with a distinct role:

### 1. User Proxy Agent (Admin)

**Role**: Acts as the project manager and human proxy
- **Termination Logic**: Recognizes when tasks are complete
- **Auto-reply Limit**: Prevents infinite loops with max 10 consecutive responses
- **Decision Making**: Determines if the solution meets requirements

In [6]:
user_proxy = autogen.UserProxyAgent(
    name="Admin",
    max_consecutive_auto_reply=10,
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    llm_config=gpt_config,
    system_message="""Reply TERMINATE if the task has been solved at full satisfaction.
    Otherwise, reply CONTINUE, or the reason why the task is not solved yet."""
)

### 2. Engineer Agent

**Responsibilities**:
- Write executable Python/shell code
- Debug and fix code errors
- Implement data scraping and processing logic
- Ensure code completeness (no partial solutions)

**Key Constraints**:
- Must provide complete, executable code
- Cannot suggest incomplete code requiring user modification
- Must analyze and fix execution errors independently

In [7]:
engineer = autogen.AssistantAgent(
    name="Engineer",
    llm_config=gpt_config,
    system_message='''Engineer. You follow an approved plan. You write python/shell code to solve tasks. Wrap the code in a code block that specifies the script type.
    The user can't modify your code. So do not suggest incomplete code which requires others to modify. Don't use a code block if it's not intended to be executed by the executor.
    Don't include multiple code blocks in one response. Do not ask others to copy and paste the result. Check the execution result returned by the executor.
    If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes.
    If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption,
    collect additional info you need, and think of a different approach to try.
    ''',
)

### 3. Scientist Agent

**Expertise**:
- Data interpretation and analysis
- Scientific reasoning and conclusions
- Domain knowledge application
- Coordination with Engineer for data collection

In [8]:
scientist = autogen.AssistantAgent(
    name="Scientist",
    llm_config=gpt_config,
    system_message="""Scientist. You follow an approved plan.
    You are able to understand scientific data and make scientific conclusions. You don't write code.
    If a web data source is provided - ask the Engineer to scrape the data from that source.
    """
)

### 4. Report Generator Agent

**Function**: Specializes in creating formatted, professional reports with proper HTML structure and saving them to the file system.


In [9]:
report_generator = autogen.AssistantAgent(
    name="ReportGenerator",
    llm_config=gpt_config,
    system_message="Create a HTML report. Write the report to disk."
)

### 5. Planner Agent

**Strategic Role**:
- Creates comprehensive project plans
- Defines clear step-by-step workflows
- Specifies which agent handles each task
- Iterates on plans based on feedback

In [10]:
planner = autogen.AssistantAgent(
    name="Planner",
    system_message='''Planner. Suggest a plan. Revise the plan based on feedback from admin and critic, until admin approval.
    The plan may involve an engineer who can write code and a scientist who doesn't write code.
    Explain the plan first. Be clear which step is performed by an engineer, and which step is performed by a scientist.
''',
    llm_config=gpt_config,
)

### 6. Executor Agent

**Technical Details**:
- **Automated Execution**: Never requires human input
- **Context Window**: Uses last 3 messages for execution context
- **Working Directory**: Creates/uses "paper" directory for file operations
- **Safety**: Isolated execution environment

In [11]:
executor = autogen.UserProxyAgent(
    name="Executor",
    system_message="Executor. Execute the code written by the engineer and report the result.",
    human_input_mode="NEVER",
    code_execution_config={"last_n_messages": 3, "work_dir": "paper"},
)

### 7. Critic Agent

**Quality Assurance**:
- Reviews plans for completeness and accuracy
- Validates code logic and implementation
- Ensures verifiable information sources are included
- Provides constructive feedback for improvements


In [12]:
critic = autogen.AssistantAgent(
    name="Critic",
    system_message="Critic. Double check plan, claims, code from other agents and provide feedback. Check whether the plan includes adding verifiable info such as source URL.",
    llm_config=gpt_config,
)

## Group Chat Configuration

### Creating the Multi-Agent System

**System Architecture**:
- **Agent Pool**: All 7 agents participate in the conversation
- **Message History**: Maintains conversation context
- **Round Limit**: Maximum 50 conversation rounds to prevent endless loops
- **Manager Role**: Orchestrates agent interactions and turn-taking


In [13]:
groupchat = autogen.GroupChat(agents=[user_proxy, engineer, scientist, report_generator, planner, executor, critic], messages=[], max_round=50)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=gpt_config)

## Task Execution

### Research Query Definition

In [14]:
# https://yourgpt.ai/tools/llm-comparison-and-leaderboard
# https://artificialanalysis.ai/leaderboards/models
query = """List 10 llms with result on STEM and Humanities results on MMLU.
           Create a report that incluces the bar chart of different models scores and an overall table.
           Include the summary of the findings in the report as reports/autogen_report.html."""

**Task Complexity**: This query requires:
1. **Data Collection**: Finding MMLU benchmark results for 10 LLMs
2. **Data Processing**: Organizing STEM and Humanities scores
3. **Visualization**: Creating bar charts for comparison
4. **Report Generation**: Producing a comprehensive HTML report
5. **Analysis**: Summarizing key findings and insights


### Initiating Multi-Agent Collaboration

**Workflow Initiation**: This starts the multi-agent conversation where:
1. **Planner** creates a research strategy
2. **Critic** reviews and refines the plan
3. **Engineer** implements data collection code
4. **Executor** runs the code and reports results
5. **Scientist** analyzes the data for insights
6. **Report Generator** creates the final HTML report
7. **Admin** evaluates completion and terminates when satisfied

## Expected Workflow

### Phase 1: Planning
- Planner proposes a structured approach to find MMLU data
- Critic ensures the plan includes verifiable sources
- Admin approves the refined plan

### Phase 2: Data Collection
- Engineer writes web scraping code for MMLU benchmarks
- Executor runs the code and reports results
- Scientist validates data quality and completeness

### Phase 3: Analysis & Visualization
- Engineer creates data visualization code (bar charts, tables)
- Scientist interprets the results and identifies patterns
- Executor generates the visual outputs

### Phase 4: Report Generation
- Report Generator creates structured HTML report
- Critic reviews for accuracy and completeness
- Admin confirms the final deliverable meets requirements


!Notice how you can provide the feedback to the manager after plan is generated!

In [16]:
user_proxy.initiate_chat(
    manager,
    message=f"{query}",
)

Admin (to chat_manager):

List 10 llms with result on STEM and Humanities results on MMLU.
           Create a report that incluces the bar chart of different models scores and an overall table.
           Include the summary of the findings in the report as reports/autogen_report.html.

--------------------------------------------------------------------------------
ReportGenerator (to chat_manager):

To create a report that lists 10 large language models (LLMs) with their performance results on the STEM and Humanities sections of the MMLU benchmark, we will follow these steps:

### Step 1: Identify 10 LLMs and Gather Data
Here’s a list of 10 LLMs along with their hypothetical performance results on the MMLU benchmark for both STEM and Humanities:

| Model Name         | STEM Score (%) | Humanities Score (%) |
|--------------------|----------------|-----------------------|
| GPT-3              | 85.0           | 80.0                  |
| BERT                | 78.0           | 82.0    

## Key Benefits of This Approach

1. **Specialization**: Each agent focuses on their core competency
2. **Quality Control**: Multiple agents review work for accuracy
3. **Fault Tolerance**: Errors are caught and corrected through collaboration
4. **Scalability**: Complex tasks are broken down into manageable parts
5. **Transparency**: All decision-making and execution steps are visible

This multi-agent system demonstrates how AI can work collaboratively to solve complex research tasks that require multiple skills and perspectives.