# Multi-Agent Multimodal Reasoning with Nova 2 Omni

This notebook demonstrates multi-agent patterns for multimodal reasoning using Amazon Nova 2 Omni. We use direct boto3 calls with reasoning configuration and implement agent coordination patterns.

**Key Features:**
- Multi-agent orchestration patterns
- Specialized agents for different modalities
- Direct boto3 calls with reasoning configuration
- Agent coordination and result synthesis

**Note:** This notebook demonstrates multi-agent patterns without requiring the Strands framework.

---

## Setup

In [None]:
import json
import boto3
from typing import Literal
from botocore.config import Config
from pydantic import BaseModel, Field
from langchain_core.tools import tool

import nova_utils

MODEL_ID = "us.amazon.nova-2-omni-v1:0"
REGION_ID = "us-west-2"

def get_bedrock_runtime():
    config = Config(read_timeout=2 * 60)
    return boto3.client(
        service_name="bedrock-runtime",
        region_name=REGION_ID,
        config=config
    )

## Define Agent Tools

Create tools for agents to submit their findings.

In [None]:
class SafetyAssessmentResult(BaseModel):
    """Schema for safety assessment results."""
    identified_hazards: list[str] = Field(description="List of hazards identified")
    risk_level: Literal["low", "medium", "high", "critical"] = Field(description="Overall risk level")
    recommended_actions: list[str] = Field(description="Recommended safety actions")

@tool(args_schema=SafetyAssessmentResult)
def submit_safety_assessment(identified_hazards: list[str], risk_level: str, recommended_actions: list[str]) -> dict:
    """Submit safety assessment results."""
    return {
        "agent": "safety_analyzer",
        "hazards": identified_hazards,
        "risk_level": risk_level,
        "actions": recommended_actions
    }

class ComprehensiveReport(BaseModel):
    """Schema for comprehensive report."""
    summary: str = Field(description="Overall summary of findings")
    key_insights: list[str] = Field(description="Key insights from all agents")
    recommendations: list[str] = Field(description="Final recommendations")

@tool(args_schema=ComprehensiveReport)
def submit_comprehensive_report(summary: str, key_insights: list[str], recommendations: list[str]) -> dict:
    """Submit comprehensive report synthesizing all agent findings."""
    return {
        "agent": "coordinator",
        "summary": summary,
        "insights": key_insights,
        "recommendations": recommendations
    }

def langchain_tool_to_bedrock(lc_tool):
    schema = lc_tool.args_schema.model_json_schema()
    return {
        "toolSpec": {
            "name": lc_tool.name,
            "description": lc_tool.description,
            "inputSchema": {"json": schema}
        }
    }

## Define Agent Class

Create a simple agent class using direct boto3 calls.

In [None]:
class MultimodalAgent:
    """Agent using direct boto3 calls with reasoning."""
    
    def __init__(self, name: str, system_prompt: str, tools: list, reasoning_effort: str = "medium"):
        self.name = name
        self.system_prompt = system_prompt
        self.bedrock_tools = [langchain_tool_to_bedrock(t) for t in tools]
        self.reasoning_effort = reasoning_effort
        self.bedrock = get_bedrock_runtime()
    
    def analyze(self, content: list) -> dict:
        """Analyze content and return results."""
        request = {
            "modelId": MODEL_ID,
            "messages": [
                {
                    "role": "user",
                    "content": content
                }
            ],
            "system": [{"text": self.system_prompt}],
            "toolConfig": {"tools": self.bedrock_tools},
            "additionalModelRequestFields": {
                "reasoningConfig": {
                    "type": "enabled",
                    "maxReasoningEffort": self.reasoning_effort
                }
            }
        }
        
        response = self.bedrock.converse(**request)
        return response

safety_agent = MultimodalAgent(
    name="SafetyAnalyzer",
    system_prompt="You are a safety assessment expert. Analyze images for hazards, evaluate risk levels, and recommend safety actions. Use submit_safety_assessment to report findings.",
    tools=[submit_safety_assessment],
    reasoning_effort="high"
)

coordinator_agent = MultimodalAgent(
    name="Coordinator",
    system_prompt="You synthesize analyses from multiple agents. Review findings and provide comprehensive report. Use submit_comprehensive_report.",
    tools=[submit_comprehensive_report],
    reasoning_effort="high"
)

print("✅ Agents initialized")

## Multi-Agent Orchestrator

Coordinate multiple agents for collaborative reasoning.

In [None]:
class MultiAgentOrchestrator:
    """Orchestrates multiple agents."""
    
    def __init__(self, agents: dict, coordinator: MultimodalAgent):
        self.agents = agents
        self.coordinator = coordinator
    
    def run(self, tasks: dict) -> dict:
        """Run agents and synthesize results."""
        agent_results = {}
        
        for agent_name, task_content in tasks.items():
            if agent_name in self.agents:
                print(f"\n=== Running {agent_name} ===")
                agent = self.agents[agent_name]
                response = agent.analyze(task_content)
                
                for content in response["output"]["message"]["content"]:
                    if "toolUse" in content:
                        agent_results[agent_name] = content["toolUse"]["input"]
                        print(f"Results: {json.dumps(agent_results[agent_name], indent=2)}")
        
        print("\n=== Running Coordinator ===")
        synthesis_prompt = f"""Synthesize these analyses:

{json.dumps(agent_results, indent=2)}

Provide final answer integrating all findings."""
        
        coordinator_response = self.coordinator.analyze([{"text": synthesis_prompt}])
        
        for content in coordinator_response["output"]["message"]["content"]:
            if "toolUse" in content:
                final_result = content["toolUse"]["input"]
                print(f"Final Answer: {json.dumps(final_result, indent=2)}")
                return final_result
        
        return {"error": "No final answer"}

orchestrator = MultiAgentOrchestrator(
    agents={"safety": safety_agent},
    coordinator=coordinator_agent
)
print("✅ Orchestrator initialized")

---

## Example: Multi-Agent Image Analysis

Use multiple agents to analyze an image collaboratively.

In [None]:
# Load image
image_path = "media/man_crossing_street.png"
image_bytes, image_format = nova_utils.load_image_as_bytes(image_path)

tasks = {
    "safety": [
        {"image": {"format": image_format, "source": {"bytes": image_bytes}}},
        {"text": "Analyze this image for safety risks. Identify all hazards, evaluate the overall risk level, and recommend appropriate safety actions."}
    ]
}

print("=== Starting Multi-Agent Analysis ===")
result = orchestrator.run(tasks)

print("\n=== Final Synthesized Result ===")
print(json.dumps(result, indent=2))

---

## Key Takeaways

- **Multi-Agent Patterns**: Coordinate specialized agents for complex tasks
- **Direct boto3 Calls**: Enable reasoning configuration with additionalModelRequestFields
- **Agent Specialization**: Agents focus on specific modalities or tasks
- **Result Synthesis**: Coordinator agent integrates findings from all agents

## Next Steps

- Add more specialized agents for different modalities
- Implement voting mechanisms for consensus
- Build custom orchestration patterns