# Custom Tools for Multi-Modal Processing

Extend agents with custom tools using the `@tool` decorator. This notebook demonstrates how to process multi-modal content including images, videos, and documents with custom tool implementations.

## What You'll Learn

- Create custom tools with the `@tool` decorator
- Process images, videos, and documents
- Handle multi-modal content in agent workflows
- Implement error handling and validation in tools

## Prerequisites

- Completed [Notebook 01: Hello World](01-hello-world-strands-agents.ipynb)
- Understanding of Python functions and decorators
- Sample media files (provided in `data-sample/` directory)

In [None]:
import boto3
from strands import Agent
from strands.models import BedrockModel
from strands.tools import tool
from datetime import datetime

print("âœ… Imports successful!")

## Creating a Simple Tool

Let's create a simple calculator tool (using @tool decorator):

In [None]:
@tool
def calculator(operation: str, a: float, b: float) -> float:
    """Performs basic mathematical operations.
    
    Args:
        operation: The operation to perform (add, subtract, multiply, divide)
        a: First number
        b: Second number
    
    Returns:
        The result of the operation
    """
    operations = {
        "add": a + b,
        "subtract": a - b,
        "multiply": a * b,
        "divide": a / b if b != 0 else "Error: Division by zero"
    }
    return operations.get(operation, "Invalid operation")

print("âœ… Calculator tool created!")

## Using Tools with Agents

Let's first create the model instance to be used by the agent

In [None]:
# Setup Bedrock model
session = boto3.Session(region_name='us-east-1')
bedrock_model = BedrockModel(
    model_id="us.anthropic.claude-3-5-sonnet-20241022-v2:0",
    boto_session=session
)

Now let's create an agent that can use our calculator:

In [None]:
# Create agent with calculator tool
math_agent = Agent(
    model=bedrock_model,
    tools=[calculator],
    system_prompt="You are a helpful math assistant. Use the calculator tool to perform calculations."
)

print("âœ… Math agent created with calculator tool!")

In [None]:
# Test the calculator tool
response = math_agent("What is 156 multiplied by 23?")
print(response)

Now let's inspect the type AgentResult in Strands:

In [None]:
# Inspect the AgentResult object
print(f"Message: {response.message}")
print("-" * 100 + "\n")

print(f"Metrics: {response.metrics}")
print("-" * 100 + "\n")

print(f"State: {response.state}")
print("-" * 100 + "\n")

print(f"Stop Reason: {response.stop_reason}")
print("-" * 100 + "\n")

## Creating More Complex Tools

Let's create a tool that gets current time information:

In [None]:
@tool
def get_current_time(timezone: str = "UTC") -> str:
    """Gets the current date and time.
    
    Args:
        timezone: The timezone (currently only UTC supported)
    
    Returns:
        Current date and time as a string
    """
    now = datetime.now()
    return f"Current time ({timezone}): {now.strftime('%Y-%m-%d %H:%M:%S')}"

print("âœ… Time tool created!")

In [None]:
# Create agent with multiple tools
assistant = Agent(
    model=bedrock_model,
    tools=[calculator, get_current_time],
    system_prompt="You are a helpful assistant with access to calculator and time tools."
)

response = assistant("What time is it? Also, what's 50 plus 75?")
print(response)

## Using Built-in Tools

Strands Agents includes pre-built tools for common tasks:

In [None]:
from strands_tools import image_reader, file_read 
# Example of video_reader tool structure
# (This is already implemented in video_reader.py)

from video_reader_local import video_reader_local

# Create agent with built-in tools
multimodal_agent = Agent(
    model=bedrock_model,
    tools=[image_reader, file_read,video_reader_local],
    system_prompt="""You are a multi-modal assistant that can:
    - Read and analyze images
    - Process documents (PDF, CSV, DOCX, etc.)
    - Use advanced reasoning for complex tasks.
    - Analyze videos and provide detailed insights.
    """
)

print("âœ… Multi-modal agent created with built-in tools!")

We can see which tools are loaded in our agent in `agent.tool_name`, along with a JSON representation of the tools in `agent.tool_config` that also includes the tool descriptions and input parameters

In [None]:
print(multimodal_agent.tool_names)

print(multimodal_agent.tool_registry.get_all_tools_config())

In [None]:
# Example 1: Image analysis
print("=== ðŸ“¸ IMAGE ANALYSIS ===")
image_result = multimodal_agent("Analyze the image data-sample/diagram.jpg in detail and describe everything you observe")
# print(image_result)
print("\n" + "="*80 + "\n")

In [None]:
# Example 2: Document analysis (if you have a PDF document)
print("=== ðŸ“„ DOCUMENT ANALYSIS ===")
doc_result = multimodal_agent("Summarize as json the content of the document data-sample/Welcome-Strands-Agents-SDK.pdf")
# print(doc_result)

In [None]:
# Example 2: Video analysis
print("=== ðŸŽ¬ VIDEO ANALYSIS ===")
video_result = multimodal_agent("Analyze the video data-sample/climbing-video.mp4 and describe in detail the actions and scenes you observe")
print(video_result)
print("\n" + "="*80 + "\n")

In [None]:
# Inspect the AgentResult object
print(f"Message: {video_result.message}")
print("-" * 100 + "\n")

print(f"Metrics: {video_result.metrics}")
print("-" * 100 + "\n")

print(f"State: {video_result.state}")
print("-" * 100 + "\n")

print(f"Stop Reason: {video_result.stop_reason}")
print("-" * 100 + "\n")

## Direct Tool Usage

You can also call tools directly from the agent:

In [None]:
print(multimodal_agent.tool_names)

In [None]:
# Example 4. Direct use of tools
video_analysis = multimodal_agent.tool.video_reader_local(
     video_path="data-sample/climbing-video.mp4", 
     text_prompt="What are the main elements in this video?"
)

In [None]:
print(video_analysis)

### Additional Samples
An agent that uses the video reader using a AWS S3 bucket for larger videos. 

For that you need to add the bucket environment variable

```bash
export VIDEO_READER_S3_BUCKET = "YOU-BUCKET-NAME"
```

In [None]:
from strands_tools import image_reader, file_read 
# Example of video_reader tool structure
# (This is already implemented in video_reader.py)

from video_reader import video_reader


# Create agent with built-in tools
multimodal_agent = Agent(
    model=bedrock_model,
    tools=[image_reader, file_read,video_reader],
    system_prompt="""You are a multi-modal assistant that can:
    - Read and analyze images
    - Process documents (PDF, CSV, DOCX, etc.)
    - Use advanced reasoning for complex tasks.
    - Analyze videos and provide detailed insights.
    """
)

print("âœ… Multi-modal agent created with built-in tools!")

In [None]:
# Example 4. Direct use of tools
video_analysis = multimodal_agent.tool.video_reader_local(
     video_path="data-sample/moderation-video.mp4", 
     text_prompt="What are the main elements in this video?"
)

In [None]:
print(video_analysis)

## Summary

In this notebook, you learned:

âœ… How to create custom tools with the `@tool` decorator

âœ… How to add tools to agents

âœ… How to use built-in tools from strands_tools

âœ… How to create agents with multiple tools

âœ… How to call tools directly


### Next Steps

Continue to the next notebook to learn about Model Context Protocol (MCP) integration!