# CAMEL Cookbook - Object Detection with ACI.dev MCP Tools

**Description:** Learn how to build an object detection agent using CAMEL AI and ACI.dev's MCP protocol for seamless ML tasks. 

⭐ Star us on [GitHub](https://github.com/camel-ai/camel), join our [Discord](https://discord.gg/EXAMPLE), or follow us on [X](https://x.com/camelaiorg)

This cookbook shows how to build a powerful object detection agent using CAMEL AI connected to ACI.dev's MCP tools. We'll create an agent that analyzes images, detects objects like cars or trees, and explains results in natural language—all without writing complex ML code.

**Key Learnings:**
- Why agents need tools to be truly useful.
- How MCP enables dynamic, aware tool usage for tasks like object detection.
- Setting up CAMEL with ACI.dev for real-time image analysis.
- Building and running your own object detection agent.
- Handling outputs with summaries, tables, and visualized results.

This setup uses CAMEL's `MCPToolkit` to connect to ACI.dev's MCP servers, powering object detection via Replicate's ML models.

### 📦 Installation

In [None]:
%pip install camel-ai aci-mcp aci-sdk

Set up keys:

In [1]:
import os
from dotenv import load_dotenv

### Define CAMEL agents

In [2]:
agent_name = "ObjectDetectionAgent"
system_prompt="""
You are a specialized Object Detection Agent. Your primary function is to use the `REPLICATE.run` tool for object detection and present the findings in a user-friendly format. "
"The user will provide a text prompt containing an image URL and a query. You must extract the `image` URL and the `query` object(s). "
"Immediately call the `REPLICATE.run` tool. The `input` must be a dictionary with two keys: `image` (the URL) and `query` (a string of the object(s)). "
"Do not ask for clarification; make a reasonable inference if the query is ambiguous. "
"After receiving the tool's output, format your response as follows: "
"- **Natural Language Summary:** Start with a detailed friendly, insightful analysis of the detection results in plain English. "
"- **Markdown Table:** Create a markdown table with columns: 'Object', 'Confidence Score', and 'Bounding Box Coordinates'. "
"- **Result Image:** If the tool provides a URL for an image with bounding boxes, display it using markdown: `![Detected Objects](URL_HERE)`. "
"Whenever I give you a link, trigger the tool call, extract its outputs and links, and present me in a proper markdown format with detailed analysis from the tool call in natural language.
"""

### MCP servers configuration using ACI.dev

In [4]:
from camel.toolkits import MCPToolkit
mcp_config = {
    "mcpServers": {
        "aci_apps": {
            "command": "aci-mcp",
            "args": [
                "apps-server",
                "--apps=REPLICATE",
                "--linked-account-owner-id",
                "parthshr370"
            ],
            "env": {"ACI_API_KEY": os.getenv("ACI_API_KEY")},
        }
    }
}
mcp_toolkit = MCPToolkit(config_dict=mcp_config)
await mcp_toolkit.connect()


<camel.toolkits.mcp_toolkit.MCPToolkit at 0x72f3228b3ad0>

from camel.agents import ChatAgent
from camel.messages import BaseMessage
from camel.models import ModelFactory
from camel.types import ModelPlatformType

model = ModelFactory.create(
    model_platform=ModelPlatformType.GEMINI,
    model_type="gemini-2.5-flash",
    api_key=os.getenv("GOOGLE_API_KEY"),
    model_config_dict={"temperature": 0.0, "max_tokens": 4096},
)



In [5]:
from camel.agents import ChatAgent
from camel.messages import BaseMessage
from camel.models import ModelFactory
from camel.types import ModelPlatformType

tools = mcp_toolkit.get_tools()

# Initialize Gemini model
model = ModelFactory.create(
    model_platform=ModelPlatformType.GEMINI,
    model_type="gemini-2.5-flash",
    api_key=os.getenv("GOOGLE_API_KEY"),
    model_config_dict={"temperature": 0.0, "max_tokens": 4096},
)

# Create system message
sys_msg = BaseMessage.make_assistant_message(
    role_name=agent_name,
    content=system_prompt,
)

agent = ChatAgent(model=model, system_message=sys_msg, tools=tools, memory=None)

In [6]:
response = await agent.astep("What do you see in this picture? URL: https://images.pexels.com/photos/1054655/pexels-photo-1054655.jpeg")
print(response.msg.content)

I've analyzed the image you provided and here's what I found:

The image appears to feature a scenic landscape with several elements. The detection model identified multiple instances of 'person' and 'tree', suggesting a natural outdoor setting possibly with people present. It also confidently detected an 'animal', which could be a prominent feature in the scene. Furthermore, 'sky' and 'water' were identified, indicating an open environment with a body of water. A 'car' was also detected, which might be on a 'road' (though 'road' itself wasn't detected with high confidence, it's a reasonable inference given the presence of a car).

Here are the detailed detection results:

| Object | Confidence Score | Bounding Box Coordinates |
|---|---|---|
| animal | 0.517 | [1433, 384, 2550, 2140] |
| sky | 0.368 | [5, 4, 3452, 665] |
| water | 0.296 | [5, 1341, 3453, 1571] |
| tree | 0.253 | [9, 411, 1508, 970] |
| person | 0.269 | [1430, 381, 2559, 2136] |
| car | 0.261 | [895, 849, 978, 928] |
|