# CAMEL Cookbook - Using MCP Tools in CAMEL (along with ACI.DEV)

**Description:** Learn how to build an object detection agent using CAMEL AI and ACI.dev's MCP protocol for seamless ML tasks. 

‚≠ê Star us on [GitHub](https://github.com/camel-ai/camel), join our [Discord](https://discord.gg/EXAMPLE), or follow us on [X](https://x.com/camelaiorg)

This cookbook shows how to build a powerful object detection agent using CAMEL AI connected to ACI.dev's MCP tools. We'll create an agent that analyzes images, detects objects like cars or trees, and explains results in natural language‚Äîall without writing complex ML code.

**Key Learnings:**
- History about function calling for LLMs.
- What is Model Context Protocols (MCP)?
- How MCP enables dynamic, aware tool usage for tasks like object detection.
- Setting up CAMEL with ACI.dev for real-time image analysis.
- Building and running your own object detection agent.
- Handling outputs with summaries, tables, and visualized results.

### Introduction to Model Context Protocols (MCP)

#### A Brief History of function calling and MCP

- **Pre-2023 - When LLMs Lacked Environmental Awareness**:
  - Tool usage implemented via prompt engineering 
  - Support provided at framework level (e.g., LangChain, CAMEL agents)
  - No native capabilities; relied on parsing unstructured model outputs
- **June 2023 ‚Äì OpenAI Launches Native Function Calling Ability**:
  - Introduced in GPT-4 and GPT-3.5-turbo
  - Utilized structured JSON outputs to call tools and pass arguments
  - Enabled significantly more reliable and scalable tool integration
- **Nov 2024 ‚Äì Anthropic Proposes MCP (Model Context Protocol)**:
  - Formalizes tool interaction using JSON-RPC 2.0 standard
  - Standardizes communication between AI systems and external tools/resources
- **2025 ‚Äì Industry-Wide Adoption**:
  - OpenAI, DeepMind, and other major players adopt MCP
  - Function calling becomes a core capability for advanced agentic AI systems


#### Why MCP? 

The power of standardization:
  
![](figs/MCP.png){fig-align="center"}


#### MCP Ecosystem

MCP Clients:
- Claude desktop
- Coding Tools: Cursor/Windsurf, Claude Code, Cline.
- Frameworks like CAMEL, LangChain.

MCP is gradually becoming a standard. Here are some useful MCP repositories:

- <a href="https://www.aci.dev" target="_blank">ACI.dev</a>
- <a href="https://smithery.ai" target="_blank">Smithery</a>
- <a href="https://composio.dev/" target="_blank">Composio</a>
- <a href="https://mcp.run/" target="_blank">mcp.run</a>
- <a href="https://www.modelscope.cn/mcp" target="_blank">ModelScope</a>
- <a href="https://github.com/punkpeye/awesome-mcp-servers" target="_blank">Awesome MCP Servers</a>

#### CAMEL's Integration with MCP

In this section, we'll explore how CAMEL is integrating with the Model Context Protocol to create a more powerful and flexible agent framework. Here's what we'll cover:

1. Agent using MCP tools
2. Export CAMEL existing tools as MCP servers
3. MCP search toolkits/ MCP search agents
4. Export CAMEL agents as MCP servers

### üì¶ Installation and Configuration

In [None]:
%pip install camel-ai aci-mcp aci-sdk

Set up keys (ACI_API_KEY, LINKED_ACCOUNT_OWNER_ID, GOOGLE_API_KEY, REPLICATE_API_TOKEN) in `.env` file. Import necessary libraries.

In [1]:
import os
from dotenv import load_dotenv

import logging
logging.getLogger().setLevel(logging.ERROR)

from camel.agents import ChatAgent
from camel.messages import BaseMessage
from camel.models import ModelFactory
from camel.types import ModelPlatformType, ModelType
from camel.toolkits import MCPToolkit, PulseMCPSearchToolkit

### Hands on MCP servers!

In [4]:
agent = ChatAgent(model="gpt-4o")
response = agent.step("Hi GPT, what is the current time? I am Riyadh.")
print(response.msg.content)


I'm sorry, but I don't have real-time capabilities to provide the current time. You might want to check a clock, smartphone, or a computer that has internet access to find the current time in Riyadh. Riyadh operates on Arabia Standard Time (AST), which is UTC+3.


So, the agent know nothing about the time! Let's find some MCP server and equip agent this ability!
For example: https://github.com/modelcontextprotocol/servers/tree/main/src/time

In [2]:
config_dict = {
  "mcpServers": {
    "time": {
      "command": "uvx",
      "args": ["mcp-server-time", "--local-timezone=Asia/Riyadh"]
    }
  }
}

async with MCPToolkit(config_dict=config_dict) as toolkit:
    print("Available tools:")
    print([tool.get_function_name() for tool in toolkit.get_tools()])

    agent = ChatAgent(model="gpt-4o", tools=toolkit.get_tools())
    response = await agent.astep("What is the current time? I am in Riyadh.")
    print(response.msg.content)

Available tools:
['get_current_time', 'convert_time']




The current time in Riyadh is 11:00 AM on July 31, 2025.


Great! Now our agent are aware of the current time now! 

How about other tools? Where to find the interesting and useful tools? There are a lot of MCP registry platforms. 
Let's use <a href="https://www.aci.dev" target="_blank">ACI.dev</a> as an example!

### Example of using replicate on ACI.dev to do object detection!

Let's say we want to do the object detection, as the Takhom company do.

We can search the related MCP tools within CAMEL:

In [9]:
search_toolkit = PulseMCPSearchToolkit()
search_toolkit.search_mcp_servers(
    query="Object detection",
    top_k=3,
)

{'servers': [{'name': 'ImageSorcery',
   'url': 'https://www.pulsemcp.com/servers/sunriseapps-imagesorcery',
   'external_url': None,
   'short_description': 'Provides powerful image manipulation capabilities including resizing, cropping, object detection, OCR text extraction, and finding objects based on text descriptions using Python with OpenCV and Ultralytics',
   'source_code_url': 'https://github.com/sunriseapps/imagesorcery-mcp',
   'github_stars': 85,
   'package_registry': 'pypi',
   'package_name': 'imagesorcery-mcp',
   'package_download_count': 7687,
   'EXPERIMENTAL_ai_generated_description': 'ImageSorcery MCP provides AI assistants with powerful image manipulation capabilities through a standardized interface. Built with Python using OpenCV and Ultralytics, it offers tools for basic operations like resizing, cropping, and rotating images, as well as advanced features including object detection, OCR text extraction, and finding objects based on text descriptions. The serve

For today's illustration, we will use Replicate API to do it.

**Replicate** is a developer-friendly platform that hosts a wide collection of open-source AI models for tasks such as:

‚Ä¢	Text-to-image generation (e.g., Stable Diffusion, DALL¬∑E)

‚Ä¢	Audio transcription (e.g., Whisper)

‚Ä¢	Chat and language models (e.g., LLaMA, Mistral)

‚Ä¢	Image-to-image transformation (e.g., face restoration, background removal)

In ACI.dev, we can find the [Replicate MCP server](https://platform.aci.dev/apps/REPLICATE) with 2 tools:

- REPLICATE__MODEL_GROUNDING_DINO: Detects any object in an image based on a descriptive text query. It can detect objects it wasn't explicitly trained on.

- REPLICATE__MODEL_FLUX_1_1_PRO: Faster, better FLUX Pro. Text-to-image model with excellent image quality, prompt adherence, and output diversity.

Let's start our by agent by setting the prompt:

In [3]:
load_dotenv()

agent_name = "ObjectDetectionAgent"
system_prompt="""
You are a specialized Object Detection Agent. Your primary function is to use the `REPLICATE.run` tool for object detection and present the findings in a user-friendly format. "
"The user will provide a text prompt containing an image URL and a query. You must extract the `image` URL and the `query` object(s). "
"Immediately call the `REPLICATE.run` tool. The `input` must be a dictionary with two keys: `image` (the URL) and `query` (a string of the object(s)). "
"Do not ask for clarification; make a reasonable inference if the query is ambiguous. "
"After receiving the tool's output, format your response as follows: "
"- **Natural Language Summary:** Start with a detailed friendly, insightful analysis of the detection results in plain English. "
"- **Markdown Table:** Create a markdown table with columns: 'Object', 'Confidence Score', and 'Bounding Box Coordinates'. "
"- **Result Image:** If the tool provides a URL for an image with bounding boxes, display it using markdown: `![Detected Objects](URL_HERE)`. "
"Whenever I give you a link, trigger the tool call, extract its outputs and links, and present me in a proper markdown format with detailed analysis from the tool call in natural language.
"""

### MCP servers configuration using ACI.dev

In [6]:
mcp_config = {
    "mcpServers": {
        "aci_apps": {
            "command": "aci-mcp",
            "args": [
                "apps-server",
                "--apps=REPLICATE",
                "--linked-account-owner-id",
                "tahakom"
            ],
            "env": {
                "ACI_API_KEY": os.getenv("ACI_API_KEY")
            }
        }
    }
}

Define the CAMEL agent with this configuration!

In [7]:
mcp_toolkit = MCPToolkit(config_dict=mcp_config)
await mcp_toolkit.connect()
tools = mcp_toolkit.get_tools()

# Initialize Gemini model
model = ModelFactory.create(
    model_platform=ModelPlatformType.OPENAI,
    model_type=ModelType.GPT_4_1,
)

# Create system message
sys_msg = BaseMessage.make_assistant_message(
    role_name=agent_name,
    content=system_prompt,
)

agent = ChatAgent(model=model, system_message=sys_msg, tools=tools, memory=None)

After create the agent, let's do some examples using Replicate to identify the following images:

Now you can start the interactive chat loop to analyze any image URL you provide!

In [8]:
# Sample prompts for reference:
# 1. "Analyze the vegetable stall and identify all produce, including tomato, onion, cabbage, cucumber, zucchini, carrot, and beet, in this image: https://images.pexels.com/photos/2255935/pexels-photo-2255935.jpeg"
# 2. "Analyze the busy street scene and identify all vehicles, such as car, bus, and truck, as well as people, in this image: https://www.livemint.com/rf/Image-621x414/LiveMint/Period1/2012/10/01/Photos/Road621.jpg"
# 3. "Analyze the warehouse scene and identify persons, cardboard boxes, and conveyor belts in this image: https://media.business-humanrights.org/media/images/16278498935_dac4d8f223_o.2e16d0ba.fill-1000x1000-c50.jpg"

print("Welcome to the Object Detection Chat! Enter 'quit' to exit.")
print("Please provide an image URL and what objects you'd like to detect.")
print("Example format: Analyze this image for cars and people: [IMAGE_URL]")

while True:
    try:
        user_input = input("\nEnter your prompt: ").strip()
        
        if user_input.lower() == 'quit':
            print("Thank you for using the Object Detection Chat!")
            break
            
        if not user_input:
            print("Please enter a valid prompt with an image URL.")
            continue
            
        response = await agent.astep(user_input)
        print("\nAnalysis Results:")
        print(response.msg.content)
    except Exception as e:
        print(f"An error occurred: {str(e)}")
        print("Please try again with a different image URL or prompt.")

Welcome to the Object Detection Chat! Enter 'quit' to exit.
Please provide an image URL and what objects you'd like to detect.
Example format: Analyze this image for cars and people: [IMAGE_URL]

Analysis Results:
- **Natural Language Summary:**
  Upon analyzing the vegetable stall in the provided image, several types of produce were successfully detected. Tomatoes and onions are well represented in various locations across the stall, with multiple bounding boxes showing their presence. Notably, carrots and cabbages are also detected, positioned in distinct sections. Zucchini and cucumber are each identified, demonstrating a diverse offering of fresh vegetables. There is also a detection for beet, though with somewhat lower confidence. Overall, the stall features an abundant and well-organized display of fresh vegetables, and all queried items have been identified with moderate confidence.

- **Markdown Table:**

| Object    | Confidence Score | Bounding Box Coordinates      |
|-------

Besides using CAMEL agent to connect to the MCP servers, we can export CAMEL agent as MCP servers in one line:

In [None]:
mcp = agent.to_mcp(name="ObjectDetectionAgent")
# mcp.run(transport="streamable-http") # run this in script, not notebook

After that, we can call it using other MCP clients such as Claude desktop, Cursor, etc.

Finally, we disconnect the MCP toolkit. 

In [None]:
await mcp_toolkit.disconnect()

### Other usecases

Other resources from CAMEL-AI:

- SQL MCP Server: https://docs.camel-ai.org/cookbooks/mcp/agents_with_sql_mcp

- Pairing AI Agents with 600+ MCP Tools via ACI.dev: https://docs.camel-ai.org/cookbooks/mcp/camel_aci_mcp_cookbook

- CAMEL MCP Hub: https://mcp.camel-ai.org