# Multimodal Models & AI Communication

This notebook covers:
- **LMM**: Large Multimodal Models
- **MLLM**: Multimodal Large Language Models
- **Communication**: MCP, A2A, A2P protocols
- **Multi-Agent Systems**: Practical examples

---

# Part 1: LMM & MLLM - Multimodal Models

## Difference:
- **MLLM**: LLM extended with multimodal inputs (text output mainly)
- **LMM**: True multimodal (can output multiple modalities)

## Examples:
- **MLLM**: GPT-4V, Claude 3
- **LMM**: Gemini Ultra, GPT-4o (audio + video)

In [None]:
# Install required packages
!pip install openai anthropic pillow requests matplotlib soundfile -q

In [None]:
import os
import base64
import requests
from io import BytesIO
from PIL import Image
import matplotlib.pyplot as plt
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

print("✓ Setup complete")

## Example 1: Image + Text → Text (MLLM)

In [None]:
def multimodal_query(image_url, text_query):
    """Query with image and text"""
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": text_query},
                    {"type": "image_url", "image_url": {"url": image_url}}
                ]
            }
        ],
        max_tokens=300
    )
    return response.choices[0].message.content

# Example
image_url = "https://images.unsplash.com/photo-1517849845537-4d257902454a?w=600"
query = "Describe this image and suggest a creative caption for social media."

print(f"Query: {query}\n")
print("Response:")
print(multimodal_query(image_url, query))

## Example 2: Multiple Images Analysis

In [None]:
def compare_multiple_images(image_urls, query):
    """Analyze multiple images together"""
    content = [{"type": "text", "text": query}]
    
    for url in image_urls:
        content.append({"type": "image_url", "image_url": {"url": url}})
    
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[{"role": "user", "content": content}],
        max_tokens=400
    )
    return response.choices[0].message.content

# Compare multiple images
urls = [
    "https://images.unsplash.com/photo-1517849845537-4d257902454a?w=400",
    "https://images.unsplash.com/photo-1548199973-03cce0bbc87b?w=400"
]

query = "Compare these images. What are the similarities and key differences?"
print(f"Query: {query}\n")
print("Analysis:")
print(compare_multiple_images(urls, query))

## Example 3: Document Understanding (Image + Text)

In [None]:
def analyze_document(document_image_url):
    """Extract and analyze document content"""
    query = """Analyze this document and provide:
    1. Document type
    2. Key information extracted
    3. Summary of main points
    4. Any action items or important dates"""
    
    return multimodal_query(document_image_url, query)

print("Document Understanding Capabilities:\n")
print("✓ Invoice processing")
print("✓ Receipt scanning")
print("✓ Form extraction")
print("✓ Chart/graph analysis")
print("✓ Presentation slides")
print("✓ Medical reports (with disclaimers)")

---
# Part 2: AI Communication Protocols

## Three Main Patterns:
1. **MCP** (Model Context Protocol): AI ↔ Tools/Data
2. **A2A** (Agent-to-Agent): AI ↔ AI
3. **A2P** (Agent-to-Person): AI ↔ Human

## Example 4: MCP - Model Context Protocol Simulation

In [None]:
# Simulate MCP server providing tools to AI
class MCPServer:
    """Simulated MCP Server providing tools"""
    
    def __init__(self, name):
        self.name = name
        self.tools = {}
    
    def register_tool(self, name, description, function):
        """Register a tool"""
        self.tools[name] = {
            "description": description,
            "function": function
        }
    
    def list_tools(self):
        """List available tools"""
        return [
            {"name": name, "description": info["description"]}
            for name, info in self.tools.items()
        ]
    
    def call_tool(self, tool_name, **kwargs):
        """Execute a tool"""
        if tool_name in self.tools:
            return self.tools[tool_name]["function"](**kwargs)
        return {"error": "Tool not found"}

# Create MCP servers
database_server = MCPServer("database")
filesystem_server = MCPServer("filesystem")

# Register tools
database_server.register_tool(
    "query_users",
    "Query user database",
    lambda limit=10: {"users": [f"user_{i}" for i in range(limit)]}
)

filesystem_server.register_tool(
    "read_file",
    "Read file contents",
    lambda path: {"content": f"Contents of {path}"}
)

print("MCP Servers Available:\n")
print("1. Database Server")
for tool in database_server.list_tools():
    print(f"   - {tool['name']}: {tool['description']}")

print("\n2. Filesystem Server")
for tool in filesystem_server.list_tools():
    print(f"   - {tool['name']}: {tool['description']}")

In [None]:
# Simulate AI using MCP tools
class AIWithMCP:
    """AI agent that can use MCP tools"""
    
    def __init__(self, mcp_servers):
        self.servers = mcp_servers
    
    def process_request(self, user_request):
        """Process user request using available tools"""
        print(f"User Request: {user_request}\n")
        
        # Simulate AI deciding which tools to use
        if "users" in user_request.lower():
            print("AI: I'll query the database for user information...")
            result = self.servers["database"].call_tool("query_users", limit=5)
            print(f"Result: {result}")
            return f"Found {len(result['users'])} users: {', '.join(result['users'])}"
        
        elif "file" in user_request.lower():
            print("AI: I'll read the file...")
            result = self.servers["filesystem"].call_tool("read_file", path="/data/report.txt")
            print(f"Result: {result}")
            return result['content']
        
        return "I need more specific instructions."

# Test AI with MCP
ai = AIWithMCP({
    "database": database_server,
    "filesystem": filesystem_server
})

requests = [
    "Get the list of users from the database",
    "Read the file at /data/report.txt"
]

for req in requests:
    print("=" * 60)
    response = ai.process_request(req)
    print(f"\nAI Response: {response}\n")

## Example 5: A2A - Agent-to-Agent Communication

In [None]:
import json
from datetime import datetime

class Agent:
    """Base agent class for A2A communication"""
    
    def __init__(self, name, role):
        self.name = name
        self.role = role
        self.inbox = []
    
    def send_message(self, to_agent, message_type, content):
        """Send message to another agent"""
        message = {
            "from": self.name,
            "to": to_agent.name,
            "type": message_type,
            "content": content,
            "timestamp": datetime.now().isoformat()
        }
        to_agent.receive_message(message)
        return message
    
    def receive_message(self, message):
        """Receive message from another agent"""
        self.inbox.append(message)
    
    def process_messages(self):
        """Process received messages"""
        for message in self.inbox:
            self.handle_message(message)
        self.inbox = []
    
    def handle_message(self, message):
        """Override in subclasses"""
        pass

# Specialized agents
class ResearchAgent(Agent):
    def handle_message(self, message):
        if message['type'] == 'research_request':
            print(f"\n{self.name}: Researching {message['content']['topic']}...")
            return {
                "findings": f"Research complete on {message['content']['topic']}",
                "sources": 10
            }

class WriterAgent(Agent):
    def handle_message(self, message):
        if message['type'] == 'write_request':
            print(f"\n{self.name}: Writing about {message['content']['topic']}...")
            return {
                "article": f"Article written about {message['content']['topic']}",
                "word_count": 500
            }

class EditorAgent(Agent):
    def handle_message(self, message):
        if message['type'] == 'edit_request':
            print(f"\n{self.name}: Editing content...")
            return {
                "edited": True,
                "improvements": ["Grammar", "Clarity", "Flow"]
            }

print("✓ Agent classes defined")

In [None]:
# Create multi-agent system
research_agent = ResearchAgent("Research-Bot", "Researcher")
writer_agent = WriterAgent("Writer-Bot", "Writer")
editor_agent = EditorAgent("Editor-Bot", "Editor")

# Simulate workflow
print("Multi-Agent Workflow: Creating an Article\n")
print("=" * 60)

# Step 1: Request research
msg1 = research_agent.send_message(
    research_agent,
    "research_request",
    {"topic": "Artificial Intelligence"}
)
print(f"\nMessage 1: {msg1['from']} → {msg1['to']}")
print(f"Type: {msg1['type']}")
print(f"Content: {msg1['content']}")
research_agent.process_messages()

# Step 2: Request writing
msg2 = writer_agent.send_message(
    writer_agent,
    "write_request",
    {"topic": "AI Applications", "research": "..."}
)
print(f"\nMessage 2: {msg2['from']} → {msg2['to']}")
print(f"Type: {msg2['type']}")
writer_agent.process_messages()

# Step 3: Request editing
msg3 = editor_agent.send_message(
    editor_agent,
    "edit_request",
    {"article": "...", "style_guide": "AP"}
)
print(f"\nMessage 3: {msg3['from']} → {msg3['to']}")
print(f"Type: {msg3['type']}")
editor_agent.process_messages()

print("\n" + "=" * 60)
print("\n✓ Article creation complete via agent collaboration!")

## Example 6: A2P - Agent-to-Person Communication

In [None]:
class ConversationalAgent:
    """Agent for human interaction (A2P)"""
    
    def __init__(self, name, personality="helpful"):
        self.name = name
        self.personality = personality
        self.conversation_history = []
    
    def chat(self, user_message):
        """Respond to user message"""
        self.conversation_history.append({
            "role": "user",
            "message": user_message
        })
        
        # Simulate response generation
        response = self._generate_response(user_message)
        
        self.conversation_history.append({
            "role": "assistant",
            "message": response
        })
        
        return response
    
    def _generate_response(self, message):
        """Generate contextual response"""
        # Simplified response logic
        if "hello" in message.lower():
            return f"Hello! I'm {self.name}, your {self.personality} assistant. How can I help you today?"
        elif "help" in message.lower():
            return "I can help you with various tasks. What would you like to know?"
        elif "?" in message:
            return f"That's a great question! Let me help you with that..."
        else:
            return "I understand. Let me assist you with that."
    
    def get_history(self):
        """Get conversation history"""
        return self.conversation_history

# Create conversational agent
assistant = ConversationalAgent("AI-Assistant", "friendly and helpful")

# Simulate conversation
print("A2P Communication Example\n")
print("=" * 60)

conversation = [
    "Hello!",
    "I need help with Python programming",
    "What are the best practices for writing clean code?",
    "Thank you!"
]

for user_msg in conversation:
    print(f"\nUser: {user_msg}")
    response = assistant.chat(user_msg)
    print(f"{assistant.name}: {response}")

print("\n" + "=" * 60)
print(f"\nConversation turns: {len(assistant.get_history()) // 2}")

## Example 7: Complete Multi-Agent System

In [None]:
class Coordinator:
    """Coordinates multiple agents"""
    
    def __init__(self):
        self.agents = {}
    
    def register_agent(self, agent):
        """Register an agent"""
        self.agents[agent.name] = agent
    
    def orchestrate_task(self, task_description):
        """Coordinate agents to complete a task"""
        print(f"\nCoordinator: Starting task - {task_description}\n")
        
        # Break down task and assign to agents
        steps = [
            ("Research-Bot", "research_request", {"topic": task_description}),
            ("Writer-Bot", "write_request", {"topic": task_description}),
            ("Editor-Bot", "edit_request", {"content": "draft"})
        ]
        
        results = []
        for agent_name, msg_type, content in steps:
            if agent_name in self.agents:
                agent = self.agents[agent_name]
                print(f"→ Assigning to {agent_name}...")
                agent.receive_message({
                    "from": "Coordinator",
                    "to": agent_name,
                    "type": msg_type,
                    "content": content
                })
                agent.process_messages()
                results.append(f"{agent_name} completed")
        
        return results

# Create coordinator and register agents
coordinator = Coordinator()
coordinator.register_agent(research_agent)
coordinator.register_agent(writer_agent)
coordinator.register_agent(editor_agent)

print("Complete Multi-Agent System Demo")
print("=" * 60)

# Execute complex task
task = "Write an article about Machine Learning applications"
results = coordinator.orchestrate_task(task)

print("\nTask Results:")
for i, result in enumerate(results, 1):
    print(f"  {i}. {result}")

print("\n✓ Multi-agent task completed successfully!")

---
# Summary

## Multimodal Models:

### MLLM (Multimodal LLM)
- ✅ Input: Text + Images + Audio
- ✅ Output: Primarily text
- ✅ Examples: GPT-4V, Claude 3, Gemini Pro Vision
- ✅ Use: Visual Q&A, document understanding, image analysis

### LMM (Large Multimodal Model)
- ✅ Input: Any modality
- ✅ Output: Multiple modalities
- ✅ Examples: Gemini Ultra, GPT-4o
- ✅ Use: Complex multi-modal tasks, video understanding

## Communication Protocols:

### 1. MCP (Model Context Protocol)
```
AI Model ↔ Tools/Resources
```
- **Purpose**: Standardized tool access
- **Use**: Databases, filesystems, APIs
- **Benefit**: Portable, reusable integrations

### 2. A2A (Agent-to-Agent)
```
AI Agent ↔ AI Agent
```
- **Purpose**: AI collaboration
- **Use**: Multi-agent workflows
- **Benefit**: Specialized expertise, parallel processing

### 3. A2P (Agent-to-Person)
```
AI Agent ↔ Human
```
- **Purpose**: Human-AI interaction
- **Use**: Chatbots, assistants, interfaces
- **Benefit**: Natural language communication

## Key Patterns:

| Pattern | When to Use | Example |
|---------|-------------|----------|
| **Single Agent** | Simple tasks | Basic chatbot |
| **Multi-Agent** | Complex workflows | Research + Write + Edit |
| **With MCP** | Need external data | Database queries |
| **Orchestrated** | Coordinated tasks | Project management |

## Real-World Applications:

1. **Content Creation Pipeline**
   - Research Agent → Writer Agent → Editor Agent → Publisher

2. **Customer Support**
   - Reception → Data Lookup (MCP) → Policy Check → Response

3. **Software Development**
   - Architect → Developer → Tester → Deployment

4. **Data Analysis**
   - Data Collection (MCP) → Analysis → Visualization → Reporting

## Next Steps:
- Build your own multi-agent system
- Implement MCP servers for your tools
- Explore advanced orchestration patterns
- Combine multiple AI models for complex tasks