# Multi-Agent Communication and Coordination with ZeroMQ

This notebook demonstrates how multiple agents can communicate and coordinate using ZeroMQ (ZMQ) to accomplish a common, distributed task. We'll implement a simple pipeline pattern:

1.  **Task Generator Agent**: Creates tasks and distributes them.
2.  **Task Processor Agent(s)**: Receive tasks, perform simulated work, and send back results.
3.  **Result Collector Agent**: Gathers results from the processor agents.

This example uses ZMQ's PUSH/PULL socket types, which are well-suited for task distribution and collection pipelines.

## 1. Setup

Import necessary libraries. `pyzmq` is the Python binding for ZeroMQ. We'll also use `threading` to run agents concurrently, `time` for simulating work, `json` for message serialization, and `uuid` for unique task IDs.

In [None]:
import zmq
import time
import threading
import json
import uuid
from typing import Dict, Any

print(f"Using PyZMQ version: {zmq.pyzmq_version()}")

## 2. ZMQ Configuration and Message Structures

Define the network addresses for our ZMQ sockets and simple structures for messages.

In [None]:
# Addresses for ZMQ sockets
TASK_VENTILATOR_ADDR = "tcp://127.0.0.1:5557"  # For sending tasks from generator to processors
RESULT_COLLECTOR_ADDR = "tcp://127.0.0.1:5558" # For sending results from processors to collector
CONTROL_SYNC_ADDR = "tcp://127.0.0.1:5559" # For synchronizing start of workers

# Message structure examples (can be formalized with Pydantic if needed)
def create_task_message(data: Any) -> Dict[str, Any]:
    return {"task_id": str(uuid.uuid4()), "data": data, "timestamp": time.time()}

def create_result_message(task_id: str, result_data: Any, worker_id: str) -> Dict[str, Any]:
    return {"task_id": task_id, "result_data": result_data, "worker_id": worker_id, "timestamp": time.time()}

## 3. Agent Implementations

Define the functions that will represent our agents.

### 3.1 Task Generator Agent

This agent generates a specified number of tasks and PUSHes them to the Task Processor agents.

In [None]:
def task_generator_agent(num_tasks: int, num_workers: int):
    """Generates tasks and sends them to processor agents."""
    context = zmq.Context()
    
    # Socket to send messages on (PUSH)
    ventilator_socket = context.socket(zmq.PUSH)
    ventilator_socket.bind(TASK_VENTILATOR_ADDR)
    print(f"[Generator] Task ventilator bound to {TASK_VENTILATOR_ADDR}")

    # Socket for worker synchronization (PUB)
    sync_socket = context.socket(zmq.PUB)
    sync_socket.bind(CONTROL_SYNC_ADDR)
    print(f"[Generator] Sync service bound to {CONTROL_SYNC_ADDR}")

    # Wait for all workers to connect (simple sync)
    # In a real system, use a more robust handshake or service discovery
    print(f"[Generator] Waiting for {num_workers} worker(s) to be ready...")
    time.sleep(num_workers * 0.5 + 1) # Give workers time to start and connect
    
    # Send synchronization signal to all workers
    print("[Generator] Sending start signal to workers.")
    sync_socket.send_string("START")

    print(f"[Generator] Starting to send {num_tasks} tasks...")
    for i in range(num_tasks):
        task_data = f"Task_{i+1}_payload"
        message = create_task_message(task_data)
        ventilator_socket.send_json(message)
        print(f"[Generator] Sent task: {message['task_id']} ({task_data})")
        time.sleep(0.1)  # Simulate some delay between sending tasks
    
    # Send a sentinel value to indicate end of tasks for each worker
    # This is a simple way to signal workers to shut down
    for _ in range(num_workers):
        ventilator_socket.send_json({"task_id": "END_OF_TASKS", "data": None})
        print("[Generator] Sent END_OF_TASKS signal.")

    print("[Generator] All tasks sent. Closing sockets.")
    ventilator_socket.close()
    sync_socket.close()
    context.term()
    print("[Generator] Finished.")

### 3.2 Task Processor Agent

This agent (of which there can be multiple instances) PULLs tasks, simulates processing, and PUSHes results.

In [None]:
def task_processor_agent(worker_id: str):
    """Receives tasks, processes them, and sends results."""
    context = zmq.Context()
    
    # Socket to receive messages from (PULL)
    receiver_socket = context.socket(zmq.PULL)
    receiver_socket.connect(TASK_VENTILATOR_ADDR)
    print(f"[Worker-{worker_id}] Connected to task ventilator at {TASK_VENTILATOR_ADDR}")

    # Socket to send results to (PUSH)
    results_sender_socket = context.socket(zmq.PUSH)
    results_sender_socket.connect(RESULT_COLLECTOR_ADDR)
    print(f"[Worker-{worker_id}] Connected to result collector at {RESULT_COLLECTOR_ADDR}")

    # Socket to receive synchronization signal (SUB)
    sync_subscriber = context.socket(zmq.SUB)
    sync_subscriber.connect(CONTROL_SYNC_ADDR)
    sync_subscriber.subscribe("START") # Subscribe to the START signal
    print(f"[Worker-{worker_id}] Subscribed to sync service at {CONTROL_SYNC_ADDR}")

    # Wait for the start signal
    print(f"[Worker-{worker_id}] Waiting for start signal...")
    sync_message = sync_subscriber.recv_string()
    if sync_message == "START":
        print(f"[Worker-{worker_id}] Received START signal. Beginning task processing.")
    else:
        print(f"[Worker-{worker_id}] Received unexpected sync message: {sync_message}. Exiting.")
        receiver_socket.close()
        results_sender_socket.close()
        sync_subscriber.close()
        context.term()
        return

    while True:
        try:
            task_message = receiver_socket.recv_json()
            task_id = task_message.get("task_id")
            
            if task_id == "END_OF_TASKS":
                print(f"[Worker-{worker_id}] Received END_OF_TASKS. Shutting down.")
                # Forward the END_OF_TASKS signal to the collector so it knows this worker is done
                results_sender_socket.send_json({"task_id": "WORKER_DONE", "worker_id": worker_id})
                break
            
            task_data = task_message.get("data")
            print(f"[Worker-{worker_id}] Received task: {task_id} ({task_data})")
            
            # Simulate work
            processing_time = 0.5 + (int(worker_id) * 0.1) # Vary processing time slightly per worker
            time.sleep(processing_time)
            result_data = f"Result_for_({task_data})_by_Worker-{worker_id}"
            
            result_message = create_result_message(task_id, result_data, worker_id)
            results_sender_socket.send_json(result_message)
            print(f"[Worker-{worker_id}] Sent result for task: {task_id}")
            
        except zmq.error.ContextTerminated:
            print(f"[Worker-{worker_id}] Context terminated, exiting loop.")
            break
        except Exception as e:
            print(f"[Worker-{worker_id}] Error processing task: {e}")
            # Potentially send an error message back or log it
            break # Exit on error for simplicity

    print(f"[Worker-{worker_id}] Closing sockets.")
    receiver_socket.close()
    results_sender_socket.close()
    sync_subscriber.close()
    context.term()
    print(f"[Worker-{worker_id}] Finished.")

### 3.3 Result Collector Agent

This agent PULLs results from the Task Processor agents and prints them.

In [None]:
def result_collector_agent(num_expected_tasks: int, num_workers: int):
    """Collects results from processor agents."""
    context = zmq.Context()
    
    # Socket to receive results on (PULL)
    results_receiver_socket = context.socket(zmq.PULL)
    results_receiver_socket.bind(RESULT_COLLECTOR_ADDR)
    print(f"[Collector] Result collector bound to {RESULT_COLLECTOR_ADDR}")

    collected_results = []
    workers_done_count = 0

    print(f"[Collector] Waiting for results... Expected {num_expected_tasks} task results and {num_workers} worker done signals.")
    
    # We expect num_expected_tasks results + num_workers WORKER_DONE signals
    total_expected_messages = num_expected_tasks + num_workers
    received_messages = 0

    while received_messages < total_expected_messages:
        try:
            result_message = results_receiver_socket.recv_json()
            received_messages += 1
            task_id = result_message.get("task_id")

            if task_id == "WORKER_DONE":
                workers_done_count += 1
                print(f"[Collector] Worker {result_message.get('worker_id')} reported done. ({workers_done_count}/{num_workers} workers done)")
            else:
                collected_results.append(result_message)
                print(f"[Collector] Received result for task: {task_id} from Worker-{result_message.get('worker_id')}. Data: {result_message.get('result_data')}")
            
        except zmq.error.ContextTerminated:
            print("[Collector] Context terminated, exiting loop.")
            break
        except Exception as e:
            print(f"[Collector] Error receiving result: {e}")
            break # Exit on error for simplicity

    print(f"\n[Collector] --- Summary ---")
    print(f"[Collector] Collected {len(collected_results)} results out of {num_expected_tasks} expected task results.")
    print(f"[Collector] {workers_done_count} workers reported done.")
    # for res in collected_results:
    #     print(f"  - Task {res['task_id']}: {res['result_data']} (from Worker-{res['worker_id']})")
    
    print("[Collector] Closing socket.")
    results_receiver_socket.close()
    context.term()
    print("[Collector] Finished.")

## 4. Orchestration and Demonstration

Now, let's run these agents concurrently using threads.

In [None]:
if __name__ == "__main__":
    NUM_TASKS_TO_GENERATE = 10
    NUM_PROCESSOR_WORKERS = 2

    print("--- Starting Multi-Agent ZMQ Demonstration ---")

    # Create threads for each agent
    collector_thread = threading.Thread(target=result_collector_agent, args=(NUM_TASKS_TO_GENERATE, NUM_PROCESSOR_WORKERS))
    
    worker_threads = []
    for i in range(NUM_PROCESSOR_WORKERS):
        worker_id = str(i + 1)
        thread = threading.Thread(target=task_processor_agent, args=(worker_id,))
        worker_threads.append(thread)
        
    # Generator should start after workers are ready to connect to sync service
    # but before they start processing. Collector can start anytime.
    generator_thread = threading.Thread(target=task_generator_agent, args=(NUM_TASKS_TO_GENERATE, NUM_PROCESSOR_WORKERS))

    # Start threads
    print("\nStarting Result Collector Agent...")
    collector_thread.start()
    
    time.sleep(0.5) # Give collector a moment to bind

    print("\nStarting Task Processor Agents...")
    for thread in worker_threads:
        thread.start()
        
    time.sleep(1) # Give workers time to connect to sync service

    print("\nStarting Task Generator Agent...")
    generator_thread.start()

    # Wait for all threads to complete
    print("\nWaiting for agents to finish...")
    generator_thread.join()
    print("Generator agent joined.")
    for thread in worker_threads:
        thread.join()
    print("All worker agents joined.")
    collector_thread.join()
    print("Collector agent joined.")

    print("\n--- Multi-Agent ZMQ Demonstration Complete ---")

## 5. Conclusion

This notebook demonstrated a basic multi-agent system using ZeroMQ for communication and coordination. We saw:
- A **Task Generator** distributing work using a PUSH socket.
- Multiple **Task Processors** receiving work via PULL sockets, processing it, and sending results via PUSH sockets.
- A **Result Collector** gathering results using a PULL socket.
- A simple synchronization mechanism using PUB/SUB to coordinate the start of workers.

**Potential Extensions and Other Patterns:**
- **Error Handling**: More robust error reporting from workers.
- **Dynamic Workers**: Workers could join and leave the pool dynamically.
- **Load Balancing**: ZMQ's REQ/REP or DEALER/ROUTER patterns can offer more sophisticated load balancing.
- **PUB/SUB for Broadcasts**: If agents need to broadcast information (e.g., state changes, events), PUB/SUB is suitable.
- **Complex Task Payloads**: Using structured data formats like JSON (as shown) or Protocol Buffers for messages.
- **Integration with AI Logic**: Each `task_processor_agent` could internally use an AI engine (like `AIEngine` from your framework) to perform its task.

ZeroMQ provides a flexible and powerful toolkit for building distributed applications, including multi-agent systems. The choice of ZMQ socket types and patterns depends heavily on the specific communication requirements of the agents.