# Lab: Model Context Protocol (MCP)

## Introduction

The Model Context Protocol (MCP) provides a standardized approach for communication AI powered applications, particularly Large Language Models (LLMs). As AI models become more sophisticated, capable of understanding complex **prompts**, utilizing external **tools**, and accessing diverse **resources**, MCP aims to simplify these interactions by offering a consistent interface.

This lab explores MCP's fundamental concepts, including its typical architecture and core components for managing the flow of **prompts**, data, and instructions for tool use. You'll learn how MCP functions over different transport mechanisms to facilitate structured and capable interactions with AI models.

### Model Context Protocol (MCP)

MCP is designed to define a clear structure for managing LLM **prompts**, manage contextual information (which can include references to **tools** and **resources**), and handle data streams when an application (a client) communicates with a model service (a server). This is not the complete specification of the protocol, but a simplified version to illustrate the concepts of the lab.

Key objectives and potential benefits of using a protocol like MCP include:

* **Improved Interoperability**: Facilitating connections between applications and a variety of model backends.
* **Standardized Patterns**: Promoting common communication methods for leveraging **prompts**, requesting **tool** executions, and receiving model-generated content.
* **Streaming Capabilities**: Supporting the efficient flow of data, useful for real-time or interactive model responses.
* **Rich Context & Instruction Handling**: Providing defined ways to pass not just conversational history, but also detailed **prompts**, and specifications for how models should use available **tools** or access **resources**.
* **Extensibility**: Allowing for future enhancements or specialized data exchanges within the protocol's framework.

MCP often draws on established communication principles while adapting them to the specific needs of AI model interactions, such as managing conversational context, interpreting detailed **prompts**, orchestrating **tool** use, and streaming data.

#### Architecture

MCP interactions generally use a client-server architecture:

<div align="left">
  <img src="pictures/mcp-arch.png" alt="MCP Architecture" width="700">
</div>

1.  **Client**: An application that consumes model services. It forms and sends requests (containing **prompts**, data, and potentially **tool** calls) to the MCP server and processes the received responses.
2.  **Server**: A service that exposes one or more AI models via MCP. It listens for client requests, interprets **prompts**, manages **tool** execution or **resource** access as defined by the protocol, interacts with the model(s), and returns responses according to the protocol.
3.  **Transport Layer**: The underlying mechanism for message transmission. MCP defines the *what* (message structure and interaction flow) more than the *how* of transmission. This lab demonstrates MCP over:
    * Standard Input/Output (STDIO)
    * Server-Sent Events (SSE)
    * Streamable HTTP

This separation allows flexibility in choosing a transport suitable for the application's environment.

#### Components

MCP interactions involve several key elements and types of information:

1.  **Messages**: Defined units of information exchanged. Common types are:
    * `ContextRequest`: From client to server. This is the primary message for conveying the user's intent, including:
        * **Prompts**: The specific instructions or queries for the model.
        * Contextual Data: Relevant background information, conversation history, or data to be processed.
        * **Tool Calls/Resource Requests**: Specifications for any tools the model should use or resources it needs to access (if the MCP implementation supports this directly or via a structured format).
        * Parameters: Configuration for the model's response generation (e.g., temperature, max tokens).
    * `ModelResponseStart`: From server to client, signaling the beginning of a response stream, possibly with metadata.
    * `ModelResponseChunk`: From server to client, containing a part of the model's output (e.g., text, structured data, tool results), enabling streaming.
    * `ModelResponseEnd`: From server to client, indicating the end of the model's output for that request.
    * `ErrorResponse`: From server to client, used to communicate issues or errors.

2.  **Prompts**: The core input that guides the model's behavior. MCP facilitates the clear transmission of simple or complex prompts from the client to the server.

3.  **Tools & Resources**: Mechanisms by which a model can interact with external systems or data:
    * **Tools**: Pre-defined functions or capabilities that the model can be instructed to use (e.g., a code interpreter, a search engine API, a database query function). MCP messages can carry requests to invoke such tools and return their outputs.
    * **Resources**: External data sources or knowledge bases that the model might need to access to fulfill a request. MCP can facilitate referencing or providing these resources.
    * *(The specifics of how tools and resources are defined and invoked can vary between MCP implementations, sometimes handled through structured data within messages or via specific message types if the protocol is highly tool-oriented.)*

4.  **Streaming**: A core feature where the model's response (which might include text, or results from tool use) can be sent back in parts (chunks) as it's generated.

#### Protocol (Interaction Flow)

A typical interaction sequence in MCP includes:

1.  **Connection**: The client connects to the MCP server via the chosen transport.
2.  **Request**: The client sends a `ContextRequest` message with the **prompt**, relevant context, parameters, and any **tool** or **resource** specifications.
    *Below is an illustrative structure for data within a request (often JSON):*
    ```json
    {
      "context": {
        "prompt": "What's the weather in London and summarize the top news from there?",
        "history": []
      },
      "tools": [ // Example of how tool specification might look
        {
          "type": "function",
          "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}
          }
        },
        {
          "type": "function",
          "function": {
            "name": "get_news_summary",
            "description": "Get top news for a city",
            "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}
          }
        }
      ],
      "params": { // Model parameters
        "temperature": 0.5
      }
    }
    ```
3.  **Response Initiation (Optional but common for streaming)**: The server may send a `ModelResponseStart` message.
4.  **Data Streaming / Tool Execution**: The server processes the request. This may involve invoking tools as specified. It sends `ModelResponseChunk` messages as parts of the response (text from the model, results from tools) become available.
5.  **Response Completion**: The server sends a `ModelResponseEnd` message once all data for the request has been sent.
6.  **Error Reporting**: If an issue arises, the server sends an `ErrorResponse`.
7.  **Disconnection**: The connection may be closed or kept alive for subsequent interactions.

This structured approach helps create more predictable and maintainable integrations with AI models, encompassing complex interactions involving prompts, tools, and resources.

## Objectives

By the end of this lab, you will:

- Understand the core principles of the Model Context Protocol (MCP) and why it's valuable for LLM applications
- Explore different transport mechanisms for MCP implementation (STDIO, SSE, and HTTP)
- Implement basic MCP clients and servers using Python
- Learn how to integrate LLMs with MCP
- Gain practical experience with standardized protocols for AI communication

This lab provides hands-on experience with an emerging standard in the AI ecosystem, giving you the skills to build more interoperable and flexible LLM-powered applications.

## Getting Started

The following bullets must be ran prior to executing notebooks for running this lab:
  1. uv installed and available on PATH with python 3.12 venv

      - Linux/MacOS:
          - `curl -sSL https://get.uv.dev | sh`
          - `source ~/.bashrc`
          - `curl -LsSf https://astral.sh/uv/install.sh | sh`
          - `source $HOME/.local/bin/env`
          - `uv python install 3.12`
          - `uv venv -p 3.12`
          - `source .venv/bin/activate`
          - `uv pip install ipykernel`

      - Windows:
          - TODO
  - Select the venv as the notebook kernel
  <div align="left">
    <img src="pictures/kernel.png" alt="VSCode Juypter UI hint" width="800">
  </div>

**MUST restart Juypter kernel if automated install dependencies cell is ran**
<div align="left">
  <img src="pictures/restart.png" alt="VSCode Juypter UI hint" width="800">
</div>

In [None]:
"""Install platform chat & mcp dependencies."""
!cd ../../.. && uv pip install --quiet -e .[chat] mcp
# Shut down the kernel so user must restart it to apply new pip installations.
# This is a workaround for the fact that Jupyter does not automatically
# pick up new installations in the current kernel.
!echo "Kernel will shut down to apply new pip installations, manual restart required."
import os
os._exit(00)

In [None]:
"""Setup background process manager for sse/http server demos."""
from src.notebooks.platform_services.lablib.bg_util import start_background_process, kill_background_process, kill_all_background_processes
from time import sleep
import subprocess as sp
print("lablib.util process management functions are ready.")

In [None]:
"""Set up the lab chat model."""
from src.notebooks.platform_services.lablib.env_util import set_services_env

_, _, _ = set_services_env()

print("Chat env initialized successfully.")

## STDIO Transport

The STDIO (Standard Input/Output) transport is the simplest MCP implementation. It uses standard input and output streams to exchange MCP messages between client and server processes.

### Key Features

- **Simplicity**: Easy to implement with minimal dependencies
- **Process-to-Process**: Ideal for communication between parent and child processes
- **Piping Support**: Works well with Unix-style pipes and redirects

### How It Works

1. Client sends JSON-formatted MCP requests to the server's standard input
2. Server processes the requests and responds via its standard output
3. Client reads and parses the JSON responses from the server's output

In [None]:
"""Run the STDIO demo"""
stdio_client_command = "uv run --active lablib/mcp/stdio/client.py"
client_result = sp.run(stdio_client_command, shell=True, capture_output=True, text=True)
print("--- STDIO Client Output ---")
print(client_result.stdout)
if client_result.stderr:
    print("--- STDIO Client Errors ---")
    print(client_result.stderr)
print("-------------------------")

## SSE Transport

Server-Sent Events (SSE) is a transport mechanism that enables servers to push updates to clients over HTTP connections.

### Key Features

- **One-Way Streaming**: Server can push multiple events to the client over a single connection
- **Simple HTTP-Based**: Uses standard HTTP, making it firewall-friendly and easy to implement
- **Automatic Reconnection**: Browsers handle connection interruptions automatically
- **Event Typing**: Supports different event types for structured communication

### How It Works

1. Client establishes an HTTP connection to the server requesting an SSE stream
2. Server keeps the connection open and sends events as they occur
3. Each event is formatted as a text block with `event`, `data`, and `id` fields
4. Client processes these events as they arrive


In [None]:
"""Run the SSE demo"""
sse_server_command = "uv run lablib/mcp/sse/server.py"
sse_client_command = "uv run lablib/mcp/sse/client.py"

print(f"Starting SSE server: {sse_server_command}")
server_proc = start_background_process("sse_server", sse_server_command)

if server_proc and server_proc.poll() is None:
    print(f"Running SSE client: {sse_client_command}")
    client_result = sp.run(sse_client_command, shell=True, capture_output=True, text=True)
    print("--- SSE Client Output ---")
    print(client_result.stdout)
    if client_result.stderr:
        print("--- SSE Client Errors ---")
        print(client_result.stderr)
    print("-------------------------")
else:
    print("SSE server failed to start or exited prematurely. Client will not run.")

print("Killing SSE server...")
kill_background_process("sse_server")

In [None]:
"""Run the SSE Copilot demo"""
sse_server_command = "uv run lablib/mcp/sse/server.py"

print(f"Starting SSE server: {sse_server_command}")
server_proc = start_background_process("sse_server", sse_server_command)

output = """
        "my-test-mcp": {
            "url": "http://localhost:8000/math/sse"
        }
"""

print(f"Add the following to your llmesh/.vscode/mcp.json file:\n{output}")

In [None]:
"""Cleanup sse process"""
print("Manually killing sse server...")
kill_background_process("sse_server")

## Streamable-HTTP Transport

The Streamable-HTTP transport is the go forward replacement for SSE.

### Key Features

- **Universal Compatibility**: Works with standard HTTP clients and servers
- **RESTful Design**: Follows standard RESTful API conventions

### How It Works

1. Client sends an HTTP POST request with an MCP request object
2. Server responds with chunked transfer encoding (HTTP 1.1)
3. Each chunk contains a complete MCP event object (usually a delta update)
4. Client processes chunks as they arrive and assembles the complete response

**Note**: There is currently no python sdk client for this transport hence the typescript client is used via the mcp_inspector.

<div align="left">
  <img src="pictures/inspector.png" alt="Inspector screen shot" width="800">
</div>

In [None]:
"""Run the HTTP demo"""

# Ensure background processes are killed before starting new ones
kill_all_background_processes()

# Start the HTTP server
start_background_process("mcp_server", "uv run lablib/mcp/streamable/server.py")

# Start the mcp inspector
start_background_process("mcp_inspector", "npx @modelcontextprotocol/inspector")

print("MCP inspector UI available at http://localhost:6274/")
print("MCP Streamable HTTP demo server available at http://localhost:8000/math/mcp")


In [None]:
# Manually kill streamable-http server and MCP inspector
# Will be automatically killed on kernel restart/exit
print("Manually terminating MCP servers...")
kill_background_process("mcp_server")
kill_background_process("mcp_inspector")

In [None]:
# To kill ALL processes managed by lablib.util (if you started others):
print("Manually terminating ALL lablib-managed background processes...")
kill_all_background_processes()