# Introduction to the Model Context Protocol (MCP)
**For Computational Linguistics & NLP Students**

## 1. What is MCP?

In the world of software engineering, connecting different systems usually involves writing custom "glue code" for every single connection. If you wanted Claude, ChatGPT, and a local open-source model to all talk to your specific dataset, you would historically have to build three separate integrations.

**The Model Context Protocol (MCP)** is an open standard that solves this $m \times n$ problem. Think of it like a **USB-C port for AI applications**.

* **USB-C:** A standard port that lets you plug a mouse, keyboard, or drive into *any* computer without writing a new driver for each specific device.
* **MCP:** A standard protocol that lets you plug data (like a text corpus) and tools (like a tokenizer) into *any* LLM application (like Claude Desktop or an IDE) without custom integration code.

### Core Concepts
MCP breaks interaction down into three primary primitives:

1.  **Resources:** Data that the model can read (like files, database rows, or API responses). Think of this as the "File GET" capability.
2.  **Tools:** Functions that the model can execute (like calculating statistics, running code, or querying a search engine). Think of this as the "Function Call" capability.
3.  **Prompts:** Pre-defined templates that help users use the server's capabilities effectively.

---

## 2. MCP vs. RESTful APIs: A Deep Dive

As students familiar with backend development (like FastAPI), you are likely used to REST APIs. It is crucial to understand that MCP is not just "another API style"; it represents a fundamental shift in **who** the client is and **how** capabilities are discovered.

### 2.1 The Philosophical Divide: Actions vs. Resources

The most important distinction between MCP and REST is their fundamental design philosophy:

**REST APIs are resource-oriented** (focused on nouns):
- They model the world as a collection of resources: `GET /blogs`, `POST /users`
- They use a fixed set of HTTP verbs (GET, POST, PUT, DELETE) for CRUD operations
- Designed for **Human-Computer Interfaces (HCI)**: Developers read documentation and write code

**MCP is action-oriented** (focused on verbs):
- It models the world as a collection of capabilities: `publishBlog()`, `analyzeText()`
- It uses method names that directly express what the agent wants to do
- Designed for **Agent-Computer Interfaces (ACI)**: AI models discover and invoke tools

### 2.2 Detailed Comparison Table

| Feature | REST API | Model Context Protocol (MCP) |
| :--- | :--- | :--- |
| **Primary User** | A Software Developer (writing code to hit endpoints) | An AI Model (discovering tools to solve problems) |
| **Design Philosophy** | Resource-centric ("What data exists?") | Action-centric ("What can I do?") |
| **Discovery** | **Manual**: You read Swagger/OpenAPI docs to know endpoints exist. | **Automatic**: The protocol includes a handshake where the server tells the LLM, "Here is everything I can do." |
| **State** | **Stateless** (usually): Each HTTP request is independent. | **Stateful Session**: A persistent connection (JSON-RPC over stdio/SSE) maintains the context lifecycle. |
| **Communication** | **HTTP verbs**: GET, POST, PUT, DELETE applied to resources | **RPC Methods**: Direct function calls like `calculateTTR(text)` |
| **Semantics** | **Low-level**: "Update the blog with ID 123" | **High-level**: "Archive all old blog posts from 2023" |
| **Error Handling** | HTTP Status Codes (404, 500). | Structured text/JSON that the LLM reads to self-correct (e.g., "Tool Error: File not found, try listing files first"). |
| **Transactionality** | No built-in support; each request is atomic | Can encapsulate multi-step operations server-side |
| **Message Format** | HTTP with various content types | JSON-RPC 2.0 |
| **Transport Options** | HTTP/HTTPS only | stdio, Server-Sent Events (SSE), Streamable HTTP, WebSockets |

**Key Takeaway:** REST is designed for *rigid, pre-defined* machine-to-machine communication. MCP is designed for *flexible, exploratory* Model-to-Context communication.

### 2.3 Why Wrapping REST APIs as MCP is Problematic

Many early MCP implementations simply create a thin wrapper over existing REST APIs. This creates what's called an **"impedance mismatch"** - a fundamental incompatibility between two design paradigms:

**Problem 1: Lost Semantic Meaning**
```python
# What an agent WANTS to do:
archiveOldBlogPosts(beforeDate="2023-01-01")

# What a REST wrapper forces it to do:
# 1. GET /api/blogs?status=published&beforeDate=2023-01-01
# 2. For each blog ID: PUT /api/blogs/{id} with {"status": "archived"}
```
The agent loses the high-level "archive" action and must orchestrate low-level CRUD operations.

**Problem 2: Transactionality Nightmares**
Consider `transferBlogPostOwnership(blogId, fromAuthorId, toAuthorId)`. This requires:
1. Verifying `fromAuthorId` owns the blog
2. Updating the blog's `authorId`
3. Logging the transfer

With REST, if step 2 fails after step 1, you have an inconsistent state. An RPC can encapsulate this entire logic server-side, ensuring atomicity.

**Problem 3: Inefficient, "Chatty" Interactions**
```python
# What an agent wants: increment like count
incrementLikeCount(blogId)

# What a REST wrapper does:
# 1. GET /api/blogs/123  (fetch entire blog object)
# 2. Modify like_count locally
# 3. PUT /api/blogs/123  (send entire blog object back)
```
This is wasteful compared to a direct RPC.

---

## 3. Technical Architecture: How MCP Works

### 3.1 The Three-Layer Architecture

MCP follows a clean three-layer design:

```
┌─────────────────────────────────────────┐
│   Protocol Layer (Application Logic)    │
│   • MCP Client / MCP Server             │
│   • Capability Negotiation              │
│   • Tool/Resource/Prompt Management     │
└──────────────────┬──────────────────────┘
                   │
┌──────────────────▼──────────────────────┐
│      Session Layer (State Management)   │
│   • Connection Lifecycle                │
│   • Request/Response Correlation        │
│   • Timeout & Error Handling            │
└──────────────────┬──────────────────────┘
                   │
┌──────────────────▼──────────────────────┐
│    Transport Layer (Message Delivery)   │
│   • JSON-RPC Serialization              │
│   • stdio / SSE / HTTP / WebSockets     │
│   • Message Framing & Delimiters        │
└─────────────────────────────────────────┘
```

### 3.2 JSON-RPC 2.0: The Message Format

At its core, MCP uses **JSON-RPC 2.0** for all communication. This provides three message types:

**1. Request (expects a response):**
```json
{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "calculate_ttr",
    "arguments": {
      "text": "The quick brown fox jumps over the lazy dog."
    }
  }
}
```

**2. Response (reply to a request):**
```json
{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "0.89"
      }
    ]
  }
}
```

**3. Notification (one-way message, no response expected):**
```json
{
  "jsonrpc": "2.0",
  "method": "notifications/progress",
  "params": {
    "progressToken": "token123",
    "progress": 50,
    "total": 100
  }
}
```

**Why JSON-RPC?**
- **Lightweight**: Minimal overhead compared to SOAP or GraphQL
- **Language-agnostic**: JSON is universally supported
- **Proven**: Used successfully in Language Server Protocol (LSP) - the inspiration for MCP
- **Simple**: Only three message types to understand

### 3.3 Transport Options

MCP is **transport-agnostic**, supporting multiple communication channels:

#### **1. stdio (Standard Input/Output)**
- **Use case**: Local integrations where server runs as a subprocess
- **How it works**: Client launches server as child process, communicates via stdin/stdout pipes
- **Pros**: Simple, low latency, no network overhead
- **Cons**: Only works locally
- **Example**: A linguistic annotation tool running as a local subprocess

#### **2. Server-Sent Events (SSE) + HTTP POST**
- **Use case**: Remote servers that need to push updates to clients
- **How it works**: 
  - Client → Server: HTTP POST requests
  - Server → Client: SSE stream for continuous updates
- **Pros**: Real-time streaming, works through firewalls
- **Cons**: One-directional streaming
- **Example**: A corpus service that streams new texts as they're added

#### **3. Streamable HTTP**
- **Use case**: Modern web applications
- **How it works**: Bidirectional streaming over HTTP
- **Pros**: Stateful sessions, resumable connections
- **Cons**: More complex implementation
- **Example**: Enterprise-scale NLP services

#### **Message Framing**
For stdio transport, messages are typically delimited using:
```
Content-Length: 157\r\n
\r\n
{"jsonrpc":"2.0","method":"tools/call",...}
```
This ensures reliable parsing even when messages are large or arrive in fragments.

### 3.4 Connection Lifecycle

Every MCP session follows a structured lifecycle:

```
CLIENT                          SERVER
  │                               │
  ├──── initialize ──────────────>│  (1) Negotiate capabilities
  │<───── capabilities ───────────┤      Client: "I support sampling"
  │                               │      Server: "I have tools, resources"
  │                               │
  ├──── initialized ─────────────>│  (2) Confirm ready to operate
  │                               │
  │   ╔═══ ACTIVE SESSION ═══╗   │
  │   ║                       ║   │
  ├──── tools/list ──────────────>│  (3) Discover available tools
  │<───── tool_list ───────────────┤
  │                               │
  ├──── resources/read ──────────>│  (4) Access data
  │<───── resource_content ────────┤
  │                               │
  ├──── tools/call ──────────────>│  (5) Execute operations
  │<───── call_result ─────────────┤
  │   ║                       ║   │
  │   ╚═══════════════════════╝   │
  │                               │
  ├──── [close connection] ──────>│  (6) Clean shutdown
  │                               │
```

**Key phases:**
1. **Initialization**: Exchange version info and capabilities
2. **Discovery**: Client learns what tools/resources are available
3. **Operation**: Normal tool calls and resource access
4. **Shutdown**: Graceful connection closure

---

## 4. MCP's Design Principles for AI Agents

### 4.1 Design for Actions, Not Data Manipulation

When building MCP servers, think about the **goals** an agent wants to achieve, not the data structures.

**❌ Bad (REST-style thinking):**
```python
@mcp.tool()
def get_corpus_entry(entry_id: str) -> dict:
    """Get a corpus entry by ID"""
    return database.get(entry_id)

@mcp.tool()
def update_corpus_entry(entry_id: str, data: dict) -> dict:
    """Update a corpus entry"""
    return database.update(entry_id, data)
```

**✅ Good (Action-oriented thinking):**
```python
@mcp.tool()
def annotate_text_with_pos_tags(text_id: str, tagger: str = "spacy") -> str:
    """Add part-of-speech tags to a text in the corpus.
    
    Args:
        text_id: The ID of the text to annotate
        tagger: The POS tagger to use (spacy, nltk, or stanza)
    
    Returns:
        The annotated text with inline POS tags
    """
    text = corpus.get_text(text_id)
    tagged = pos_tag(text, tagger=tagger)
    corpus.save_annotation(text_id, "pos", tagged)
    return tagged

@mcp.tool()
def find_texts_by_lexical_complexity(
    min_ttr: float, 
    max_ttr: float
) -> list[str]:
    """Find corpus texts within a TTR range.
    
    Returns:
        List of text IDs matching the complexity criteria
    """
    results = []
    for text_id in corpus.list_ids():
        text = corpus.get_text(text_id)
        ttr = calculate_ttr(text)
        if min_ttr <= ttr <= max_ttr:
            results.append(text_id)
    return results
```

Notice how the good examples:
- Express **what the agent wants to accomplish** ("annotate with POS tags", "find by complexity")
- Encapsulate **domain logic** (choosing a tagger, calculating TTR)
- Handle **multi-step operations** atomically

### 4.2 Safety and Security in MCP

One of MCP's key advantages over letting models call APIs directly is **controlled access**:

**1. Secret Management**
```python
import os
from fastmcp import FastMCP

mcp = FastMCP("SecureCorpus")

@mcp.tool()
def query_proprietary_corpus(query: str) -> str:
    """Search our proprietary corpus (requires authentication)."""
    # API key is stored server-side - model never sees it
    api_key = os.environ["CORPUS_API_KEY"]
    return corpus_api.search(query, api_key=api_key)
```
The LLM only sees `query_proprietary_corpus("syntax trees")` - it never has access to credentials.

**2. Input Validation**
```python
@mcp.tool()
def delete_annotation(
    text_id: str, 
    annotation_type: str
) -> str:
    """Delete an annotation from a text."""
    # Validate inputs before executing destructive operations
    if annotation_type not in ["pos", "ner", "dependency"]:
        return "Error: Invalid annotation type. Must be pos, ner, or dependency."
    
    if not corpus.text_exists(text_id):
        return f"Error: Text {text_id} not found."
    
    corpus.delete_annotation(text_id, annotation_type)
    return f"Successfully deleted {annotation_type} annotation from {text_id}"
```

**3. Rate Limiting and Auditing**
```python
from datetime import datetime
import logging

logger = logging.getLogger(__name__)

@mcp.tool()
def run_expensive_analysis(text_id: str) -> dict:
    """Run computationally expensive linguistic analysis."""
    # Log all tool invocations for audit trail
    logger.info(f"[{datetime.now()}] Expensive analysis requested for {text_id}")
    
    # Implement rate limiting
    if not rate_limiter.check_quota():
        return {"error": "Rate limit exceeded. Try again in 1 hour."}
    
    return perform_analysis(text_id)
```

**4. Sandboxing and Permissions**
MCP servers can run with restricted file system access:
```python
# Server only has read access to /data/corpus/
# Cannot access /etc/ or other sensitive directories
@mcp.resource("corpus://text/{text_id}")
def get_text(text_id: str) -> str:
    # Path is constrained to safe directory
    safe_path = CORPUS_DIR / f"{text_id}.txt"
    if not safe_path.exists():
        return "Error: Text not found"
    return safe_path.read_text()
```

---

## 5. Demo: A "Linguist's Helper" MCP Server

We will use `fastmcp`, a Python library that makes building MCP servers as easy as writing FastAPI applications. 

In this demo, we simulate a server that manages a small linguistic corpus. It provides:
1.  **A Resource** allowing the LLM to read texts.
2.  **A Tool** allowing the LLM to calculate the **Type-Token Ratio (TTR)**, a measure of vocabulary variation.
3.  **A Prompt** helping the user analyze the text.

In [None]:
# First, ensure you have the library installed
# %pip install fastmcp

In [None]:
from typing import Any
from fastmcp import FastMCP

# 1. Initialize the Server
# This is analogous to `app = FastAPI()`
mcp = FastMCP("LinguistHelper")

# --- Mock Data Layer ---
# In a real app, this might be a database or a directory of .txt files
CORPUS = {
    "en_sample_01": "The quick brown fox jumps over the lazy dog.",
    "en_sample_02": "To be, or not to be, that is the question.",
    "zh_sample_01": "學而不思則罔,思而不學則殆。"  # "Learning without thought is labor lost..."
}

# --- RESOURCES: Data Retrieval ---

@mcp.resource("corpus://{text_id}")
def get_text(text_id: str) -> str:
    """Retrieve raw text content from the corpus by its ID."""
    return CORPUS.get(text_id, "Error: Text ID not found in corpus.")

@mcp.resource("corpus://list")
def list_texts() -> str:
    """List all available text IDs in the corpus."""
    return "\n".join(CORPUS.keys())

# --- TOOLS: Computational Functions ---

@mcp.tool()
def calculate_ttr(text: str) -> float:
    """
    Calculate the Type-Token Ratio (TTR) of a text string.
    TTR = (Unique Words / Total Words). 
    Higher TTR indicates greater lexical diversity.
    """
    if not text:
        return 0.0
    
    # Simple whitespace tokenization for demo purposes
    # In a real NLP app, we would use spaCy or NLTK here
    tokens = text.lower().split()
    types = set(tokens)
    
    if len(tokens) == 0:
        return 0.0
        
    return len(types) / len(tokens)

# --- PROMPTS: Pre-defined workflows ---

@mcp.prompt()
def analyze_text_complexity(text_id: str) -> str:
    """Create a prompt to guide the LLM in analyzing a text's complexity."""
    return f"Please analyze the linguistic complexity of the text located at corpus://{text_id}. First, use the 'calculate_ttr' tool to find its Type-Token Ratio. Then, review the content and explain if the TTR score matches your qualitative assessment of the vocabulary."

# --- Running the Server ---

if __name__ == "__main__":
    # In a real deployment, this runs a loop listening for connections.
    # For a notebook demo, we simply define the server.
    # You would typically run this script via terminal: `fastmcp run my_script.py`
    print("MCP Server 'LinguistHelper' defined successfully.")
    print("Resources:", mcp._resource_manager._resources.keys())
    print("Tools:", mcp._tool_manager._tools.keys())


## 6. How It Works Under the Hood

When you connect an LLM client (like Claude Desktop) to this script:

### Step-by-Step Execution Flow

**1. Connection & Initialization**
```json
// Client sends:
{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
    "protocolVersion": "2024-11-05",
    "capabilities": {"sampling": {}},
    "clientInfo": {"name": "Claude", "version": "1.0"}
  }
}

// Server responds:
{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "protocolVersion": "2024-11-05",
    "capabilities": {
      "tools": {},
      "resources": {"subscribe": false}
    },
    "serverInfo": {"name": "LinguistHelper", "version": "1.0"}
  }
}
```

**2. Tool Discovery**
```json
// Client: "What can you do?"
{
  "jsonrpc": "2.0",
  "id": 2,
  "method": "tools/list"
}

// Server: "Here are my capabilities"
{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "tools": [
      {
        "name": "calculate_ttr",
        "description": "Calculate the Type-Token Ratio (TTR)...",
        "inputSchema": {
          "type": "object",
          "properties": {
            "text": {"type": "string"}
          },
          "required": ["text"]
        }
      }
    ]
  }
}
```

**3. User Query & Tool Execution**

User asks: *"Is 'en_sample_01' lexically diverse?"*

The LLM reasons:
1. "I need to read the text first" → Calls `resources/read`
2. "I need to calculate TTR" → Calls `tools/call` with `calculate_ttr`
3. "Now I can answer" → Generates response with context

```json
// Reading the resource
{
  "jsonrpc": "2.0",
  "id": 3,
  "method": "resources/read",
  "params": {"uri": "corpus://en_sample_01"}
}

// Server returns text
{
  "jsonrpc": "2.0",
  "id": 3,
  "result": {
    "contents": [
      {
        "uri": "corpus://en_sample_01",
        "mimeType": "text/plain",
        "text": "The quick brown fox jumps over the lazy dog."
      }
    ]
  }
}

// Calling the tool
{
  "jsonrpc": "2.0",
  "id": 4,
  "method": "tools/call",
  "params": {
    "name": "calculate_ttr",
    "arguments": {
      "text": "The quick brown fox jumps over the lazy dog."
    }
  }
}

// Server returns result
{
  "jsonrpc": "2.0",
  "id": 4,
  "result": {
    "content": [
      {"type": "text", "text": "0.89"}
    ]
  }
}
```

This pattern allows you to inject complex Python logic (tokenization, dependency parsing, vector retrieval) into the AI's workflow essentially for free.

---

## 7. Advanced MCP Patterns for Linguistics Research

### 7.1 Multi-Tool Workflows

MCP shines when tools build on each other:

In [None]:
from fastmcp import FastMCP

mcp_advanced = FastMCP("AdvancedLinguistics")

@mcp_advanced.tool()
def tokenize_text(text: str, language: str = "en") -> list[str]:
    """Tokenize text using language-specific rules."""
    # In reality, you'd use spaCy or similar
    return text.split()

@mcp_advanced.tool()
def pos_tag(tokens: list[str], language: str = "en") -> list[tuple[str, str]]:
    """Apply part-of-speech tagging to tokens."""
    # Dummy implementation - use spaCy in production
    return [(token, "NOUN") for token in tokens]

@mcp_advanced.tool()
def extract_noun_phrases(pos_tagged: list[tuple[str, str]]) -> list[str]:
    """Extract noun phrases from POS-tagged tokens."""
    return [word for word, tag in pos_tagged if tag.startswith("N")]

@mcp_advanced.tool()
def full_linguistic_pipeline(
    text: str, 
    language: str = "en"
) -> dict[str, Any]:
    """Run complete linguistic analysis pipeline.
    
    This demonstrates how to compose multiple tools into a 
    high-level action that an agent can use atomically.
    """
    tokens = tokenize_text(text, language)
    pos_tags = pos_tag(tokens, language)
    noun_phrases = extract_noun_phrases(pos_tags)
    
    return {
        "token_count": len(tokens),
        "noun_phrases": noun_phrases,
        "pos_distribution": {
            tag: sum(1 for _, t in pos_tags if t == tag)
            for _, tag in pos_tags
        }
    }

**Key insight**: The agent can either call tools individually for fine-grained control, or use `full_linguistic_pipeline()` for a complete analysis. This flexibility is MCP's strength.

### 7.2 Dynamic Resources

Resources don't have to be static files:

In [None]:
@mcp_advanced.resource("corpus://search?query={query}")
def search_corpus(query: str) -> str:
    """Search the corpus and return matching texts.
    
    This demonstrates a DYNAMIC resource - the URI contains
    query parameters that affect what data is returned.
    """
    results = []
    for text_id, text in CORPUS.items():
        if query.lower() in text.lower():
            results.append(f"[{text_id}] {text}")
    
    if not results:
        return f"No texts found matching '{query}'"
    
    return "\n\n".join(results)

@mcp_advanced.resource("corpus://stats/ttr_distribution")
def get_ttr_distribution() -> str:
    """Get TTR statistics across the entire corpus.
    
    This demonstrates a COMPUTED resource - the data doesn't
    exist as a file but is calculated on-demand.
    """
    ttrs = {}
    for text_id, text in CORPUS.items():
        ttrs[text_id] = calculate_ttr(text)
    
    avg_ttr = sum(ttrs.values()) / len(ttrs)
    
    report = f"Average TTR: {avg_ttr:.3f}\n\n"
    report += "Per-text breakdown:\n"
    for text_id, ttr in sorted(ttrs.items(), key=lambda x: x[1], reverse=True):
        report += f"  {text_id}: {ttr:.3f}\n"
    
    return report

### 7.3 Error Handling and Self-Correction

MCP's text-based error messages allow LLMs to self-correct:

In [None]:
@mcp_advanced.tool()
def calculate_ngram_frequency(
    text_id: str, 
    n: int,
    top_k: int = 10
) -> dict[str, int] | str:
    """Calculate n-gram frequencies for a text.
    
    Args:
        text_id: ID of text in corpus
        n: Size of n-grams (1=unigrams, 2=bigrams, etc.)
        top_k: Number of most frequent n-grams to return
    
    Returns:
        Dictionary of n-grams and their counts, or error message
    """
    # Extensive error checking with helpful messages
    if text_id not in CORPUS:
        available = ", ".join(CORPUS.keys())
        return f"Error: Text '{text_id}' not found. Available texts: {available}"
    
    if n < 1 or n > 5:
        return "Error: n must be between 1 and 5. Did you mean n=2 for bigrams?"
    
    if top_k < 1:
        return "Error: top_k must be at least 1"
    
    text = CORPUS[text_id]
    tokens = text.lower().split()
    
    if len(tokens) < n:
        return f"Error: Text has only {len(tokens)} tokens, cannot create {n}-grams"
    
    # Calculate n-grams
    ngrams = []
    for i in range(len(tokens) - n + 1):
        ngrams.append(" ".join(tokens[i:i+n]))
    
    # Count frequencies
    from collections import Counter
    freq = Counter(ngrams)
    
    return dict(freq.most_common(top_k))

When the LLM makes a mistake (e.g., passing an invalid `text_id`), it receives a clear error message explaining what went wrong and what the valid options are. This allows it to retry with corrected parameters.

---

## 8. Comparison with Related Technologies

### MCP vs. Function Calling APIs

Many LLM providers (OpenAI, Anthropic, Google) offer "function calling" APIs. How is MCP different?

| Aspect | Function Calling APIs | MCP |
|--------|----------------------|-----|
| **Scope** | Single LLM provider | Universal standard across providers |
| **Server** | Developer implements execution | Standardized MCP server with lifecycle |
| **Discovery** | Functions defined per-request | Server advertises capabilities |
| **State** | Stateless (per request) | Stateful sessions |
| **Transport** | HTTP only | stdio, SSE, HTTP, WebSockets |
| **Analogy** | Like inline assembly | Like a library with a well-defined API |

**Function calling** is great for one-off tool use. **MCP** is designed for building complex, stateful integrations.

### MCP vs. LangChain Tools

LangChain provides a `Tool` abstraction. MCP takes this further:

- **LangChain**: Python-specific, tightly coupled to LangChain framework
- **MCP**: Language-agnostic protocol, works with any client
- **LangChain**: Tools are Python functions
- **MCP**: Tools are network services (can be in any language)

Think of MCP as "LangChain Tools, but as a protocol instead of a library."

---

## 9. Real-World Use Cases in Computational Linguistics

### 9.1 Linguistic Fieldwork Assistant
```python
# MCP server for eliciting grammatical data
@mcp.tool()
def generate_elicitation_prompt(phenomenon: str, language: str) -> str:
    """Generate targeted prompts for linguistic fieldwork."""
    pass

@mcp.tool()
def analyze_response_for_pattern(
    response: str, 
    expected_pattern: str
) -> dict:
    """Check if speaker's response matches expected grammatical pattern."""
    pass
```

### 9.2 Cross-Linguistic Database Access
```python
@mcp.resource("wals://feature/{feature_id}/languages")
def get_wals_languages(feature_id: str) -> str:
    """Access WALS database for typological features."""
    pass

@mcp.tool()
def find_typologically_similar_languages(
    reference_language: str,
    features: list[str]
) -> list[str]:
    """Find languages with similar typological profiles."""
    pass
```

### 9.3 Corpus Annotation Pipeline
```python
@mcp.tool()
def run_annotation_pipeline(
    text_id: str,
    annotations: list[str]  # ["tokenize", "pos", "parse", "ner"]
) -> str:
    """Run multiple annotation steps atomically."""
    pass

@mcp.resource("corpus://{text_id}/annotations/{layer}")
def get_annotation_layer(text_id: str, layer: str) -> str:
    """Retrieve specific annotation layer (pos, parse, ner, etc.)."""
    pass
```

---

## 10. Summary: Key Takeaways

1. **MCP is NOT just another API format** - it's a paradigm shift from resource-oriented (REST) to action-oriented (RPC) design for AI agents.

2. **MCP solves the M×N integration problem** - write one MCP server, use it with any MCP-compatible client (Claude, Cursor, custom tools).

3. **MCP uses JSON-RPC 2.0** for messaging, with support for multiple transports (stdio, SSE, HTTP).

4. **The three-layer architecture** (Protocol → Session → Transport) keeps concerns separated and implementations clean.

5. **Design for actions, not CRUD** - think about what agents want to accomplish, not how to manipulate data structures.

6. **Security is built-in** - credentials stay server-side, input validation happens before execution, and audit logging is straightforward.

7. **MCP enables composable workflows** - tools can build on each other, resources can be dynamic, and the protocol supports self-correction.

8. **For linguistics research**, MCP opens possibilities for corpus access, annotation pipelines, typological databases, and fieldwork assistance.

### Further Reading
- Official MCP Documentation: https://modelcontextprotocol.io/
- fastmcp Python library: https://github.com/jlowin/fastmcp
- Language Server Protocol (LSP): https://microsoft.github.io/language-server-protocol/
- JSON-RPC 2.0 Specification: https://www.jsonrpc.org/specification

### Exercise Ideas
1. Extend the demo server to work with a real corpus (e.g., Universal Dependencies)
2. Add tools for phonological analysis (syllable counting, stress patterns)
3. Create a prompt that guides the LLM through comparative linguistic analysis
4. Implement resource subscriptions for real-time corpus updates
5. Build an MCP server that interfaces with the Leipzig Glossing Rules