<a href="https://colab.research.google.com/github/CrisMcode111/DI_Bootcamp/blob/main/w9_d2_mcp_client_with_llm_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MCP Client with an LLM (Colab)

Step-by-step notebook that connects a Python MCP client to a server over STDIO, hands tool schemas to an LLM, and executes the tool calls it suggests.

## Before you start

- You need a GitHub Models token in `GITHUB_TOKEN` if you want to actually call the LLM (use Colab Secrets instead of plain text).
- Leave `USE_REAL_LLM = False` to dry-run with a simple planner that infers numbers from the prompt.
- The notebook creates a minimal MCP server (`add`, `multiply`, and one greeting resource) so everything runs locally over STDIO.

In [1]:
!pip install -q --upgrade mcp azure-ai-inference azure-core nest_asyncio

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.1/47.1 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m124.9/124.9 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m213.3/213.3 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
import os
from google.colab import userdata
# Optional: set your GitHub Models token for Azure AI Inference.
# In Colab, prefer: from google.colab import userdata; os.environ["GITHUB_TOKEN"] = userdata.get("GITHUB_TOKEN")
os.environ["GITHUB_TOKEN"] = userdata.get("GITHUB_TOKEN")


## Create a minimal MCP server (STDIO)

We use FastMCP to expose two tools and one resource. The `mcp` CLI will launch this file when the client opens a STDIO session.

In [4]:
from pathlib import Path

SERVER_FILE = Path("demo_server.py")

#variable qui recoit valeur
SERVER_CODE = '''
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("DemoServer")

@mcp.tool()
def add(a: int, b: int) -> int:
    'Add two integers.'
    return a + b

@mcp.tool()
def multiply(a: int, b: int) -> int:
    'Multiply two integers.'
    return a * b

@mcp.resource("greeting://{name}")
def get_greeting(name: str) -> str:
    'Return a greeting string.'
    return f"Hello, {name}!"

if __name__ == "__main__":
    mcp.run()
'''

SERVER_FILE.write_text(SERVER_CODE.strip() + "", encoding="utf-8")
print(f"Saved MCP server to {SERVER_FILE.resolve()}")
print(SERVER_FILE.read_text())


Saved MCP server to /content/demo_server.py
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("DemoServer")

@mcp.tool()
def add(a: int, b: int) -> int:
    'Add two integers.'
    return a + b

@mcp.tool()
def multiply(a: int, b: int) -> int:
    'Multiply two integers.'
    return a * b

@mcp.resource("greeting://{name}")
def get_greeting(name: str) -> str:
    'Return a greeting string.'
    return f"Hello, {name}!"

if __name__ == "__main__":
    mcp.run()


In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Client and LLM helpers

- Connects to the server over STDIO and lists resources/tools.
- Converts MCP tool schemas into an LLM function-calling spec.
- Calls the LLM (or a stub) to choose tools, then executes them.

In [7]:
import asyncio
import json
import os
import re
import sys
from pathlib import Path
from typing import Any, Dict, List

import nest_asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

nest_asyncio.apply()

SERVER_FILE = Path("demo_server.py")
USE_REAL_LLM = True  # flip to True when GITHUB_TOKEN is set
AZURE_ENDPOINT = "https://models.inference.ai.azure.com"
MODEL_NAME = "gpt-4o"

def convert_to_llm_tool(tool):
    return {
        "type": "function",
        "function": {
            "name": tool.name,
            "description": tool.description or "MCP tool",
            "parameters": {
                "type": "object",
                "properties": tool.inputSchema.get("properties", {}),
                "required": tool.inputSchema.get("required", []),
            },
        },
    }

def stub_tool_calls(prompt: str, functions: List[Dict[str, Any]]):
    text = prompt.lower()
    numbers = [int(n) for n in re.findall(r"-?\d+", prompt)]
    a = numbers[0] if numbers else 2
    b = numbers[1] if len(numbers) > 1 else (numbers[0] if numbers else 20)
    if any(k in text for k in ["multiply", "product", "times"]):
        return [{"name": "multiply", "args": {"a": a, "b": b}}]
    return [{"name": "add", "args": {"a": a, "b": b}}]

def call_llm(prompt: str, functions: List[Dict[str, Any]], use_llm: bool = False):
    if not use_llm:
        print("LLM disabled -> using stubbed plan.")
        return stub_tool_calls(prompt, functions)

    token = os.getenv("GITHUB_TOKEN")
    if not token:
        raise RuntimeError("GITHUB_TOKEN is missing. Set it or call with use_llm=False.")

    from azure.ai.inference import ChatCompletionsClient
    from azure.core.credentials import AzureKeyCredential

    client = ChatCompletionsClient(
        endpoint=AZURE_ENDPOINT,
        credential=AzureKeyCredential(token),
    )

    response = client.complete(
        model=MODEL_NAME,
        messages=[
            {"role": "system", "content": "You are an MCP planning assistant."},
            {"role": "user", "content": prompt},
        ],
        tools=functions,
        temperature=0,
        max_tokens=400,
    )

    message = response.choices[0].message
    tool_calls = []
    if message.tool_calls:
        for call in message.tool_calls:
            args = call.function.arguments
            args_json = json.loads(args) if isinstance(args, str) else args
            tool_calls.append({"name": call.function.name, "args": args_json})

    return tool_calls

def simplify_content(content):
    rendered = []
    for item in content:
        if hasattr(item, "text"):
            rendered.append(item.text)
        else:
            rendered.append(str(item))
    return rendered if len(rendered) > 1 else (rendered[0] if rendered else None)

async def orchestrate(prompt: str, use_llm: bool = USE_REAL_LLM):
    server_params = StdioServerParameters(
        command="mcp",
        args=["run", str(SERVER_FILE)],
        env=None,
    )

    # Temporarily redirect sys.stderr for the subprocess to avoid 'UnsupportedOperation: fileno'
    temp_stderr_file = None
    try:
        # Open os.devnull in write mode to get a file object that supports fileno()
        temp_stderr_file = open(os.devnull, 'w')

        async with stdio_client(server_params, errlog=temp_stderr_file.fileno()) as (read, write):
            async with ClientSession(read, write) as session:
                print("1) Initialize session")
                await session.initialize()

                print("2) Discover resources and tools")
                resources = await session.list_resources()
                tools_response = await session.list_tools()
                for tool in tools_response.tools:
                    print(f" - {tool.name}: {tool.inputSchema.get('properties', {})}")

                print("3) Convert MCP tools to LLM function spec")
                functions = [convert_to_llm_tool(t) for t in tools_response.tools]

                print("4) Ask the LLM (or stub) what to call")
                tool_calls = call_llm(prompt, functions, use_llm=use_llm)
                print("Proposed tool calls:", tool_calls)

                print("5) Execute tool calls")
                results = []
                for call in tool_calls:
                    result = await session.call_tool(call["name"], arguments=call["args"])
                    simplified = simplify_content(result.content)
                    results.append({"tool": call["name"], "args": call["args"], "content": simplified})
                    print(f" - {call['name']}({call['args']}) -> {simplified}")

                return {
                    "resources": [str(r) for r in resources],
                    "tools": [t.name for t in tools_response.tools],
                    "tool_calls": tool_calls,
                    "results": results,
                }
    finally:
        # Ensure the temporary file is closed
        if temp_stderr_file:
            temp_stderr_file.close()


## Run it (addition example)

Keep `USE_REAL_LLM=False` to see the flow without calling an external model. Flip it to `True` once `GITHUB_TOKEN` is set.

In [9]:
await orchestrate("Add 2 to 20", use_llm=USE_REAL_LLM)

1) Initialize session
2) Discover resources and tools
 - add: {'a': {'title': 'A', 'type': 'integer'}, 'b': {'title': 'B', 'type': 'integer'}}
 - multiply: {'a': {'title': 'A', 'type': 'integer'}, 'b': {'title': 'B', 'type': 'integer'}}
3) Convert MCP tools to LLM function spec
4) Ask the LLM (or stub) what to call
Proposed tool calls: [{'name': 'add', 'args': {'a': 2, 'b': 20}}]
5) Execute tool calls
 - add({'a': 2, 'b': 20}) -> 22


{'resources': ["('meta', None)", "('nextCursor', None)", "('resources', [])"],
 'tools': ['add', 'multiply'],
 'tool_calls': [{'name': 'add', 'args': {'a': 2, 'b': 20}}],
 'results': [{'tool': 'add', 'args': {'a': 2, 'b': 20}, 'content': '22'}]}

## Exercise: multiply 7 by 6

Re-run with a different prompt. If you toggle `USE_REAL_LLM=True`, the model should return a `tool_calls` entry for `multiply` with the right arguments.

In [10]:
await orchestrate("Multiply 7 by 6", use_llm=USE_REAL_LLM)

1) Initialize session
2) Discover resources and tools
 - add: {'a': {'title': 'A', 'type': 'integer'}, 'b': {'title': 'B', 'type': 'integer'}}
 - multiply: {'a': {'title': 'A', 'type': 'integer'}, 'b': {'title': 'B', 'type': 'integer'}}
3) Convert MCP tools to LLM function spec
4) Ask the LLM (or stub) what to call
Proposed tool calls: [{'name': 'multiply', 'args': {'a': 7, 'b': 6}}]
5) Execute tool calls
 - multiply({'a': 7, 'b': 6}) -> 42


{'resources': ["('meta', None)", "('nextCursor', None)", "('resources', [])"],
 'tools': ['add', 'multiply'],
 'tool_calls': [{'name': 'multiply', 'args': {'a': 7, 'b': 6}}],
 'results': [{'tool': 'multiply', 'args': {'a': 7, 'b': 6}, 'content': '42'}]}

## Best practices recap

- Keep tool schemas tight and mark required fields to avoid retries.
- Only expose the tools you are comfortable running; filter before sending to the LLM.
- Log the chain end to end: prompt -> tool_calls -> tool results for debugging and audits.
- If validation fails, show the schema and ask the model (or user) to fix arguments.
- Use STDIO for local dev; swap to HTTP/SSE for remote hosts without changing tool logic.