Tutorial MCP Testing

MCP Testing Strategies (Chapter 2)

Now that you've built your Learning Journal MCP, let's learn how to test it properly. Good tests are the foundation of trustworthy human-AI collaboration.

This tutorial continues directly from Creating Your First MCP. If you haven't completed that tutorial yet, please do so first.

Introduction

In the previous tutorial, we built a Learning Journal MCP with tools, resources, and prompts. Now we need to test it properly to ensure it's reliable, maintainable, and preserves the 80-20 philosophy.

Testing MCPs is not just about verifying code works - it's about ensuring your tools preserve human agency while delivering AI capabilities. When we test our Learning Journal MCP, we're validating that the 80-20 balance is maintained and human wisdom remains central.

Why Test Our Learning Journal MCP?

Human Safety: Ensure AI assistance doesn't replace human judgment in learning
Data Integrity: Verify journal entries are saved and retrieved correctly
Reliability: The MCP must work consistently across different scenarios
Educational Value: Tests document how our Learning Journal should behave
Community Trust: Well-tested tools earn community adoption

Testing Philosophy

Following the 80-20 approach with our Learning Journal:

80% Automated Testing: Verify functionality, edge cases, and performance
20% Human Validation: Ensure learning wisdom, integrity, and compassion are preserved
100% Responsibility: We own the quality of our Learning Journal MCP

Setting Up Your Test Environment

Install Testing Dependencies

# Install pytest and common testing tools
pip install pytest pytest-asyncio pytest-cov pytest-mock

# For advanced testing (optional)
pip install hypothesis factory-boy freezegun

Test Directory Structure

Following the modular pattern from our community tutorials, organize tests in _mcp/tests/:

your-project/
└── _mcp/
    ├── main.py
    ├── servers/
    │   ├── journal.py
    │   └── analytics.py
    └── tests/                     # Test suite root
        ├── conftest.py           # Shared fixtures
        ├── test_main.py          # Main server tests
        ├── unit/                 # Unit tests by component
        │   ├── __init__.py
        │   ├── test_tools.py
        │   ├── test_resources.py
        │   └── test_prompts.py
        ├── integration/          # Integration tests
        │   ├── __init__.py
        │   ├── test_server_mounting.py
        │   └── test_workflows.py
        ├── e2e/                  # End-to-end tests
        │   ├── __init__.py
        │   ├── test_complete_flows.py
        │   └── test_transport_scenarios.py
        └── fixtures/             # Test data and utilities
            ├── __init__.py
            ├── sample_data.py
            └── mock_responses.py

Configure pytest

Create _mcp/tests/pytest.ini:

[tool:pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
addopts = 
    --strict-markers
    --verbose
    --tb=short
    --cov=_mcp
    --cov-report=term-missing
    --cov-report=html
asyncio_mode = auto
markers =
    unit: Unit tests for individual components
    integration: Integration tests for component interactions
    e2e: End-to-end workflow tests
    slow: Tests that take significant time
    philosophy: Tests that validate 80-20 principles

Test Organization: Modular Structure

The Three-Layer Testing Approach

Unit Tests - Test individual components in isolation Integration Tests - Test how components work together End-to-End Tests - Test complete user workflows

Let's build this with the Learning Journal MCP from our Creating Your First MCP tutorial.

Base Test Configuration

Create _mcp/tests/conftest.py:

"""
Shared test fixtures and configuration for MCP testing
"""

import pytest
import tempfile
import shutil
from pathlib import Path
from fastmcp import FastMCP, Client
from servers.journal import journal_server

@pytest.fixture
def temp_journal_dir():
    """Create a temporary directory for journal entries during testing."""
    temp_dir = tempfile.mkdtemp()
    yield Path(temp_dir)
    shutil.rmtree(temp_dir)

@pytest.fixture
def journal_mcp():
    """Create a journal MCP server for testing."""
    mcp = FastMCP("Test Journal MCP")
    mcp.mount(journal_server)
    return mcp

@pytest.fixture
async def journal_client(journal_mcp):
    """Create a client connected to the journal MCP."""
    client = Client(journal_mcp)
    async with client:
        yield client

@pytest.fixture
def sample_learning_entry():
    """Sample learning entry data for tests."""
    return {
        "topic": "FastMCP Testing",
        "key_insight": "Testing MCPs ensures human wisdom is preserved",
        "confidence_level": 8
    }

@pytest.fixture
def sample_journal_content():
    """Sample journal file content."""
    return """
## 2024-01-15 10:30 - Python Testing

**Key Insight**: pytest is the gold standard for Python testing
**Confidence Level**: 9/10

---

## 2024-01-15 14:20 - MCP Architecture

**Key Insight**: Modular design makes MCPs maintainable
**Confidence Level**: 7/10

---
"""

Testing with FastMCP Client

The FastMCP Client's in-memory transport is perfect for testing. It eliminates network complexity and provides direct access to your MCP server.

Basic Client Testing Pattern

Create _mcp/tests/test_main.py:

"""
Test the main MCP server functionality
"""

import pytest
from fastmcp import Client

@pytest.mark.unit
async def test_server_initialization(journal_client):
    """Test that the MCP server initializes correctly."""
    # Test server is accessible
    await journal_client.ping()
    
    # Test server has expected components
    tools = await journal_client.list_tools()
    resources = await journal_client.list_resources()
    prompts = await journal_client.list_prompts()
    
    assert len(tools) > 0, "Server should have tools"
    assert len(resources) > 0, "Server should have resources"
    assert len(prompts) > 0, "Server should have prompts"

@pytest.mark.integration
async def test_server_metadata(journal_client):
    """Test server provides proper metadata."""
    tools = await journal_client.list_tools()
    
    # Find the record_learning tool
    record_tool = next((t for t in tools if t.name == "record_learning"), None)
    assert record_tool is not None, "Should have record_learning tool"
    assert record_tool.description, "Tool should have description"
    assert record_tool.inputSchema, "Tool should have input schema"

Unit Testing Components

Unit tests validate individual components work correctly in isolation.

Testing Tools

Create _mcp/tests/unit/test_tools.py:

"""
Unit tests for MCP tools
"""

import pytest
import tempfile
from pathlib import Path
from unittest.mock import patch, mock_open

@pytest.mark.unit
async def test_record_learning_success(journal_client, sample_learning_entry, temp_journal_dir):
    """Test successful learning entry recording."""
    with patch('servers.journal.Path') as mock_path:
        mock_path.return_value = temp_journal_dir
        mock_path.return_value.mkdir.return_value = None
        
        result = await journal_client.call_tool(
            "record_learning",
            sample_learning_entry
        )
        
        assert "✅ Recorded learning" in result.data
        assert sample_learning_entry["topic"] in result.data

@pytest.mark.unit
async def test_record_learning_validation(journal_client):
    """Test tool validates input parameters."""
    # Test missing required parameters
    with pytest.raises(Exception):  # FastMCP will validate and raise
        await journal_client.call_tool("record_learning", {})
    
    # Test invalid confidence level
    with pytest.raises(Exception):
        await journal_client.call_tool("record_learning", {
            "topic": "Test",
            "key_insight": "Test insight",
            "confidence_level": 15  # Should be 1-10
        })

@pytest.mark.unit
async def test_record_learning_preserves_human_input(journal_client, temp_journal_dir):
    """Test that tool preserves exactly what human provided."""
    human_input = {
        "topic": "Human Wisdom in AI",
        "key_insight": "Humans must remain in control of important decisions",
        "confidence_level": 10
    }
    
    with patch('servers.journal.Path') as mock_path:
        mock_path.return_value = temp_journal_dir
        
        # Mock file writing to capture what gets written
        with patch('builtins.open', mock_open()) as mock_file:
            await journal_client.call_tool("record_learning", human_input)
            
            # Verify human input is preserved in the file
            written_content = mock_file().write.call_args[0][0]
            assert human_input["topic"] in written_content
            assert human_input["key_insight"] in written_content
            assert str(human_input["confidence_level"]) in written_content

@pytest.mark.philosophy
async def test_tool_requires_human_decision(journal_client):
    """Test that tools preserve human decision-making."""
    # The record_learning tool should require human input for:
    # - What to learn (topic)
    # - What was learned (insight)  
    # - Confidence assessment (self-evaluation)
    
    entry = {
        "topic": "AI Ethics",
        "key_insight": "AI should enhance human capability, not replace it",
        "confidence_level": 9
    }
    
    result = await journal_client.call_tool("record_learning", entry)
    
    # Verify the tool doesn't modify human judgment
    assert entry["confidence_level"] == 9  # Human's self-assessment preserved
    assert "AI Ethics" in result.data      # Human's topic choice preserved

Testing Resources

Create _mcp/tests/unit/test_resources.py:

"""
Unit tests for MCP resources
"""

import pytest
from pathlib import Path
from unittest.mock import patch, mock_open

@pytest.mark.unit
async def test_get_recent_entries_empty(journal_client):
    """Test resource behavior when no entries exist.""" 
    with patch('servers.journal.Path') as mock_path:
        mock_path.return_value.exists.return_value = False
        
        content = await journal_client.read_resource("journal://recent")
        assert "No journal entries yet" in content

@pytest.mark.unit
async def test_get_recent_entries_with_data(journal_client, sample_journal_content):
    """Test resource returns recent entries correctly."""
    with patch('servers.journal.Path') as mock_path:
        # Mock the path exists and has files
        mock_path.return_value.exists.return_value = True
        mock_path.return_value.glob.return_value = [Path("2024-01-15.md")]
        
        with patch('builtins.open', mock_open(read_data=sample_journal_content)):
            content = await journal_client.read_resource("journal://recent")
            
            assert "Python Testing" in content
            assert "MCP Architecture" in content
            assert "pytest is the gold standard" in content

@pytest.mark.unit  
async def test_get_learning_stats(journal_client, sample_journal_content):
    """Test statistics resource calculates correctly."""
    with patch('servers.journal.Path') as mock_path:
        mock_path.return_value.exists.return_value = True
        mock_path.return_value.glob.return_value = [Path("2024-01-15.md")]
        
        with patch('builtins.open', mock_open(read_data=sample_journal_content)):
            stats = await journal_client.read_resource("journal://stats")
            
            # Should count entries (marked by ## headers)
            assert stats["total_entries"] == 2
            assert stats["days_active"] == 1

@pytest.mark.philosophy
async def test_resources_provide_context_not_decisions(journal_client):
    """Test that resources provide information but don't make decisions."""
    # Resources should give humans data to make informed decisions
    # They should NOT interpret or recommend actions
    
    stats = await journal_client.read_resource("journal://stats") 
    
    # Should provide raw data
    assert "total_entries" in stats
    assert "days_active" in stats
    
    # Should NOT provide interpretations like:
    # - "You're learning too slowly" 
    # - "Focus more on topic X"
    # - "You should be more confident"
    # These are human judgments
    
    assert "should" not in str(stats).lower()
    assert "need to" not in str(stats).lower()
    assert "recommend" not in str(stats).lower()

Testing Prompts

Create _mcp/tests/unit/test_prompts.py:

"""
Unit tests for MCP prompts
"""

import pytest

@pytest.mark.unit
async def test_reflection_prompt_generation(journal_client):
    """Test reflection prompt generates appropriate questions."""
    prompts = await journal_client.list_prompts()
    reflection_prompt = next((p for p in prompts if p.name == "reflection_prompt"), None)
    
    assert reflection_prompt is not None, "Should have reflection_prompt"
    
    # Test prompt with topic
    result = await journal_client.get_prompt("reflection_prompt", {"topic": "Machine Learning"})
    
    assert len(result.messages) == 1
    message_content = result.messages[0].content.text
    
    # Should include the topic
    assert "Machine Learning" in message_content
    
    # Should ask thought-provoking questions
    assert "understand well" in message_content
    assert "still unclear" in message_content
    assert "apply" in message_content

@pytest.mark.philosophy
async def test_prompts_encourage_human_reflection(journal_client):
    """Test that prompts encourage human thinking, not AI answers."""
    result = await journal_client.get_prompt("reflection_prompt", {"topic": "AI Safety"})
    prompt_text = result.messages[0].content.text
    
    # Should ask questions that require human judgment
    questions = [
        "What do I understand",      # Personal understanding
        "What aspects...are still unclear",  # Self-assessment
        "How can I apply",          # Personal application
        "What would I teach",       # Knowledge verification
        "What questions do I still have"  # Curiosity preservation
    ]
    
    for question_phrase in questions:
        assert question_phrase in prompt_text, f"Missing human-centered question: {question_phrase}"
    
    # Should NOT provide answers or shortcuts
    avoid_phrases = [
        "The answer is",
        "You should think",
        "The correct approach",
        "Here's what you need to know"
    ]
    
    for avoid_phrase in avoid_phrases:
        assert avoid_phrase not in prompt_text, f"Prompt shouldn't provide answers: {avoid_phrase}"

@pytest.mark.unit
async def test_prompt_parameter_handling(journal_client):
    """Test prompt handles parameters correctly."""
    # Test with different topics
    topics = ["Python", "Database Design", "Team Leadership"]
    
    for topic in topics:
        result = await journal_client.get_prompt("reflection_prompt", {"topic": topic})
        content = result.messages[0].content.text
        
        assert topic in content, f"Prompt should include topic: {topic}"
        assert len(content) > 100, "Prompt should be substantial"

Integration Testing

Integration tests verify that components work together correctly.

Testing Server Composition

Create _mcp/tests/integration/test_server_mounting.py:

"""
Integration tests for server composition and mounting
"""

import pytest
from fastmcp import FastMCP, Client
from servers.journal import journal_server

@pytest.mark.integration
async def test_server_mounting():
    """Test mounting servers preserves all functionality."""
    # Create main server
    main_server = FastMCP("Integration Test Server")
    
    # Mount journal server
    main_server.mount(journal_server)
    
    # Test via client
    client = Client(main_server)
    async with client:
        # Should have tools from mounted server
        tools = await client.list_tools()
        tool_names = [tool.name for tool in tools]
        assert "record_learning" in tool_names
        
        # Should have resources from mounted server  
        resources = await client.list_resources()
        resource_uris = [resource.uri for resource in resources]
        assert "journal://recent" in resource_uris
        assert "journal://stats" in resource_uris
        
        # Test functionality works
        result = await client.call_tool("record_learning", {
            "topic": "Integration Testing",
            "key_insight": "Server mounting preserves functionality",
            "confidence_level": 8
        })
        assert "✅ Recorded learning" in result.data

@pytest.mark.integration  
async def test_multiple_server_mounting():
    """Test mounting multiple journal-related servers works correctly."""
    # In a real Learning Journal MCP, you might have multiple modules
    # Example: journal_server + review_server
    
    journal_server = FastMCP("Journal Core")
    review_server = FastMCP("Review System")
    
    @journal_server.tool
    def quick_note(note: str) -> str:
        return f"✅ Quick note: {note}"
    
    @review_server.tool  
    def schedule_review(topic: str) -> str:
        return f"📅 Review scheduled for: {topic}"
    
    # Mount both into main Learning Journal MCP
    main_server = FastMCP("Learning Journal MCP")
    main_server.mount(journal_server)
    main_server.mount(review_server)
    
    # Test both modules work together
    client = Client(main_server)
    async with client:
        note_result = await client.call_tool("quick_note", {"note": "MCP testing is important"})
        review_result = await client.call_tool("schedule_review", {"topic": "MCP Testing"})
        
        assert "✅ Quick note" in note_result.data
        assert "📅 Review scheduled" in review_result.data

Testing Component Interactions

Create _mcp/tests/integration/test_workflows.py:

"""
Integration tests for complete workflows
"""

import pytest
from pathlib import Path
from unittest.mock import patch

@pytest.mark.integration
async def test_learning_workflow(journal_client, temp_journal_dir):
    """Test complete learning workflow: record → read → reflect."""
    with patch('servers.journal.Path') as mock_path:
        mock_path.return_value = temp_journal_dir
        
        # Step 1: Record learning
        await journal_client.call_tool("record_learning", {
            "topic": "Workflow Testing", 
            "key_insight": "End-to-end testing validates user journeys",
            "confidence_level": 9
        })
        
        # Step 2: Read back entries (simulate reading from file)
        with patch('builtins.open') as mock_file:
            mock_file.return_value.__enter__.return_value.read.return_value = """
## 2024-01-15 10:30 - Workflow Testing

**Key Insight**: End-to-end testing validates user journeys
**Confidence Level**: 9/10

---
"""
            recent = await journal_client.read_resource("journal://recent")
            assert "Workflow Testing" in recent
            assert "End-to-end testing" in recent
        
        # Step 3: Generate reflection prompt
        reflection = await journal_client.get_prompt("reflection_prompt", {
            "topic": "Workflow Testing"
        })
        
        assert "Workflow Testing" in reflection.messages[0].content.text
        assert "understand well" in reflection.messages[0].content.text

@pytest.mark.integration
async def test_data_consistency(journal_client):
    """Test data consistency across components."""
    # Record multiple entries
    topics = ["Topic A", "Topic B", "Topic C"]
    
    for i, topic in enumerate(topics):
        await journal_client.call_tool("record_learning", {
            "topic": topic,
            "key_insight": f"Insight {i+1}",
            "confidence_level": i + 5
        })
    
    with patch('servers.journal.Path') as mock_path:
        mock_path.return_value.exists.return_value = True
        mock_path.return_value.glob.return_value = [Path("test.md")]
        
        # Mock file content with all entries
        file_content = "## Entry A\n## Entry B\n## Entry C\n"
        with patch('builtins.open') as mock_file:
            mock_file.return_value.__enter__.return_value.read.return_value = file_content
            
            # Test stats reflect all entries
            stats = await journal_client.read_resource("journal://stats")
            assert stats["total_entries"] == 3

End-to-End Testing

E2E tests validate complete user scenarios across the entire system.

Testing Complete User Journeys

Create _mcp/tests/e2e/test_complete_flows.py:

"""
End-to-end tests for complete user workflows
"""

import pytest
import tempfile
import shutil
from pathlib import Path
from fastmcp import FastMCP, Client

@pytest.mark.e2e
async def test_new_user_learning_journey():
    """Test a new user's complete learning journey."""
    # Create a real temporary directory for this test
    temp_dir = tempfile.mkdtemp()
    journal_dir = Path(temp_dir) / "journal_entries"
    
    try:
        # Create server with real file system
        from servers.journal import journal_server
        main_server = FastMCP("E2E Test Server")
        main_server.mount(journal_server)
        
        # Use real client
        client = Client(main_server)
        
        async with client:
            # New user starts - should have no entries
            stats = await client.read_resource("journal://stats")
            assert stats["total_entries"] == 0
            
            # User records first learning
            result1 = await client.call_tool("record_learning", {
                "topic": "MCP Basics",
                "key_insight": "MCPs connect AI to real-world tools",
                "confidence_level": 6
            })
            assert "✅ Recorded learning" in result1.data
            
            # User records second learning  
            result2 = await client.call_tool("record_learning", {
                "topic": "Testing Strategy",
                "key_insight": "E2E tests validate complete user journeys", 
                "confidence_level": 8
            })
            assert "✅ Recorded learning" in result2.data
            
            # User checks their progress
            new_stats = await client.read_resource("journal://stats")
            assert new_stats["total_entries"] >= 2
            assert new_stats["days_active"] >= 1
            
            # User reviews recent entries
            recent = await client.read_resource("journal://recent")
            assert "MCP Basics" in recent
            assert "Testing Strategy" in recent
            
            # User reflects on learning
            reflection = await client.get_prompt("reflection_prompt", {
                "topic": "MCP Basics"
            })
            
            prompt_text = reflection.messages[0].content.text
            assert "MCP Basics" in prompt_text
            assert "What do I understand well" in prompt_text
            
    finally:
        # Clean up
        shutil.rmtree(temp_dir)

@pytest.mark.e2e
@pytest.mark.slow
async def test_learning_over_time():
    """Test learning patterns over multiple days."""
    temp_dir = tempfile.mkdtemp()
    
    try:
        from servers.journal import journal_server
        server = FastMCP("Time-based Test")
        server.mount(journal_server)
        
        client = Client(server)
        
        async with client:
            # Simulate learning over several days
            learning_data = [
                ("Day 1", "Python Basics", "Variables store data", 5),
                ("Day 1", "Python Functions", "Functions encapsulate logic", 6), 
                ("Day 2", "Testing", "Tests ensure code quality", 7),
                ("Day 3", "MCP Architecture", "MCPs enable AI-human collaboration", 8)
            ]
            
            for day, topic, insight, confidence in learning_data:
                await client.call_tool("record_learning", {
                    "topic": f"{day}: {topic}",
                    "key_insight": insight,
                    "confidence_level": confidence
                })
            
            # Check final statistics
            stats = await client.read_resource("journal://stats")
            assert stats["total_entries"] == 4
            
            # Verify recent entries include latest learning
            recent = await client.read_resource("journal://recent")
            assert "MCP Architecture" in recent
            
    finally:
        shutil.rmtree(temp_dir)

Testing Different Transports

Create _mcp/tests/e2e/test_transport_scenarios.py:

"""
End-to-end tests for different transport scenarios
"""

import pytest
import asyncio
from fastmcp import FastMCP, Client

@pytest.mark.e2e
async def test_stdio_transport():
    """Test MCP works with stdio transport (script execution)."""
    # This test would run the actual script
    # For now, we test the in-memory equivalent
    from main import mcp
    
    client = Client(mcp)
    async with client:
        await client.ping()
        
        tools = await client.list_tools()
        assert len(tools) > 0
        
        result = await client.call_tool("record_learning", {
            "topic": "Transport Testing",
            "key_insight": "Different transports should behave identically",
            "confidence_level": 7
        })
        assert "✅ Recorded learning" in result.data

@pytest.mark.e2e
@pytest.mark.slow
async def test_http_transport_simulation():
    """Test MCP behavior that would occur over HTTP."""
    # Simulate HTTP-like conditions: 
    # - Stateless requests
    # - Network delays
    # - Potential timeouts
    
    server = FastMCP("HTTP Simulation")
    
    @server.tool
    async def slow_operation() -> str:
        """Simulate network delay."""
        await asyncio.sleep(0.1)  # Simulate network latency
        return "Completed after delay"
    
    client = Client(server)
    async with client:
        # Test timeout handling
        result = await client.call_tool("slow_operation", timeout=5.0)
        assert result.data == "Completed after delay"
        
        # Test multiple rapid requests (like HTTP)
        tasks = []
        for i in range(5):
            task = client.call_tool("slow_operation")
            tasks.append(task)
        
        results = await asyncio.gather(*tasks)
        assert len(results) == 5
        assert all(r.data == "Completed after delay" for r in results)

Advanced Testing Patterns

Parameterized Tests

"""
Advanced testing patterns for comprehensive coverage
"""

import pytest

@pytest.mark.parametrize("confidence_level,expected_valid", [
    (1, True),    # Minimum valid
    (5, True),    # Middle valid
    (10, True),   # Maximum valid
    (0, False),   # Too low
    (11, False),  # Too high
    (-1, False),  # Negative
])
@pytest.mark.unit
async def test_confidence_level_validation(journal_client, confidence_level, expected_valid):
    """Test confidence level validation with various inputs."""
    try:
        result = await journal_client.call_tool("record_learning", {
            "topic": "Validation Test",
            "key_insight": "Testing edge cases prevents bugs",
            "confidence_level": confidence_level
        })
        
        # If we get here, the call succeeded
        if expected_valid:
            assert "✅ Recorded learning" in result.data
        else:
            pytest.fail(f"Expected validation error for confidence_level={confidence_level}")
            
    except Exception as e:
        # If we get here, the call failed
        if expected_valid:
            pytest.fail(f"Unexpected error for valid confidence_level={confidence_level}: {e}")
        else:
            # Expected failure - test passes
            pass

@pytest.mark.parametrize("topic,insight", [
    ("Python", "Simple is better than complex"),
    ("Machine Learning", "Data quality matters more than algorithms"),
    ("Team Leadership", "Trust is the foundation of high-performing teams"),
    ("", "Empty topic should be handled gracefully"),  # Edge case
    ("A" * 1000, "Very long topic"),  # Long input
])
@pytest.mark.unit
async def test_various_learning_topics(journal_client, topic, insight):
    """Test recording various types of learning topics."""
    result = await journal_client.call_tool("record_learning", {
        "topic": topic,
        "key_insight": insight,
        "confidence_level": 7
    })
    
    if topic:  # Non-empty topic
        assert "✅ Recorded learning" in result.data
        assert topic in result.data or len(topic) > 100  # Long topics might be truncated
    else:  # Empty topic - should handle gracefully
        # Depending on implementation, might succeed or fail
        # Just ensure it doesn't crash
        assert isinstance(result.data, str)

Testing Async Components

@pytest.mark.asyncio
async def test_concurrent_operations(journal_client):
    """Test MCP handles concurrent operations correctly."""
    # Create multiple concurrent learning recordings
    tasks = []
    
    for i in range(10):
        task = journal_client.call_tool("record_learning", {
            "topic": f"Concurrent Topic {i}",
            "key_insight": f"Insight number {i}",
            "confidence_level": 5 + (i % 5)
        })
        tasks.append(task)
    
    # Run all concurrently
    results = await asyncio.gather(*tasks, return_exceptions=True)
    
    # All should succeed
    for i, result in enumerate(results):
        if isinstance(result, Exception):
            pytest.fail(f"Task {i} failed: {result}")
        else:
            assert "✅ Recorded learning" in result.data

@pytest.mark.asyncio
async def test_resource_caching_behavior(journal_client):
    """Test that resources behave correctly with repeated access."""
    # Read resource multiple times rapidly
    results = []
    
    for _ in range(5):
        stats = await journal_client.read_resource("journal://stats")
        results.append(stats)
    
    # Results should be consistent
    first_result = results[0]
    for result in results[1:]:
        assert result == first_result, "Resource results should be consistent"

Performance Testing

@pytest.mark.slow
@pytest.mark.performance  
async def test_tool_performance(journal_client):
    """Test tool performance under load."""
    import time
    
    start_time = time.time()
    
    # Record many learning entries
    for i in range(100):
        await journal_client.call_tool("record_learning", {
            "topic": f"Performance Test {i}",
            "key_insight": f"Learning insight {i}",
            "confidence_level": 5 + (i % 5)
        })
    
    end_time = time.time()
    duration = end_time - start_time
    
    # Should complete within reasonable time
    assert duration < 10.0, f"Performance test took too long: {duration}s"
    
    # Calculate average time per operation
    avg_time = duration / 100
    assert avg_time < 0.1, f"Average operation time too slow: {avg_time}s"

Testing the 80-20 Philosophy

Validating Human Agency Preservation

@pytest.mark.philosophy
async def test_human_agency_preservation(journal_client):
    """Test that MCP preserves human agency and decision-making."""
    # Record a learning entry
    human_decision = {
        "topic": "AI Ethics",
        "key_insight": "Humans must remain in control of critical decisions",
        "confidence_level": 9  # Human's self-assessment
    }
    
    result = await journal_client.call_tool("record_learning", human_decision)
    
    # Tool should preserve human's exact input without modification
    assert "✅ Recorded learning" in result.data
    
    # Get reflection prompt
    reflection = await journal_client.get_prompt("reflection_prompt", {
        "topic": human_decision["topic"]
    })
    
    prompt_text = reflection.messages[0].content.text
    
    # Prompt should ask questions, not provide answers
    question_indicators = ["What do", "How can", "What would", "What questions"]
    assert any(indicator in prompt_text for indicator in question_indicators)
    
    # Should NOT contain prescriptive statements
    prescriptive_phrases = ["You should", "You must", "The correct answer"]
    assert not any(phrase in prompt_text for phrase in prescriptive_phrases)

@pytest.mark.philosophy
async def test_wisdom_preservation(journal_client):
    """Test that the system preserves and encourages human wisdom."""
    # Test reflection prompt encourages deep thinking
    reflection = await journal_client.get_prompt("reflection_prompt", {
        "topic": "Learning Philosophy"
    })
    
    prompt_text = reflection.messages[0].content.text
    
    # Should encourage multiple perspectives
    wisdom_indicators = [
        "understand well",      # Self-knowledge
        "still unclear",        # Humility
        "apply",               # Practical wisdom
        "teach someone else",   # Knowledge transfer
        "questions do I still have"  # Continuous learning
    ]
    
    for indicator in wisdom_indicators:
        assert indicator in prompt_text, f"Missing wisdom indicator: {indicator}"

@pytest.mark.philosophy 
async def test_educational_value(journal_client):
    """Test that interactions provide educational value."""
    # Record learning with low confidence
    await journal_client.call_tool("record_learning", {
        "topic": "Complex Topic",
        "key_insight": "This is challenging to understand",
        "confidence_level": 3  # Low confidence - needs more learning
    })
    
    # Get reflection prompt for this topic
    reflection = await journal_client.get_prompt("reflection_prompt", {
        "topic": "Complex Topic"
    })
    
    prompt_text = reflection.messages[0].content.text
    
    # Should encourage learning and growth
    educational_elements = [
        "unclear",              # Acknowledges learning gaps
        "apply",               # Encourages practical application
        "questions",           # Promotes curiosity
        "teach",               # Tests understanding
    ]
    
    found_elements = [elem for elem in educational_elements if elem in prompt_text]
    assert len(found_elements) >= 3, f"Should have educational elements: {found_elements}"

Test Fixtures and Utilities

Advanced Fixtures

Create _mcp/tests/fixtures/test_data.py:

"""
Advanced test fixtures and utilities
"""

import pytest
from datetime import datetime, timedelta
from pathlib import Path
import json

class JournalTestData:
    """Factory for creating test journal data."""
    
    @staticmethod
    def learning_entry(
        topic: str = "Test Topic",
        insight: str = "Test insight",
        confidence: int = 5,
        timestamp: datetime = None
    ) -> dict:
        """Create a learning entry for testing."""
        return {
            "topic": topic,
            "key_insight": insight,
            "confidence_level": confidence,
            "timestamp": timestamp or datetime.now()
        }
    
    @staticmethod
    def journal_file_content(entries: list) -> str:
        """Create journal file content from entries."""
        content = ""
        for entry in entries:
            timestamp = entry.get("timestamp", datetime.now())
            content += f"""
## {timestamp.strftime("%Y-%m-%d %H:%M")} - {entry["topic"]}

**Key Insight**: {entry["key_insight"]}
**Confidence Level**: {entry["confidence_level"]}/10

---
"""
        return content

@pytest.fixture
def journal_factory():
    """Factory fixture for creating journal test data."""
    return JournalTestData

@pytest.fixture
def sample_week_of_learning():
    """Sample learning data for a week."""
    base_date = datetime(2024, 1, 15, 10, 0)
    
    return [
        {
            "topic": "Python Basics",
            "key_insight": "Functions make code reusable",
            "confidence_level": 8,
            "timestamp": base_date
        },
        {
            "topic": "Testing",
            "key_insight": "Tests prevent regression bugs",
            "confidence_level": 7,
            "timestamp": base_date + timedelta(days=1)
        },
        {
            "topic": "MCP Architecture", 
            "key_insight": "Modular design improves maintainability",
            "confidence_level": 6,
            "timestamp": base_date + timedelta(days=2)
        }
    ]

@pytest.fixture
def mock_file_system(tmp_path):
    """Create a mock file system for testing."""
    journal_dir = tmp_path / "journal_entries"
    journal_dir.mkdir()
    
    # Create some sample files
    (journal_dir / "2024-01-15.md").write_text("""
## 2024-01-15 10:30 - Python Testing

**Key Insight**: pytest is the gold standard
**Confidence Level**: 9/10

---
""")
    
    return journal_dir

Test Utilities

Create _mcp/tests/fixtures/helpers.py:

"""
Test helper functions and utilities
"""

import asyncio
from typing import Any, Dict, List
from fastmcp import Client

class MCPTestHelper:
    """Helper class for common MCP testing operations."""
    
    def __init__(self, client: Client):
        self.client = client
    
    async def record_multiple_learnings(self, learnings: List[Dict[str, Any]]) -> List[Any]:
        """Record multiple learning entries."""
        results = []
        for learning in learnings:
            result = await self.client.call_tool("record_learning", learning)
            results.append(result)
        return results
    
    async def get_all_components(self) -> Dict[str, List]:
        """Get all tools, resources, and prompts."""
        tools, resources, prompts = await asyncio.gather(
            self.client.list_tools(),
            self.client.list_resources(), 
            self.client.list_prompts()
        )
        
        return {
            "tools": tools,
            "resources": resources,
            "prompts": prompts
        }
    
    async def validate_component_metadata(self, component_type: str) -> bool:
        """Validate that components have proper metadata."""
        components = await getattr(self.client, f"list_{component_type}")()
        
        for component in components:
            if not component.name:
                return False
            if not component.description:
                return False
                
        return True

@pytest.fixture
async def mcp_helper(journal_client):
    """Helper fixture for MCP testing operations."""
    return MCPTestHelper(journal_client)

def assert_valid_learning_entry(result):
    """Assert that a learning entry result is valid."""
    assert result is not None
    assert "✅ Recorded learning" in result.data
    assert result.data != ""

def assert_educational_prompt(prompt_text: str):
    """Assert that a prompt text has educational value."""
    educational_indicators = [
        "understand",
        "learn",  
        "apply",
        "think",
        "reflect"
    ]
    
    found = [indicator for indicator in educational_indicators 
             if indicator in prompt_text.lower()]
    
    assert len(found) >= 2, f"Prompt should be educational. Found indicators: {found}"

def assert_preserves_human_agency(prompt_text: str):
    """Assert that prompt preserves human decision-making."""
    # Should ask questions, not provide answers
    question_words = ["what", "how", "why", "when", "where"]
    has_questions = any(word in prompt_text.lower() for word in question_words)
    
    assert has_questions, "Prompt should ask questions to preserve human agency"
    
    # Should not be prescriptive
    prescriptive_phrases = ["you should", "you must", "the answer is"]
    is_prescriptive = any(phrase in prompt_text.lower() for phrase in prescriptive_phrases)
    
    assert not is_prescriptive, "Prompt should not be prescriptive"

Best Practices

What to Test

DO Test:

Core functionality works correctly
Error handling behaves appropriately
Human agency is preserved in interactions
Educational value is provided
Components integrate properly
Performance meets expectations

DON'T Test:

Implementation details that might change
External dependencies beyond your control
Obvious Python language features
FastMCP framework internals

Test Naming Conventions

# Good test names - describe behavior
def test_record_learning_preserves_human_input()
def test_reflection_prompt_encourages_thinking()
def test_stats_resource_calculates_correctly()

# Bad test names - describe implementation
def test_record_learning_function()
def test_file_writing()
def test_json_serialization()

Documentation in Tests

@pytest.mark.philosophy
async def test_human_wisdom_preservation(journal_client):
    """
    Test that the system preserves human wisdom in the learning process.
    
    This test validates the 80-20 philosophy:
    - AI provides organization and structure (80%)
    - Human provides insight and judgment (20%)
    - Human wisdom is never replaced by AI analysis
    """
    # Test implementation here

Continuous Testing

Pre-commit Hooks

Create .pre-commit-config.yaml:

repos:
  - repo: local
    hooks:
      - id: pytest-unit
        name: Run unit tests
        entry: pytest _mcp/tests/unit/ -x --tb=short
        language: system
        pass_filenames: false
        
      - id: pytest-integration  
        name: Run integration tests
        entry: pytest _mcp/tests/integration/ -x --tb=short
        language: system
        pass_filenames: false

GitHub Actions

Create .github/workflows/test.yml:

name: Test MCP

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: [3.10, 3.11, 3.12]
    
    steps:
    - uses: actions/checkout@v4
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v4
      with:
        python-version: ${{ matrix.python-version }}
        
    - name: Install dependencies
      run: |
        pip install -e .
        pip install pytest pytest-asyncio pytest-cov
        
    - name: Run unit tests
      run: pytest _mcp/tests/unit/ --cov=_mcp
      
    - name: Run integration tests
      run: pytest _mcp/tests/integration/
      
    - name: Run philosophy tests
      run: pytest -m philosophy --tb=short

Coverage Requirements

Add to _mcp/tests/pytest.ini:

[tool:pytest]
addopts = 
    --cov=_mcp
    --cov-report=term-missing
    --cov-report=html
    --cov-fail-under=80

Summary

Testing MCPs ensures they deliver on the 80-20 promise: AI automation that preserves human wisdom. Your test suite should validate:

✅ Functionality: All components work correctly
✅ Integration: Components work together seamlessly
✅ Philosophy: Human agency and wisdom are preserved
✅ Performance: System meets reliability expectations
✅ Education: Tools provide learning value

Testing Checklist

Unit Tests

All tools validate inputs correctly
All resources return expected data
All prompts generate appropriate content
Error handling works properly

Integration Tests

Server mounting preserves functionality
Components interact correctly
Data flows work end-to-end

E2E Tests

Complete user workflows function
Different transports work identically
Performance meets requirements

Philosophy Tests

Human decision-making is preserved
Educational opportunities are provided
Wisdom is encouraged, not replaced
Human agency is maintained

Resources

Creating Your First MCP - Build the MCP we tested
FastMCP Quickstart Guide - Modular architecture patterns
pytest Documentation - Complete testing framework
FastMCP Client Docs - Client testing patterns

Good tests are documentation that never lies. They preserve not just functionality, but the philosophy behind why we build what we build.

Tutorial MCP Testing

MCP Testing Strategies (Chapter 2)

Table of Contents

Introduction

Why Test Our Learning Journal MCP?

Testing Philosophy

Setting Up Your Test Environment

Install Testing Dependencies

Test Directory Structure

Configure pytest

Test Organization: Modular Structure

The Three-Layer Testing Approach

Base Test Configuration

Testing with FastMCP Client

Basic Client Testing Pattern

Unit Testing Components

Testing Tools

Testing Resources

Testing Prompts

Integration Testing

Testing Server Composition

Testing Component Interactions

End-to-End Testing

Testing Complete User Journeys

Testing Different Transports

Advanced Testing Patterns

Parameterized Tests

Testing Async Components

Performance Testing

Testing the 80-20 Philosophy

Validating Human Agency Preservation

Test Fixtures and Utilities

Advanced Fixtures

Test Utilities

Best Practices

What to Test

Test Naming Conventions

Documentation in Tests

Continuous Testing

Pre-commit Hooks

GitHub Actions

Coverage Requirements

Summary

Testing Checklist

Resources

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

🤝 Community Wiki

🏠 Home

Knowledge

Main Prompt

Agentic Workflows

📚 Tutorials

Getting Started

Building Tools

MCP Tutorial Series

Advanced Topics

🛠️ Tools & Examples

Storm Checker

Django Mercury

📖 Resources

Documentation Standards

Community Resources

🚀 Projects

Active Projects

Coming Soon

💬 Community

Get Involved

Support

Governance

Legend

Clone this wiki locally