-
Notifications
You must be signed in to change notification settings - Fork 0
Tutorial MCP Testing
Now that you've built your Learning Journal MCP, let's learn how to test it properly. Good tests are the foundation of trustworthy human-AI collaboration.
This tutorial continues directly from Creating Your First MCP. If you haven't completed that tutorial yet, please do so first.
- Introduction
- Setting Up Your Test Environment
- Test Organization: Modular Structure
- Testing with FastMCP Client
- Unit Testing Components
- Integration Testing
- End-to-End Testing
- Advanced Testing Patterns
- Test Fixtures and Utilities
- Testing the 80-20 Philosophy
- Best Practices
- Continuous Testing
In the previous tutorial, we built a Learning Journal MCP with tools, resources, and prompts. Now we need to test it properly to ensure it's reliable, maintainable, and preserves the 80-20 philosophy.
Testing MCPs is not just about verifying code works - it's about ensuring your tools preserve human agency while delivering AI capabilities. When we test our Learning Journal MCP, we're validating that the 80-20 balance is maintained and human wisdom remains central.
- Human Safety: Ensure AI assistance doesn't replace human judgment in learning
- Data Integrity: Verify journal entries are saved and retrieved correctly
- Reliability: The MCP must work consistently across different scenarios
- Educational Value: Tests document how our Learning Journal should behave
- Community Trust: Well-tested tools earn community adoption
Following the 80-20 approach with our Learning Journal:
- 80% Automated Testing: Verify functionality, edge cases, and performance
- 20% Human Validation: Ensure learning wisdom, integrity, and compassion are preserved
- 100% Responsibility: We own the quality of our Learning Journal MCP
# Install pytest and common testing tools
pip install pytest pytest-asyncio pytest-cov pytest-mock
# For advanced testing (optional)
pip install hypothesis factory-boy freezegunFollowing the modular pattern from our community tutorials, organize tests in _mcp/tests/:
your-project/
βββ _mcp/
βββ main.py
βββ servers/
β βββ journal.py
β βββ analytics.py
βββ tests/ # Test suite root
βββ conftest.py # Shared fixtures
βββ test_main.py # Main server tests
βββ unit/ # Unit tests by component
β βββ __init__.py
β βββ test_tools.py
β βββ test_resources.py
β βββ test_prompts.py
βββ integration/ # Integration tests
β βββ __init__.py
β βββ test_server_mounting.py
β βββ test_workflows.py
βββ e2e/ # End-to-end tests
β βββ __init__.py
β βββ test_complete_flows.py
β βββ test_transport_scenarios.py
βββ fixtures/ # Test data and utilities
βββ __init__.py
βββ sample_data.py
βββ mock_responses.py
Create _mcp/tests/pytest.ini:
[tool:pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
addopts =
--strict-markers
--verbose
--tb=short
--cov=_mcp
--cov-report=term-missing
--cov-report=html
asyncio_mode = auto
markers =
unit: Unit tests for individual components
integration: Integration tests for component interactions
e2e: End-to-end workflow tests
slow: Tests that take significant time
philosophy: Tests that validate 80-20 principlesUnit Tests - Test individual components in isolation Integration Tests - Test how components work together End-to-End Tests - Test complete user workflows
Let's build this with the Learning Journal MCP from our Creating Your First MCP tutorial.
Create _mcp/tests/conftest.py:
"""
Shared test fixtures and configuration for MCP testing
"""
import pytest
import tempfile
import shutil
from pathlib import Path
from fastmcp import FastMCP, Client
from servers.journal import journal_server
@pytest.fixture
def temp_journal_dir():
"""Create a temporary directory for journal entries during testing."""
temp_dir = tempfile.mkdtemp()
yield Path(temp_dir)
shutil.rmtree(temp_dir)
@pytest.fixture
def journal_mcp():
"""Create a journal MCP server for testing."""
mcp = FastMCP("Test Journal MCP")
mcp.mount(journal_server)
return mcp
@pytest.fixture
async def journal_client(journal_mcp):
"""Create a client connected to the journal MCP."""
client = Client(journal_mcp)
async with client:
yield client
@pytest.fixture
def sample_learning_entry():
"""Sample learning entry data for tests."""
return {
"topic": "FastMCP Testing",
"key_insight": "Testing MCPs ensures human wisdom is preserved",
"confidence_level": 8
}
@pytest.fixture
def sample_journal_content():
"""Sample journal file content."""
return """
## 2024-01-15 10:30 - Python Testing
**Key Insight**: pytest is the gold standard for Python testing
**Confidence Level**: 9/10
---
## 2024-01-15 14:20 - MCP Architecture
**Key Insight**: Modular design makes MCPs maintainable
**Confidence Level**: 7/10
---
"""The FastMCP Client's in-memory transport is perfect for testing. It eliminates network complexity and provides direct access to your MCP server.
Create _mcp/tests/test_main.py:
"""
Test the main MCP server functionality
"""
import pytest
from fastmcp import Client
@pytest.mark.unit
async def test_server_initialization(journal_client):
"""Test that the MCP server initializes correctly."""
# Test server is accessible
await journal_client.ping()
# Test server has expected components
tools = await journal_client.list_tools()
resources = await journal_client.list_resources()
prompts = await journal_client.list_prompts()
assert len(tools) > 0, "Server should have tools"
assert len(resources) > 0, "Server should have resources"
assert len(prompts) > 0, "Server should have prompts"
@pytest.mark.integration
async def test_server_metadata(journal_client):
"""Test server provides proper metadata."""
tools = await journal_client.list_tools()
# Find the record_learning tool
record_tool = next((t for t in tools if t.name == "record_learning"), None)
assert record_tool is not None, "Should have record_learning tool"
assert record_tool.description, "Tool should have description"
assert record_tool.inputSchema, "Tool should have input schema"Unit tests validate individual components work correctly in isolation.
Create _mcp/tests/unit/test_tools.py:
"""
Unit tests for MCP tools
"""
import pytest
import tempfile
from pathlib import Path
from unittest.mock import patch, mock_open
@pytest.mark.unit
async def test_record_learning_success(journal_client, sample_learning_entry, temp_journal_dir):
"""Test successful learning entry recording."""
with patch('servers.journal.Path') as mock_path:
mock_path.return_value = temp_journal_dir
mock_path.return_value.mkdir.return_value = None
result = await journal_client.call_tool(
"record_learning",
sample_learning_entry
)
assert "β
Recorded learning" in result.data
assert sample_learning_entry["topic"] in result.data
@pytest.mark.unit
async def test_record_learning_validation(journal_client):
"""Test tool validates input parameters."""
# Test missing required parameters
with pytest.raises(Exception): # FastMCP will validate and raise
await journal_client.call_tool("record_learning", {})
# Test invalid confidence level
with pytest.raises(Exception):
await journal_client.call_tool("record_learning", {
"topic": "Test",
"key_insight": "Test insight",
"confidence_level": 15 # Should be 1-10
})
@pytest.mark.unit
async def test_record_learning_preserves_human_input(journal_client, temp_journal_dir):
"""Test that tool preserves exactly what human provided."""
human_input = {
"topic": "Human Wisdom in AI",
"key_insight": "Humans must remain in control of important decisions",
"confidence_level": 10
}
with patch('servers.journal.Path') as mock_path:
mock_path.return_value = temp_journal_dir
# Mock file writing to capture what gets written
with patch('builtins.open', mock_open()) as mock_file:
await journal_client.call_tool("record_learning", human_input)
# Verify human input is preserved in the file
written_content = mock_file().write.call_args[0][0]
assert human_input["topic"] in written_content
assert human_input["key_insight"] in written_content
assert str(human_input["confidence_level"]) in written_content
@pytest.mark.philosophy
async def test_tool_requires_human_decision(journal_client):
"""Test that tools preserve human decision-making."""
# The record_learning tool should require human input for:
# - What to learn (topic)
# - What was learned (insight)
# - Confidence assessment (self-evaluation)
entry = {
"topic": "AI Ethics",
"key_insight": "AI should enhance human capability, not replace it",
"confidence_level": 9
}
result = await journal_client.call_tool("record_learning", entry)
# Verify the tool doesn't modify human judgment
assert entry["confidence_level"] == 9 # Human's self-assessment preserved
assert "AI Ethics" in result.data # Human's topic choice preservedCreate _mcp/tests/unit/test_resources.py:
"""
Unit tests for MCP resources
"""
import pytest
from pathlib import Path
from unittest.mock import patch, mock_open
@pytest.mark.unit
async def test_get_recent_entries_empty(journal_client):
"""Test resource behavior when no entries exist."""
with patch('servers.journal.Path') as mock_path:
mock_path.return_value.exists.return_value = False
content = await journal_client.read_resource("journal://recent")
assert "No journal entries yet" in content
@pytest.mark.unit
async def test_get_recent_entries_with_data(journal_client, sample_journal_content):
"""Test resource returns recent entries correctly."""
with patch('servers.journal.Path') as mock_path:
# Mock the path exists and has files
mock_path.return_value.exists.return_value = True
mock_path.return_value.glob.return_value = [Path("2024-01-15.md")]
with patch('builtins.open', mock_open(read_data=sample_journal_content)):
content = await journal_client.read_resource("journal://recent")
assert "Python Testing" in content
assert "MCP Architecture" in content
assert "pytest is the gold standard" in content
@pytest.mark.unit
async def test_get_learning_stats(journal_client, sample_journal_content):
"""Test statistics resource calculates correctly."""
with patch('servers.journal.Path') as mock_path:
mock_path.return_value.exists.return_value = True
mock_path.return_value.glob.return_value = [Path("2024-01-15.md")]
with patch('builtins.open', mock_open(read_data=sample_journal_content)):
stats = await journal_client.read_resource("journal://stats")
# Should count entries (marked by ## headers)
assert stats["total_entries"] == 2
assert stats["days_active"] == 1
@pytest.mark.philosophy
async def test_resources_provide_context_not_decisions(journal_client):
"""Test that resources provide information but don't make decisions."""
# Resources should give humans data to make informed decisions
# They should NOT interpret or recommend actions
stats = await journal_client.read_resource("journal://stats")
# Should provide raw data
assert "total_entries" in stats
assert "days_active" in stats
# Should NOT provide interpretations like:
# - "You're learning too slowly"
# - "Focus more on topic X"
# - "You should be more confident"
# These are human judgments
assert "should" not in str(stats).lower()
assert "need to" not in str(stats).lower()
assert "recommend" not in str(stats).lower()Create _mcp/tests/unit/test_prompts.py:
"""
Unit tests for MCP prompts
"""
import pytest
@pytest.mark.unit
async def test_reflection_prompt_generation(journal_client):
"""Test reflection prompt generates appropriate questions."""
prompts = await journal_client.list_prompts()
reflection_prompt = next((p for p in prompts if p.name == "reflection_prompt"), None)
assert reflection_prompt is not None, "Should have reflection_prompt"
# Test prompt with topic
result = await journal_client.get_prompt("reflection_prompt", {"topic": "Machine Learning"})
assert len(result.messages) == 1
message_content = result.messages[0].content.text
# Should include the topic
assert "Machine Learning" in message_content
# Should ask thought-provoking questions
assert "understand well" in message_content
assert "still unclear" in message_content
assert "apply" in message_content
@pytest.mark.philosophy
async def test_prompts_encourage_human_reflection(journal_client):
"""Test that prompts encourage human thinking, not AI answers."""
result = await journal_client.get_prompt("reflection_prompt", {"topic": "AI Safety"})
prompt_text = result.messages[0].content.text
# Should ask questions that require human judgment
questions = [
"What do I understand", # Personal understanding
"What aspects...are still unclear", # Self-assessment
"How can I apply", # Personal application
"What would I teach", # Knowledge verification
"What questions do I still have" # Curiosity preservation
]
for question_phrase in questions:
assert question_phrase in prompt_text, f"Missing human-centered question: {question_phrase}"
# Should NOT provide answers or shortcuts
avoid_phrases = [
"The answer is",
"You should think",
"The correct approach",
"Here's what you need to know"
]
for avoid_phrase in avoid_phrases:
assert avoid_phrase not in prompt_text, f"Prompt shouldn't provide answers: {avoid_phrase}"
@pytest.mark.unit
async def test_prompt_parameter_handling(journal_client):
"""Test prompt handles parameters correctly."""
# Test with different topics
topics = ["Python", "Database Design", "Team Leadership"]
for topic in topics:
result = await journal_client.get_prompt("reflection_prompt", {"topic": topic})
content = result.messages[0].content.text
assert topic in content, f"Prompt should include topic: {topic}"
assert len(content) > 100, "Prompt should be substantial"Integration tests verify that components work together correctly.
Create _mcp/tests/integration/test_server_mounting.py:
"""
Integration tests for server composition and mounting
"""
import pytest
from fastmcp import FastMCP, Client
from servers.journal import journal_server
@pytest.mark.integration
async def test_server_mounting():
"""Test mounting servers preserves all functionality."""
# Create main server
main_server = FastMCP("Integration Test Server")
# Mount journal server
main_server.mount(journal_server)
# Test via client
client = Client(main_server)
async with client:
# Should have tools from mounted server
tools = await client.list_tools()
tool_names = [tool.name for tool in tools]
assert "record_learning" in tool_names
# Should have resources from mounted server
resources = await client.list_resources()
resource_uris = [resource.uri for resource in resources]
assert "journal://recent" in resource_uris
assert "journal://stats" in resource_uris
# Test functionality works
result = await client.call_tool("record_learning", {
"topic": "Integration Testing",
"key_insight": "Server mounting preserves functionality",
"confidence_level": 8
})
assert "β
Recorded learning" in result.data
@pytest.mark.integration
async def test_multiple_server_mounting():
"""Test mounting multiple journal-related servers works correctly."""
# In a real Learning Journal MCP, you might have multiple modules
# Example: journal_server + review_server
journal_server = FastMCP("Journal Core")
review_server = FastMCP("Review System")
@journal_server.tool
def quick_note(note: str) -> str:
return f"β
Quick note: {note}"
@review_server.tool
def schedule_review(topic: str) -> str:
return f"π
Review scheduled for: {topic}"
# Mount both into main Learning Journal MCP
main_server = FastMCP("Learning Journal MCP")
main_server.mount(journal_server)
main_server.mount(review_server)
# Test both modules work together
client = Client(main_server)
async with client:
note_result = await client.call_tool("quick_note", {"note": "MCP testing is important"})
review_result = await client.call_tool("schedule_review", {"topic": "MCP Testing"})
assert "β
Quick note" in note_result.data
assert "π
Review scheduled" in review_result.dataCreate _mcp/tests/integration/test_workflows.py:
"""
Integration tests for complete workflows
"""
import pytest
from pathlib import Path
from unittest.mock import patch
@pytest.mark.integration
async def test_learning_workflow(journal_client, temp_journal_dir):
"""Test complete learning workflow: record β read β reflect."""
with patch('servers.journal.Path') as mock_path:
mock_path.return_value = temp_journal_dir
# Step 1: Record learning
await journal_client.call_tool("record_learning", {
"topic": "Workflow Testing",
"key_insight": "End-to-end testing validates user journeys",
"confidence_level": 9
})
# Step 2: Read back entries (simulate reading from file)
with patch('builtins.open') as mock_file:
mock_file.return_value.__enter__.return_value.read.return_value = """
## 2024-01-15 10:30 - Workflow Testing
**Key Insight**: End-to-end testing validates user journeys
**Confidence Level**: 9/10
---
"""
recent = await journal_client.read_resource("journal://recent")
assert "Workflow Testing" in recent
assert "End-to-end testing" in recent
# Step 3: Generate reflection prompt
reflection = await journal_client.get_prompt("reflection_prompt", {
"topic": "Workflow Testing"
})
assert "Workflow Testing" in reflection.messages[0].content.text
assert "understand well" in reflection.messages[0].content.text
@pytest.mark.integration
async def test_data_consistency(journal_client):
"""Test data consistency across components."""
# Record multiple entries
topics = ["Topic A", "Topic B", "Topic C"]
for i, topic in enumerate(topics):
await journal_client.call_tool("record_learning", {
"topic": topic,
"key_insight": f"Insight {i+1}",
"confidence_level": i + 5
})
with patch('servers.journal.Path') as mock_path:
mock_path.return_value.exists.return_value = True
mock_path.return_value.glob.return_value = [Path("test.md")]
# Mock file content with all entries
file_content = "## Entry A\n## Entry B\n## Entry C\n"
with patch('builtins.open') as mock_file:
mock_file.return_value.__enter__.return_value.read.return_value = file_content
# Test stats reflect all entries
stats = await journal_client.read_resource("journal://stats")
assert stats["total_entries"] == 3E2E tests validate complete user scenarios across the entire system.
Create _mcp/tests/e2e/test_complete_flows.py:
"""
End-to-end tests for complete user workflows
"""
import pytest
import tempfile
import shutil
from pathlib import Path
from fastmcp import FastMCP, Client
@pytest.mark.e2e
async def test_new_user_learning_journey():
"""Test a new user's complete learning journey."""
# Create a real temporary directory for this test
temp_dir = tempfile.mkdtemp()
journal_dir = Path(temp_dir) / "journal_entries"
try:
# Create server with real file system
from servers.journal import journal_server
main_server = FastMCP("E2E Test Server")
main_server.mount(journal_server)
# Use real client
client = Client(main_server)
async with client:
# New user starts - should have no entries
stats = await client.read_resource("journal://stats")
assert stats["total_entries"] == 0
# User records first learning
result1 = await client.call_tool("record_learning", {
"topic": "MCP Basics",
"key_insight": "MCPs connect AI to real-world tools",
"confidence_level": 6
})
assert "β
Recorded learning" in result1.data
# User records second learning
result2 = await client.call_tool("record_learning", {
"topic": "Testing Strategy",
"key_insight": "E2E tests validate complete user journeys",
"confidence_level": 8
})
assert "β
Recorded learning" in result2.data
# User checks their progress
new_stats = await client.read_resource("journal://stats")
assert new_stats["total_entries"] >= 2
assert new_stats["days_active"] >= 1
# User reviews recent entries
recent = await client.read_resource("journal://recent")
assert "MCP Basics" in recent
assert "Testing Strategy" in recent
# User reflects on learning
reflection = await client.get_prompt("reflection_prompt", {
"topic": "MCP Basics"
})
prompt_text = reflection.messages[0].content.text
assert "MCP Basics" in prompt_text
assert "What do I understand well" in prompt_text
finally:
# Clean up
shutil.rmtree(temp_dir)
@pytest.mark.e2e
@pytest.mark.slow
async def test_learning_over_time():
"""Test learning patterns over multiple days."""
temp_dir = tempfile.mkdtemp()
try:
from servers.journal import journal_server
server = FastMCP("Time-based Test")
server.mount(journal_server)
client = Client(server)
async with client:
# Simulate learning over several days
learning_data = [
("Day 1", "Python Basics", "Variables store data", 5),
("Day 1", "Python Functions", "Functions encapsulate logic", 6),
("Day 2", "Testing", "Tests ensure code quality", 7),
("Day 3", "MCP Architecture", "MCPs enable AI-human collaboration", 8)
]
for day, topic, insight, confidence in learning_data:
await client.call_tool("record_learning", {
"topic": f"{day}: {topic}",
"key_insight": insight,
"confidence_level": confidence
})
# Check final statistics
stats = await client.read_resource("journal://stats")
assert stats["total_entries"] == 4
# Verify recent entries include latest learning
recent = await client.read_resource("journal://recent")
assert "MCP Architecture" in recent
finally:
shutil.rmtree(temp_dir)Create _mcp/tests/e2e/test_transport_scenarios.py:
"""
End-to-end tests for different transport scenarios
"""
import pytest
import asyncio
from fastmcp import FastMCP, Client
@pytest.mark.e2e
async def test_stdio_transport():
"""Test MCP works with stdio transport (script execution)."""
# This test would run the actual script
# For now, we test the in-memory equivalent
from main import mcp
client = Client(mcp)
async with client:
await client.ping()
tools = await client.list_tools()
assert len(tools) > 0
result = await client.call_tool("record_learning", {
"topic": "Transport Testing",
"key_insight": "Different transports should behave identically",
"confidence_level": 7
})
assert "β
Recorded learning" in result.data
@pytest.mark.e2e
@pytest.mark.slow
async def test_http_transport_simulation():
"""Test MCP behavior that would occur over HTTP."""
# Simulate HTTP-like conditions:
# - Stateless requests
# - Network delays
# - Potential timeouts
server = FastMCP("HTTP Simulation")
@server.tool
async def slow_operation() -> str:
"""Simulate network delay."""
await asyncio.sleep(0.1) # Simulate network latency
return "Completed after delay"
client = Client(server)
async with client:
# Test timeout handling
result = await client.call_tool("slow_operation", timeout=5.0)
assert result.data == "Completed after delay"
# Test multiple rapid requests (like HTTP)
tasks = []
for i in range(5):
task = client.call_tool("slow_operation")
tasks.append(task)
results = await asyncio.gather(*tasks)
assert len(results) == 5
assert all(r.data == "Completed after delay" for r in results)"""
Advanced testing patterns for comprehensive coverage
"""
import pytest
@pytest.mark.parametrize("confidence_level,expected_valid", [
(1, True), # Minimum valid
(5, True), # Middle valid
(10, True), # Maximum valid
(0, False), # Too low
(11, False), # Too high
(-1, False), # Negative
])
@pytest.mark.unit
async def test_confidence_level_validation(journal_client, confidence_level, expected_valid):
"""Test confidence level validation with various inputs."""
try:
result = await journal_client.call_tool("record_learning", {
"topic": "Validation Test",
"key_insight": "Testing edge cases prevents bugs",
"confidence_level": confidence_level
})
# If we get here, the call succeeded
if expected_valid:
assert "β
Recorded learning" in result.data
else:
pytest.fail(f"Expected validation error for confidence_level={confidence_level}")
except Exception as e:
# If we get here, the call failed
if expected_valid:
pytest.fail(f"Unexpected error for valid confidence_level={confidence_level}: {e}")
else:
# Expected failure - test passes
pass
@pytest.mark.parametrize("topic,insight", [
("Python", "Simple is better than complex"),
("Machine Learning", "Data quality matters more than algorithms"),
("Team Leadership", "Trust is the foundation of high-performing teams"),
("", "Empty topic should be handled gracefully"), # Edge case
("A" * 1000, "Very long topic"), # Long input
])
@pytest.mark.unit
async def test_various_learning_topics(journal_client, topic, insight):
"""Test recording various types of learning topics."""
result = await journal_client.call_tool("record_learning", {
"topic": topic,
"key_insight": insight,
"confidence_level": 7
})
if topic: # Non-empty topic
assert "β
Recorded learning" in result.data
assert topic in result.data or len(topic) > 100 # Long topics might be truncated
else: # Empty topic - should handle gracefully
# Depending on implementation, might succeed or fail
# Just ensure it doesn't crash
assert isinstance(result.data, str)@pytest.mark.asyncio
async def test_concurrent_operations(journal_client):
"""Test MCP handles concurrent operations correctly."""
# Create multiple concurrent learning recordings
tasks = []
for i in range(10):
task = journal_client.call_tool("record_learning", {
"topic": f"Concurrent Topic {i}",
"key_insight": f"Insight number {i}",
"confidence_level": 5 + (i % 5)
})
tasks.append(task)
# Run all concurrently
results = await asyncio.gather(*tasks, return_exceptions=True)
# All should succeed
for i, result in enumerate(results):
if isinstance(result, Exception):
pytest.fail(f"Task {i} failed: {result}")
else:
assert "β
Recorded learning" in result.data
@pytest.mark.asyncio
async def test_resource_caching_behavior(journal_client):
"""Test that resources behave correctly with repeated access."""
# Read resource multiple times rapidly
results = []
for _ in range(5):
stats = await journal_client.read_resource("journal://stats")
results.append(stats)
# Results should be consistent
first_result = results[0]
for result in results[1:]:
assert result == first_result, "Resource results should be consistent"@pytest.mark.slow
@pytest.mark.performance
async def test_tool_performance(journal_client):
"""Test tool performance under load."""
import time
start_time = time.time()
# Record many learning entries
for i in range(100):
await journal_client.call_tool("record_learning", {
"topic": f"Performance Test {i}",
"key_insight": f"Learning insight {i}",
"confidence_level": 5 + (i % 5)
})
end_time = time.time()
duration = end_time - start_time
# Should complete within reasonable time
assert duration < 10.0, f"Performance test took too long: {duration}s"
# Calculate average time per operation
avg_time = duration / 100
assert avg_time < 0.1, f"Average operation time too slow: {avg_time}s"@pytest.mark.philosophy
async def test_human_agency_preservation(journal_client):
"""Test that MCP preserves human agency and decision-making."""
# Record a learning entry
human_decision = {
"topic": "AI Ethics",
"key_insight": "Humans must remain in control of critical decisions",
"confidence_level": 9 # Human's self-assessment
}
result = await journal_client.call_tool("record_learning", human_decision)
# Tool should preserve human's exact input without modification
assert "β
Recorded learning" in result.data
# Get reflection prompt
reflection = await journal_client.get_prompt("reflection_prompt", {
"topic": human_decision["topic"]
})
prompt_text = reflection.messages[0].content.text
# Prompt should ask questions, not provide answers
question_indicators = ["What do", "How can", "What would", "What questions"]
assert any(indicator in prompt_text for indicator in question_indicators)
# Should NOT contain prescriptive statements
prescriptive_phrases = ["You should", "You must", "The correct answer"]
assert not any(phrase in prompt_text for phrase in prescriptive_phrases)
@pytest.mark.philosophy
async def test_wisdom_preservation(journal_client):
"""Test that the system preserves and encourages human wisdom."""
# Test reflection prompt encourages deep thinking
reflection = await journal_client.get_prompt("reflection_prompt", {
"topic": "Learning Philosophy"
})
prompt_text = reflection.messages[0].content.text
# Should encourage multiple perspectives
wisdom_indicators = [
"understand well", # Self-knowledge
"still unclear", # Humility
"apply", # Practical wisdom
"teach someone else", # Knowledge transfer
"questions do I still have" # Continuous learning
]
for indicator in wisdom_indicators:
assert indicator in prompt_text, f"Missing wisdom indicator: {indicator}"
@pytest.mark.philosophy
async def test_educational_value(journal_client):
"""Test that interactions provide educational value."""
# Record learning with low confidence
await journal_client.call_tool("record_learning", {
"topic": "Complex Topic",
"key_insight": "This is challenging to understand",
"confidence_level": 3 # Low confidence - needs more learning
})
# Get reflection prompt for this topic
reflection = await journal_client.get_prompt("reflection_prompt", {
"topic": "Complex Topic"
})
prompt_text = reflection.messages[0].content.text
# Should encourage learning and growth
educational_elements = [
"unclear", # Acknowledges learning gaps
"apply", # Encourages practical application
"questions", # Promotes curiosity
"teach", # Tests understanding
]
found_elements = [elem for elem in educational_elements if elem in prompt_text]
assert len(found_elements) >= 3, f"Should have educational elements: {found_elements}"Create _mcp/tests/fixtures/test_data.py:
"""
Advanced test fixtures and utilities
"""
import pytest
from datetime import datetime, timedelta
from pathlib import Path
import json
class JournalTestData:
"""Factory for creating test journal data."""
@staticmethod
def learning_entry(
topic: str = "Test Topic",
insight: str = "Test insight",
confidence: int = 5,
timestamp: datetime = None
) -> dict:
"""Create a learning entry for testing."""
return {
"topic": topic,
"key_insight": insight,
"confidence_level": confidence,
"timestamp": timestamp or datetime.now()
}
@staticmethod
def journal_file_content(entries: list) -> str:
"""Create journal file content from entries."""
content = ""
for entry in entries:
timestamp = entry.get("timestamp", datetime.now())
content += f"""
## {timestamp.strftime("%Y-%m-%d %H:%M")} - {entry["topic"]}
**Key Insight**: {entry["key_insight"]}
**Confidence Level**: {entry["confidence_level"]}/10
---
"""
return content
@pytest.fixture
def journal_factory():
"""Factory fixture for creating journal test data."""
return JournalTestData
@pytest.fixture
def sample_week_of_learning():
"""Sample learning data for a week."""
base_date = datetime(2024, 1, 15, 10, 0)
return [
{
"topic": "Python Basics",
"key_insight": "Functions make code reusable",
"confidence_level": 8,
"timestamp": base_date
},
{
"topic": "Testing",
"key_insight": "Tests prevent regression bugs",
"confidence_level": 7,
"timestamp": base_date + timedelta(days=1)
},
{
"topic": "MCP Architecture",
"key_insight": "Modular design improves maintainability",
"confidence_level": 6,
"timestamp": base_date + timedelta(days=2)
}
]
@pytest.fixture
def mock_file_system(tmp_path):
"""Create a mock file system for testing."""
journal_dir = tmp_path / "journal_entries"
journal_dir.mkdir()
# Create some sample files
(journal_dir / "2024-01-15.md").write_text("""
## 2024-01-15 10:30 - Python Testing
**Key Insight**: pytest is the gold standard
**Confidence Level**: 9/10
---
""")
return journal_dirCreate _mcp/tests/fixtures/helpers.py:
"""
Test helper functions and utilities
"""
import asyncio
from typing import Any, Dict, List
from fastmcp import Client
class MCPTestHelper:
"""Helper class for common MCP testing operations."""
def __init__(self, client: Client):
self.client = client
async def record_multiple_learnings(self, learnings: List[Dict[str, Any]]) -> List[Any]:
"""Record multiple learning entries."""
results = []
for learning in learnings:
result = await self.client.call_tool("record_learning", learning)
results.append(result)
return results
async def get_all_components(self) -> Dict[str, List]:
"""Get all tools, resources, and prompts."""
tools, resources, prompts = await asyncio.gather(
self.client.list_tools(),
self.client.list_resources(),
self.client.list_prompts()
)
return {
"tools": tools,
"resources": resources,
"prompts": prompts
}
async def validate_component_metadata(self, component_type: str) -> bool:
"""Validate that components have proper metadata."""
components = await getattr(self.client, f"list_{component_type}")()
for component in components:
if not component.name:
return False
if not component.description:
return False
return True
@pytest.fixture
async def mcp_helper(journal_client):
"""Helper fixture for MCP testing operations."""
return MCPTestHelper(journal_client)
def assert_valid_learning_entry(result):
"""Assert that a learning entry result is valid."""
assert result is not None
assert "β
Recorded learning" in result.data
assert result.data != ""
def assert_educational_prompt(prompt_text: str):
"""Assert that a prompt text has educational value."""
educational_indicators = [
"understand",
"learn",
"apply",
"think",
"reflect"
]
found = [indicator for indicator in educational_indicators
if indicator in prompt_text.lower()]
assert len(found) >= 2, f"Prompt should be educational. Found indicators: {found}"
def assert_preserves_human_agency(prompt_text: str):
"""Assert that prompt preserves human decision-making."""
# Should ask questions, not provide answers
question_words = ["what", "how", "why", "when", "where"]
has_questions = any(word in prompt_text.lower() for word in question_words)
assert has_questions, "Prompt should ask questions to preserve human agency"
# Should not be prescriptive
prescriptive_phrases = ["you should", "you must", "the answer is"]
is_prescriptive = any(phrase in prompt_text.lower() for phrase in prescriptive_phrases)
assert not is_prescriptive, "Prompt should not be prescriptive"DO Test:
- Core functionality works correctly
- Error handling behaves appropriately
- Human agency is preserved in interactions
- Educational value is provided
- Components integrate properly
- Performance meets expectations
DON'T Test:
- Implementation details that might change
- External dependencies beyond your control
- Obvious Python language features
- FastMCP framework internals
# Good test names - describe behavior
def test_record_learning_preserves_human_input()
def test_reflection_prompt_encourages_thinking()
def test_stats_resource_calculates_correctly()
# Bad test names - describe implementation
def test_record_learning_function()
def test_file_writing()
def test_json_serialization()@pytest.mark.philosophy
async def test_human_wisdom_preservation(journal_client):
"""
Test that the system preserves human wisdom in the learning process.
This test validates the 80-20 philosophy:
- AI provides organization and structure (80%)
- Human provides insight and judgment (20%)
- Human wisdom is never replaced by AI analysis
"""
# Test implementation hereCreate .pre-commit-config.yaml:
repos:
- repo: local
hooks:
- id: pytest-unit
name: Run unit tests
entry: pytest _mcp/tests/unit/ -x --tb=short
language: system
pass_filenames: false
- id: pytest-integration
name: Run integration tests
entry: pytest _mcp/tests/integration/ -x --tb=short
language: system
pass_filenames: falseCreate .github/workflows/test.yml:
name: Test MCP
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.10, 3.11, 3.12]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
pip install -e .
pip install pytest pytest-asyncio pytest-cov
- name: Run unit tests
run: pytest _mcp/tests/unit/ --cov=_mcp
- name: Run integration tests
run: pytest _mcp/tests/integration/
- name: Run philosophy tests
run: pytest -m philosophy --tb=shortAdd to _mcp/tests/pytest.ini:
[tool:pytest]
addopts =
--cov=_mcp
--cov-report=term-missing
--cov-report=html
--cov-fail-under=80Testing MCPs ensures they deliver on the 80-20 promise: AI automation that preserves human wisdom. Your test suite should validate:
β
Functionality: All components work correctly
β
Integration: Components work together seamlessly
β
Philosophy: Human agency and wisdom are preserved
β
Performance: System meets reliability expectations
β
Education: Tools provide learning value
Unit Tests
- All tools validate inputs correctly
- All resources return expected data
- All prompts generate appropriate content
- Error handling works properly
Integration Tests
- Server mounting preserves functionality
- Components interact correctly
- Data flows work end-to-end
E2E Tests
- Complete user workflows function
- Different transports work identically
- Performance meets requirements
Philosophy Tests
- Human decision-making is preserved
- Educational opportunities are provided
- Wisdom is encouraged, not replaced
- Human agency is maintained
- Creating Your First MCP - Build the MCP we tested
- FastMCP Quickstart Guide - Modular architecture patterns
- pytest Documentation - Complete testing framework
- FastMCP Client Docs - Client testing patterns
Good tests are documentation that never lies. They preserve not just functionality, but the philosophy behind why we build what we build.
- What is a Main Prompt
- Main Prompt Best Practices
- Major Anti-Patterns for Main Prompts
- Major Human-Patterns for Main Prompts
- What is an Agentic Workflow? (SOON)
- Agentic Workflow Best Practices (SOON)
- Major Anti-Patterns for Agentic Workflows
- Major Human-Patterns for Agentic Workflows
- Chapter 1: Creating Your First MCP - Build a Learning Journal
- Chapter 2: MCP Testing Strategies - Test Your Learning Journal
- Modular MCP Architecture (WIP)
- Publishing to PyPI (TODO)
- MCP Security Best Practices (TODO)
- Storm Checker Overview (WIP)
- Example Pictures
- Storm Checker MCP (TODO)
- Django Mercury Overview (WIP)
- Performance Testing Guide (TODO)
- Django Mercury MCP (TODO)
- Writing Standards (WIP)
- Code Documentation (TODO)
- API Documentation (TODO)
- Contributing Guide (TODO)
- Code of Conduct (TODO)
- Community Events (TODO)
- Storm Checker MCP (In Development)
- Django Mercury MCP (In Development)
- EduLite Assistant MCP (Planning)
- Getting Help (TODO)
- FAQ (TODO)
- Troubleshooting (TODO)
- (TODO) - Planned content
- (WIP) - Work in progress
- No tag - Available now
Building tools that preserve human wisdom while embracing AI efficiency