Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
4c5a9ea
Enhance JSON Schema parsing with support for format validation, enum …
Agent-Hellboy Sep 2, 2025
6ba5aad
Add JSON Schema parser and property-based invariants system
Agent-Hellboy Sep 2, 2025
1b9d98f
Reorganize tests into component-based structure
Agent-Hellboy Sep 2, 2025
60e2b84
resolve comments
Agent-Hellboy Sep 2, 2025
ca63d4c
resolve comments
Agent-Hellboy Sep 2, 2025
29acdd2
resolve comments
Agent-Hellboy Sep 2, 2025
1a51639
resolve comments
Agent-Hellboy Sep 2, 2025
195542e
Merge pull request #75 from Agent-Hellboy/schema_parser_for_fuzzer
Agent-Hellboy Sep 2, 2025
021501b
resolve comments
Agent-Hellboy Sep 3, 2025
7852052
Reorganize tests into component-based structure
Agent-Hellboy Sep 2, 2025
4bcc324
resolve comments
Agent-Hellboy Sep 3, 2025
c11b082
Merge remote changes and convert test_realistic_strategies.py to pyte…
Agent-Hellboy Sep 3, 2025
26c2bf9
resolve comments
Agent-Hellboy Sep 4, 2025
6dec921
resolve comments
Agent-Hellboy Sep 4, 2025
f7b4ff2
resolve comments
Agent-Hellboy Sep 4, 2025
7059ec1
Add manual run support for full test run
Agent-Hellboy Sep 4, 2025
3d451fa
resolve comments
Agent-Hellboy Sep 4, 2025
d17897a
ci(tests): re-enable PR/push triggers, add permissions and concurrency
Agent-Hellboy Sep 4, 2025
b4ad42a
resolve comments
Agent-Hellboy Sep 4, 2025
ee07f66
resolve comments
Agent-Hellboy Sep 4, 2025
49a4161
ci(tests): gate Codecov upload by token; skip forks; require coverage…
Agent-Hellboy Sep 4, 2025
7904614
fix: resolve all remaining PR #76 comments - component tests workflow…
Agent-Hellboy Sep 4, 2025
027ce10
fix: resolve critical PR #76 comments - executor deadlock, workflow a…
Agent-Hellboy Sep 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions .github/workflows/component-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
name: Component Tests

on:
workflow_dispatch:
inputs:
components:
description: "Comma-separated components to run (auth,cli,client,fuzz_engine,safety[safety_system],transport)"
required: false
default: ""

permissions:
contents: read
id-token: write
actions: read

jobs:
component-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .
pip install pytest pytest-cov pytest-asyncio

- name: Determine changed components
id: changes
run: |
# If manual input is provided, use that. Otherwise, run all components.
INPUTS="${{ github.event.inputs.components }}"
if [ -n "$INPUTS" ]; then
AUTH_CHANGES=false
CLI_CHANGES=false
CLIENT_CHANGES=false
FUZZ_ENGINE_CHANGES=false
SAFETY_CHANGES=false
TRANSPORT_CHANGES=false
IFS=',' read -ra TOKENS <<< "$INPUTS"
for t in "${TOKENS[@]}"; do
t="${t//[[:space:]]/}"
case "$t" in
auth) AUTH_CHANGES=true ;;
cli) CLI_CHANGES=true ;;
client) CLIENT_CHANGES=true ;;
fuzz_engine) FUZZ_ENGINE_CHANGES=true ;;
safety|safety_system) SAFETY_CHANGES=true ;;
transport) TRANSPORT_CHANGES=true ;;
esac
done
else
# Default to running all components on manual trigger
AUTH_CHANGES=true
CLI_CHANGES=true
CLIENT_CHANGES=true
FUZZ_ENGINE_CHANGES=true
SAFETY_CHANGES=true
TRANSPORT_CHANGES=true
fi

echo "auth=$AUTH_CHANGES" >> $GITHUB_OUTPUT
echo "cli=$CLI_CHANGES" >> $GITHUB_OUTPUT
echo "client=$CLIENT_CHANGES" >> $GITHUB_OUTPUT
echo "fuzz_engine=$FUZZ_ENGINE_CHANGES" >> $GITHUB_OUTPUT
echo "safety=$SAFETY_CHANGES" >> $GITHUB_OUTPUT
echo "transport=$TRANSPORT_CHANGES" >> $GITHUB_OUTPUT

- name: Run auth tests
if: steps.changes.outputs.auth == 'true'
run: pytest -vv tests/unit/auth --cov=mcp_fuzzer.auth --cov-report=xml:coverage.auth.xml

- name: Run CLI tests
if: steps.changes.outputs.cli == 'true'
run: pytest -vv tests/unit/cli --cov=mcp_fuzzer.cli --cov-report=xml:coverage.cli.xml

- name: Run client tests
if: steps.changes.outputs.client == 'true'
run: pytest -vv tests/unit/client --cov=mcp_fuzzer.client --cov-report=xml:coverage.client.xml

- name: Run fuzz engine tests
if: steps.changes.outputs.fuzz_engine == 'true'
run: pytest -vv tests/unit/fuzz_engine --cov=mcp_fuzzer.fuzz_engine --cov-report=xml:coverage.fuzz_engine.xml

- name: Run safety system tests
if: steps.changes.outputs.safety == 'true'
run: pytest -vv tests/unit/safety_system --cov=mcp_fuzzer.safety_system --cov-report=xml:coverage.safety_system.xml

- name: Run transport tests
if: steps.changes.outputs.transport == 'true'
run: pytest -vv tests/unit/transport --cov=mcp_fuzzer.transport --cov-report=xml:coverage.transport.xml

- name: Run integration tests
if: ${{ steps.changes.outputs.auth == 'true' || steps.changes.outputs.cli == 'true' || steps.changes.outputs.client == 'true' || steps.changes.outputs.fuzz_engine == 'true' || steps.changes.outputs.safety == 'true' || steps.changes.outputs.transport == 'true' }}
run: |
pytest -vv tests/integration --cov=mcp_fuzzer --cov-report=xml:coverage.integration.xml

- name: Check for coverage files
id: coverage_check
run: |
if ls coverage.*.xml 1> /dev/null 2>&1; then
echo "has_coverage=true" >> $GITHUB_OUTPUT
else
echo "has_coverage=false" >> $GITHUB_OUTPUT
fi

- name: Check Codecov token
id: codecov_token
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
run: |
if [ -n "$CODECOV_TOKEN" ]; then
echo "has_token=true" >> $GITHUB_OUTPUT
else
echo "has_token=false" >> $GITHUB_OUTPUT
fi

- name: Upload coverage to Codecov
if: ${{ (steps.changes.outputs.auth == 'true' || steps.changes.outputs.cli == 'true' || steps.changes.outputs.client == 'true' || steps.changes.outputs.fuzz_engine == 'true' || steps.changes.outputs.safety == 'true' || steps.changes.outputs.transport == 'true') && steps.coverage_check.outputs.has_coverage == 'true' && steps.codecov_token.outputs.has_token == 'true' }}
uses: codecov/codecov-action@v4
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: coverage.*.xml
fail_ci_if_error: true
38 changes: 35 additions & 3 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,28 +2,60 @@ name: Tests

on:
push:
branches: [main, master]
branches: [ main ]
pull_request:
branches: [main, master]
branches: [ main ]
workflow_dispatch:
inputs:
reason:
description: "Why are you running the test workflow?"
required: false
default: "manual run"
pytest_args:
description: "Optional pytest args (e.g., -m 'unit and fuzz_engine')"
required: false
default: ""

permissions:
contents: read
id-token: write

concurrency:
group: tests-${{ github.workflow }}-${{ github.head_ref || github.ref }}
cancel-in-progress: true

jobs:
tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .
pip install pytest pytest-cov pytest-asyncio
- name: Run tests with coverage
run: |
pytest -vv
pytest -vv --cov=mcp_fuzzer --cov-report=xml ${{ github.event.inputs.pytest_args }}
- name: Check Codecov token
id: codecov_token
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
run: |
if [ -n "$CODECOV_TOKEN" ]; then
echo "has_token=true" >> $GITHUB_OUTPUT
else
echo "has_token=false" >> $GITHUB_OUTPUT
fi
- name: Upload coverage to Codecov
if: ${{ steps.codecov_token.outputs.has_token == 'true' && hashFiles('coverage.xml') != '' && (github.event_name != 'pull_request' || github.event.pull_request.head.repo.fork == false) }}
uses: codecov/codecov-action@v4
with:
token: ${{ secrets.CODECOV_TOKEN }}
Expand Down
36 changes: 34 additions & 2 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,7 @@ The fuzzing engine orchestrates the testing process and manages test execution.

- `tool_fuzzer.py`: Tests individual tools with various argument combinations
- `protocol_fuzzer.py`: Tests MCP protocol types with various message structures
- `invariants.py`: Implements property-based invariants and checks for fuzz testing
- `executor.py`: Provides asynchronous execution framework with concurrency control and retry mechanisms

**Fuzzing Process:**
Expand All @@ -252,7 +253,8 @@ The fuzzing engine orchestrates the testing process and manages test execution.
2. **Strategy Selection**: Choose appropriate fuzzing strategy (realistic vs aggressive)
3. **Data Generation**: Generate test data using Hypothesis and custom strategies
4. **Execution**: Execute tests with controlled concurrency via AsyncFuzzExecutor
5. **Analysis**: Analyze results and generate reports
5. **Invariant Verification**: Verify responses against property-based invariants
6. **Analysis**: Analyze results and generate reports

### 4. Strategy System

Expand All @@ -263,13 +265,43 @@ The strategy system generates test data using different approaches.
- `realistic/`: Generates valid, realistic data for functionality testing
- `aggressive/`: Generates malicious/malformed data for security testing
- `strategy_manager.py`: Orchestrates strategy selection and execution
- `schema_parser.py`: Parses JSON Schema definitions to generate appropriate test data

**Strategy Types:**

- **Realistic Strategies**: Generate valid Base64, UUIDs, timestamps, semantic versions
- **Aggressive Strategies**: Generate SQL injection, XSS, path traversal, buffer overflow attempts

### 5. Safety System
**Schema Parser:**

The schema parser provides comprehensive support for parsing JSON Schema definitions and generating appropriate test data based on schema specifications. It handles:

- Basic types: string, number, integer, boolean, array, object, null
- String constraints: minLength, maxLength, pattern, format
- Number/Integer constraints: minimum, maximum, exclusiveMinimum, exclusiveMaximum, multipleOf
- Array constraints: minItems, maxItems, uniqueItems
- Object constraints: required properties, minProperties, maxProperties
- Schema combinations: oneOf, anyOf, allOf
- Enums and constants

The module supports both "realistic" and "aggressive" fuzzing strategies, where realistic mode generates valid data conforming to the schema, while aggressive mode intentionally generates edge cases and invalid data to test error handling.

### 5. Invariants System

The invariants system provides property-based testing capabilities to verify response validity, error type correctness, and prevention of unintended crashes or unexpected states during fuzzing.

**Key Components:**

- `check_response_validity`: Ensures responses follow JSON-RPC 2.0 specification
- `check_error_type_correctness`: Verifies error responses have correct structure and codes
- `check_response_schema_conformity`: Validates responses against JSON schema definitions
- `verify_response_invariants`: Orchestrates multiple invariant checks on a single response
- `verify_batch_responses`: Applies invariant checks to batches of responses
- `check_state_consistency`: Ensures server state remains consistent during fuzzing

These invariants serve as runtime assertions that validate the behavior of the server being tested, helping to identify potential issues that might not be caught by simple error checking.

### 6. Safety System

The safety system provides multiple layers of protection against dangerous operations.

Expand Down
70 changes: 70 additions & 0 deletions docs/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,13 +113,15 @@ mcp_fuzzer/
protocol_fuzzer.py # Orchestrates protocol-type fuzzing
tool_fuzzer.py # Orchestrates tool fuzzing
strategy/
schema_parser.py # JSON Schema parser for test data generation
strategy_manager.py # Selects strategies per phase/type
realistic/
tool_strategy.py
protocol_type_strategy.py
aggressive/
tool_strategy.py
protocol_type_strategy.py
invariants.py # Property-based invariants and checks
runtime/
manager.py # Async ProcessManager (start/stop, signals)
watchdog.py # ProcessWatchdog (hang detection)
Expand All @@ -143,6 +145,74 @@ mcp_fuzzer/
client.py # UnifiedMCPFuzzerClient orchestrator
```

## Schema Parser

The schema parser module (`mcp_fuzzer.fuzz_engine.strategy.schema_parser`) provides comprehensive support for parsing JSON Schema definitions and generating appropriate test data based on schema specifications.

### Features

- **Basic Types**: Handles string, number, integer, boolean, array, object, and null types
- **String Constraints**: Supports minLength, maxLength, pattern, and format validations
- **Number/Integer Constraints**: Handles minimum, maximum, exclusiveMinimum, exclusiveMaximum, multipleOf
- **Array Constraints**: Supports minItems, maxItems, uniqueItems
- **Object Constraints**: Handles required properties, minProperties, additionalProperties (false blocks extra properties)
- **Schema Combinations**: Processes oneOf, anyOf, allOf schema combinations with proper constraint merging
- **Enums and Constants**: Supports enum values and const keyword (both in realistic and aggressive modes)
- **Fuzzing Phases**: Supports both "realistic" (valid) and "aggressive" (edge cases) modes

### Example Usage

```python
from mcp_fuzzer.fuzz_engine.strategy.schema_parser import make_fuzz_strategy_from_jsonschema

# Define a JSON schema
schema = {
"type": "object",
"properties": {
"name": {"type": "string", "minLength": 3, "maxLength": 50},
"age": {"type": "integer", "minimum": 18, "maximum": 120},
"email": {"type": "string", "format": "email"}
},
"required": ["name", "age"]
}

# Generate realistic data
realistic_data = make_fuzz_strategy_from_jsonschema(schema, phase="realistic")

# Generate aggressive data for security testing
aggressive_data = make_fuzz_strategy_from_jsonschema(schema, phase="aggressive")
```

## Invariants System

The invariants module (`mcp_fuzzer.fuzz_engine.invariants`) provides property-based testing capabilities to verify response validity, error type correctness, and prevention of unintended crashes or unexpected states during fuzzing.

### Features

- **Response Validity**: Ensures responses follow JSON-RPC 2.0 specification
- **Error Type Correctness**: Verifies error responses have correct structure and codes
- **Schema Conformity**: Validates responses against JSON schema definitions
- **Batch Verification**: Applies invariant checks to batches of responses
- **State Consistency**: Ensures server state remains consistent during fuzzing

### Example Usage

```python
from mcp_fuzzer.fuzz_engine.invariants import verify_response_invariants, InvariantViolation

# Verify a response against invariants
try:
verify_response_invariants(
response={"jsonrpc": "2.0", "id": 1, "result": "success"},
expected_error_codes=[400, 404, 500],
schema={"type": "object", "properties": {"result": {"type": "string"}}}
)
# Response is valid
except InvariantViolation as e:
# Invariant violation detected
print(f"Violation: {e}")
```

- Strategy: Generates inputs for tools and protocol types in two phases:
- realistic (valid/spec-conformant), aggressive (malformed/attack vectors).
- Fuzzer: Runs strategies, sends envelopes via a transport, and records results.
Expand Down
14 changes: 11 additions & 3 deletions mcp_fuzzer/fuzz_engine/executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,9 +169,8 @@ async def execute_batch(
"""

async def _bounded_execute_and_track(op, args, kwargs):
# Acquire semaphore before execution and release after
async with self._semaphore:
return await self._execute_and_track(op, args, kwargs)
# Concurrency is enforced inside execute(); avoid double-acquire deadlock
return await self._execute_and_track(op, args, kwargs)

# Create bounded tasks that respect the semaphore limit
tasks = []
Expand Down Expand Up @@ -242,3 +241,12 @@ async def shutdown(self, timeout: float = 5.0) -> None:
"Shutdown timed out with %d tasks still running",
len(self._running_tasks),
)
# Proactively cancel outstanding tasks and wait for them to finish
for task in list(self._running_tasks):
task.cancel()
await asyncio.gather(*self._running_tasks, return_exceptions=True)
finally:
Comment on lines +244 to +248
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Cancel-then-await can hang indefinitely; bound the second wait.

If tasks ignore cancellation or are stuck in uninterruptible awaits, the unconditional gather() can block shutdown forever.

Apply this diff to add a bounded “grace” wait and log if tasks remain:

             for task in list(self._running_tasks):
                 task.cancel()
-            await asyncio.gather(*self._running_tasks, return_exceptions=True)
+            try:
+                await asyncio.wait_for(
+                    asyncio.gather(*self._running_tasks, return_exceptions=True),
+                    timeout=min(2.0, timeout),
+                )
+            except asyncio.TimeoutError:
+                self._logger.error(
+                    "Forced shutdown still timed out; %d tasks may still be running",
+                    sum(1 for t in self._running_tasks if not t.done()),
+                )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Proactively cancel outstanding tasks and wait for them to finish
for task in list(self._running_tasks):
task.cancel()
await asyncio.gather(*self._running_tasks, return_exceptions=True)
finally:
# Proactively cancel outstanding tasks and wait for them to finish
for task in list(self._running_tasks):
task.cancel()
try:
await asyncio.wait_for(
asyncio.gather(*self._running_tasks, return_exceptions=True),
timeout=min(2.0, timeout),
)
except asyncio.TimeoutError:
self._logger.error(
"Forced shutdown still timed out; %d tasks may still be running",
sum(1 for t in self._running_tasks if not t.done()),
)
finally:

# Ensure the set is cleaned up
self._running_tasks = {
t for t in self._running_tasks if not t.cancelled() and not t.done()
}
Comment on lines +249 to +252
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Rebinding _running_tasks breaks done callbacks; mutate in place instead.

add_done_callback(self._running_tasks.discard) captures the old set object. Reassigning self._running_tasks creates a new set, so callbacks remove from the stale set, leaking entries in the current set.

Apply this diff to preserve the set identity and safely remove completed/cancelled tasks:

-            # Ensure the set is cleaned up
-            self._running_tasks = {
-                t for t in self._running_tasks if not t.cancelled() and not t.done()
-            }
+            # Ensure the set is cleaned up without rebinding (callbacks rely on identity)
+            to_remove = {t for t in tuple(self._running_tasks) if t.done() or t.cancelled()}
+            self._running_tasks.difference_update(to_remove)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Ensure the set is cleaned up
self._running_tasks = {
t for t in self._running_tasks if not t.cancelled() and not t.done()
}
# Ensure the set is cleaned up without rebinding (callbacks rely on identity)
to_remove = {t for t in tuple(self._running_tasks) if t.done() or t.cancelled()}
self._running_tasks.difference_update(to_remove)
🤖 Prompt for AI Agents
mcp_fuzzer/fuzz_engine/executor.py around lines 250-253: reassigning
self._running_tasks to a new set breaks existing
add_done_callback(self._running_tasks.discard) callbacks because they hold a
reference to the old set; instead mutate the existing set in place to preserve
identity. Replace the reassignment with an in-place removal of
completed/cancelled tasks (e.g., use set.difference_update(...) or iterate and
call self._running_tasks.discard(...) for each finished task) so callbacks
remove items from the live set and no entries are leaked.

Loading