# Wolfram MCP Server - Stress Tests

This notebook contains stress tests to verify the robustness of the proof-carrying Wolfram MCP server.

## Test Categories

1. **Timeout Handling** - Long computations should abort gracefully
2. **Memory Pressure** - Large expressions and data structures
3. **Rapid Fire Requests** - Many sequential calls
4. **Error Recovery** - Malformed input, kernel errors
5. **Session State Integrity** - State persists correctly across stress
6. **Numeric Validation Edge Cases** - Complex numbers, branch cuts, singularities
7. **Obligation Engine Load** - Many obligations registered and checked
8. **Concurrent Sessions** - Multiple independent sessions

---

## Test 1: Timeout Handling

Verify that long-running computations abort gracefully without crashing the server.

```python
# Test 1a: Computation that exceeds timeout
wolfram_eval(
    code="Do[PrimeQ[2^i - 1], {i, 1, 100000}]",  # Very slow
    timeout=2  # 2 second timeout
)
# Expected: Aborted gracefully, returns timeout message

# Test 1b: Infinite loop detection
wolfram_eval(
    code="While[True, x++]",
    timeout=3
)
# Expected: Aborted after 3 seconds

# Test 1c: Very long symbolic computation
wolfram_eval(
    code="Integrate[Exp[-x^2] * BesselJ[100, x], {x, 0, Infinity}]",
    timeout=5
)
# Expected: Either completes or aborts cleanly

# Test 1d: Server still responsive after timeout
wolfram_eval("1 + 1")
# Expected: Returns 2 immediately - server recovered
```

## Test 2: Memory Pressure

Test handling of large data structures.

```python
# Test 2a: Large list creation
wolfram_eval("largeList = Range[10^6]; Length[largeList]")
# Expected: Returns 1000000

# Test 2b: Large matrix operations
wolfram_eval("""
    m = RandomReal[{0, 1}, {500, 500}];
    {Det[m], Tr[m]}
""", timeout=60)
# Expected: Returns determinant and trace

# Test 2c: Deeply nested expression
wolfram_eval("""
    expr = Nest[Function[x, {x, x}], a, 20];
    LeafCount[expr]
""")
# Expected: Returns 2^20 = 1048576

# Test 2d: Large polynomial expansion
wolfram_eval("Expand[(x + y + z)^30]", timeout=30)
# Expected: Large polynomial (4960 terms)

# Test 2e: Clear large objects to free memory
wolfram_eval("Clear[largeList, m, expr]; ByteCount[$HistoryLength]")
# Expected: Memory freed
```

## Test 3: Rapid Fire Requests

Many sequential calls to test session stability.

```python
# Test 3a: 100 rapid evaluations
for i in range(100):
    result = wolfram_eval(f"Prime[{i+1}]")
    # Verify each prime is correct

# Test 3b: Accumulating state
wolfram_eval("counter = 0")
for i in range(50):
    wolfram_eval("counter++")
result = wolfram_eval("counter")
# Expected: Returns 50

# Test 3c: Rapid definition and use
for i in range(20):
    wolfram_define(f"func{i}", f"[x_] := x^{i}")
    wolfram_eval(f"func{i}[2]")
    # Expected: Returns 2^i each time

# Test 3d: Session state after rapid fire
wolfram_session_info()
# Expected: Shows all defined functions, stable state
```

## Test 4: Error Recovery

Test that the server recovers from various error conditions.

```python
# Test 4a: Syntax error
wolfram_eval("1 + + 2")  # Invalid syntax
# Expected: Error message, no crash

# Test 4b: Undefined symbol
wolfram_eval("undefinedFunction[x]")
# Expected: Returns unevaluated or error

# Test 4c: Division by zero
wolfram_eval("1/0")
# Expected: Returns ComplexInfinity or error

# Test 4d: Invalid domain
wolfram_eval("Sqrt[-1]")
# Expected: Returns I (imaginary unit)

# Test 4e: Recursion limit
wolfram_eval("""
    badRecursion[n_] := badRecursion[n + 1];
    badRecursion[0]
""", timeout=5)
# Expected: Aborts with recursion limit error

# Test 4f: Server still works after errors
wolfram_eval("2 + 2")
# Expected: Returns 4 - server recovered

# Test 4g: Malformed JSON in tools
wolfram_typed_equality(
    lhs="incomplete expression [",
    rhs="1",
    equality_type="exact"
)
# Expected: Graceful error, no crash
```

## Test 5: Session State Integrity

Verify state persists correctly under stress.

```python
# Test 5a: Define multiple interdependent functions
wolfram_eval("""
    f[x_] := x^2;
    g[x_] := f[x] + 1;
    h[x_] := g[f[x]];
""")

# Test 5b: Verify all functions work
wolfram_eval("{f[3], g[3], h[3]}")
# Expected: {9, 10, 82}

# Test 5c: Define global variables
wolfram_eval("""
    $myConstant = Pi/4;
    $myList = {1, 2, 3, 4, 5};
""")

# Test 5d: Use globals in computation
wolfram_eval("Total[$myList] * Sin[$myConstant]")
# Expected: 15 * Sin[Pi/4] = 15/Sqrt[2]

# Test 5e: Stress test then verify state
for i in range(30):
    wolfram_eval(f"temp{i} = {i}^2")

# Verify original functions still work
wolfram_eval("{f[5], g[5], h[5]}")
# Expected: {25, 26, 677} - state preserved

# Test 5f: Check session info is accurate
wolfram_session_info()
# Expected: Shows all defined symbols
```

## Test 6: Numeric Validation Edge Cases

Test numeric validation with tricky mathematical cases.

```python
# Test 6a: Complex numbers
wolfram_numeric_validate(
    expr1="Exp[I*x]",
    expr2="Cos[x] + I*Sin[x]",
    variables="x",
    num_points=20
)
# Expected: All pass (Euler's formula)

# Test 6b: Near singularities
wolfram_numeric_validate(
    expr1="(x^2 - 1)/(x - 1)",
    expr2="x + 1",
    variables="x",
    domain_constraints="x > 1.001 || x < 0.999",  # Avoid x=1
    num_points=20
)
# Expected: All pass

# Test 6c: Branch cuts
wolfram_numeric_validate(
    expr1="Log[x*y]",
    expr2="Log[x] + Log[y]",
    variables="x, y",
    domain_constraints="x > 0 && y > 0",  # Principal branch
    num_points=20
)
# Expected: All pass on positive reals

# Test 6d: Numerical instability region
wolfram_numeric_validate(
    expr1="(1 - Cos[x])/x^2",
    expr2="1/2 - x^2/24",  # Taylor approximation
    variables="x",
    domain_constraints="-0.1 < x < 0.1 && x != 0",
    num_points=10,
    tolerance=0.01  # Looser tolerance for approximation
)
# Expected: Should pass with loose tolerance

# Test 6e: Expressions that are NOT equal
wolfram_numeric_validate(
    expr1="Sin[x]",
    expr2="x",  # Only equal near 0
    variables="x",
    domain_constraints="x > 1",
    num_points=10
)
# Expected: FAILS - detects inequality
```

## Test 7: Obligation Engine Load

Test the obligation engine under heavy load.

```python
# Test 7a: Register many obligations
for i in range(50):
    wolfram_register_obligation(
        name=f"test_obligation_{i}",
        description=f"Test that x^{i} derivative is correct",
        test_type="identity",
        test_expression=f"D[x^{i}, x]",
        expected=f"{i}*x^{i-1}" if i > 0 else "0"
    )

# Test 7b: Check all obligations at once
wolfram_check_obligations()
# Expected: All 50 pass

# Test 7c: List all obligations
wolfram_list_obligations()
# Expected: Shows 50 obligations with status

# Test 7d: Mixed pass/fail obligations
wolfram_register_obligation(
    name="intentional_fail",
    description="This should fail",
    test_type="identity",
    test_expression="1 + 1",
    expected="3"  # Wrong!
)

wolfram_check_obligations()
# Expected: Shows 50 passed, 1 failed
```

## Test 8: Test Suite Stress

Stress test the Math CI system.

```python
# Test 8a: Create large test suite
wolfram_create_test_suite(
    name="Derivative Rules",
    description="Comprehensive derivative identity tests"
)

# Add 30 test cases
test_cases = [
    ("power_rule", "D[x^n, x]", "n*x^(n-1)"),
    ("exp_rule", "D[Exp[x], x]", "Exp[x]"),
    ("log_rule", "D[Log[x], x]", "1/x"),
    ("sin_rule", "D[Sin[x], x]", "Cos[x]"),
    ("cos_rule", "D[Cos[x], x]", "-Sin[x]"),
    ("tan_rule", "D[Tan[x], x]", "Sec[x]^2"),
    ("chain_rule", "D[Sin[x^2], x]", "2*x*Cos[x^2]"),
    ("product_rule", "D[x*Sin[x], x]", "Sin[x] + x*Cos[x]"),
    ("quotient", "D[Sin[x]/x, x]", "(x*Cos[x] - Sin[x])/x^2"),
    # ... more cases
]

for name, expr, expected in test_cases:
    wolfram_add_test(
        suite_name="Derivative Rules",
        test_name=name,
        category="identity",
        expression=expr,
        expected=expected
    )

# Test 8b: Run full suite
wolfram_run_test_suite("Derivative Rules")
# Expected: All tests pass

# Test 8c: Run suite multiple times (idempotency)
for _ in range(5):
    result = wolfram_run_test_suite("Derivative Rules")
# Expected: Same results each time
```

## Test 9: Concurrent Sessions

Test multiple independent sessions.

```python
# Test 9a: Create separate sessions
wolfram_eval("sessionVar = 100", session_id="session_A")
wolfram_eval("sessionVar = 200", session_id="session_B")
wolfram_eval("sessionVar = 300", session_id="session_C")

# Test 9b: Verify session isolation
result_A = wolfram_eval("sessionVar", session_id="session_A")  # -> 100
result_B = wolfram_eval("sessionVar", session_id="session_B")  # -> 200
result_C = wolfram_eval("sessionVar", session_id="session_C")  # -> 300
# Expected: Each session has its own value

# Test 9c: Independent function definitions
wolfram_define("f", "[x_] := x^2", session_id="session_A")
wolfram_define("f", "[x_] := x^3", session_id="session_B")

wolfram_eval("f[2]", session_id="session_A")  # -> 4
wolfram_eval("f[2]", session_id="session_B")  # -> 8
# Expected: Different function definitions per session

# Test 9d: Session info per session
wolfram_session_info(session_id="session_A")
wolfram_session_info(session_id="session_B")
# Expected: Each shows only its own symbols

# Test 9e: Clear one session without affecting others
wolfram_clear_session(session_id="session_A")
wolfram_eval("sessionVar", session_id="session_B")  # -> 200 (unaffected)
```

## Test 10: Domain and Semantic Analysis

Stress test domain inference and semantic diff.

```python
# Test 10a: Complex domain inference
wolfram_infer_domain(
    expression="Log[x] + Sqrt[1-x^2] + 1/(x-2)",
    variables="x"
)
# Expected: 0 < x <= 1 and x != 2 (but 2 > 1, so just 0 < x <= 1)

# Test 10b: Multi-variable domain
wolfram_infer_domain(
    expression="Sqrt[x] + Sqrt[y] + 1/(x+y)",
    variables="x, y"
)
# Expected: x >= 0, y >= 0, x + y != 0

# Test 10c: Semantic diff with many terms
wolfram_semantic_diff(
    expr1="Expand[(a+b+c+d+e)^4]",
    expr2="a^4 + 4*a^3*b + 6*a^2*b^2 + 4*a*b^3 + b^4 + ...",  # partial
    canonicalization="Expand"
)
# Expected: Detects missing terms

# Test 10d: Canonicalization stress
for method in ["rational", "polynomial", "trig", "full"]:
    wolfram_canonicalize(
        expression="Sin[x]^2 + 2*Sin[x]*Cos[x] + Cos[x]^2",
        method=method
    )
# Expected: All methods work, produce valid canonical forms

# Test 10e: Expression hash collision test
hashes = set()
for i in range(100):
    result = wolfram_expression_hash(f"x^{i} + {i}")
    hash_val = json.loads(result)["hash"]
    assert hash_val not in hashes  # No collisions
    hashes.add(hash_val)
```

## Test 11: Graphics Under Load

Test plotting functionality.

```python
# Test 11a: Simple plot
wolfram_plot(
    expression="Sin[x]*Exp[-x/10]",
    variable="x",
    range_min=0,
    range_max=20
)
# Expected: Returns base64 image

# Test 11b: Complex plot
wolfram_plot(
    expression="{Sin[x], Cos[x], Sin[2x], Cos[2x]}",
    variable="x",
    range_min=-Pi,
    range_max=Pi,
    options="PlotLegends -> Automatic"
)
# Expected: Multi-curve plot

# Test 11c: 3D plot
wolfram_plot3d(
    expression="Sin[x*y]",
    var1="x", range1_min=-3, range1_max=3,
    var2="y", range2_min=-3, range2_max=3,
    options="ColorFunction -> \"Rainbow\""
)
# Expected: Returns 3D surface as base64

# Test 11d: Multiple rapid plots
for i in range(10):
    wolfram_plot(
        expression=f"Sin[{i}*x]",
        variable="x",
        range_min=0,
        range_max=2*Pi
    )
# Expected: All plots generated successfully
```

## Test 12: Edge Cases in Proof-Carrying Computation

```python
# Test 12a: Empty assumptions
wolfram_eval_proven(
    code="Integrate[x^2, x]",
    assumptions=None
)
# Expected: Works with no assumptions

# Test 12b: Contradictory assumptions
wolfram_eval_proven(
    code="Simplify[Sqrt[x^2]]",
    assumptions="x > 0 && x < 0"  # Impossible
)
# Expected: Handles gracefully

# Test 12c: Very complex assumptions
wolfram_eval_proven(
    code="Integrate[x^(a-1) * Exp[-b*x], {x, 0, Infinity}]",
    assumptions="a > 0 && b > 0 && Element[a, Reals] && Element[b, Reals]"
)
# Expected: Returns Gamma[a]/b^a with conditions

# Test 12d: Condition extraction edge case
wolfram_eval_proven(
    code="Integrate[1/x^p, {x, 1, Infinity}]",
    generate_conditions=True
)
# Expected: Result with condition Re[p] > 1

# Test 12e: Typed equality with impossible types
wolfram_typed_equality(
    lhs="I",
    rhs="1",
    equality_type="exact"
)
# Expected: verified: false (imaginary != real)
```

---

## Running the Tests

To run these tests, use Claude with the Wolfram MCP server:

1. Ensure the MCP server is running
2. Execute each test block through Claude
3. Verify expected vs actual results
4. Monitor for crashes, hangs, or memory leaks

### Success Criteria

| Test | Criteria |
|------|----------|
| Timeout | Aborts cleanly, server recovers |
| Memory | Handles large data, can clear |
| Rapid Fire | 100+ calls without degradation |
| Error Recovery | No crashes, graceful errors |
| State Integrity | Variables persist correctly |
| Numeric Validation | Correct pass/fail detection |
| Obligations | Scales to 50+ obligations |
| Concurrent Sessions | Full isolation maintained |
| Graphics | Generates valid base64 images |
| Edge Cases | Graceful handling of weird input |