# üîç Verification Layer Testing

**Goal:** Test symbolic math verification and factual verification components

## What We're Testing:
1. **SymPy Integration** - Does symbolic verification work?
2. **Verification Accuracy** - Does it catch errors?
3. **False Positives** - Does it incorrectly flag correct answers?
4. **RAG Adapters** - Factual verification with knowledge bases

In [None]:
from kaelum.core.verification import VerificationEngine
from kaelum import enhance

# Initialize verification engine
verifier = VerificationEngine(use_symbolic=True, use_factual_check=False)

print("‚úÖ Verification engine loaded")

## Test 1: Symbolic Math Verification

**Test if SymPy catches calculation errors**

In [None]:
# Test traces with correct and incorrect math
test_cases = [
    {
        "name": "Correct calculation",
        "trace": [
            "To find 25% of 80, convert to decimal: 0.25",
            "Multiply: 0.25 √ó 80 = 20",
            "The answer is 20"
        ],
        "expected": "should_pass"
    },
    {
        "name": "Wrong calculation",
        "trace": [
            "To find 25% of 80, convert to decimal: 0.25",
            "Multiply: 0.25 √ó 80 = 25",  # Wrong!
            "The answer is 25"
        ],
        "expected": "should_fail"
    },
    {
        "name": "Equation solving",
        "trace": [
            "Solve: 2x + 4 = 10",
            "Subtract 4: 2x = 6",
            "Divide by 2: x = 3"
        ],
        "expected": "should_pass"
    },
]

for test in test_cases:
    print(f"\n{'='*60}")
    print(f"Test: {test['name']} ({test['expected']})")
    print(f"{'='*60}")
    
    result = verifier.verify_trace(test['trace'])
    print(f"Verification result: {result}")
    
    # Check if it behaves as expected
    if test['expected'] == 'should_pass':
        status = "‚úÖ PASS" if result.get('valid', True) else "‚ùå FAIL (false negative)"
    else:
        status = "‚úÖ PASS" if not result.get('valid', True) else "‚ùå FAIL (false positive)"
    
    print(f"Status: {status}")

**üìù Symbolic Verification Results:**
- Catches errors correctly?
- False positives/negatives?
- SymPy limitations found:

## Test 2: End-to-End Verification

**Test verification within full pipeline**

In [None]:
# Test queries that should trigger verification
math_queries = [
    "What is 15% of 200?",
    "Solve: 5x - 10 = 25",
    "Calculate: sqrt(144) + sqrt(81)",
]

for query in math_queries:
    print(f"\n{'='*60}")
    print(f"Query: {query}")
    print(f"{'='*60}")
    
    result = enhance(query, mode="math", max_iterations=1)
    print(result)
    print("\nüëÜ Check if verification diagnostics are shown")

**üìù Pipeline Verification:**
- Are verification results displayed?
- Does it affect confidence scores?
- Integration issues:

## Test 3: Reflection Triggered by Verification

**Does failed verification trigger reflection?**

In [None]:
# Use higher iterations to enable reflection
complex_query = "A train travels 120 km in 2 hours. What is its speed in km/h?"

print("Testing with reflection enabled...\n")

result = enhance(
    complex_query,
    mode="math",
    max_iterations=2,  # Enable reflection
    temperature=0.5
)

print(result)
print("\nüëÜ Check 'iterations' count - did it reflect?")

**üìù Reflection Results:**
- Did verification trigger reflection?
- Did reflection fix errors?
- Iteration count:

## Test 4: RAG Adapter Testing (Optional)

**Test factual verification with knowledge base**

*(Skip if you don't have ChromaDB/Qdrant setup yet)*

In [None]:
# Uncomment when RAG is ready
# from kaelum.core.rag_adapter import ChromaAdapter
# import chromadb

# # Setup example
# client = chromadb.Client()
# collection = client.create_collection("test_facts")
# collection.add(
#     documents=["Paris is the capital of France", "Water boils at 100¬∞C"],
#     ids=["fact1", "fact2"]
# )

# adapter = ChromaAdapter(collection)
# result = enhance(
#     "What is the capital of France?",
#     rag_adapter=adapter,
#     use_factual_verification=True
# )
# print(result)

print("‚è≠Ô∏è  Skipped - RAG testing will be done later")

## üéØ Verification Summary

| Component | Status | Issues Found |
|-----------|--------|-------------|
| SymPy Integration | ___ | ___ |
| Error Detection | ___ | ___ |
| False Positives | ___ | ___ |
| Reflection Trigger | ___ | ___ |

**Next Steps:**
1. Fix any issues found
2. Add more edge cases
3. Implement RAG adapters
4. Test factual verification