# ü§ñ LLM Selection Testing

**Goal:** Choose the best open-source LLM for KaelumAI project

## üìä Decision Matrix (from README)

| Model | Size | Speed | Quality | Use Case |
|-------|------|-------|---------|----------|
| **Qwen 2.5 7B** | 4.7GB | ‚ö°‚ö° Medium | ‚≠ê‚≠ê‚≠ê‚≠ê Best | Production, complex reasoning |
| **Llama 3.2 3B** | 2.0GB | ‚ö°‚ö°‚ö° Fast | ‚≠ê‚≠ê‚≠ê Good | Dev/testing, simple tasks |
| **Mistral 7B** | 4.1GB | ‚ö°‚ö° Medium | ‚≠ê‚≠ê‚≠ê‚≠ê Best | Code generation, structured output |
| **Phi-3 Mini** | 2.3GB | ‚ö°‚ö°‚ö° Fast | ‚≠ê‚≠ê Okay | Edge deployment, low resource |

## üéØ What We're Testing:
1. **Speed** - Latency for simple queries
2. **Reasoning Quality** - Logical consistency
3. **Math Accuracy** - Calculation correctness
4. **Reflection Rate** - How often it triggers self-correction

In [None]:
from kaelum import enhance
import time

# Models to test (make sure these are pulled in Ollama)
MODELS = [
    "llama3.2:3b",
    "qwen2.5:7b",
    # "mistral:7b",  # Uncomment if you have this
]

TEST_CONFIG = {"temperature": 0.3, "max_tokens": 512, "max_iterations": 1}
print(f"‚úÖ Testing models: {MODELS}")

## Test 1: Speed Benchmark

In [None]:
query = "What is 15% of 200?"
results = {}

for model in MODELS:
    print(f"\n{'='*60}\nTesting: {model}\n{'='*60}")
    start = time.time()
    result = enhance(query, model=model, **TEST_CONFIG)
    elapsed = time.time() - start
    results[model] = {"time": elapsed, "result": result}
    print(result)
    print(f"\n‚è±Ô∏è  {elapsed:.2f}s")

print(f"\n{'='*60}\nSPEED SUMMARY\n{'='*60}")
for model, data in sorted(results.items(), key=lambda x: x[1]['time']):
    print(f"{model:20s} ‚Üí {data['time']:.2f}s")

**üìù Speed Winner:** (write here)

## Test 2: Reasoning Quality

In [None]:
query = "If all birds can fly, and penguins are birds, can penguins fly?"

for model in MODELS:
    print(f"\n{'='*60}\n{model}\n{'='*60}")
    print(enhance(query, mode="logic", model=model, **TEST_CONFIG))

**üìù Best reasoning:** (write here)

## Test 3: Math Accuracy

In [None]:
queries = ["Solve: 3x + 5 = 20", "What is sqrt(144)?", "Calculate: (15 √ó 8) √∑ 4"]

for query in queries:
    print(f"\n{'='*60}\n{query}\n{'='*60}")
    for model in MODELS:
        print(f"\n{model}:")
        print(enhance(query, mode="math", model=model, **TEST_CONFIG))

**üìù Math accuracy winner:** (write here)

## üéØ Final Decision

### Recommended:
- **For Development:**
- **For Production:**
- **Reasoning:**