# Getting Started with Neon SDK

This notebook demonstrates the basic features of the Neon Python SDK for agent evaluation.

## Installation

```bash
pip install neon-sdk
```

In [None]:
# Install the SDK if not already installed
# !pip install neon-sdk

## 1. Basic Tracing

The SDK provides context managers for tracing agent operations.

In [None]:
from neon_sdk.tracing import trace, span, generation, tool

# Simple trace with nested spans
with trace("my-agent", metadata={"version": "1.0"}):
    print("Starting agent...")
    
    with span("preprocessing"):
        data = "Hello, World!"
        print(f"Preprocessed: {data}")
    
    with generation("llm-call", model="gpt-4"):
        # Simulated LLM call
        response = f"Processed: {data}"
        print(f"LLM Response: {response}")
    
    with tool("calculator", tool_name="math"):
        result = 2 + 2
        print(f"Tool Result: {result}")

print("Agent completed!")

## 2. Using the Decorator

For simpler cases, use the `@traced` decorator.

In [None]:
from neon_sdk.tracing import traced
import asyncio

@traced("sync-function")
def process_sync(x: int) -> int:
    """A traced synchronous function."""
    return x * 2

@traced("async-function")
async def process_async(x: int) -> int:
    """A traced asynchronous function."""
    await asyncio.sleep(0.1)
    return x * 3

# Test sync function
result1 = process_sync(5)
print(f"Sync result: {result1}")

# Test async function
result2 = await process_async(5)
print(f"Async result: {result2}")

## 3. Rule-Based Scorers

Scorers evaluate agent performance. Let's start with simple rule-based scorers.

In [None]:
from neon_sdk.scorers import contains, exact_match, ContainsConfig, ExactMatchConfig
from neon_sdk.scorers.base import EvalContext

# Contains scorer - check if output contains specific strings
contains_scorer = contains(["hello", "world"])

result = contains_scorer.evaluate(EvalContext(
    output="Hello, World! This is a test.",
    input={"query": "greeting"},
))

print(f"Contains Score: {result.value}")
print(f"Reason: {result.reason}")

In [None]:
# Exact match scorer with configuration
exact_scorer = exact_match(ExactMatchConfig(
    expected="Hello World",
    case_sensitive=False,
    normalize_whitespace=True,
))

result = exact_scorer.evaluate(EvalContext(
    output="  hello   world  ",
))

print(f"Exact Match Score: {result.value}")
print(f"Reason: {result.reason}")

## 4. Custom Scorers

Define your own scorers for domain-specific evaluation.

In [None]:
from neon_sdk.scorers import scorer, ScoreResult

@scorer("word_count")
def word_count_scorer(context: EvalContext) -> ScoreResult:
    """Score based on word count (normalized to 0-1)."""
    words = context.output.split() if context.output else []
    count = len(words)
    # Normalize: 100+ words = 1.0
    score = min(count / 100, 1.0)
    
    return ScoreResult(
        value=score,
        reason=f"Word count: {count} ({score:.2%} of target)",
    )

# Test the custom scorer
result = word_count_scorer.evaluate(EvalContext(
    output="This is a sample response with some words. " * 5,
))

print(f"Word Count Score: {result.value:.2f}")
print(f"Reason: {result.reason}")

## 5. Putting It Together

Let's create a simple agent and evaluate its performance.

In [None]:
from neon_sdk.tracing import trace, generation
from neon_sdk.scorers import contains
from neon_sdk.scorers.base import EvalContext

# Define test cases
test_cases = [
    {
        "input": {"query": "What is Python?"},
        "expected": ["programming", "language"],
    },
    {
        "input": {"query": "What is machine learning?"},
        "expected": ["algorithms", "data"],
    },
]

# Simulated agent function
def simple_agent(query: str) -> str:
    """A simple agent that returns predefined responses."""
    responses = {
        "What is Python?": "Python is a high-level programming language known for its simplicity.",
        "What is machine learning?": "Machine learning uses algorithms to learn patterns from data.",
    }
    return responses.get(query, "I don't know.")

# Run evaluation
results = []

for case in test_cases:
    query = case["input"]["query"]
    expected = case["expected"]
    
    # Trace the agent
    with trace("simple-agent", input=case["input"]):
        with generation("response", model="simple-v1"):
            response = simple_agent(query)
    
    # Score the response
    scorer = contains(expected)
    score = scorer.evaluate(EvalContext(
        input=case["input"],
        output=response,
    ))
    
    results.append({
        "query": query,
        "response": response,
        "score": score.value,
        "reason": score.reason,
    })

# Print results
print("Evaluation Results:")
print("-" * 60)
for r in results:
    print(f"Query: {r['query']}")
    print(f"Response: {r['response']}")
    print(f"Score: {r['score']:.2f}")
    print(f"Reason: {r['reason']}")
    print("-" * 60)

avg_score = sum(r["score"] for r in results) / len(results)
print(f"\nAverage Score: {avg_score:.2f}")

## Next Steps

- [02_advanced_scorers.ipynb](02_advanced_scorers.ipynb) - LLM judges and causal analysis
- [03_clickhouse_analytics.ipynb](03_clickhouse_analytics.ipynb) - Trace storage and analytics
- [04_temporal_workflows.ipynb](04_temporal_workflows.ipynb) - Durable workflow execution

## Resources

- [Documentation](https://neon-sdk.readthedocs.io)
- [GitHub Repository](https://github.com/neon-dev/neon)
- [API Reference](https://neon-sdk.readthedocs.io/en/latest/api/)