# Getting Started with SigmaEval

This notebook demonstrates a minimal, complete example of how to use `SigmaEval` to evaluate a simple chat application.


First, you'll need to provide an API key for the LLM that will be used to judge the application's responses. `SigmaEval` uses [LiteLLM](https://litellm.ai/) to support over 100+ LLM providers, so you can use any model from providers like OpenAI, Anthropic, Google, etc.

To set the API key securely in Colab, click the "🔑" icon in the left-hand sidebar and create a new secret with the name `GEMINI_API_KEY` and your key as the value. Then, run the cell below to load the key into the environment.


In [None]:
import os
from google.colab import userdata

os.environ["GEMINI_API_KEY"] = userdata.get("GEMINI_API_KEY")


In [None]:
# Install sigmaeval
%pip install sigmaeval-framework


🔄 **Action Required**: After installing the package, you may need to restart the runtime for the changes to take effect. You can do this by clicking `Runtime > Restart runtime` in the Colab menu.


In [None]:
from sigmaeval import SigmaEval, ScenarioTest, assertions
import asyncio
import os

# 1. Define the ScenarioTest to describe the desired behavior
scenario = (
    ScenarioTest("Simple Test")
    .given("A user interacting with a chatbot")
    .when("The user greets the bot")
    .expect_behavior(
        "The bot provides a simple and friendly greeting.",
        # We want to be confident that at least 75% of responses will score an 7/10 or higher.
        criteria=assertions.scores.proportion_gte(min_score=7, proportion=0.75)
    )
    .max_turns(1) # Only needed here since we're returning a static greeting
)
# 2. Implement the app_handler to allow SigmaEval to communicate with your app
async def app_handler(messages, state):
    # In a real test, you would pass messages to your app and return the response.
    # For this example, we'll return a static, friendly greeting.
    return "Hello there! Nice to meet you!"

# 3. Initialize SigmaEval and run the evaluation
async def main():
    # You can use any model that LiteLLM supports: https://docs.litellm.ai/docs/providers
    sigma_eval = SigmaEval(
        judge_model="gemini/gemini-2.5-flash",
        sample_size=20,  # The number of times to run the test
        significance_level=0.05  # Corresponds to a 95% confidence level
    )
    result = await sigma_eval.evaluate(scenario, app_handler)
    
    print(result)
    
    # Assert that the test passed for integration with testing frameworks
    assert result.passed

# In a notebook, you can run the async main function directly
await main()
