Rue

Rue is a Python testing framework for AI projects. It follows pytest syntax and culture while introducing components essential for testing AI software: metrics, typed datasets, semantic predicates (LLM-as-a-Judge), and OTEL traces.

Installation

uv add rue

Rue 101

Follow pytest habits...

Create test_*.py files
Mark plain Rue tests with @rue.test
Use 'rue.resource' instead of 'pytest.fixture'
Put shared suite setup, shared resources, and directory-specific overrides in conftest.py or confrue_*.py files. Rue loads conftest.py first in each directory. See confrue files
Add 'assert' expressions within the functions
Run 'uv run rue tests run'

Rue only collects tests marked through Rue decorators, so pytest and Rue tests can live side by side in the same test_*.py module.

...while leveraging Rue APIs.

Use 'with metrics()' context to turn failed assertions into quality metrics
Use 'has_facts()' and other semantic predicates for asserting natural language
Access OTEL span data and assert it with 'follows_policy()' predicate
Parse datasets into clearly typed and validated data objects

Example

import rue
from rue import Case, Metric, metrics
from rue.predicates import has_unsupported_facts, follows_policy

from pydantic import BaseModel

@rue.resource.sut
def store_chatbot():
    return rue.SUT(call_llm)

@rue.resource.metric
def accuracy():
    metric = Metric()
    yield metric

    assert metric.mean > 0.8
    yield metric.mean

class Refs(BaseModel):
    kb: str
    expected_tool: str | None = None

cases = [
    Case(inputs={"prompt": "When are you open?"}, references=Refs(kb="Store hours: 9 AM - 6 PM, Monday-Saturday. Closed Sundays.")),
    Case(inputs={"prompt": "Return policy?"}, references=Refs(kb="30-day returns with receipt.")),
    Case(inputs={"prompt": "How much for the Nike Air Max?"}, references=Refs(kb="Nike Air Max: $129.99", expected_tool="offer_product")),
]

@rue.test.iterate.cases(*cases)
@rue.test.iterate(3)
async def test_chatbot_no_hallucinations(
    case: Case[dict[str, str], Refs],
    store_chatbot,
    accuracy: Metric,
):
    """AI agent relies on knowledge base and tool calls for transactional questions"""
    response = store_chatbot.instance(**case.inputs)

    # Verify the answer don't have any unsupported facts
    with metrics(accuracy):
        assert not await has_unsupported_facts(response, case.references.kb)

    # Verify tool was called when expected
    if expected_tool := case.references.expected_tool:
        root_spans = store_chatbot.root_spans
        tool_names = [
            s.attributes.get("llm.request.functions.0.name")
            for s in store_chatbot.llm_spans
            if s.attributes
        ]
        assert root_spans
        assert expected_tool in tool_names

Run it:

uv run rue tests run

Use a custom run UUID when you need stable correlation IDs:

uv run rue tests run --run-id 3f5f5e9a-1c2d-4b5f-9c2b-7f6d8a9b0c1d

Output:

Rue Test Runner
=================

Collected 1 test

test_example.py::test_chatbot_responds ✓

==================== 1 passed in 0.08s ====================

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
docs		docs
examples		examples
src/rue		src/rue
tests		tests
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
THIRD_PARTY_LICENSES		THIRD_PARTY_LICENSES
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rue

Installation

Rue 101

Example

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Rue

Installation

Rue 101

Example

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages