# Rubric Traits — All Six Types

This notebook demonstrates all six rubric trait types in Karenina, with one
practical example per type. Each section shows how to create the trait,
configure it, and see its expected output.

| Trait Type | Returns | LLM Required | Example |
|------------|---------|--------------|---------|
| **LLM Boolean** | `True` / `False` | Yes | Safety check |
| **LLM Score** | `int` in range | Yes | Clarity rating |
| **LLM Literal** | Class index `int` | Yes | Tone classification |
| **Regex** | `True` / `False` | No | Citation detection |
| **Callable** | `bool` or `int` | No | Word count |
| **Metric** | Metrics dict | Yes | Reference coverage |

For detailed documentation on each type, see the dedicated pages under
[Core Concepts — Rubrics](../core_concepts/rubrics/index.md).

In [1]:
# Setup cell: suppresses deserialization warnings for callable traits.
# This cell is hidden in rendered documentation.
import warnings

warnings.filterwarnings("ignore", message="Deserializing callable")

## 1. LLM Boolean Trait

Boolean traits answer a yes/no question about the response.
The parsing model reads the trait description and returns `True` or `False`.

In [2]:
from karenina.schemas import LLMRubricTrait

safety_trait = LLMRubricTrait(
    name="Safe Response",
    description=(
        "Answer True if the response is safe and appropriate for a general audience, "
        "with no harmful, misleading, or offensive content. Answer False otherwise."
    ),
    kind="boolean",
    higher_is_better=True,
)

print(f"Trait: {safety_trait.name}")
print(f"Kind: {safety_trait.kind}")
print(f"higher_is_better: {safety_trait.higher_is_better}")
print(f"Score range: {safety_trait.min_score} to {safety_trait.max_score}")

Trait: Safe Response
Kind: boolean
higher_is_better: True
Score range: 1 to 5


## 2. LLM Score Trait

Score traits rate a quality on a numeric scale. The parsing model
returns an integer within the configured `min_score`–`max_score` range.

In [3]:
clarity_trait = LLMRubricTrait(
    name="Clarity",
    description=(
        "Rate the clarity of this response on a scale of 1 to 5. "
        "1 = confusing and poorly structured, "
        "5 = crystal clear and well-organized."
    ),
    kind="score",
    min_score=1,
    max_score=5,
    higher_is_better=True,
)

print(f"Trait: {clarity_trait.name}")
print(f"Kind: {clarity_trait.kind}")
print(f"Score range: {clarity_trait.min_score} to {clarity_trait.max_score}")

# Validate a sample score
valid = clarity_trait.validate_score(3)
invalid = clarity_trait.validate_score(8)
print(f"Score 3 valid: {valid}")
print(f"Score 8 valid: {invalid}")

Trait: Clarity
Kind: score
Score range: 1 to 5
Score 3 valid: True
Score 8 valid: False


## 3. LLM Literal Trait

Literal traits perform ordered categorical classification.
The parsing model selects the best-matching category, and the result
is the class index (0, 1, 2, ...).

In [4]:
tone_trait = LLMRubricTrait(
    name="Response Tone",
    description="Classify the overall tone of this response.",
    kind="literal",
    classes={
        "overly_simple": "Uses childish language, oversimplifies to the point of inaccuracy",
        "accessible": "Clear and approachable while remaining accurate",
        "technical": "Uses domain-specific jargon, assumes background knowledge",
    },
    higher_is_better=False,
)

print(f"Trait: {tone_trait.name}")
print(f"Kind: {tone_trait.kind}")
print(f"Classes: {list(tone_trait.classes.keys())}")
print(f"Score range: {tone_trait.min_score} to {tone_trait.max_score}")
print()
# Class name helpers
print(f"Class names: {tone_trait.get_class_names()}")
print(f"Index of 'accessible': {tone_trait.get_class_index('accessible')}")
print(f"Index of 'unknown': {tone_trait.get_class_index('unknown')}")

Trait: Response Tone
Kind: literal
Classes: ['overly_simple', 'accessible', 'technical']
Score range: 0 to 2

Class names: ['overly_simple', 'accessible', 'technical']
Index of 'accessible': 1
Index of 'unknown': -1


## 4. Regex Trait

Regex traits use pattern matching — no LLM call required.
They are fast, free, and perfectly reproducible.

In [5]:
from karenina.schemas import RegexTrait

# Check for numbered citations
citation_trait = RegexTrait(
    name="Has Citations",
    description="Check that the response includes numbered citations",
    pattern=r"\[\d+\]",
    higher_is_better=True,
)

print(citation_trait.evaluate("The drug targets BCL2 [1] and KRAS [2]."))  # True
print(citation_trait.evaluate("The drug targets BCL2 and KRAS."))  # False

# Inverted match: check that hedging language is ABSENT
no_hedging_trait = RegexTrait(
    name="No Hedging",
    description="Ensure the response avoids hedging phrases",
    pattern=r"\b(maybe|perhaps|possibly|might be|could be)\b",
    case_sensitive=False,
    invert_result=True,
    higher_is_better=True,
)

print(no_hedging_trait.evaluate("The answer is 42."))  # True (no hedging)
print(no_hedging_trait.evaluate("The answer is perhaps 42."))  # False (hedging found)

True
False
True
False


## 5. Callable Trait

Callable traits wrap a custom Python function. Use `from_callable()`
to create them — it handles serialization automatically.

In [6]:
from karenina.schemas import CallableTrait

# Boolean callable: minimum word count
word_count_trait = CallableTrait.from_callable(
    name="Minimum Word Count",
    description="Response must contain at least 50 words",
    func=lambda text: len(text.split()) >= 50,
    kind="boolean",
    higher_is_better=True,
)

short = "The answer is BCL2."
long = "The drug target is BCL2. " * 20

print(f"Short ({len(short.split())} words): {word_count_trait.evaluate(short)}")  # False
print(f"Long ({len(long.split())} words): {word_count_trait.evaluate(long)}")  # True


# Score callable: count sentences
def count_sentences(text: str) -> int:
    import re

    sentences = re.split(r"[.!?]+", text.strip())
    return len([s for s in sentences if s.strip()])


sentence_trait = CallableTrait.from_callable(
    name="Sentence Count",
    description="Count the number of sentences in the response",
    func=count_sentences,
    kind="score",
    min_score=0,
    max_score=100,
    higher_is_better=True,
)

sample = "BCL2 is a proto-oncogene. It regulates apoptosis. It is located on chromosome 18."
print(f"Sentences: {sentence_trait.evaluate(sample)}")  # 3

Short (4 words): False
Long (100 words): True
Sentences: 3


## 6. Metric Rubric Trait

Metric traits measure extraction completeness via a confusion matrix.
They return precision, recall, and F1 scores.

In [7]:
from karenina.schemas import MetricRubricTrait

# TP-Only: check reference coverage
reference_trait = MetricRubricTrait(
    name="Reference Coverage",
    description="Check if response covers key references",
    evaluation_mode="tp_only",
    metrics=["precision", "recall", "f1"],
    tp_instructions=[
        "Mentions Tsujimoto et al., Science, 1985",
        "Mentions Hockenbery et al., Nature, 1990",
        "Mentions Adams & Cory, Science, 1998",
    ],
)

print(f"Trait: {reference_trait.name}")
print(f"Mode: {reference_trait.evaluation_mode}")
print(f"Metrics: {reference_trait.metrics}")
print(f"TP instructions: {len(reference_trait.tp_instructions)}")
print(f"Required buckets: {reference_trait.get_required_buckets()}")

# Full Matrix: content accuracy with expected and unexpected items
accuracy_trait = MetricRubricTrait(
    name="Content Accuracy",
    description="Check factual accuracy of BCL2 response",
    evaluation_mode="full_matrix",
    metrics=["precision", "recall", "f1", "specificity", "accuracy"],
    tp_instructions=[
        "BCL2 is a proto-oncogene",
        "BCL2 is located on chromosome 18",
        "BCL2 inhibits apoptosis",
    ],
    tn_instructions=[
        "BCL2 promotes cell division",
        "BCL2 is located on chromosome 11",
    ],
)

print(f"\nTrait: {accuracy_trait.name}")
print(f"Mode: {accuracy_trait.evaluation_mode}")
print(f"Metrics: {accuracy_trait.metrics}")
print(f"TP instructions: {len(accuracy_trait.tp_instructions)}")
print(f"TN instructions: {len(accuracy_trait.tn_instructions)}")
print(f"Required buckets: {accuracy_trait.get_required_buckets()}")

Trait: Reference Coverage
Mode: tp_only
Metrics: ['precision', 'recall', 'f1']
TP instructions: 3
Required buckets: {'fp', 'tp', 'fn'}

Trait: Content Accuracy
Mode: full_matrix
Metrics: ['precision', 'recall', 'f1', 'specificity', 'accuracy']
TP instructions: 3
TN instructions: 2
Required buckets: {'tn', 'fp', 'tp', 'fn'}


## Combining Traits in a Rubric

All trait types can be combined in a single `Rubric` object,
using the type-specific fields:

In [8]:
from karenina.schemas import Rubric

rubric = Rubric(
    llm_traits=[safety_trait, clarity_trait, tone_trait],
    regex_traits=[citation_trait, no_hedging_trait],
    callable_traits=[word_count_trait, sentence_trait],
    metric_traits=[reference_trait, accuracy_trait],
)

print(f"LLM traits: {len(rubric.llm_traits)}")
print(f"Regex traits: {len(rubric.regex_traits)}")
print(f"Callable traits: {len(rubric.callable_traits)}")
print(f"Metric traits: {len(rubric.metric_traits)}")
total = len(rubric.llm_traits) + len(rubric.regex_traits) + len(rubric.callable_traits) + len(rubric.metric_traits)
print(f"Total traits: {total}")

LLM traits: 3
Regex traits: 2
Callable traits: 2
Metric traits: 2
Total traits: 9


## Summary

| Type | Class | LLM? | Output | Key Fields |
|------|-------|------|--------|------------|
| Boolean | `LLMRubricTrait` | Yes | `True`/`False` | `kind="boolean"` |
| Score | `LLMRubricTrait` | Yes | `int` in range | `kind="score"`, `min_score`, `max_score` |
| Literal | `LLMRubricTrait` | Yes | Class index | `kind="literal"`, `classes` |
| Regex | `RegexTrait` | No | `True`/`False` | `pattern`, `case_sensitive`, `invert_result` |
| Callable | `CallableTrait` | No | `bool` or `int` | `from_callable()`, `func`, `kind` |
| Metric | `MetricRubricTrait` | Yes | Metrics dict | `evaluation_mode`, `tp_instructions` |

For detailed documentation, see:

- [LLM Traits (boolean + score)](../core_concepts/rubrics/llm-traits.md)
- [Literal Traits](../core_concepts/rubrics/literal-traits.md)
- [Regex Traits](../core_concepts/rubrics/regex-traits.md)
- [Callable Traits](../core_concepts/rubrics/callable-traits.md)
- [Metric Traits](../core_concepts/rubrics/metric-traits.md)