# Loading a Benchmark

Before running verification, you need to load your benchmark from a checkpoint file (JSON-LD) or a database. This page covers `Benchmark.load()`, inspecting questions, templates, and rubrics, and preparing for verification.

For background on the checkpoint format, see [Checkpoints](../04-core-concepts/checkpoints.md).

In [1]:
# Setup cell (hidden in rendered docs).
# No mocking needed — all examples load from the test checkpoint file.
import os as _os

_os.chdir(_os.path.dirname(_os.path.abspath("__file__")))

---

## Loading from a File

Use `Benchmark.load()` to load a checkpoint from a JSON-LD file:

In [2]:
from pathlib import Path

from karenina.benchmark import Benchmark

benchmark = Benchmark.load(Path("test_checkpoint.jsonld"))

print(f"Name:        {benchmark.name}")
print(f"Description: {benchmark.description}")
print(f"Version:     {benchmark.version}")
print(f"Creator:     {benchmark.creator}")
print(f"Questions:   {benchmark.question_count}")

Name:        Documentation Test Benchmark
Description: Sample benchmark with 5 questions for use in documentation examples
Version:     1.0.0
Creator:     Karenina Documentation
Questions:   5


`Benchmark.load()` parses the JSON-LD structure, validates it, and rebuilds the internal question/template/rubric caches. The returned `Benchmark` object is ready for inspection and verification.

---

## Loading from a Database

If your benchmark is stored in a database (via `save_to_db()`), load it by name:

In [3]:
# benchmark = Benchmark.load_from_db("My Benchmark", storage="sqlite:///benchmarks.db")

The `storage` parameter is a SQLAlchemy connection string. This is useful when benchmarks are managed through the karenina-server or GUI.

---

## Inspecting Questions

### Listing Questions

In [4]:
# Get all question IDs
question_ids = benchmark.get_question_ids()
print(f"Question IDs ({len(question_ids)}):")
for qid in question_ids:
    print(f"  {qid}")

Question IDs (5):
  urn:uuid:question-what-is-the-capital-of-france-cb0b4aaf
  urn:uuid:question-what-is-6-multiplied-by-7-aba99ec8
  urn:uuid:question-what-element-has-the-atomic-number-8-provide-both--8bcf4ce2
  urn:uuid:question-is-17-a-prime-number-7abec4b4
  urn:uuid:question-explain-the-concept-of-machine-learning-in-simple--de73020c


### Getting Question Details

Each question is returned as a dictionary with all its fields:

In [5]:
# Get the first question
qid = benchmark.get_question_ids()[0]
question = benchmark.get_question(qid)

print(f"Question:   {question['question']}")
print(f"Answer:     {question['raw_answer']}")
print(f"Finished:   {question.get('finished', False)}")
print(f"Has rubric: {'question_rubric' in question and question['question_rubric'] is not None}")

Question:   What is the capital of France?
Answer:     Paris
Finished:   True
Has rubric: True


### Getting All Questions

Use `get_all_questions()` for bulk access:

In [6]:
# Full question dictionaries
all_questions = benchmark.get_all_questions()
print(f"Total questions: {len(all_questions)}")
for q in all_questions:
    print(f"  - {q['question'][:60]}...")

# IDs only (lighter)
ids = benchmark.get_all_questions(ids_only=True)
print(f"\nIDs only: {len(ids)} questions")

Total questions: 5
  - What is the capital of France?...
  - What is 6 multiplied by 7?...
  - What element has the atomic number 8? Provide both the name ...
  - Is 17 a prime number?...
  - Explain the concept of machine learning in simple terms....

IDs only: 5 questions


### Question Metadata

For a structured metadata summary including template and rubric status:

In [7]:
qid = benchmark.get_question_ids()[0]
meta = benchmark.get_question_metadata(qid)

print(f"Question:     {meta['question'][:50]}...")
print(f"Has template: {meta['has_template']}")
print(f"Has rubric:   {meta['has_rubric']}")
print(f"Finished:     {meta['finished']}")
print(f"Created:      {meta['date_created']}")

Question:     What is the capital of France?...
Has template: True
Has rubric:   True
Finished:     True
Created:      2026-02-05T23:06:59.237875


---

## Inspecting Templates

Templates define how answers are parsed and verified. Not every question needs a template — questions without templates can still be evaluated using rubric-only mode.

In [8]:
# Check which questions have templates
for qid in benchmark.get_question_ids():
    has_it = benchmark.has_template(qid)
    q = benchmark.get_question(qid)
    print(f"{'[template]' if has_it else '[none]    '} {q['question'][:50]}...")

[template] What is the capital of France?...
[template] What is 6 multiplied by 7?...
[template] What element has the atomic number 8? Provide both...
[template] Is 17 a prime number?...
[none]     Explain the concept of machine learning in simple ...


### Viewing Template Code

In [9]:
# Get the template code for a question that has one
qid = benchmark.get_question_ids()[0]
if benchmark.has_template(qid):
    code = benchmark.get_template(qid)
    print(code)

from pydantic import Field
from karenina.schemas.entities import BaseAnswer

class Answer(BaseAnswer):
    capital: str = Field(description="The capital city mentioned in the response")

    def model_post_init(self, __context):
        self.correct = {"capital": "Paris"}

    def verify(self) -> bool:
        return self.capital.strip().lower() == self.correct["capital"].lower()



### Finished Templates

The `get_finished_templates()` method returns templates that are ready for verification (questions marked as finished with non-default templates):

In [10]:
finished = benchmark.get_finished_templates()
print(f"Templates ready for verification: {len(finished)}")
for ft in finished:
    print(f"  - {ft.question_preview}: {len(ft.template_code)} chars")

Templates ready for verification: 4
  - What is the capital of France?: 383 chars
  - What is 6 multiplied by 7?: 345 chars
  - What element has the atomic number 8? Provide both the name and chemical symbol.: 527 chars
  - Is 17 a prime number?: 247 chars


### Missing Templates

Find questions that still need templates:

In [11]:
missing = benchmark.get_missing_templates()
print(f"Questions without templates: {len(missing)}")
for m in missing:
    print(f"  - {m['question'][:60]}...")

Questions without templates: 1
  - Explain the concept of machine learning in simple terms....


---

## Inspecting Rubrics

Rubrics evaluate response quality through traits. They can be **global** (applied to all questions) or **question-specific**.

### Global Rubric

In [12]:
global_rubric = benchmark.get_global_rubric()
if global_rubric:
    print(f"Global LLM traits:      {len(global_rubric.llm_traits)}")
    print(f"Global regex traits:    {len(global_rubric.regex_traits)}")
    print(f"Global callable traits: {len(global_rubric.callable_traits)}")
    print(f"Global metric traits:   {len(global_rubric.metric_traits)}")
else:
    print("No global rubric defined")

No global rubric defined


### Question-Specific Rubrics

In [13]:
for qid in benchmark.get_question_ids():
    q = benchmark.get_question(qid)
    rubric = q.get("question_rubric")
    if rubric:
        trait_count = sum(
            len(rubric.get(k, [])) for k in ["llm_traits", "regex_traits", "callable_traits", "metric_traits"]
        )
        print(f"{q['question'][:40]}... — {trait_count} trait(s)")
    else:
        print(f"{q['question'][:40]}... — no question rubric")

What is the capital of France?... — 1 trait(s)
What is 6 multiplied by 7?... — 1 trait(s)
What element has the atomic number 8? Pr... — 1 trait(s)
Is 17 a prime number?... — no question rubric
Explain the concept of machine learning ... — no question rubric


---

## Benchmark Status

Use the status properties to understand the overall state of a loaded benchmark:

In [14]:
print(f"Name:          {benchmark.name}")
print(f"Questions:     {benchmark.question_count}")
print(f"Finished:      {benchmark.finished_count}")
print(f"Empty:         {benchmark.is_empty}")
print(f"Complete:      {benchmark.is_complete}")
print(f"Progress:      {benchmark.get_progress():.1f}%")

Name:          Documentation Test Benchmark
Questions:     5
Finished:      4
Empty:         False
Complete:      False
Progress:      80.0%


A benchmark is **complete** when all questions are finished and have templates. The `get_progress()` method returns a percentage based on the ratio of finished questions to total questions.

---

## Next Steps

- [Configure verification settings](verification-config.md) — set up models, evaluation mode, and feature flags
- [Run verification via Python API](python-api.md) — full end-to-end example
- [Run verification via CLI](cli.md) — command-line workflow
- [Analyze results](../07-analyzing-results/index.md) — inspect and export verification outputs