### Benchmarking (running in direct, non-framework mode)

Demonstrates the ability to benchmark llm responses against an given semantic expectation directly. Later functionality will allow you to run in framework mode, in which benchmarks are auto-executed using defined fixtures as parameters.

#### Requires:
- .env file in project directory configured with `OPENAI_API_KEY`, `BASE_URL`, `DEFAULT_EMBEDDING_MODEL` (or using defaults)

In [None]:
import os
os.chdir('..')
import semtest

#### Defined semantic expectation

In [None]:
expectation = "A dog is in the background of the photograph"

In [None]:

def mock_llm_response():
    responses = [
        "There's a dog in the background of the photo",
        "In the background of the photo is a dog",
        "There's an animal in the background of the photo and it's a dog."
    ]
    for response in responses:
        yield response

mock_llm_response_generator = mock_llm_response()

#### Decorate the function to act as a benchmark

In [None]:
@semtest.benchmark(
    semantic_expectation=expectation,
    iterations=3
)
def mock_prompt_benchmark():
    # Here's where the llm call with the designated prompt would occur
    mocked_llm_response = next(mock_llm_response_generator)
    print(f"llm responded with `{mocked_llm_response}`")
    return mocked_llm_response

In [None]:
res: semtest.Benchmark = mock_prompt_benchmark()

In [None]:
print(res.benchmarks())

In [None]:
print(res.benchmarks_json())