# Example 1

Most likely some comments here like - what will be presented, and the structure of the notebooks

In [2]:
# TODO: Create another test scenario where each type of assertion would be included
# TODO: For this, config should also be updated
# TODO: Talk about sub components separately (step1, step2, assertion1, assertion2 etc.)
# TODO: Add optional jinja2 templating section or a remark with a link

## Installation

In [3]:
# %pip install contextcheck
# %pip install devtools

## Imports

In [4]:
from contextcheck import TestScenario
from contextcheck.executors.executor import Executor # NOTE RB: Maybe Executor should be at the most outer layer for import
from devtools import pprint # Needed for pydantic models pretty formatting
import yaml

## Scenario creation

Note that throughout this notebook we present a separate bits of a single scenario which are all gathered in a proper yaml, which is used after the explanation of the particular parts which make a scenario.

### Explain config

Config defines llm (or Rag system) connection. We provide several popular llm providers which lets you be productive from the start.
There are three components used in config:
1. `endpoint_under_test` - defines the tested endpoint
2. `default_request` - defines the defaults for both the `endpoint_under_test` and `eval_endpoint` (TODO: Please someone confirm that)
3. `eval_endpoint` - defines the endpoint which is used for evaluating the responses from `endpoint_under_test`

For more infromation about configuration please go to [TODO - INSERT LINK HERE]

TODO: What's the purpose of `default_request` when the same configuration can be given to `endpoint_under_test` or `eval_endpoint`?

In [5]:
# Define configuration in yaml - for demonstration purposes it's done in notebook
yaml_config_1 = """
config:
   endpoint_under_test:
      kind: openai
      model: gpt-4o-mini
      temperature: 0.2
   eval_endpoint:
      kind: openai
      model: gpt-4o
      temperature: 0.0
"""

yaml_from_string = yaml.safe_load(yaml_config_1)
yaml_from_string

{'config': {'endpoint_under_test': {'kind': 'openai', 'model': 'gpt-4o-mini'},
  'eval_endpoint': {'kind': 'openai', 'model': 'gpt-4o-mini'}}}

#### Extra: Adding custom endpoint

In [1]:
# Logic or a link for creating and using custom endpoint should be added somewhere here

### Explain steps

Each test scenario consists of at least one testing step.

Each step can by defined by its `name` (optional), `request` and `asserts` (optional):
- `name` is a name of the test step
- `request` is a message to an llm
- `asserts` is a list of assertions done on llm response

NOTE: By default each assert is treated as an `eval` assertion

In [11]:
# TODO: Add other type of asserts
yaml_from_string = yaml.safe_load("""
steps:
   - name: Check capital of Poland
     request: 'What is the capital city of Poland?'
     asserts:
        - '"Warsaw" in response.message'
        - 'response.stats.conn_duration < 3'
   - name: Test hallucination evaluator (hallucinated)
     request:
       message: Where did Mike go? Choose between the home and the park.
     asserts:
        - llm_metric: hallucination
          reference: Mike went to the store.
""")
yaml_from_string

{'steps': [{'name': 'Check capital of Poland',
   'request': 'What is the capital city of Poland?',
   'asserts': ['"Warsaw" in response.message',
    'response.stats.conn_duration < 3']},
  {'name': 'Test hallucination evaluator (hallucinated)',
   'request': {'message': 'Where did Mike go? Choose between the home and the park.'},
   'asserts': [{'llm_metric': 'hallucination',
     'reference': 'Mike went to the store.'}]}]}

#### Explain assertions

There are three families of assertions:
1. `eval` assertion - converts a string to python code using (you guessed it) eval
2. `llm_metric` assertion - uses another llm defined in `eval_endpoint` to assess the `endpoint_under_test` performance
3. `deterministic` assertion - does string assessments like contains, contains-any etc.

In [None]:
# TODO: Add deterministic assertion combining it with the two previous assertions
# NOTE RB: Metrics should be easilly extended i.e. if someone wants to add a metric we should provide a simple way
# to do that, which should not break any functionalities like result summarization or time statistics etc.

##### Explain llm assertions

`llm_metric` uses another llm to assess the response of the `endpoint_under_test`. For this `eval_endpoint` should be added in config section to define evaluation endpoint. It can be one of the available endpoints (link here) or one created by the user (link here).

In [2]:
# TODO: Add 1-2 examples here and link other options

##### Explain eval assertions

`eval` assertion uses python's build in eval function which changes any string to python executable code. User has Response model for disposition which include in a base form should include the response from the `endpoint_under_test` and the time statistics (see `ConnectorStats` model).

In [12]:
# TODO: Add 1-2 examples of eval here

##### Explain deterministic assertions

`deterministic` assertion provide a way to assert the content of the response through string comparisons like contains or contains-any.

In [None]:

# TODO: Show 1-2 examples of that and link to other options

## Final scenario

In [13]:
# When the test scenario is finally ready we can load it
# TODO: Change it to a proper test scenario
test_scenario_file_path = "../tests/scenario_openai.yaml"
test_scenario = TestScenario.from_yaml(file_path=test_scenario_file_path)

In [14]:
# Inspect the structure of test_scenario
pprint(test_scenario)

TestScenario(
    steps=[
        TestStep(
            name='Write success in the response',
            request=RequestBase(
                message='Write success in the response',
            ),
            response=None,
            asserts=[],
            result=None,
        ),
        TestStep(
            name='Check capital of Poland',
            request=RequestBase(
                message='What is the capital city of Poland?',
            ),
            response=None,
            asserts=[
                AssertionEval(
                    result=None,
                    eval='"Warsaw" in response.message',
                ),
                AssertionEval(
                    result=None,
                    eval='response.stats.conn_duration < 3',
                ),
            ],
            result=None,
        ),
        TestStep(
            name='Send hello',
            request=RequestBase(
                message='Hello!',
            ),
            response=None,

In [15]:
# Initiate executor which runs test scenario
executor = Executor(test_scenario=test_scenario)

In [16]:
# Run test scenario
executor.run_all()

[32m2024-09-23 17:47:51.698[0m | [1mINFO    [0m | [36mcontextcheck.executors.executor[0m:[36mrun_all[0m:[36m41[0m - [1mRunning scenario[0m
[32m2024-09-23 17:47:51.711[0m | [1mINFO    [0m | [36mcontextcheck.interfaces.interface[0m:[36m__call__[0m:[36m11[0m - [1mname='Write success in the response' request=RequestBase(message='Write success in the response') response=None asserts=[] result=None[0m
[32m2024-09-23 17:47:51.713[0m | [1mINFO    [0m | [36mcontextcheck.interfaces.interface[0m:[36m__call__[0m:[36m11[0m - [1mmessage='Write success in the response'[0m
[32m2024-09-23 17:47:52.783[0m | [1mINFO    [0m | [36mcontextcheck.interfaces.interface[0m:[36m__call__[0m:[36m11[0m - [1mmessage='Success! How can I assist you further?' stats=ResponseStats(tokens_request=12, tokens_response=9, tokens_total=21, conn_start_time=26156.204792279, conn_end_time=26157.269648521, conn_duration=1.0648562419992231) id='chatcmpl-AAfZwElri6ZErMaaIxv12EXRAzQ1S' c

False

In [17]:
# NOTE RB: Maybe executor should copy the test scenario
# Inspect updated test_scenario
pprint(test_scenario)

TestScenario(
    steps=[
        TestStep(
            name='Write success in the response',
            request=RequestBase(
                message='Write success in the response',
            ),
            response=ResponseModel(
                message='Success! How can I assist you further?',
                stats=ResponseStats(
                    tokens_request=12,
                    tokens_response=9,
                    tokens_total=21,
                    conn_start_time=26156.204792279,
                    conn_end_time=26157.269648521,
                    conn_duration=1.0648562419992231,
                ),
                id='chatcmpl-AAfZwElri6ZErMaaIxv12EXRAzQ1S',
                choices=[
                    {
                        'finish_reason': 'stop',
                        'index': 0,
                        'logprobs': None,
                        'message': {
                            'content': 'Success! How can I assist you further?',
                

In [18]:
# We can inspect each test step separately and check its results
for step in test_scenario.steps:
    print(f"Step name: {step.name}, Result: {step.result}")

Step name: Write success in the response, Result: True
Step name: Check capital of Poland, Result: True
Step name: Send hello, Result: False


In [22]:
# TODO: We can also inspect each assertion for each step separately
for step in test_scenario.steps:
    print(f"Step name: {step.name}:\n")
    for assertion in step.asserts:
        print(assertion) 
    print("-"*12)

Step name: Write success in the response:

------------
Step name: Check capital of Poland:

result=True eval='"Warsaw" in response.message'
result=True eval='response.stats.conn_duration < 3'
------------
Step name: Send hello:

result=True eval='response.stats.conn_duration < 3'
result=True eval='response.stats.tokens_response > 5'
result=False eval='"Goodbye" in response.message'
------------
