# Running experiments with custom evaluations

The user can run custom experiments to perform automatically when sending a prompt to an API. This notebook shows how to run experiments with custom evaluations. The notebook uses anthropic to run the experiments. 

In [1]:
from prompto.settings import Settings
from prompto.experiment import Experiment
from dotenv import load_dotenv
import os

When using `prompto` to query models from the Anthropic API, lines in our experiment `.jsonl` files must have `"api": "anthropic"` in the prompt dict. 

## Environment variables

For the [Anthropic API](https://alan-turing-institute.github.io/prompto/docs/anthropic/), there are two environment variables that could be set:
- `ANTHROPIC_API_KEY`: the API key for the Anthropic API

As mentioned in the [environment variables docs](https://alan-turing-institute.github.io/prompto/docs/environment_variables/#model-specific-environment-variables), there are also model-specific environment variables too which can be utilised. In particular, when you specify a `model_name` key in a prompt dict, one could also specify a `ANTHROPIC_API_KEY_model_name` environment variable to indicate the API key used for that particular model (where "model_name" is replaced to whatever the corresponding value of the `model_name` key is). We will see a concrete example of this later.

To set environment variables, one can simply have these in a `.env` file which specifies these environment variables as key-value pairs:
```
ANTHROPIC_API_KEY=<YOUR-ANTHROPIC-KEY>
```

If you make this file, you can run the following which should return `True` if it's found one, or `False` otherwise:

In [2]:
load_dotenv(dotenv_path=".env")

True

Now, we obtain those values. We raise an error if the `ANTHROPIC_API_KEY` environment variable hasn't been set:

In [3]:
ANTHROPIC_API_KEY = os.environ.get("ANTHROPIC_API_KEY")
if ANTHROPIC_API_KEY is None:
    raise ValueError("ANTHROPIC_API_KEY is not set")

If you get any errors or warnings in the above two cells, try to fix your `.env` file like the example we have above to get these variables set.

## Write some Custom Evaluations Functions

The only rule when writing custom evaluations is that the function should take in a single argument which is the prompt_dict with the responses from the API. The function should return the same dictionary with any additional keys that you want to add. 

In [4]:
def count_words_in_response(response_dict):
    """
    This function is an example of an evaluation function that can be used to evaluate the response of an experiment.
    It counts the number of words in the response and adds it to the response_dict. It also adds a boolean value to
    the response_dict that is True if the response has more than 10 words and False otherwise.
    """
    # Count the number of spaces in the response
    response_dict["Word Count"] = response_dict["response"].count(" ") + 1
    response_dict["more_than_10_words"] = response_dict["Word Count"] > 10
    return response_dict

## Now we simply run the experiment in the same way as normal, but pass in your evaluation func into `process` method. 

Note more than one functions can be passed and they will be executed in the order they are passed.

In [5]:
settings = Settings(data_folder="./data", max_queries=30)
experiment = Experiment(file_name="evaluation-example.jsonl", settings=settings)

In [6]:
experiment.completed_responses

[]

In [7]:
responses, avg_query_processing_time = await experiment.process(
    evaluation_funcs=[count_words_in_response]
)

Sending 2 queries at 30 QPM with RI of 2.0s  (attempt 1/3): 100%|██████████| 2/2 [00:04<00:00,  2.00s/query]
Waiting for responses  (attempt 1/3): 100%|██████████| 2/2 [00:00<00:00,  2.09query/s]


In [8]:
experiment.completed_responses

[{'id': 0,
  'api': 'anthropic',
  'model_name': 'claude-3-haiku-20240307',
  'prompt': 'How does technology impact us?',
  'parameters': {'temperature': 1, 'max_tokens': 100},
  'timestamp_sent': '28-08-2024-18-01-13',
  'response': 'Technology can have a significant impact on individuals and society in both positive and negative ways. Here are some of the key ways technology can impact us:\n\nPositive impacts:\n- Increased productivity and efficiency - Technology like computers, automation, and the internet can help us work faster and more effectively.\n- Access to information and knowledge - The internet provides easy access to vast amounts of information and educational resources.\n- Improved communication and connectivity - Technologies like smartphones, email, and video chat help',
  'Word Count': 80,
  'more_than_10_words': True},
 {'id': 1,
  'api': 'anthropic',
  'model_name': 'claude-3-5-sonnet-20240620',
  'prompt': 'How does technology impact us? Keep the response to less t

We can see the results from the evaluation function in the completed responses. 