## Custom Grading Criteria

A custom grading criteria is the easiest way to create your own eval.

These evals take the format: 
_"If X, then fail. Otherwise, pass"_

This gets wrapped inside our CoT prompt, and enforces a JSON output of pass / fail along with a reason.

This is best used for very simple conditional evals (like the one below)

In [None]:
import os
from athina.evals import GradingCriteria
from athina.loaders import ResponseLoader
from athina.keys import OpenAiApiKey, AthinaApiKey
import pandas as pd
from dotenv import load_dotenv

load_dotenv()

OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))
AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))

### Initialize your dataset

The [`ResponseLoader`](https://github.com/athina-ai/athina-evals/blob/main/athina/loaders/response_loader.py) class is used to load your dataset. 

This loader ensures that the data contains a "response" field and is in the correct format for the `LlmEvaluator` class.

In [None]:
# Create batch dataset from list of dict objects
raw_data = [
    {
        "response": "I'm sorry but I can't help you with that query",
    },
    {
        "response": "I don't know the answer to that question",
    },
]

dataset = ResponseLoader().load_dict(raw_data)
pd.DataFrame(dataset)

### Configure and Run Evaluator

The easiest way to configure a custom evaluator is to use our [`GradingCritera`](https://github.com/athina-ai/athina-evals/blob/main/athina/evals/llm/grading_criteria/evaluator.py) class.

This evaluator simply takes in a grading criteria in the following format:

```
If X, then fail. Otherwise, pass.
```

Optionally, you can also specify what model you would like to use for grading.

In [None]:
# Checks if the LLM response answers the user query sufficiently
eval_model = "gpt-3.5-turbo"

grading_criteria = "If the response says it cannot answer the query, then fail. Otherwise pass."

GradingCriteria(
    model=eval_model,
    grading_criteria=grading_criteria
).run_batch(data=dataset, max_parallel_evals=2).to_df()