# Coherence Evaluator

## Objective
This sample demonstrates how to use the Coherence evaluator to assess the quality and readability of AI-generated responses. The evaluator measures the ability of the language model to generate text that reads naturally, flows smoothly, and resembles human-like language.

## Time

You should expect to spend about 20 minutes running this notebook. 

## Before you begin
For quality evaluation, you need to deploy a `gpt` model supporting JSON mode. We recommend using `gpt-4o` or `gpt-4.1`.    

### Prerequisite
```bash
pip install azure-ai-projects azure-identity openai
```
Set these environment variables with your own values:
1) **AZURE_AI_PROJECT_ENDPOINT** - Your Azure AI project endpoint in format: `https://<account_name>.services.ai.azure.com/api/projects/<project_name>`
2) **AZURE_AI_MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator (e.g., gpt-4o-mini)

The Coherence evaluator assesses the ability of the language model to generate text that reads naturally, flows smoothly, and resembles human-like language in its responses. It measures the readability and user-friendliness of the model's generated responses.

Coherence scores range from 1 to 5:

<pre>
Score 1: Very Poor - The response is completely incoherent with no logical flow or connection between ideas.
Score 2: Poor - The response has significant coherence issues with disjointed ideas and unclear transitions.
Score 3: Fair - The response is somewhat coherent but has noticeable gaps in logical flow or organization.
Score 4: Good - The response is mostly coherent with good flow and logical connections between ideas.
Score 5: Excellent - The response is perfectly coherent with smooth, natural flow and clear logical progression.
</pre>

The evaluation requires the following input pattern:

**Pattern 1: Query-Response Evaluation**
- Query: The user's question or prompt. (string)
- Response: The AI-generated response to be evaluated for coherence. (string)

### Initialize Coherence Evaluator

In [None]:
import os
from openai.types.evals.create_eval_jsonl_run_data_source_param import SourceFileContentContent
from pprint import pprint
from agent_utils import run_evaluator

# Get environment variables
deployment_name = os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"]

# Data source configuration (defines the schema for evaluation inputs)
data_source_config = {
    "type": "custom",
    "item_schema": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string"
            },
            "response": {
                "type": "string"
            }
        },
        "required": []
    },
    "include_sample_schema": True
}

# Data mapping (maps evaluation inputs to evaluator parameters)
data_mapping = {
    "query": "{{item.query}}",
    "response": "{{item.response}}"
}

# Initialization parameters for the evaluator
initialization_parameters = {
    "deployment_name": deployment_name
}

# Initialize the evaluation_contents list - we'll append all test cases here
evaluation_contents = []

### Samples

#### Query and Response as Strings (str)

In [None]:
# Test Case 1: High coherence response with logical flow and clear connections
query = "Can you explain how machine learning works?"
response = "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. It works by analyzing data patterns to make predictions or decisions. First, the system is trained on a dataset, which allows it to identify patterns and relationships. Then, it uses these learned patterns to make predictions on new, unseen data. The accuracy improves over time as the model processes more data and refines its understanding."

# Append to evaluation_contents
evaluation_contents.append(
    SourceFileContentContent(
        item={
            "query": query,
            "response": response
        }
    )
)

#### Example of Poor Coherence

In [None]:
# Test Case 2: Poor coherence example with disjointed ideas and unclear transitions
query = "What is blockchain technology?"
response = "Blockchain is decentralized. Bitcoin uses it. Blocks contain data. Cryptography is involved. Banks might use blockchain. It's secure because of the chain. Mining is how new blocks are added. Digital currencies exist. The ledger is distributed."

# Append to evaluation_contents
evaluation_contents.append(
    SourceFileContentContent(
        item={
            "query": query,
            "response": response
        }
    )
)

### Run Evaluation on All Test Cases

Now that we've defined all test cases, let's run the evaluation once on all of them.

In [None]:
results = run_evaluator(
    evaluator_name="coherence",
    evaluation_contents=evaluation_contents,
    data_source_config=data_source_config,
    initialization_parameters=initialization_parameters,
    data_mapping=data_mapping
)

### Display Results

View the evaluation results for each test case.

In [None]:
pprint(results)