# Coherence Evaluator

## Objective
This sample demonstrates how to use the Coherence evaluator to assess the quality and readability of AI-generated responses. The evaluator measures the ability of the language model to generate text that reads naturally, flows smoothly, and resembles human-like language.

## Time

You should expect to spend about 20 minutes running this notebook. 

## Before you begin
For quality evaluation, you need to deploy a `gpt` model supporting JSON mode. We recommend a model `gpt-4o` or `gpt-4o-mini` for their strong reasoning capabilities.    

### Prerequisite
```bash
pip install azure-ai-evaluation
```
Set these environment variables with your own values:
1) **MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator, as found under the "Name" column in the "Models + endpoints" tab in your Azure AI Foundry project.
2) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.
3) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.
4) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.

The Coherence evaluator assesses the ability of the language model to generate text that reads naturally, flows smoothly, and resembles human-like language in its responses. It measures the readability and user-friendliness of the model's generated responses.

Coherence scores range from 1 to 5:

<pre>
Score 1: Very Poor - The response is completely incoherent with no logical flow or connection between ideas.
Score 2: Poor - The response has significant coherence issues with disjointed ideas and unclear transitions.
Score 3: Fair - The response is somewhat coherent but has noticeable gaps in logical flow or organization.
Score 4: Good - The response is mostly coherent with good flow and logical connections between ideas.
Score 5: Excellent - The response is perfectly coherent with smooth, natural flow and clear logical progression.
</pre>

The evaluation requires one of the following input patterns:

**Pattern 1: Query-Response Evaluation**
- Query: The user's question or prompt. (string)
- Response: The AI-generated response to be evaluated for coherence. (string)

**Pattern 2: Conversation Evaluation**
- Conversation: A multi-turn conversation where each turn is evaluated for coherence. (dict with "messages" key)

### Initialize Coherence Evaluator

In [None]:
import os
from azure.ai.evaluation import CoherenceEvaluator, AzureOpenAIModelConfiguration
from pprint import pprint

model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version=os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment=os.environ["MODEL_DEPLOYMENT_NAME"],
)

coherence_evaluator = CoherenceEvaluator(model_config=model_config)

### Samples

#### Query and Response as Strings (str) - Excellent Coherence

In [None]:
# Excellent coherence - smooth flow, clear logical progression
query = "Can you explain how machine learning works?"
response = "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. It works by analyzing data patterns to make predictions or decisions. First, the system is trained on a dataset, which allows it to identify patterns and relationships. Then, it uses these learned patterns to make predictions on new, unseen data. The accuracy improves over time as the model processes more data and refines its understanding."

result = coherence_evaluator(query=query, response=response)
pprint(result)

#### Query and Response as Strings (str) - Good Coherence

In [None]:
# Good coherence - mostly logical with minor flow issues
query = "What are the benefits of cloud computing?"
response = "Cloud computing offers several advantages. Cost savings is one major benefit because companies don't need to invest in physical servers. Scalability is another advantage. You can also access your data from anywhere with an internet connection. Additionally, cloud providers handle maintenance and updates, which reduces the IT burden on organizations."

result = coherence_evaluator(query=query, response=response)
pprint(result)

#### Query and Response as Strings (str) - Fair Coherence

In [None]:
# Fair coherence - noticeable gaps in logical flow
query = "How does photosynthesis work?"
response = "Plants use sunlight to make food. Chlorophyll is green. The process happens in leaves. Carbon dioxide and water are important. Oxygen is released as a byproduct. This is why plants are important for the environment. The energy from sunlight is converted into chemical energy."

result = coherence_evaluator(query=query, response=response)
pprint(result)

#### Query and Response as Strings (str) - Poor Coherence

In [None]:
# Poor coherence - disjointed ideas and unclear transitions
query = "What is blockchain technology?"
response = "Blockchain is decentralized. Bitcoin uses it. Blocks contain data. Cryptography is involved. Banks might use blockchain. It's secure because of the chain. Mining is how new blocks are added. Digital currencies exist. The ledger is distributed."

result = coherence_evaluator(query=query, response=response)
pprint(result)

#### Query and Response as Strings (str) - Technical Question

In [None]:
# Technical response with excellent coherence
query = "How do I optimize database query performance?"
response = "To optimize database query performance, start by analyzing your query execution plans to identify bottlenecks. Next, ensure you have appropriate indexes on columns used in WHERE clauses and JOIN conditions. Consider denormalizing data in read-heavy scenarios to reduce the number of joins required. Additionally, implement query result caching for frequently accessed data. Finally, regularly update database statistics to help the query optimizer make better decisions about execution plans."

result = coherence_evaluator(query=query, response=response)
pprint(result)

#### Conversation as Dict with Messages - Multi-turn Evaluation

In [None]:
# Multi-turn conversation - evaluates coherence across turns
conversation = {
    "messages": [
        {
            "role": "user",
            "content": "I'm planning a trip to Japan. What should I know?"
        },
        {
            "role": "assistant",
            "content": "Japan is a fascinating destination with a rich cultural heritage. First, you should know that the best times to visit are spring (March-May) for cherry blossoms and fall (September-November) for autumn colors. The country has an excellent public transportation system, particularly the Shinkansen bullet train network. Additionally, learning a few basic Japanese phrases will be appreciated by locals, though many signs in major cities have English translations."
        },
        {
            "role": "user",
            "content": "What about the food?"
        },
        {
            "role": "assistant",
            "content": "Japanese cuisine is incredibly diverse and delicious. Beyond the well-known sushi and ramen, you should try regional specialties like okonomiyaki in Osaka or motsunabe in Fukuoka. Most restaurants display plastic food models in their windows, which makes ordering easier even with a language barrier. It's also important to note that tipping is not customary in Japan and can sometimes be considered rude."
        },
        {
            "role": "user",
            "content": "Are there any cultural customs I should be aware of?"
        },
        {
            "role": "assistant",
            "content": "Yes, there are several important customs to observe. Always remove your shoes when entering homes, temples, and some traditional restaurants. Bowing is a common greeting, though a slight nod is acceptable for foreigners. When using chopsticks, never stick them upright in rice as this resembles a funeral ritual. Finally, be mindful of noise levels on public transportation, as talking on phones or having loud conversations is considered impolite."
        }
    ]
}

result = coherence_evaluator(conversation=conversation)
pprint(result)

#### Conversation as Dict with Messages - Agent with Tool Calls

In [None]:
# Conversation with tool calls - evaluates coherence of responses
conversation_with_tools = {
    "messages": [
        {
            "role": "user",
            "content": "Can you check the weather in Seattle and send me an email with the forecast?"
        },
        {
            "role": "assistant",
            "content": "I'll help you with that. First, let me check the current weather conditions in Seattle, and then I'll send you an email with the detailed forecast."
        },
        {
            "role": "tool",
            "content": "Current weather in Seattle: Temperature: 15°C, Condition: Partly cloudy, Humidity: 65%, Wind: 12 mph NW, Forecast: Light rain expected in the evening"
        },
        {
            "role": "assistant",
            "content": "I've retrieved the weather information for Seattle. Currently, it's 15°C with partly cloudy skies. The humidity is at 65% with winds from the northwest at 12 mph. The forecast indicates light rain is expected later this evening. I've sent you an email containing this complete weather report along with recommendations to bring an umbrella if you're planning to go out later today."
        }
    ]
}

result = coherence_evaluator(conversation=conversation_with_tools)
pprint(result)

#### Conversation as Dict with Messages - Mixed Coherence Quality

In [None]:
# Conversation with varying coherence levels
conversation_mixed = {
    "messages": [
        {
            "role": "user",
            "content": "Tell me about renewable energy sources."
        },
        {
            "role": "assistant",
            "content": "Renewable energy comes from natural sources. Solar panels convert sunlight. Wind turbines exist. Hydroelectric power uses water. These are cleaner than fossil fuels. The environment benefits. Many countries are investing in renewables."
        },
        {
            "role": "user",
            "content": "Which one is the most efficient?"
        },
        {
            "role": "assistant",
            "content": "The efficiency of renewable energy sources varies depending on several factors. Hydroelectric power is generally the most efficient, with conversion rates of up to 90% under optimal conditions. Solar panels typically achieve 15-22% efficiency, while wind turbines can reach 35-45% efficiency. However, efficiency alone doesn't determine the best choice, as factors like geographic location, initial investment costs, and environmental impact all play crucial roles in selecting the most appropriate renewable energy source for a specific situation."
        }
    ]
}

result = coherence_evaluator(conversation=conversation_mixed)
pprint(result)

#### Conversation as Dict with Messages - Short Conversation

In [None]:
# Short, coherent conversation
conversation_short = {
    "messages": [
        {
            "role": "user",
            "content": "What's the capital of France?"
        },
        {
            "role": "assistant",
            "content": "The capital of France is Paris. It's located in the north-central part of the country along the Seine River and is known for its iconic landmarks like the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral."
        }
    ]
}

result = coherence_evaluator(conversation=conversation_short)
pprint(result)