# Fluency Evaluator

## Objective
This sample demonstrates how to use the Fluency evaluator to assess the linguistic quality of AI-generated responses. The evaluator measures how well generated text conforms to grammatical rules, syntactic structures, and appropriate vocabulary usage.

## Time

You should expect to spend about 20 minutes running this notebook. 

## Before you begin
For quality evaluation, you need to deploy a `gpt` model supporting JSON mode. We recommend a model `gpt-4o` or `gpt-4o-mini` for their strong reasoning capabilities.    

### Prerequisite
```bash
pip install azure-ai-projects azure-identity azure-ai-evaluation
```
Set these environment variables with your own values:
1) **AZURE_AI_PROJECT** - The project connection string, as found in the overview page of your Azure AI Foundry project.
2) **MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator, as found under the "Name" column in the "Models + endpoints" tab in your Azure AI Foundry project.
3) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.
4) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.
5) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.

The Fluency evaluator assesses the extent to which generated text conforms to grammatical rules, syntactic structures, and appropriate vocabulary usage, resulting in linguistically correct responses.

Fluency scores range from 1 to 5:

<pre>
Score 1: Very Poor - The response is incomprehensible with severe grammatical errors and improper vocabulary.
Score 2: Poor - The response has frequent grammatical errors and awkward phrasing that hinder understanding.
Score 3: Fair - The response is understandable but contains noticeable grammatical errors or awkward expressions.
Score 4: Good - The response is mostly fluent with minor grammatical issues that don't significantly impact readability.
Score 5: Excellent - The response is perfectly fluent with proper grammar, syntax, and vocabulary usage.
</pre>

The evaluation requires one of the following input patterns:

**Pattern 1: Single Response Evaluation**
- Response: The text response to be evaluated for fluency. (string)

**Pattern 2: Conversation Evaluation**
- Conversation: A multi-turn conversation where each turn is evaluated for fluency. (List[dict])

### Initialize Fluency Evaluator

In [2]:
import os
from azure.ai.evaluation import FluencyEvaluator, AzureOpenAIModelConfiguration
from pprint import pprint

model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version=os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment=os.environ["MODEL_DEPLOYMENT_NAME"],
)

fluency_evaluator = FluencyEvaluator(model_config=model_config)

### Samples

#### Response as String (str) - Excellent Fluency

In [3]:
# Excellent fluency - proper grammar, syntax, and vocabulary
response_excellent = "The weather in Seattle is currently partly cloudy with a temperature of 15°C. The forecast indicates that conditions will remain stable throughout the day, with a gentle breeze from the northwest."

result = fluency_evaluator(response=response_excellent)
pprint(result)

{'fluency': 4.0,
 'fluency_completion_tokens': 140,
 'fluency_finish_reason': 'stop',
 'fluency_model': 'gpt-4.1-2025-04-14',
 'fluency_prompt_tokens': 948,
 'fluency_reason': 'The response is well-articulated, coherent, and uses '
                   'varied sentence structures and vocabulary appropriate for '
                   'the context, with no significant errors.',
 'fluency_result': 'pass',
 'fluency_sample_input': '[{"role": "user", "content": "{\\"response\\": '
                         '\\"The weather in Seattle is currently partly cloudy '
                         'with a temperature of 15\\\\u00b0C. The forecast '
                         'indicates that conditions will remain stable '
                         'throughout the day, with a gentle breeze from the '
                         'northwest.\\"}"}]',
 'fluency_sample_output': '[{"role": "assistant", "content": "<S0>Let\'s think '
                          'step by step: The response is grammatically correct '
      

#### Response as String (str) - Good Fluency with Minor Issues

In [4]:
# Good fluency - mostly fluent with minor grammatical issues
response_good = "The meeting has been schedule for tomorrow at 2 PM. Please make sure to reviewing the attached documents before attending the session."

result = fluency_evaluator(response=response_good)
pprint(result)

{'fluency': 3.0,
 'fluency_completion_tokens': 127,
 'fluency_finish_reason': 'stop',
 'fluency_model': 'gpt-4.1-2025-04-14',
 'fluency_prompt_tokens': 937,
 'fluency_reason': 'The response is clear and coherent but contains some '
                   'grammatical mistakes and lacks sentence complexity, '
                   'fitting the definition of Competent Fluency.',
 'fluency_result': 'pass',
 'fluency_sample_input': '[{"role": "user", "content": "{\\"response\\": '
                         '\\"The meeting has been schedule for tomorrow at 2 '
                         'PM. Please make sure to reviewing the attached '
                         'documents before attending the session.\\"}"}]',
 'fluency_sample_output': '[{"role": "assistant", "content": "<S0>Let\'s think '
                          'step by step: The response communicates the main '
                          'idea clearly: a meeting is scheduled and documents '
                          'should be reviewed beforehand.

#### Response as String (str) - Fair Fluency

In [5]:
# Fair fluency - understandable but contains noticeable errors
response_fair = "The order you placing yesterday is being process and will shipped soon. Tracking information send to your email when available."

result = fluency_evaluator(response=response_fair)
pprint(result)

{'fluency': 2.0,
 'fluency_completion_tokens': 146,
 'fluency_finish_reason': 'stop',
 'fluency_model': 'gpt-4.1-2025-04-14',
 'fluency_prompt_tokens': 934,
 'fluency_reason': 'The response has frequent grammatical errors and awkward '
                   'phrasing, but the main idea is still partially clear.',
 'fluency_result': 'fail',
 'fluency_sample_input': '[{"role": "user", "content": "{\\"response\\": '
                         '\\"The order you placing yesterday is being process '
                         'and will shipped soon. Tracking information send to '
                         'your email when available.\\"}"}]',
 'fluency_sample_output': '[{"role": "assistant", "content": "<S0>Let\'s think '
                          'step by step: The response contains several '
                          'grammatical errors (\\"The order you placing '
                          'yesterday,\\" \\"is being process,\\" \\"will '
                          'shipped soon,\\" \\"Tracking infor

#### Response as String (str) - Poor Fluency

In [6]:
# Poor fluency - frequent grammatical errors and awkward phrasing
response_poor = "Email draft attach is. You review and giving feedback must. Important very for project success it being."

result = fluency_evaluator(response=response_poor)
pprint(result)

{'fluency': 1.0,
 'fluency_completion_tokens': 108,
 'fluency_finish_reason': 'stop',
 'fluency_model': 'gpt-4.1-2025-04-14',
 'fluency_prompt_tokens': 932,
 'fluency_reason': 'The response is mostly incoherent and contains pervasive '
                   'grammatical errors, fitting the definition of Emergent '
                   'Fluency.',
 'fluency_result': 'fail',
 'fluency_sample_input': '[{"role": "user", "content": "{\\"response\\": '
                         '\\"Email draft attach is. You review and giving '
                         'feedback must. Important very for project success it '
                         'being.\\"}"}]',
 'fluency_sample_output': '[{"role": "assistant", "content": "<S0>Let\'s think '
                          'step by step: The response contains fragmented '
                          'sentences and awkward phrasing (\\"Email draft '
                          'attach is. You review and giving feedback must. '
                          'Important very for

#### Response as String (str) - Technical Content

In [7]:
# Technical response with excellent fluency
response_technical = "To resolve the authentication error, please verify that your API credentials are correctly configured in the environment variables. Additionally, ensure that the service principal has the necessary permissions assigned at the subscription level."

result = fluency_evaluator(response=response_technical)
pprint(result)

{'fluency': 4.0,
 'fluency_completion_tokens': 150,
 'fluency_finish_reason': 'stop',
 'fluency_model': 'gpt-4.1-2025-04-14',
 'fluency_prompt_tokens': 949,
 'fluency_reason': 'The response is well-articulated, clear, and uses varied '
                   'technical vocabulary with good grammar and logical flow, '
                   'fitting the definition of proficient fluency.',
 'fluency_result': 'pass',
 'fluency_sample_input': '[{"role": "user", "content": "{\\"response\\": \\"To '
                         'resolve the authentication error, please verify that '
                         'your API credentials are correctly configured in the '
                         'environment variables. Additionally, ensure that the '
                         'service principal has the necessary permissions '
                         'assigned at the subscription level.\\"}"}]',
 'fluency_sample_output': '[{"role": "assistant", "content": "<S0>Let\'s think '
                          'step by ste

#### Conversation as List[dict] - Multi-turn Evaluation

In [8]:
# Multi-turn conversation - evaluates fluency of each assistant turn
conversation = {
    "messages": [
        {
            "role": "user",
            "content": "Can you help me understand the weather forecast for this week?"
        },
        {
            "role": "assistant",
            "content": "Of course! I'd be happy to help you understand this week's weather forecast. The outlook shows a mix of sunny and cloudy conditions."
        },
        {
            "role": "user",
            "content": "What about the temperature?"
        },
        {
            "role": "assistant",
            "content": "Temperatures will ranging from 12°C to 18°C throughout the week, with the warmest days being Wednesday and Thursday."
        },
        {
            "role": "user",
            "content": "Should I bring an umbrella?"
        },
        {
            "role": "assistant",
            "content": "Yes, it would be advisable to carry an umbrella, particularly on Friday when there is a 60% chance of precipitation."
        }
    ]
}

result = fluency_evaluator(conversation=conversation)
pprint(result)

{'evaluation_per_turn': {'fluency': [3.0, 3.0, 4.0],
                         'fluency_completion_tokens': [119, 121, 136],
                         'fluency_finish_reason': ['stop', 'stop', 'stop'],
                         'fluency_model': ['gpt-4.1-2025-04-14',
                                           'gpt-4.1-2025-04-14',
                                           'gpt-4.1-2025-04-14'],
                         'fluency_prompt_tokens': [938, 938, 937],
                         'fluency_reason': ['The response is clear, correct, '
                                            'and coherent, but it does not '
                                            'demonstrate complex sentence '
                                            'structure or advanced vocabulary.',
                                            'The response is clear and '
                                            'coherent with only a minor '
                                            'grammatical error, fitting the '

#### Conversation as List[dict] - Agent with Tool Calls

In [9]:
# Conversation with tool calls - evaluates fluency of assistant responses
conversation_with_tools = {
    "messages": [
        {
            "role": "user",
            "content": "Can you send me an email with the weather report for Seattle?"
        },
        {
            "role": "assistant",
            "content": "I'll fetch the current weather information for Seattle and send you an email with the details."
        },
        {
            "role": "tool",
            "content": "Temperature: 15°C, Condition: Partly cloudy, Wind: 8 mph NW"
        },
        {
            "role": "assistant",
            "content": "I have successfully sent you an email containing the weather report for Seattle. The current conditions are partly cloudy with a temperature of 15°C and light winds from the northwest."
        }
    ]
}

result = fluency_evaluator(conversation=conversation_with_tools)
pprint(result)

{'fluency': 4.0,
 'fluency_completion_tokens': 123,
 'fluency_finish_reason': 'stop',
 'fluency_model': 'gpt-4.1-2025-04-14',
 'fluency_prompt_tokens': 929,
 'fluency_reason': 'The response demonstrates proficient fluency with good '
                   'grammar, clear structure, and logical flow, but does not '
                   'reach the level of exceptional fluency due to its '
                   'simplicity.',
 'fluency_result': 'pass',
 'fluency_sample_input': '[{"role": "user", "content": "{\\"response\\": '
                         '\\"I\'ll fetch the current weather information for '
                         'Seattle and send you an email with the '
                         'details.\\"}"}]',
 'fluency_sample_output': '[{"role": "assistant", "content": "<S0>Let\'s think '
                          'step by step: The response is grammatically correct '
                          'and uses appropriate vocabulary. The sentence '
                          'structure is clear and co

#### Conversation as List[dict] - Mixed Fluency Quality

In [10]:
# Conversation with varying fluency levels across turns
conversation_mixed = {
    "messages": [
        {
            "role": "user",
            "content": "What's the status of my order?"
        },
        {
            "role": "assistant",
            "content": "Your order is being process currently and will be ship very soon to you."
        },
        {
            "role": "user",
            "content": "When will it arrive?"
        },
        {
            "role": "assistant",
            "content": "The estimated delivery date for your order is March 15, 2025. You will receive tracking information via email once the package has been dispatched."
        }
    ]
}

result = fluency_evaluator(conversation=conversation_mixed)
pprint(result)

{'evaluation_per_turn': {'fluency': [2.0, 4.0],
                         'fluency_completion_tokens': [124, 127],
                         'fluency_finish_reason': ['stop', 'stop'],
                         'fluency_model': ['gpt-4.1-2025-04-14',
                                           'gpt-4.1-2025-04-14'],
                         'fluency_prompt_tokens': [927, 942],
                         'fluency_reason': ['The response has frequent '
                                            'grammatical errors and limited '
                                            'vocabulary, but the main idea is '
                                            'partially understandable.',
                                            'The response is well-written, '
                                            'clear, and coherent, with good '
                                            'grammar and appropriate '
                                            'vocabulary, but lacks the '
                      