# Fluency Evaluator

## Objective
This sample demonstrates how to use the Fluency evaluator to assess the linguistic quality of AI-generated responses. The evaluator measures how well generated text conforms to grammatical rules, syntactic structures, and appropriate vocabulary usage.

## Time

You should expect to spend about 20 minutes running this notebook. 

## Before you begin
For quality evaluation, you need to deploy a `gpt` model supporting JSON mode.  We recommend using `gpt-4o` or `gpt-4.1`.    

### Prerequisite
```bash
pip install azure-ai-projects azure-identity openai
```
Set these environment variables with your own values:
1) **AZURE_AI_PROJECT_ENDPOINT** - Your Azure AI project endpoint in format: `https://<account_name>.services.ai.azure.com/api/projects/<project_name>`
2) **AZURE_AI_MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator (e.g., gpt-4o-mini)

The Fluency evaluator assesses the extent to which generated text conforms to grammatical rules, syntactic structures, and appropriate vocabulary usage, resulting in linguistically correct responses.

Fluency scores range from 1 to 5:

<pre>
Score 1: Very Poor - The response is incomprehensible with severe grammatical errors and improper vocabulary.
Score 2: Poor - The response has frequent grammatical errors and awkward phrasing that hinder understanding.
Score 3: Fair - The response is understandable but contains noticeable grammatical errors or awkward expressions.
Score 4: Good - The response is mostly fluent with minor grammatical issues that don't significantly impact readability.
Score 5: Excellent - The response is perfectly fluent with proper grammar, syntax, and vocabulary usage.
</pre>

The evaluation requires the following input format:

**Query-Response Evaluation**
- Response: The AI's response to be evaluated for fluency (string)

### Initialize Fluency Evaluator

In [None]:
import os
from openai.types.evals.create_eval_jsonl_run_data_source_param import SourceFileContentContent
from pprint import pprint
from agent_utils import run_evaluator

# Get environment variables
deployment_name = os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"]

# Data source configuration (defines the schema for evaluation inputs)
data_source_config = {
    "type": "custom",
    "item_schema": {
        "type": "object",
        "properties": {"query": {"type": "string"}, "response": {"type": "string"}},
        "required": [],
    },
    "include_sample_schema": True,
}

# Data mapping (maps evaluation inputs to evaluator parameters)
data_mapping = {
    "query": "{{item.query}}",
    "response": "{{item.response}}"
}

# Initialization parameters for the evaluator
initialization_parameters = {
    "deployment_name": deployment_name
}

# Initialize the evaluation_contents list - we'll append all test cases here
evaluation_contents = []

### Samples

#### Response as String (str)

In [None]:
response = "The weather in Seattle is currently partly cloudy with a temperature of 15Â°C. The forecast indicates that conditions will remain stable throughout the day, with a gentle breeze from the northwest."

evaluation_contents.append(
    SourceFileContentContent(
        item={
            "query": None,
            "response": response
        }
    )
)

#### Example of Poor Fluency

In [None]:
# Poor fluency example
response = "Email draft attach is. You review and giving feedback must. Important very for project success it being."

# Append to evaluation_contents
evaluation_contents.append(
    SourceFileContentContent(
        item={
            "query": None,
            "response": response
        }
    )
)

### Run Evaluation on All Test Cases

Now that we've defined all test cases, let's run the evaluation once on all of them.

In [None]:
results = run_evaluator(
    evaluator_name="fluency",
    evaluation_contents=evaluation_contents,
    data_source_config=data_source_config,
    initialization_parameters=initialization_parameters,
    data_mapping=data_mapping
)

### Display Results

View the evaluation results for each test case.

In [None]:
pprint(results)