# Classification - Azure OpenAI GPT-4o with Vision

This sample demonstrates how to classify a document using Azure OpenAI's GPT-4o model with vision capabilities.

![Data Classification](../../images/classification-openai.png)

This is achieved by the following process:

- Define a list of classifications, with descriptions and keywords.
- Construct a system prompt that defines the instruction for classifying document pages.
- Construct a user prompt that includes the defined classifications, and each document page as an base64 encoded image.
- Use the Azure OpenAI chat completions API with the GPT-4o model to generate a classification for each document page as a structured output.

## Objectives

By the end of this sample, you will have learned how to:

- Convert a document into a set of base64 encoded images for processing by GPT-4o.
- Use prompt engineering techniques to instruct GPT-4o to classify a document's pages into predefined categories.

## Setup

### Import modules

This sample takes advantage of the following Python dependencies:

- **pdf2image** for converting a PDF file into a set of images per page.
- **openai** to interface with the Azure OpenAI chat completions API to generate structured classification outputs using the GPT-4o model.
- **azure-identity** to securely authenticate with deployed Azure Services using Microsoft Entra ID credentials.

The following local modules are also used:

- **modules.app_settings** to access environment variables from the `.env` file.
- **modules.classification** to define the classifications.
- **modules.comparison** to compare the output of the classification process with expected results.
- **modules.document_processing_result** to store the results of the classification process as a file.
- **modules.openai_confidence** to calculate the confidence of the classification process based on the `logprobs` response from the API request.
- **modules.utils** `Stopwatch` to measure the end-to-end execution time for the classification process.

In [1]:
import sys
sys.path.append('../') # Import local modules

from IPython.display import display
import os
import pandas as pd
from dotenv import dotenv_values
from pdf2image import convert_from_bytes
import base64
import io
import json
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
import json

from modules.app_settings import AppSettings
from modules.utils import Stopwatch
from modules.accuracy_evaluator import AccuracyEvaluator
from modules.comparison import get_classification_comparison
from modules.classification import Classifications, Classification
from modules.openai_confidence import evaluate_confidence
from modules.document_processing_result import DataClassificationResult

### Configure the Azure services

To use Azure OpenAI, the SDK is used to create a client instance using a deployed endpoint and authentication credentials.

For this sample, the credentials of the Azure CLI are used to authenticate with the deployed services.

In [2]:
# Set the working directory to the root of the repo
working_dir = os.path.abspath('../../')
settings = AppSettings(dotenv_values(f"{working_dir}/.env"))

# Configure the default credential for accessing Azure services using Azure CLI credentials
credential = DefaultAzureCredential(
    exclude_workload_identity_credential=True,
    exclude_developer_cli_credential=True,
    exclude_environment_credential=True,
    exclude_managed_identity_credential=True,
    exclude_powershell_credential=True,
    exclude_shared_token_cache_credential=True,
    exclude_interactive_browser_credential=True
)

openai_token_provider = get_bearer_token_provider(credential, 'https://cognitiveservices.azure.com/.default')

openai_client = AzureOpenAI(
    azure_endpoint=settings.openai_endpoint,
    azure_ad_token_provider=openai_token_provider,
    api_version="2024-12-01-preview" # Requires the latest API version for structured outputs.
)

### Establish the expected output

To compare the accuracy of the classification process, the expected output of the classification process has been defined in the following code block based on each page of a [Vehicle Insurance Policy](../assets/vehicle_insurance/policy_1.pdf).

The expected output has been defined by a human evaluating the document.

> **Note**: Only the `page_number` and `classification` are used in the accuracy evaluation.

In [3]:
path = f"{working_dir}/samples/assets/vehicle_insurance/"
pdf_fname = "policy_1.pdf"
pdf_fpath = f"{path}{pdf_fname}"

expected = Classifications(classifications=[
    Classification(page_number=1, classification="Insurance Policy", similarity=1),
    Classification(page_number=2, classification="Insurance Policy", similarity=1),
    Classification(page_number=3, classification="Insurance Policy", similarity=1),
    Classification(page_number=4, classification="Insurance Policy", similarity=1),
    Classification(page_number=5, classification="Insurance Policy", similarity=1),
    Classification(page_number=6, classification="Insurance Certificate", similarity=1),
    Classification(page_number=7, classification="Terms and Conditions", similarity=1),
    Classification(page_number=8, classification="Terms and Conditions", similarity=1),
    Classification(page_number=9, classification="Terms and Conditions", similarity=1),
    Classification(page_number=10, classification="Terms and Conditions", similarity=1),
    Classification(page_number=11, classification="Terms and Conditions", similarity=1),
    Classification(page_number=12, classification="Terms and Conditions", similarity=1),
    Classification(page_number=13, classification="Terms and Conditions", similarity=1)
])

classification_evaluator = AccuracyEvaluator(match_keys=["page_number"], ignore_keys=["similarity"])

## Define classifications

The following code block defines the classifications for a document. Each classification has a name, description, and keywords that will be used to classify the document's pages.

> **Note**, the classifications have been defined based on expected content in a specific type of document, in this example, [a Vehicle Insurance Policy](../assets/vehicle_insurance/policy_1.pdf).

In [4]:
classifications = [
    {
        "classification": "Insurance Policy",
        "description": "Specific information related to an insurance policy, such as coverage, limits, premiums, and terms, often used for reference or clarification purposes.",
        "keywords": [
            "welcome letter",
            "personal details",
            "vehicle details",
            "insured driver details",
            "policy details",
            "incident/conviction history",
            "schedule of insurance",
            "vehicle damage excesses"
        ]
    },
    {
        "classification": "Insurance Certificate",
        "description": "A document that serves as proof of insurance coverage, often required for legal, regulatory, or contractual purposes.",
        "keywords": [
            "certificate of vehicle insurance",
            "effective date of insurance",
            "entitlement to drive",
            "limitations of use"
        ]
    },
    {
        "classification": "Terms and Conditions",
        "description": "The rules, requirements, or obligations that govern an agreement or contract, often related to insurance policies, financial products, or legal documents.",
        "keywords": [
            "terms and conditions",
            "legal statements",
            "payment instructions",
            "legal obligations",
            "covered for",
            "claim settlement",
            "costs to pay",
            "legal responsibility",
            "personal accident coverage",
            "medical expense coverage",
            "personal liability coverage",
            "windscreen damage coverage",
            "uninsured motorist protection",
            "renewal instructions",
            "cancellation instructions"
        ]
    }
]

## Classify the document pages

The following code block runs the classification process using Azure OpenAI's GPT-4o model using vision capabilities.

It performs the following steps:

1. Get the document bytes from the provided file path. _Note: In this example, we are processing a local document, however, you can use any document storage location of your choice, such as Azure Blob Storage._
2. Use pdf2image to convert the document to a list of images per page as base64 strings.
3. Use Azure OpenAI's GPT-4o model and the classification definitions to provide a classification for each page of the document.

In [5]:
# Prepare the user content for the OpenAI API including the classifications and the document page images.
user_content = []
user_content.append({
    "type": "text",
    "text": f"""Classifications:
    
    {json.dumps(classifications)}
    """
})

In [6]:
with Stopwatch() as image_stopwatch:
    document_bytes = open(pdf_fpath, "rb").read()
    pages = convert_from_bytes(document_bytes)
    for page in pages:
        byteIO = io.BytesIO()
        page.save(byteIO, format='PNG')
        base64_data = base64.b64encode(byteIO.getvalue()).decode('utf-8')
        
        user_content.append({
            "type": "image_url",
            "image_url": {
                "url": f"data:image/png;base64,{base64_data}"
            }
        })

In [7]:
with Stopwatch() as oai_stopwatch:
    completion = openai_client.beta.chat.completions.parse(
        model=settings.gpt4o_model_deployment_name,
        messages=[
            {
                "role": "system",
                "content": f"""Using the classifications provided, classify each page of the following document into one of the classifications. 
                - If a page contains multiple classifications, choose the most relevant one. 
                - If a page does not fit any of the classifications, use the classification 'Unclassified'.""",
            },
            {
                "role": "user",
                "content": user_content
            }
        ],
        response_format=Classifications,
        max_tokens=4096,
        temperature=0.1,
        top_p=0.1,
        logprobs=True # Enabled to determine the confidence of the response.
    )

### Understanding the Structured Outputs JSON schema

Using [Pydantic's JSON schema feature](https://docs.pydantic.dev/latest/concepts/json_schema/), the [Classification](../modules/classification.py) data model is automatically converted to a JSON schema when applied to the `response_format` parameter of the OpenAI chat completions request.

The JSON schema is used to instruct the GPT-4o model to generate a strict output that adheres to the structure defined. The approach using Pydantic makes it easier for developers to manage the data structure in code, with helpful descriptions and examples that will be included in the final JSON schema.

Demonstrated below, you can see how the Classification data model is understood by the OpenAI request:

In [8]:
# Highlight the schema sent to the OpenAI model
print(json.dumps(Classifications.model_json_schema(), indent=2))

{
  "$defs": {
    "Classification": {
      "description": "A class representing a classification of a page.\n\nAttributes:\n    page_number: The page number of the classification.\n    classification: The classification of the page.\n    similarity: The similarity of the classification from 0 to 100.",
      "properties": {
        "page_number": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ],
          "description": "The page number of the classification.",
          "title": "Page Number"
        },
        "classification": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "description": "The classification of the page.",
          "title": "Classification"
        },
        "similarity": {
          "anyOf": [
            {
              "type": "number"
        

## Calculate the accuracy

The following code block calculates the accuracy of the classification process by comparing the actual classifications with the predicted classifications.

In [9]:
# Gets the parsed Classifications object from the completion response.
document_classifications = completion.choices[0].message.parsed

expected_dict = expected.to_dict()
classifications_dict = document_classifications.to_dict()

accuracy = classification_evaluator.evaluate(expected=expected_dict, actual=classifications_dict)

## Visualize the outputs

To provide context for the execution of the code, the following code blocks visualize the outputs of the classification process.

This includes:

- The accuracy of the classification process comparing the expected output with the output generated by Azure OpenAI's GPT-4o model.
- The confidence score of the classification process based on the log probability of the predicted classification.
- The execution time of the end-to-end process.
- The total number of tokens consumed by the GPT-4o model.
- The classification results for each page of the document.

### Understanding Accuracy vs Confidence

When using AI to classify data, both confidence and accuracy are essential for different but complementary reasons.

- **Accuracy** measures how close the AI model's output is to a ground truth or expected output. It reflects how well the model's predictions align with reality.
  - Accuracy ensures consistency in the classification process, which is crucial for downstream tasks using the data.
- **Confidence** represents the AI model's internal assessment of how certain it is about its predictions.
  - Confidence indicates that the model is certain about its predictions, which can be a useful indicator for human reviewers to step in for manual verification.

High accuracy and high confidence are ideal, but in practice, there is often a trade-off between the two. While accuracy cannot always be self-assessed, confidence scores can and should be used to prioritize manual verification of low-confidence predictions.

In [10]:
# Determines the confidence of the classifications using the log probabilities of the completion response.
confidence = evaluate_confidence(classifications_dict, completion.choices[0])

In [11]:
# Gets the total execution time of the classification process.
total_elapsed = image_stopwatch.elapsed + oai_stopwatch.elapsed

# Gets the prompt tokens and completion tokens from the completion response.
prompt_tokens = completion.usage.prompt_tokens
completion_tokens = completion.usage.completion_tokens

In [12]:
# Save the output of the data classification result.
classification_result = DataClassificationResult(
    classification=document_classifications.to_dict(),
    accuracy=accuracy,
    execution_time=total_elapsed
)

with open(f"{working_dir}/samples/classification/openai.{pdf_fname}.json", "w") as f:
    f.write(classification_result.to_json(indent=4))

In [13]:
# Display the outputs of the classification process.
df = pd.DataFrame([
    {
        "Accuracy": f"{accuracy['overall'] * 100:.2f}%",
        "Confidence": f"{confidence['_overall'] * 100:.2f}%",
        "Execution Time": f"{total_elapsed:.2f} seconds",
        "Image Pre-processing Execution Time": f"{image_stopwatch.elapsed:.2f} seconds",
        "OpenAI Execution Time": f"{oai_stopwatch.elapsed:.2f} seconds",
        "Prompt Tokens": prompt_tokens,
        "Completion Tokens": completion_tokens
    }
])

display(df)
display(get_classification_comparison(expected, document_classifications, confidence))

Unnamed: 0,Accuracy,Confidence,Execution Time,Image Pre-processing Execution Time,OpenAI Execution Time,Prompt Tokens,Completion Tokens
0,100.00%,99.63%,24.59 seconds,4.50 seconds,20.10 seconds,8723,208


Page,Expected,Extracted,Similarity,Confidence
1,Insurance Policy,Insurance Policy,95,0.999868
2,Insurance Policy,Insurance Policy,95,0.99998
3,Insurance Policy,Insurance Policy,95,0.99993
4,Insurance Policy,Insurance Policy,95,0.99941
5,Insurance Policy,Insurance Policy,95,0.99991
6,Insurance Certificate,Insurance Certificate,90,0.999056
7,Terms and Conditions,Terms and Conditions,90,0.997413
8,Terms and Conditions,Terms and Conditions,90,0.999412
9,Terms and Conditions,Terms and Conditions,90,0.999935
10,Terms and Conditions,Terms and Conditions,90,0.999247
