# Classification - Azure OpenAI GPT

This sample demonstrates how to use Azure OpenAI's GPT-4o model using vision capabilities to analyze each page of a document to classify it into one of a defined set of categories.

## Objectives

By the end of this sample, you will have learned how to:

- Use GPT-4o's vision capabilities to categorize a document into a set of predefined categories using prompt-based classification.

## Setup

In [1]:
import sys
sys.path.append('../')

from IPython.display import display, Markdown

import os
from dotenv import dotenv_values
from pdf2image import convert_from_bytes
import base64
import io
import json
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from modules.app_settings import AppSettings
from modules.classification import Classifications
from modules.stopwatch import Stopwatch

In [2]:
# Set the working directory to the root of the repo
working_dir = os.path.abspath('../../')
settings = AppSettings(dotenv_values(f"{working_dir}/.env"))

# Configure the default credential for accessing Azure services using Azure CLI credentials
credential = DefaultAzureCredential(
    exclude_workload_identity_credential=True,
    exclude_developer_cli_credential=True,
    exclude_environment_credential=True,
    exclude_managed_identity_credential=True,
    exclude_powershell_credential=True,
    exclude_shared_token_cache_credential=True,
    exclude_interactive_browser_credential=True
)

openai_token_provider = get_bearer_token_provider(credential, 'https://cognitiveservices.azure.com/.default')

openai_client = AzureOpenAI(
    azure_endpoint=settings.openai_endpoint,
    azure_ad_token_provider=openai_token_provider,
    api_version="2024-08-01-preview"
)

## Establish the classifications

The following code block contains the classification definitions for a document. The classifications have been defined based on expected content in a specific type of document, in this example, insurance documents.

In [3]:
pdf_path = f"{working_dir}/samples/assets/"
pdf_file_name = "VehicleInsurancePolicy.pdf"

classifications = [
    {
        "classification": "Correspondence",
        "description": "A communication exchanged between individuals, organizations, or parties, typically in written or electronic form, often used for record-keeping or official purposes.",
        "keywords": [
            "letter",
            "communication",
            "email",
            "fax",
            "letterhead",
            "memorandum",
        ]
    },
    {
        "classification": "Contact Information",
        "description": "Personal or organizational details that can be used to contact or identify individuals or entities, often used for communication or reference purposes.",
        "keywords": [
            "policyholder",
            "email address",
            "phone number",
            "address",
            "contact person",
            "emergency contact",
        ]
    },
    {
        "classification": "Policy Details",
        "description": "Specific terms, conditions, or provisions of an agreement or contract, often related to insurance policies, financial products, or legal documents.",
        "keywords": [
            "cover type",
            "policy number",
            "history",
            "schedule",
            "effective date",
            "excess",
        ]
    },
    {
        "classification": "Insurance Certificate",
        "description": "A document that serves as proof of insurance coverage, often required for legal, regulatory, or contractual purposes.",
        "keywords": [
            "certificate",
            "proof",
            "coverage",
            "liability",
            "endorsement",
            "declaration",
        ]
    },
    {
        "classification": "Terms and Conditions",
        "description": "The rules, requirements, or obligations that govern an agreement or contract, often related to insurance policies, financial products, or legal documents.",
        "keywords": [
            "terms",
            "conditions",
            "rules",
            "requirements",
            "obligations",
            "agreement",
            "responsibilities",
            "payment",
            "renewal",
            "cancellation",
        ]
    }
]

## Classify document pages

The following code block executes the classification process using Azure OpenAI's GPT-4o model using vision capabilities.

It performs the following steps:

1. Get the document bytes from the provided file path. _Note: In this example, we are processing a local document, however, you can use any document storage location of your choice, such as Azure Blob Storage._
2. Use py2pdf to convert the document to a list of images per page as base64 strings.
3. Use Azure OpenAI's GPT-4o model and the classification definitions to provide a classification for each page of the document.

In [4]:
fname = f"{pdf_path}{pdf_file_name}"

stopwatch = Stopwatch()
stopwatch.start()

user_content = []
user_content.append({
    "type": "text",
    "text": f"""Classifications:
    
    {json.dumps(classifications)}
    """
})

document_bytes = open(fname, "rb").read()

pages = convert_from_bytes(document_bytes)
for page in pages:
    byteIO = io.BytesIO()
    page.save(byteIO, format='PNG')
    base64_data = base64.b64encode(byteIO.getvalue()).decode('utf-8')
    
    user_content.append({
        "type": "image_url",
        "image_url": {
            "url": f"data:image/png;base64,{base64_data}"
        }
    })
    
completion = openai_client.beta.chat.completions.parse(
    model=settings.gpt4o_model_deployment_name,
    messages=[
        {
            "role": "system",
            "content": f"""Using the classifications provided, classify each page of the following document into one of the classifications. 
            - If a page contains multiple classifications, choose the most relevant one. 
            - If a page does not fit any of the classifications, use the classification 'Unclassified'.""",
        },
        {
            "role": "user",
            "content": user_content
        }
    ],
    response_format=Classifications,
    max_tokens=4096,
    temperature=0.1,
    top_p=0.1
)

stopwatch.stop()

## Visualize the outputs

To provide context for the execution of the code, the following code blocks visualize the outputs of the classification process.

This includes:

- The execution time of the end-to-end process.
- The total number of tokens consumed by the GPT-4o model.
- The classification results for each page of the document.

In [None]:
# Gets the parsed Classifications object from the completion response.
document_classifications = completion.choices[0].message.parsed

# Gets the prompt tokens and completion tokens from the completion response.
prompt_tokens = completion.usage.prompt_tokens
completion_tokens = completion.usage.completion_tokens

# Display the outputs of the classification process.
print(f"Execution time: {stopwatch.elapsed:.2f} seconds")
print(f"Prompt tokens: {prompt_tokens}")
print(f"Completion tokens: {completion_tokens}")

display(Markdown(f"### Document Classifications:"))
for page in document_classifications.classifications:
    display(Markdown(f"#### Page {page.page_number}"))
    display(Markdown(f"**Classification:** {page.classification}"))