# Document Classification with Azure OpenAI's GPT-4o Vision Capabilities

This sample demonstrates how to classify a document using Azure OpenAI's GPT-4o model with vision capabilities.

![Data Classification](../../../images/classification-openai.png)

This is achieved by the following process:

- Define a list of classifications, with descriptions and keywords.
- Construct a system prompt that defines the instruction for classifying document pages.
- Construct a user prompt that includes the defined classifications, and each document page as an base64 encoded image.
- Use the Azure OpenAI chat completions API with the GPT-4o model to generate a classification for each document page as a structured output.

## Objectives

By the end of this sample, you will have learned how to:

- Convert a document into a set of base64 encoded images for processing by GPT-4o.
- Use prompt engineering techniques to instruct GPT-4o to classify a document's pages into predefined categories.

## Useful Tips

- Combine this technique with a [page extraction](../extraction/README.md) approach to ensure that you extract the most relevant data from a document's pages.

## Setup

### Import modules and reference NuGet packages

This sample uses several .NET libraries including:

- **Azure.Identity** for authentication with Azure services.
- **Azure.AI.OpenAI** to call the OpenAI chat completions API.
- A PDF processing library (to be implemented) for converting PDF pages to images.

The notebook also uses local helper classes for classification models, confidence evaluation, and timing.

In [None]:
using System;
using System.IO;
using System.Collections.Generic;
using System.Text;
using Azure.Identity;
using Azure.AI.OpenAI;
using System.Threading.Tasks;
using System.Text.Json;

// TODO: add using for any PDF conversion library and helper methods (for example, a custom PDF2Image method).

// Assume we have local classes equivalent to the Python ones
public class Classification
{
    public int? PageNumber { get; set; }
    public string ClassificationName { get; set; }
    public double? Similarity { get; set; }
}

public class Classifications
{
    public List<Classification> Items { get; set; } = new List<Classification>();
    
    // Serialize to JSON
    public string ToJson() => JsonSerializer.Serialize(this, new JsonSerializerOptions { WriteIndented = true });
}

// A stub for evaluating accuracy
public class AccuracyEvaluator
{
    public double Evaluate(Classifications expected, Classifications actual)
    {
        // TODO: implement comparison logic
        return 1.0; // stub example returning 100% accuracy
    }
}


### Configure Azure Services and Read Environment Settings

This cell sets the working directory and configures credentials for accessing Azure OpenAI using Azure.Identity. Adjust the settings as needed.

In [None]:
string workingDir = Path.GetFullPath(Path.Combine(Environment.CurrentDirectory, "..", "..", ".."));

// Read settings from .env or configuration file (you can use a NuGet package like DotNetEnv)
// For demonstration, we will use hardcoded settings.
var settings = new {
    OpenAIEndpoint = "https://your-openai-endpoint.azure.com/",
    Gpt4oModelDeploymentName = "gpt-4o-deployment-name"
};

// Configure credentials
var credential = new DefaultAzureCredential(new DefaultAzureCredentialOptions {
    ExcludeInteractiveBrowserCredential = true
    // Other exclusions can be added if needed
});

// Create a token credential and initialize the OpenAI client
var openAiClient = new OpenAIClient(new Uri(settings.OpenAIEndpoint), credential);

Console.WriteLine($"Working directory: {workingDir}");

## Establish the expected output

Define the expected classifications (e.g. for a Vehicle Insurance Policy). Only `PageNumber` and `ClassificationName` are used for accuracy evaluation.

In [None]:
var expected = new Classifications();

for (int i = 1; i <= 13; i++)
{
    // For pages 1-5: Insurance Policy; page 6: Insurance Certificate; pages 7-13: Terms and Conditions
    if (i <= 5) 
        expected.Items.Add(new Classification { PageNumber = i, ClassificationName = "Insurance Policy", Similarity = 1 });
    else if (i == 6)
        expected.Items.Add(new Classification { PageNumber = i, ClassificationName = "Insurance Certificate", Similarity = 1 });
    else
        expected.Items.Add(new Classification { PageNumber = i, ClassificationName = "Terms and Conditions", Similarity = 1 });
}

var evaluator = new AccuracyEvaluator();

## Define Classifications

Define a list of classifications with descriptions and keywords. This list will be sent to the model as part of the prompt.

In [None]:
var classifications = new List<dynamic> {
    new {
        classification = "Insurance Policy",
        description = "Information such as coverage, limits, premiums, and terms.",
        keywords = new[] { "welcome letter", "personal details", "vehicle details", "policy details" }
    },
    new {
        classification = "Insurance Certificate",
        description = "Proof of insurance coverage.",
        keywords = new[] { "certificate of vehicle insurance", "effective date", "entitlement" }
    },
    new {
        classification = "Terms and Conditions",
        description = "Rules and obligations in a contract.",
        keywords = new[] { "terms and conditions", "legal statements", "payment instructions" }
    }
};

// Serialize classifications to JSON string to include in the prompt
string classificationsJson = JsonSerializer.Serialize(classifications, new JsonSerializerOptions { WriteIndented = true });
Console.WriteLine("Classifications JSON:\n" + classificationsJson);

## Classify the Document Pages

The following code block outlines the process:

1. Read the PDF file and convert each page into a Base64 encoded PNG image (implementation required).
2. Build a system prompt and user prompt (including the classifications and images).
3. Call the Azure OpenAI chat completions API with the GPT-4o model to obtain classifications.

In [None]:
// Define the system prompt
string systemPrompt = @"Using the classifications provided, classify each page of the following document into one of the classifications.

- If a page contains multiple classifications, choose the most relevant one.
- If a page does not fit any of the classifications, use the classification 'Unclassified'.";

// Build the user prompt by including the classifications JSON
string userTextPrompt = $"Classifications:\n\n{classificationsJson}";

// Placeholder: Read PDF file bytes (adjust the path as needed)
string pdfPath = Path.Combine(workingDir, "samples", "assets", "vehicle_insurance", "policy_1.pdf");
byte[] pdfBytes = File.ReadAllBytes(pdfPath);

// TODO: Implement a method ConvertPdfToImages that returns a List<byte[]> where each byte[] is an image in PNG format
List<byte[]> pageImages = new List<byte[]>();
// For now, assume a stub that returns an empty list or sample images

// Simulate encoding each page to a Base64 string
var userContent = new List<dynamic>();

userContent.Add(new { type = "text", text = userTextPrompt });

foreach (var imageBytes in pageImages)
{
    string base64Data = Convert.ToBase64String(imageBytes);
    userContent.Add(new {
        type = "image_url",
        image_url = new { url = $"data:image/png;base64,{base64Data}" }
    });
}

// Display prompt information for debugging
Console.WriteLine("System Prompt:\n" + systemPrompt);
Console.WriteLine("User Content:\n" + JsonSerializer.Serialize(userContent, new JsonSerializerOptions { WriteIndented = true }));

### Call Azure OpenAI API to Get Classifications

This cell calls the GPT-4o model using the constructed prompts. In practice, you will need to implement structured response handling (for example, using a Pydantic-like model in .NET).

In [None]:
// NOTE: This is a stub implementation. You will need to build the request payload and parse the response accordingly.

async Task<dynamic> ClassifyDocumentAsync()
{
    // Build messages
    var messages = new List<dynamic> {
        new { role = "system", content = systemPrompt },
        new { role = "user", content = userContent }
    };

    // Call the OpenAI chat completions API
    // Replace with actual method invocation and parsing logic
    // For demonstration, we return a stub object
    await Task.Delay(1000); // simulate network call
    return new {
        choices = new[] {
            new {
                message = new { parsed = new Classifications() },
                usage = new { prompt_tokens = 8000, completion_tokens = 200 },
                logprobs = (object)null
            }
        }
    };
}

var classificationResponse = await ClassifyDocumentAsync();

// Retrieve the classifications from the response (stub)
var documentClassifications = (classificationResponse.choices[0].message.parsed as Classifications) ?? new Classifications();

Console.WriteLine("Document Classifications:\n" + documentClassifications.ToJson());

## Evaluate the Results

Calculate the accuracy using the evaluator and (optionally) evaluate the confidence. Also record timing statistics.

In [None]:
// For demonstration, we use a stub elapsed time.
double imageProcessingTime = 1.8; // in seconds
double openAiTime = 24.9; // in seconds
double totalTime = imageProcessingTime + openAiTime;

// Evaluate accuracy (stub - always returns 100%)
double accuracy = evaluator.Evaluate(expected, documentClassifications);

// TODO: Implement evaluation of confidence based on the API's logprobs if available
double confidence = 0.9963; // stub confidence value

Console.WriteLine($"Accuracy: {accuracy * 100:0.00}%");
Console.WriteLine($"Confidence: {confidence * 100:0.00}%");
Console.WriteLine($"Total Execution Time: {totalTime:0.00} seconds");
Console.WriteLine($"Image Processing Time: {imageProcessingTime:0.00} seconds");
Console.WriteLine($"OpenAI Execution Time: {openAiTime:0.00} seconds");
Console.WriteLine($"Prompt Tokens: 8000");
Console.WriteLine($"Completion Tokens: 200");

## Visualize the Outputs

You can enhance this notebook by adding charts or tables using .NET libraries if desired.