## 📸 True or False Recognition Demo

The purpose of this demo is a minimalist test of GPT-4's workflow in document recognition tasks. The code in this notebook accomplishes the tasks below:

- 🖼️ Submit an image of a document to the GPT-4 model.
- 🧪 True/False classification : Receive a response indicating whether the document appears to be a certain type or contains certain content.
- 📝 Details: Get additional details or insights about the document based on the model's analysis.

Let's get started. 



In [1]:
import os
import base64
import requests
from mimetypes import guess_type
import json
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

## Image Preprocessing

Encode a local image into base64 format and generating a data URL for it. 

In [2]:
# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Function to encode a local image into data URL 
def local_image_to_data_url(image_path):
    # Guess the MIME type of the image based on the file extension
    mime_type, _ = guess_type(image_path)
    if mime_type is None:
        mime_type = 'application/octet-stream'  # Default MIME type if none is found

    # Read and encode the image file
    base64_encoded_data = encode_image(image_path)

    # Construct the data URL
    return f"data:{mime_type};base64,{base64_encoded_data}"

In [3]:
# Path to your image
image_path = "images/dummy-receipt1.png"

# Getting the data path
data_url = local_image_to_data_url(image_path)

## Model Input Payload

Payload structure for submitting an image and a True/False question to the GPT-4 model, along with user role specification.

In [4]:
payload = {
  "model": "gpt-4-turbo",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Is this a receipt of healthcare service? Pls start your response with True or False, then offer your reasoning in 1-2 setences."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": data_url
          }
        }
      ]
    }
  ],
  "max_tokens": 300 # Remember to set a "max_tokens" value, or the return output will be cut off.
}

## Model Request Execution
Send a POST request to the model endpoint with the specified headers and payload.

In [5]:
# Option 1: Use OpenAI API endpoint when AZURE_ENDPOINT is not specified
post_url = os.getenv("AZURE_ENDPOINT", "https://api.openai.com/v1/chat/completions")

# Option 2: AZURE_ENDPOINT, specify in .env
# post_url = f"https://{RESOURCE_NAME}.openai.azure.com/openai/deployments/{DEPLOYMENT_NAME}/extensions/chat/completions?api-version=2023-12-01-preview"
# RESOURCE_NAME is the name of your Azure OpenAI resource
# DEPLOYMENT_NAME is the name of your GPT-4 Turbo with Vision model deployment

In [22]:
headers = {
  "Content-Type": "application/json",
  "Authorization": f"Bearer {api_key}"
}

response = requests.post(post_url, headers=headers, json=payload)


# Extract and Print Information
Extract relevant information from the model response and print it for analysis.

In [23]:
response_data = response.json()

# Format the full response
full_response = json.dumps(response_data, indent=4)

# Extracting true or false from the response
is_healthcare_receipt = response_data['choices'][0]['message']['content'].startswith('True')

# Extracting details from the response
content_split = response_data['choices'][0]['message']['content'].split('. ')
details = content_split[1] if len(content_split) > 1 else ""


# Printing the extracted information
print("Healthcare service receipt:", is_healthcare_receipt)
print("Details:", details)
# print("Full response:", full_response)

Healthcare service receipt: False
Details: This is not a receipt of healthcare service; it appears to be a receipt for automotive parts and service, with items such as "Front and rear brake cables" and "New set of pedal arms" listed, which are typical in vehicle maintenance.
