# Chat with PDF page images

**If you're looking or the web application, check the src/ folder.** 

This notebook demonstrates how to convert PDF pages to images and send them to a vision model for inference

## Authenticate to OpenAI

The following code connects to OpenAI, either using an Azure OpenAI account, GitHub models, or local Ollama model. See the README for instruction on configuring the `.env` file.

In [1]:
import os

import azure.identity
import openai
from dotenv import load_dotenv

load_dotenv(".env", override=True)

openai_host = os.getenv("OPENAI_HOST", "github")

if openai_host == "github":
    print("Using GitHub Models with GITHUB_TOKEN as key")
    openai_client = openai.OpenAI(
        api_key=os.environ["GITHUB_TOKEN"],
        base_url="https://models.github.ai/inference",
    )
    model_name = os.getenv("OPENAI_MODEL", "openai/gpt-4o")
elif openai_host == "local":
    print("Using local OpenAI-compatible API with no key")
    openai_client = openai.OpenAI(api_key="no-key-required", base_url=os.environ["LOCAL_OPENAI_ENDPOINT"])
    model_name = os.getenv("OPENAI_MODEL", "gpt-4o")
elif openai_host == "azure" and os.getenv("AZURE_OPENAI_KEY_FOR_CHATVISION"):
    # Authenticate using an Azure OpenAI API key
    # This is generally discouraged, but is provided as a convenience
    print("Using Azure OpenAI with key")
    openai_client = openai.OpenAI(
        base_url=os.environ["AZURE_OPENAI_ENDPOINT"] + "/openai/v1/",
        api_key=os.environ["AZURE_OPENAI_KEY_FOR_CHATVISION"],
    )
    # This is actually the deployment name, not the model name
    model_name = os.getenv("OPENAI_MODEL", "gpt-4o")
elif openai_host == "azure" and os.getenv("AZURE_OPENAI_ENDPOINT"):
    tenant_id = os.environ["AZURE_TENANT_ID"]
    print("Using Azure OpenAI with Azure Developer CLI credential for tenant id", tenant_id)
    default_credential = azure.identity.AzureDeveloperCliCredential(tenant_id=tenant_id)
    token_provider = azure.identity.get_bearer_token_provider(
        default_credential, "https://cognitiveservices.azure.com/.default"
    )
    openai_client = openai.OpenAI(
        base_url=os.environ["AZURE_OPENAI_ENDPOINT"] + "/openai/v1/",
        api_key=token_provider,
    )
    # This is actually the deployment name, not the model name
    model_name = os.getenv("OPENAI_MODEL", "gpt-4o")

print(f"Using model {model_name}")

Using GitHub Models with GITHUB_TOKEN as key
Using model openai/gpt-4o


## Convert PDFs to images

In [2]:
%pip install Pillow PyMuPDF

Defaulting to user installation because normal site-packages is not writeable
Collecting Pillow
  Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (9.0 kB)
Collecting PyMuPDF
  Downloading pymupdf-1.26.5-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (3.4 kB)
Downloading pillow-11.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.6/6.6 MB[0m [31m36.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hDownloading pymupdf-1.26.5-cp39-abi3-manylinux_2_28_x86_64.whl (24.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.1/24.1 MB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25hInstalling collected packages: PyMuPDF, Pillow
Successfully installed Pillow-11.3.0 PyMuPDF-1.26.5

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.2[0m
[1m

In [3]:
import pymupdf
from PIL import Image

filename = "plants.pdf"
doc = pymupdf.open(filename)
for i in range(doc.page_count):
    doc = pymupdf.open(filename)
    page = doc.load_page(i)
    pix = page.get_pixmap()
    original_img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
    original_img.save(f"page_{i}.png")

## Send images to vision model

In [4]:
import base64


def open_image_as_base64(filename):
    with open(filename, "rb") as image_file:
        image_data = image_file.read()
    image_base64 = base64.b64encode(image_data).decode("utf-8")
    return f"data:image/png;base64,{image_base64}"

In [5]:
user_content = [{"text": "What plants are listed on these pages?", "type": "text"}]
# Process just the first few pages, as processing all doc.page_count pages is slow
for i in range(3):
    user_content.append({"image_url": {"url": open_image_as_base64(f"page_{i}.png")}, "type": "image_url"})

response = openai_client.chat.completions.create(model=model_name, messages=[{"role": "user", "content": user_content}])

print(response.choices[0].message.content)

The following plants are listed in the document:

### Annuals
1. **Centromadia pungens** - Common tarweed
2. **Epilobium densiflorum** - Dense Spike-primrose
3. **Eschscholzia caespitosa** - Tufted Poppy
4. **Eschscholzia californica** - California poppy
5. **Eschscholzia californica 'Purple Gleam'** - Purple Gleam Poppy
6. **Eschscholzia californica var. maritima** - Coastal California Poppy
7. **Madia elegans** - Tarweed
8. **Mentzelia lindleyi** - Lindley’s Blazing Star
9. **Symphyotrichum subulatum** - Slim marsh aster
10. **Trichostema lanceolatum** - Vinegar weed

### Bulbs
11. **Brodiaea californica** - California brodiaea
12. **Chlorogalum pomeridianum** - Soap plant
13. **Epipactis gigantea** - Stream orchid
14. **Wyethia angustifolia** - Narrowleaf mule ears
15. **Wyethia mollis** - Woolly Mule’s Ear’s

### Grasses
16. **Agrostis pallens** - Thingrass
17. **Anthoxanthum occidentale** - Vanilla grass
18. **Bouteloua gracilis** - Blue grama

### Perennials
19. **Achillea millef