# Using `attachments` with the OpenAI API

This tutorial demonstrates how to use the `attachments` library to process local or remote files
and prepare their content for use with the OpenAI API, particularly for multimodal models
like GPT-4 with Vision or for text-based analysis.

## 1. Setup and Imports

First, ensure you have the `attachments` and `openai` libraries installed.

```bash
uv pip install attachments openai python-dotenv
```

Now, let's import the necessary modules.

In [4]:
!uv pip install attachments openai python-dotenv

[2mUsing Python 3.11.11 environment at: /home/maxime/Projects/attachments/.venv[0m
[2K[2mResolved [1m76 packages[0m [2min 338ms[0m[0m                                        [0m
[2K[2mInstalled [1m13 packages[0m [2min 85ms[0m[0m                               [0m
 [32m+[39m [1mannotated-types[0m[2m==0.7.0[0m
 [32m+[39m [1manyio[0m[2m==4.9.0[0m
 [32m+[39m [1mdistro[0m[2m==1.9.0[0m
 [32m+[39m [1mh11[0m[2m==0.16.0[0m
 [32m+[39m [1mhttpcore[0m[2m==1.0.9[0m
 [32m+[39m [1mhttpx[0m[2m==0.28.1[0m
 [32m+[39m [1mjiter[0m[2m==0.10.0[0m
 [32m+[39m [1mopenai[0m[2m==1.79.0[0m
 [32m+[39m [1mpydantic[0m[2m==2.11.4[0m
 [32m+[39m [1mpydantic-core[0m[2m==2.33.2[0m
 [32m+[39m [1msniffio[0m[2m==1.3.1[0m
 [32m+[39m [1mtqdm[0m[2m==4.67.1[0m
 [32m+[39m [1mtyping-inspection[0m[2m==0.4.0[0m


In [5]:
# Ensure you have an .env file with your OPENAI_API_KEY or set it as an environment variable
import os
from attachments import Attachments
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv() # Load environment variables from .env file

True

## 2. Initialize Attachments

We'll create an `Attachments` object. You can use URLs or local file paths.
For this example, let's use a publicly available PDF and an image.

In [6]:
# Example using online resources
pdf_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/BremenBotanikaZen.jpg/1280px-BremenBotanikaZen.jpg"

# You can also use local paths, e.g.:
# pdf_local_path = "path/to/your/document.pdf"
# image_local_path = "path/to/your/image.jpg"
# attachments_obj = Attachments(pdf_local_path, image_local_path, verbose=True)

attachments_obj = Attachments(pdf_url, image_url, verbose=True) # Renamed to avoid conflict

Attempting to download content from URL: https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf
URL https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf has Content-Type: application/pdf; qs=0.001
Successfully downloaded URL https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf to temporary file: /tmp/tmpz5ruyfo5.pdf
Cleaned up temporary file: /tmp/tmpz5ruyfo5.pdf
Attempting to download content from URL: https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/BremenBotanikaZen.jpg/1280px-BremenBotanikaZen.jpg
URL https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/BremenBotanikaZen.jpg/1280px-BremenBotanikaZen.jpg has Content-Type: image/jpeg
Successfully downloaded URL https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/BremenBotanikaZen.jpg/1280px-BremenBotanikaZen.jpg to temporary file: /tmp/tmpsvd5ghgy.jpg
Cleaned up temporary file: /tmp/tmpsvd5ghgy.jpg


## 3. Inspecting Attachments

The `Attachments` object processes the files. Its string representation (`str(attachments_obj)`)
provides an XML-like format suitable for LLM prompts. For vision models,
the `.images` property provides base64 encoded image data URLs.

In [7]:
# Get the string representation for text-based analysis or context
llm_context_string = str(attachments_obj)
print("--- LLM Context String (sample) ---")
# Print a sample, as it can be very long
print(llm_context_string[:500] + "..." if len(llm_context_string) > 500 else llm_context_string)

--- LLM Context String (sample) ---
<?xml version="1.0" ?>
<attachments>
  <attachment id="contact_sheet1" type="jpeg" original_path="[auto-generated contact sheet for https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf]">
    <content/>
  </attachment>
  <attachment id="pdf1" type="pdf" original_path="https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf">
    <content>Dummy PDF file

</content>
  </attachment>
  <attachment id="jpeg2" type="jpeg" original_path="https://upload.wikimedia.org/w...


In [None]:
# Access image data for vision models
# .images will contain a list of data URLs (e.g., "data:image/jpeg;base64,...")
if attachments_obj.images:
    print(f"\n--- Found {len(attachments_obj.images)} image(s) ---")
    # print("First image data URL (sample):", attachments_obj.images[0][:100] + "...") # Print a sample of the data URL
else:
    print("\n--- No images found or processed ---")

## 4. Preparing Content for OpenAI API

Let's construct a message for the OpenAI API. We'll demonstrate a multimodal example
using GPT-4o (or another vision-capable model).

In [8]:
client = OpenAI() # Assumes OPENAI_API_KEY is set in your environment via .env or system variable

### 4.1. Multimodal Prompt (Text and Images)

We'll combine the textual context from `str(attachments_obj)` with any images found.

In [9]:
# Prepare the content list for the OpenAI API
openai_messages_content = []

# Add text part: a general instruction and the context from attachments_obj
prompt_text = f'''
Analyze the following documents and images. Provide a brief summary of the PDF content
and describe the image.

Document context:
{llm_context_string}
'''
openai_messages_content.append({"type": "text", "text": prompt_text})

# Add image parts
for image_data_url in attachments_obj.images:
    # OpenAI API expects image_url with "data:image/jpeg;base64,..." format for base64 encoded images
    openai_messages_content.append({
        "type": "image_url",
        "image_url": {
            "url": image_data_url,
            "detail": "low" # Use "high" for more detail, "low" for faster processing
        }
    })

### 4.2. Making the API Call (Example)

Now, let's construct the full message and show how you would make the API call.
**Note:** Running this cell will make an API call to OpenAI if your API key is configured.

In [10]:
if not os.getenv("OPENAI_API_KEY"):
    print("OPENAI_API_KEY not found in environment variables. Skipping API call.")
    print("Please create a .env file with OPENAI_API_KEY='your_key_here' or set it as an environment variable.")
else:
    print("Attempting to call OpenAI API (multimodal)...")
    try:
        response = client.chat.completions.create(
            model="gpt-4o", # Or your preferred vision-capable model like "gpt-4-turbo"
            messages=[
                {
                    "role": "user",
                    "content": openai_messages_content
                }
            ],
            max_tokens=500
        )
        print("\n--- OpenAI API Response (Multimodal) ---")
        print(response.choices[0].message.content)
    except Exception as e:
        print(f"Error calling OpenAI API (multimodal): {e}")

Attempting to call OpenAI API (multimodal)...

--- OpenAI API Response (Multimodal) ---
## Summary of PDF Content

The PDF document labeled as a "Dummy PDF file" appears to be a placeholder or sample PDF with minimal content. It is likely used for testing purposes rather than containing substantial information or data.

## Description of the Image

The image depicts a Zen garden, which features neatly raked gravel or sand creating a pattern of parallel lines. This style is typical of traditional Japanese dry landscape gardens, emphasizing simplicity and minimalism. It evokes a sense of tranquility and meditation.


## 5. Text-Only Analysis

If you are using a text-only model (e.g., `gpt-3.5-turbo`), you would only pass the `llm_context_string`.

In [11]:
# Example for a text-only model
text_only_prompt = f'''
Based on the following document content, please answer specific questions or perform tasks.
For example, what is the main subject of the PDF?

Document context:
{llm_context_string}
'''

if not os.getenv("OPENAI_API_KEY"):
    print("OPENAI_API_KEY not found. Skipping text-only API call.")
else:
    print("\nAttempting text-only OpenAI API call...")
    try:
        response_text_only = client.chat.completions.create(
            model="gpt-3.5-turbo", # Or your preferred text model
            messages=[
                {
                    "role": "user",
                    "content": text_only_prompt
                }
            ],
            max_tokens=300
        )
        print("\n--- OpenAI Text-Only API Response ---")
        print(response_text_only.choices[0].message.content)
    except Exception as e:
        print(f"Error calling OpenAI API (text-only): {e}")


Attempting text-only OpenAI API call...

--- OpenAI Text-Only API Response ---
The main subject of the PDF is a "Dummy PDF file" as mentioned in the content of the attachment with id "pdf1".


## Conclusion

This tutorial showed how to use the `attachments` library to load files/URLs,
extract their content into formats suitable for LLMs, and construct prompts
for the OpenAI API for both multimodal and text-only analysis.

Remember to handle your API keys securely (e.g., using a `.env` file and `python-dotenv`)
and manage costs associated with API calls. 