<table align="left"><td><a target="_blank" href="https://beam.apache.org/documentation/io/built-in/webapis/"><img src="https://beam.apache.org/images/logos/full-color/name-bottom/beam-logo-full-color-name-bottom-100.png" width="32" height="32" />View the docs</a></td></table>

`ImageRequest` is the custom request we provide the `HttpImageClient` to invoke the HTTP call
that acquires the image.

`ImageResponse` is the custom response we return from the `HttpImageClient` that contains the image data
as a result of calling the remote server with the image URL.


In [None]:
from collections import namedtuple

class ImageRequest:
    image_url_to_mime_type = {
        "jpg": "image/jpeg",
        "jpeg": "image/jpeg",
        "png": "image/png",
    }

    def __init__(self, image_url):
        self.image_url = image_url
        self.mime_type = self.image_url_to_mime_type.get(image_url.split(".")[-1])

ImageResponse = namedtuple("ImageResponse", ["mime_type", "data"])    

#### Define Caller

We implement the `Caller`, the `HttpImageClient`, that receives an `ImageRequest` and returns an `ImageResponse`.

_For demo purposes, the example uses a tuple to preserve the raw URL in the returned `ImageResponse`._

I/O errors are retried by the PTransform if the Caller is raising certain errors.  
Prior to raising an exception, the transform performs a retry **for certain errors**
using a prescribed exponential backoff. Your `Caller` must raise specific errors, to signal the transform
to perform the retry with backoff. 

`RequestResponseIO` will attempt a retry with backoff when `Caller` raises:
* UserCodeQuotaException
* UserCodeTimeoutException

After a threshold number of retries, the error is re-raised.




In [None]:
import requests
from apache_beam.io.requestresponse import (
    Caller,
    UserCodeExecutionException,
    UserCodeQuotaException,
    UserCodeTimeoutException,
)


class HttpImageClient(Caller):
    STATUS_TOO_MANY_REQUESTS = 429
    STATUS_TIMEOUT = 408

    def __call__(self, kv):
        url, request = kv
        try:
            response = requests.get(request.image_url)
        except requests.exceptions.Timeout as e:
            raise UserCodeTimeoutException() from e
        except requests.exceptions.HTTPError as e:
            raise UserCodeExecutionException() from e

        if response.status_code >= 500:
            raise UserCodeExecutionException()

        if response.status_code >= 400:
            match response.status_code:
                case self.STATUS_TOO_MANY_REQUESTS:
                    raise UserCodeQuotaException()
                case self.STATUS_TIMEOUT:
                    raise UserCodeTimeoutException()
                case _:
                    raise UserCodeExecutionException()

        return url, ImageResponse(request.mime_type, response.content)

In [None]:
images = [
    "https://storage.googleapis.com/generativeai-downloads/images/cake.jpg",
    "https://storage.googleapis.com/generativeai-downloads/images/chocolate.png",
    "https://storage.googleapis.com/generativeai-downloads/images/croissant.jpg",
    "https://storage.googleapis.com/generativeai-downloads/images/dog_form.jpg",
    "https://storage.googleapis.com/generativeai-downloads/images/factory.png",
    "https://storage.googleapis.com/generativeai-downloads/images/scones.jpg",
]

In [None]:
import apache_beam as beam
from apache_beam.io.requestresponse import (
    RequestResponseIO,
)
from apache_beam.options.pipeline_options import PipelineOptions


def build_image_request(image_url):
    return image_url, ImageRequest(image_url)

with beam.Pipeline(options=PipelineOptions(pickle_library="cloudpickle")) as pipeline:
    _ = (
        pipeline
        | "Create data" >> beam.Create(images)
        | "Map to ImageRequest" >> beam.Map(build_image_request)
        | "Download image" >> RequestResponseIO(HttpImageClient())
        | "Print results"
        >> beam.MapTuple(
            lambda url, response: print(
                f"{url}, mimeType={response.mime_type}, size={len(response.data)}"
            )
        )
    )


The last example demonstrated invoking HTTP requests directly. However, there are some API services that provide
client code that one should use within the `Caller` implementation. Using client code within Beam presents
unique challenges, namely serialization. Additionally, some client code requires explicit handling in terms of
setup and teardown

`RequestResponseIO` can handle such setup and teardown scenarios by overwriting context manager dunder methods 
\_\_enter\_\_ and \_\_exit\_\_ on the Caller.


In [None]:
import os

from google import genai
from google.genai import types
from google.genai.errors import APIError

API_KEY = "<your api key>"

class GeminiAIClient(Caller):
    MODEL_GEMINI_FLASH_LITE = "gemini-2.0-flash-lite"

    def __init__(self, api_key):
        self.api_key = api_key

    def __enter__(self):
        self.client = genai.Client(api_key=self.api_key)
        return self

    def __call__(self, kv):
        url, request = kv
        try:
            response = self.client.models.generate_content(
                model=self.MODEL_GEMINI_FLASH_LITE,
                contents=[
                    types.Part.from_bytes(
                        data=request.data,
                        mime_type=request.mime_type,
                    ),
                    "Caption this image.",
                ],
            )
        except APIError as e:
            raise UserCodeExecutionException() from e

        return url, response

In [None]:
with beam.Pipeline(options=PipelineOptions(pickle_library="cloudpickle")) as pipeline:
    _ = (
        pipeline
        | "Create data" >> beam.Create(images)
        | "Map to ImageRequest" >> beam.Map(build_image_request)
        | "Download image" >> RequestResponseIO(HttpImageClient())
        | "Gemini AI" >> RequestResponseIO(GeminiAIClient(API_KEY))
        | "Print results"
        >> beam.MapTuple(lambda url, response: print(url, response.text))
    )
