In [2]:
!pip3 install -q -U transformers
!pip3 install -q -U torch
!pip3 install -q -U huggingface_hub
!pip3 install -q -U dotenv

In [3]:
from huggingface_hub import login
import os
from dotenv import load_dotenv

load_dotenv()  # looks for .env in current dir
hf_token = os.getenv("HF_TOKEN")
login(token=hf_token)

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


In [9]:
import torch
from transformers import pipeline, AutoProcessor

device = "mps" if torch.backends.mps.is_available() else "cpu"
dtype  = torch.bfloat16  # bf16 on MPS can be flaky

model_id = "google/gemma-3-4b-it"

processor = AutoProcessor.from_pretrained(model_id, use_fast=True)
pipe = pipeline(
    task="image-text-to-text",
    model=model_id,
    processor=processor,          # forces fast processor, no warning
    torch_dtype=dtype,
    device=device                 # works for CPU/"mps"/cuda in recent Transformers
)

Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 10.04it/s]
Device set to use mps


In [10]:
messages = [
    {
        "role": "system",
        "content": [
            {"type": "text", "text": (
                "You are a potential customer looking at Google review images. "
                # "Always respond in two clearly labeled sections:\n"
                # "Answer: <short, factual yes/no/N/A>\n"
                # "Rationale: <explain based on visible features>"
            )}
        ]
    },
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://lh5.googleusercontent.com/p/AF1QipMBzN4BJV9YCObcw_ifNzFPm-u38hO3oimOA8Fb=w150-h150-k-no-p"},
            {"type": "text", "text": (
                "Describe the image in detail."
            )}
        ]
    }
]

output = pipe(text=messages, max_new_tokens=128)
print(output[0]["generated_text"][-1]["content"])

Okay, here's a detailed description of the image I'm seeing:

**Overall Impression:** The photo appears to be taken inside a modern, somewhat minimalist business space – likely a clinic, office, or perhaps a boutique. It has a clean and professional feel.

**Key Elements:**

*   **Ceiling:** The dominant feature is the ceiling. It’s a light gray color with a slightly textured, possibly acoustic, finish. There’s a noticeable slope or angle to it, giving the room a modern, architectural feel. There are recessed lighting fixtures evenly spaced across the ceiling.
*   **Pendant
