## 1. Import Necessary Libraries

In [1]:
from transformers import AutoProcessor, Gemma3nForConditionalGeneration
from PIL import Image
import torch
import os
from tqdm.auto import tqdm
import pandas as pd

## 2. Load the Model and Processor

This block handles the setup of the core components: the pre-trained model and its associated processor.

* **`model_id`**: We specify the identifier for the model we want to use from the Hugging Face Hub, which in this case is `"google/gemma-3n-e2b-it"`.
* **Model Loading**: The `Gemma3nForConditionalGeneration.from_pretrained()` method downloads and initializes the model.
    * `torch_dtype=torch.bfloat16`: This loads the model's weights using `bfloat16` precision. It's a memory-efficient format that can speed up computation on compatible GPUs.
* **Processor Loading**: `AutoProcessor.from_pretrained()` automatically fetches the correct processor for the specified model. The processor is responsible for converting our raw inputs (like text and images) into the numerical tensors the model requires.

In [2]:
# Define the identifier for the pre-trained model on the Hugging Face Hub.
model_id = "google/gemma-3n-e2b-it"
# Load the pre-trained model from the specified ID.
model = Gemma3nForConditionalGeneration.from_pretrained(model_id, torch_dtype=torch.bfloat16,).cuda().eval()
# Load the appropriate processor associated with the model.
processor = AutoProcessor.from_pretrained(model_id)


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

## 3. Automated Image Processing and Inference

This part automates the process of running inference on a directory of images. It iterates through each image and asks a series of predefined questions, then stores the model's responses.

### 3.1. Initialization and Setup
First, we initialize the necessary variables:
* **`results_df` & `data`**: A pandas DataFrame and a list are created to store the final results. We'll append results to the `data` list first for efficiency, then convert it to a DataFrame later.
* **File Paths**: The script defines the path to the folder containing the images and lists all files within it.
* **Prompt Generation**: A list of `prompts` is dynamically created. It starts with one general question ("What do you see on the table?") and then programmatically adds specific questions for each item in the `objects` list to test the model's object recognition capabilities.

### 3.2. Main Inference Loop
The code uses a nested loop structure:
1.  **Outer Loop**: Iterates through every `file` in the image folder, wrapped in `tqdm` to show a progress bar.
2.  **Inner Loop**: For each image, it iterates through every `prompt` in our generated list.

### 3.3. Preparing Model Inputs
Inside the loop, for each image-prompt pair, we construct the input for the model in a specific conversational format:
* **`messages`**: A list of dictionaries that defines the conversation history.
    * **`role: "system"`**: This sets the model's persona and gives it high-level instructions, like ignoring the mechanical hand in the images.
    * **`role: "user"`**: This contains the user's input, which is multi-modal (containing both an image and text). The image is opened with PIL, and the corresponding text prompt is included.

### 3.4. Generating and Decoding the Response
* **`processor.apply_chat_template`**: This function takes the `messages` list and converts it into the numerical tensor format that the model can understand.
* **`model.generate`**: This is the core inference step where the model generates a response based on the input. We use `do_sample=False` for deterministic (non-random) output.
* **Decoding**: The generated output (which is a sequence of token IDs) is sliced to isolate only the new tokens and then decoded back into a human-readable string using `processor.decode`.

### 3.5. Storing Results
The `filename`, `prompt`, the model's `response`, and an empty `Ground Truth` column are stored in a dictionary and appended to the `data` list for later analysis.

In [3]:
# Create a DataFrame to store the results
results_df = pd.DataFrame(columns=["filename", "prompt", "response","Ground Truth"])
# Initialize an empty list to temporarily hold the results before adding them to the DataFrame.
data=[]
# Define the path to the folder containing the images to be processed.
folder_path = "Images"
files = os.listdir(folder_path)

# Define a list of objects to specifically ask the model about.
objects=["cube","controller","tape","camera","keyboard","scissors","glue","screwdriver","pencil","tablet bottle","Caliper","glasses"]

# Initialize a list of prompts with a general question.
prompts = ["What do you see on the table?"]
# Dynamically create and add specific "Yes/No" questions for each object.
for obj in objects:
    prompts.append(f"I can’t find {obj} do you see it? Answer with Yes or No.")

for file in tqdm(files):
    prompt = f"What do you see on the table?"
    for prompt in prompts:
        print(f"Processing {file} with prompt: {prompt}")

        # Construct the message payload for the model in a conversational format.
        messages = [
            {
                "role": "system",
                "content": [{"type": "text", "text": "You are a helpful assistant for a user operating a mechanical hand. Answer questions clearly and provide relevant information based on user needs. If a mechanical hand is visible in the image, ignore it and focus on describing or analyzing other parts of the image as requested."}]
            },
            {
                "role": "user",
                "content": [
                    {"type": "image", "image": Image.open(folder_path+"/"+file).convert("RGB")},
                    {"type": "text", "text": prompt}
                ]
            }
        ]
        # Use the processor to apply the chat template, tokenize, and convert to PyTorch tensors.
        inputs = processor.apply_chat_template(
            messages,
            add_generation_prompt=True, 
            tokenize=True,
            return_dict=True,
            return_tensors="pt",
        ).to(model.device, dtype=torch.bfloat16)

        # Get the length of the input tokens to later separate it from the generated output.
        input_len = inputs["input_ids"].shape[-1]

        with torch.inference_mode():
            # Generate a response from the model.
            generation = model.generate(**inputs, max_new_tokens=256, do_sample=False)
            # Slice the output tensor to get only the newly generated tokens.
            generation = generation[0][input_len:]
        # Decode the generated token IDs back into a human-readable string.
        decoded = processor.decode(generation, skip_special_tokens=True)

        # Append the results of this run to our data list.
        data.append({"filename": file.split('.')[0], "prompt": prompt, "response": decoded,"Ground Truth": ""})
        print(decoded)

  0%|          | 0/22 [00:00<?, ?it/s]

Processing 00.png with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see a light-colored wooden surface with a blue and white checkered pattern underneath. There is a small, green cube on the table, and a black rectangular container with some contents inside it.
Processing 00.png with prompt: I can’t find cube do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing 00.png with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing 00.png with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing 00.png with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing 00.png with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing 00.png with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing 00.png with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing 00.png with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing 00.png with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing 00.png with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing 00.png with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing 00.png with prompt: I can’t find glasses do you see it? Answer with Yes or No.
No.
Processing IMG_20250827_122347.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see a few items:

* **A white wooden table:** It has a light wood grain.
* **A black computer monitor:** It's positioned towards the top center of the table.
* **A black keyboard:** Located to the left of the monitor.
* **A white box:** It appears to contain a black object, possibly a controller or a device. It's placed towards the left side of the table.
* **A white roll of tape:** Situated near the box.
* **A small silver device:** This looks like a depth camera, possibly an Intel RealSense camera, placed on a piece of white paper with QR codes.
* **Several pieces of white paper:** These papers have black QR codes printed on them, some with numbers like "0" and "3" visible.
* **A black gamepad controller:** It's lying on the table to the left of the depth camera.
* **A white transparent film:** This film is placed over some of the QR codes.
* **A portion of a white robotic arm:** The arm is visible on the right side of the table, with a silver and white end effector.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122347.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122347.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122347.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122347.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122347.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes. I see a roll of white tape on the left side of the desk.
Processing IMG_20250827_122347.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122347.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122347.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122347.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122347.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122347.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122355.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see a light-colored wooden surface with several items arranged on it. 

There's a black Xbox controller plugged into a cable. 

Scattered across the table are several white pieces of paper, each with a black QR code printed on it. The QR codes appear to be labeled with numbers from 0 to 8. 

In the center of the table, there's a small, silver device with a camera lens. 

Towards the top left, there's a white roll of tape. 

In the top center, there's a partially visible white box with the Xbox logo on it. 

To the right of the QR codes, there's a black computer monitor and a white cable running from it. 

Finally, on the far right, there's a black mouse.
Processing IMG_20250827_122355.jpg with prompt: I can’t find cube do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122355.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122355.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122355.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122355.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122355.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122355.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122355.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122355.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122355.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122355.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122355.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122437.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


I see a wooden table with various items on it. Here's a breakdown of what's visible:

* **A black laptop:** It's positioned towards the top center of the table.
* **A black keyboard:** It's placed to the right of the laptop.
* **A black gaming controller:** It's located to the left of the keyboard.
* **A small silver device:** It's situated on a piece of white paper in the center of the table.
* **A blue tube of adhesive:** It's near the keyboard.
* **A roll of white tape:** It's on the left side of the table.
* **A pair of red and black scissors:** They are positioned between the gaming controller and the silver device.
* **Several pieces of white paper with QR codes:** These are scattered across the table.
* **A clear plastic sheet:** It's placed near the scissors and a QR code on a piece of paper.
* **A white and silver robotic arm with "Intel RealSense" branding:** It's in the foreground, with its end effector holding a QR code.
* **A black mouse:** It's partially visible on the ri

The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122437.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122437.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122437.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122437.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122437.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122437.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes. There is a blue tube of what appears to be glue on the desk.
Processing IMG_20250827_122437.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122437.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122437.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122437.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122437.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122445.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


I see a wooden table with several items on it. Here's a breakdown:

* **A black keyboard:** It's a Logitech brand keyboard with a yellow accent.
* **A black trackball mouse:** Located to the left of the keyboard.
* **A white and black 3D scanner:** Positioned on a piece of white paper with QR codes.
* **A small blue tube:** Likely containing adhesive or a similar substance.
* **A pair of red and black scissors:** Placed near the keyboard.
* **A clear plastic stand:** Holding a white piece of paper with a QR code and the number "18" printed on it.
* **Several white papers with QR codes:** These are scattered across the table, some with numbers like "18" and "8" printed on them.
* **A black monitor:** Partially visible in the background.
* **A white box:** Possibly containing accessories for the 3D scanner.
* **A roll of white tape:** Located near the box.
* **A black object:** Possibly a cable or another accessory, partially visible on the right side of the table.
* **A white cable:** R

The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122445.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122445.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122445.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122445.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122445.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122445.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122445.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122445.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122445.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122445.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122445.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122454.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


I see a wooden table with several items on it. Here's a breakdown:

* **A black Logitech keyboard and trackpad:** This is positioned diagonally across the table.
* **A black Xbox controller:** It's on the left side of the table, connected by a cable.
* **A small, silver and black device:** This appears to be a small camera or sensor, placed on a piece of white paper.
* **A red and black tool:** It's lying on the white paper next to the silver and black device.
* **A blue tube of adhesive:** It's near the silver and black device and the red and black tool.
* **Several pieces of white paper with black QR codes:** These are scattered around the table.
* **A pair of red and black scissors:** They are on the left side of the table, near the keyboard.
* **A roll of white tape:** It's on the far left side of the table.
* **A black mousepad:** It's on the far right side of the table.
* **A black laptop:** It's in the background, slightly towards the center.
* **Various cables:** Several cables

The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122454.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122454.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122454.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122454.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122454.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122454.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes. There is a blue tube of what appears to be glue on the desk.
Processing IMG_20250827_122454.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122454.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122454.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122454.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122454.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122506.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see a variety of items including:

* **A white wooden desk** with a light wood grain.
* **A black keyboard** positioned in the center.
* **A black mouse** to the right of the keyboard.
* **A white box** with a black image of a hand on the top left.
* **A black controller** (likely a gaming controller) on the left side.
* **A pair of red and black scissors** in the middle.
* **A small white box** with a blue cap and a red nozzle, possibly a glue stick or similar.
* **Several pieces of paper** with black and white QR codes printed on them. Some are scattered, while others are placed on the table.
* **A white tape roll** on the far left.
* **A black object** (possibly a small device or tool) near the keyboard.
* **A black laptop** partially visible in the upper left corner.
* **Various cables** running across the table.

The overall scene suggests a workspace with some technical or experimental elements.
Processing IMG_20250827_122506.jpg with prompt: I can’t find cube do 

The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122506.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122506.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122506.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122506.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122506.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122506.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122506.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122506.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122506.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122506.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122506.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122526.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


I see a wooden table with various items on it. Here's a breakdown of what's visible:

* **A white robotic arm:** A portion of a robotic arm is prominently featured on the right side of the table. It has a white forearm and a gripper mechanism.
* **A black keyboard:** A compact, black keyboard is placed towards the center of the table.
* **A black game controller:** A black game controller with colorful buttons is situated on the left side.
* **A white box:** A white box with black text and a picture of a robotic hand is near the top left.
* **A white circular object:** A white circular object with a black outline is placed near the top center.
* **A small black device:** A small, black device with a screen is located near the center.
* **Scissors:** A pair of red-handled scissors is positioned near the black device.
* **A red screwdriver:** A small red screwdriver is placed near the scissors.
* **A blue tube:** A blue tube with white text is near the right side of the table.
* **QR cod

The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122526.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122526.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes. There is a roll of blue tape on the right side of the desk.
Processing IMG_20250827_122526.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122526.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122526.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122526.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122526.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122526.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122526.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122526.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122526.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122533.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see a variety of items including:

* **A white wooden desk.**
* **A white robotic arm** with a gripper holding a small white card with a QR code.
* **A black gamepad controller** with buttons and joysticks.
* **A small black and white camera** with a white cable.
* **A black laptop** with a keyboard.
* **A red screwdriver.**
* **A pair of red and black scissors.**
* **Several white cards** with black QR codes. One card has the number "0" printed on it.
* **A white box** with a black and white image on it, likely containing a product.
* **Some papers** scattered on the table.
* **A black cable** running from the gamepad controller.

The overall scene appears to be a workspace with some technology and tools present.
Processing IMG_20250827_122533.jpg with prompt: I can’t find cube do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122533.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122533.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes. There is a roll of tape on the desk, near the keyboard.
Processing IMG_20250827_122533.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122533.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122533.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122533.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122533.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122533.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122533.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122533.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122533.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122536.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


I see a wooden table with various items on it. Here's a breakdown:

* **A black keyboard:** It appears to be a wireless keyboard.
* **A black controller:** This looks like a gaming controller, possibly for a console or PC.
* **A small white device with a camera:** This could be a depth camera or a similar sensor.
* **Scissors:** A pair of red-handled scissors.
* **A red screwdriver:** A small screwdriver.
* **A blue tube of adhesive:** Likely double-sided tape.
* **A white box:** It has a black object inside, possibly a small device or accessory.
* **Several QR codes:** There are multiple printed QR codes on pieces of paper and attached to clear plastic stands. Some have the number "18" and others have the number "8" printed on them.
* **A black object with a white cable:** This is likely a cable connected to the white device with the camera.
* **A black circular object:** This could be a mousepad or a similar accessory.
* **A laptop:** Partially visible in the background.
* **Wires:**

The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122536.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122536.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes. There is a roll of tape on the table, near the keyboard.
Processing IMG_20250827_122536.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122536.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122536.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122536.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes. There is a blue tube of what appears to be adhesive or glue on the desk.
Processing IMG_20250827_122536.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122536.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122536.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_122536.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_122536.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123431.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see a few items:

* **A PlayStation controller:** It's black and lying on a piece of white paper.
* **A piece of white paper:** It has the word "FLURESH" written on it in red marker. There are also some white lines drawn on the paper.
* **A small cardboard block:** It's brown and has a simple, blocky shape.
* **A pencil:** It's lying horizontally on the table, near the cardboard block.
* **A blue and red circuit board:** It's sitting on the table, near the PlayStation controller and the cardboard block.
* **A robotic arm:** It's positioned above the table, holding a black square with a white "Q" on it. It appears to be part of a robotic system.
* **A grey surface:** The items are placed on a grey table.
* **A blue container:** A portion of a blue container is visible in the upper left corner.
* **A metal object:** A part of a metal object, possibly a machine or tool, is visible in the upper right corner.
Processing IMG_20250827_123431.jpg with prompt: I can’t find cube 

The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123431.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123431.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes. There is tape visible in the image. It is a white piece of tape with a black rectangle on it, placed on the paper with the word "FLURESH" written on it.
Processing IMG_20250827_123431.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123431.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123431.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123431.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123431.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123431.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123431.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123431.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123431.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123453.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see a few items:

* **A black PlayStation controller:** It's lying on a piece of white paper with red lettering that appears to say "SUREST".
* **A piece of white paper:** This paper has the red lettering "SUREST" printed on it.
* **A pencil:** A wooden pencil is lying on the table near the controller and the paper.
* **A cardboard box:** A small brown cardboard box is partially visible to the left of the controller.
* **A camera:** A black camera with a lens is positioned above the controller and the box.
* **A gray table:** All these items are placed on a gray table.

In the background, there are some blurred objects that look like chairs, tables, and possibly some equipment.
Processing IMG_20250827_123453.jpg with prompt: I can’t find cube do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123453.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123453.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes. There is tape visible in the image. It is a white tape with black markings on the table.
Processing IMG_20250827_123453.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123453.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123453.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123453.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123453.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123453.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123453.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123453.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123453.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123500.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see a few items:

* **A black PlayStation controller:** It's lying on its side, angled towards the upper right corner of the image.
* **A stick of white chalk:** It's positioned horizontally across the center of the table, slightly to the left of the controller.
* **A piece of cardboard:** It's a brown, rectangular piece of cardboard with cut-out shapes, possibly for a model or craft project. It's located in the lower left corner of the image.
* **A piece of white paper:** It's spread out on the table, partially covered by the cardboard and the controller. There's some red writing on the paper that appears to say "JURESH". 

The table itself is a light gray color.
Processing IMG_20250827_123500.jpg with prompt: I can’t find cube do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123500.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123500.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123500.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123500.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123500.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123500.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123500.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123500.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123500.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123500.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123500.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123521.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see the following items:

* **A black PlayStation controller:** It's positioned towards the left side of the table.
* **A roll of blue and white electrical tape:** It's located near the center of the table.
* **A small white bottle of Band-Aid:** It's situated to the right of the tape.
* **A pair of scissors:** They are placed near the tape and the controller.
* **A pencil:** It lies horizontally on the table, near the scissors.
* **A piece of paper with the name "SURESH" written on it in red:** This paper is partially visible in the bottom left corner.
* **A gray table surface.**
* **A white surface with some markings or text on it** (partially visible in the background).
* **A black object with a screen** (partially visible in the top left corner).
* **A few chairs and a table in the blurred background.**
Processing IMG_20250827_123521.jpg with prompt: I can’t find cube do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123521.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123521.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123521.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123521.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123521.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123521.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123521.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123521.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123521.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123521.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123521.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123536.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see the following items:

* **A black PlayStation controller:** It's positioned towards the bottom left of the image, partially overlapping a piece of white paper.
* **Scissors:** Located in the upper left quadrant, they are silver with black handles.
* **A roll of tape:** Situated in the upper right quadrant, it has a white base with blue and yellow tape.
* **A white bottle with a white cap:** Found in the upper right corner, it appears to contain a white substance.
* **A pen:** Lies horizontally in the center of the image, with a light-colored body and a darker tip.
* **A piece of white paper:** Partially visible beneath the PlayStation controller. 

The table itself is a light gray color.
Processing IMG_20250827_123536.jpg with prompt: I can’t find cube do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123536.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123536.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123536.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123536.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123536.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123536.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123536.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123536.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123536.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123536.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123536.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123612.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see a black PlayStation controller, a roll of blue and tan electrical tape, a pair of black and silver scissors, and a piece of paper with the word "MESH" written in red marker. There's also a pencil with blue and red markings on it, and a partially visible cardboard box. The table surface appears to be gray.
Processing IMG_20250827_123612.jpg with prompt: I can’t find cube do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123612.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123612.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123612.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123612.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123612.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123612.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123612.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123612.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123612.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123612.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123612.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123633.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see the following items:

* **A black video game controller:** It's positioned in the center of the image.
* **A roll of colorful masking tape:** It's located to the right of the controller.
* **A piece of paper with the word "SURESH" written on it in red:** This paper is placed on a white sheet of paper.
* **A cardboard structure:** It's partially visible on the left side of the image.
* **A pencil:** It's lying on the white sheet of paper near the cardboard structure.
* **Several pieces of white paper:** These are scattered on the table, likely used for crafting or design. 

The table itself appears to be a light gray or beige color.
Processing IMG_20250827_123633.jpg with prompt: I can’t find cube do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123633.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123633.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123633.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123633.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123633.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123633.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123633.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123633.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123633.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123633.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123633.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123740.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see a pair of clear safety glasses with black temple arms. They are resting on a light gray surface. To the left of the glasses, there is a piece of white paper with the name "SURESH" printed in red letters. There are also some pieces of brown cardboard and a wooden stick near the paper. In the background, there is a black robotic arm and some chairs.
Processing IMG_20250827_123740.jpg with prompt: I can’t find cube do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123740.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123740.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123740.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123740.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123740.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123740.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123740.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123740.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123740.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123740.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123740.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123751.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see a few items:

* **A piece of white paper** with the name "SURESH" printed in red.
* **A pair of clear safety glasses** with black frames. They are positioned in front of the paper.
* **A small piece of white tape** is visible on the table, partially covering the safety glasses.
* **A portion of a red circuit board** is visible on the left side of the image.
* **A small brown cardboard box** is in the upper left corner.
* **A gray surface** that appears to be the table itself.
Processing IMG_20250827_123751.jpg with prompt: I can’t find cube do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123751.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes. The controller is visible on the bottom left of the image. It's a red circuit board with various components on it.
Processing IMG_20250827_123751.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123751.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123751.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123751.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123751.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123751.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123751.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123751.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123751.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123751.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123844.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see a few items:

* **A light blue control panel:** This panel has a touchscreen display and a red emergency stop button. It's labeled "UNIVERSAL ROBOTS".
* **A small wooden block:** This block is sitting on the table, slightly to the right of the control panel.
* **A piece of paper:** This paper is partially visible in the foreground, underneath the wooden block. It appears to have some markings or text on it.
* **A measuring tool:** A silver measuring tool is visible on the left side of the image, partially out of focus. 

The table itself is a light gray color.
Processing IMG_20250827_123844.jpg with prompt: I can’t find cube do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123844.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123844.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123844.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123844.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123844.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123844.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123844.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123844.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123844.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123844.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123844.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123853.jpg with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see a few items:

* **A piece of white paper** with the word "SURESH" written in red marker.
* **A small wooden cube**.
* **Some pieces of white tape** arranged in a somewhat rectangular shape.
* **A small piece of brown cardboard** in the upper left corner.
* **A gray surface** which appears to be the table itself. 

There are also some blurry dark shapes in the upper part of the image, which might be part of the mechanical hand or other objects out of focus.
Processing IMG_20250827_123853.jpg with prompt: I can’t find cube do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123853.jpg with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123853.jpg with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123853.jpg with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing IMG_20250827_123853.jpg with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123853.jpg with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123853.jpg with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123853.jpg with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123853.jpg with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123853.jpg with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123853.jpg with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing IMG_20250827_123853.jpg with prompt: I can’t find glasses do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing output_image.png with prompt: What do you see on the table?


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


On the table, I see a rectangular metal container with a slightly open top. To the right of the container, there is a small, black and white checkered cube. Above the table, a white robotic arm with a gripper is positioned, seemingly looking down at the cube.
Processing output_image.png with prompt: I can’t find cube do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Yes.
Processing output_image.png with prompt: I can’t find controller do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing output_image.png with prompt: I can’t find tape do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing output_image.png with prompt: I can’t find camera do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing output_image.png with prompt: I can’t find keyboard do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing output_image.png with prompt: I can’t find scissors do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing output_image.png with prompt: I can’t find glue do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing output_image.png with prompt: I can’t find screwdriver do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing output_image.png with prompt: I can’t find pencil do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing output_image.png with prompt: I can’t find tablet bottle do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing output_image.png with prompt: I can’t find Caliper do you see it? Answer with Yes or No.


The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


No.
Processing output_image.png with prompt: I can’t find glasses do you see it? Answer with Yes or No.
No.


In [None]:
# Convert the list of results into a DataFrame and save it to an Excel file.
data_df = pd.DataFrame(data)
data_df.to_excel("output.xlsx", index=False)

## 4. Calculating and Summarizing Model Accuracy

This final part is used to analyze the model's performance after the results have been manually reviewed and annotated.

### 4.1. Manual Annotation Prerequisite
First, the data generated by the previous code is saved to an Excel file. A human then manually reviews each response from the model. Based on the image and the prompt, the **`Ground Truth`** column is filled in:
* **1** is entered if the model's response is correct.
* **0** is entered if the model's response is incorrect.

This manual step is crucial because evaluating the correctness of a generative model often requires human judgment.

### 4.2. Performance Analysis
then reads this annotated Excel file to calculate the accuracy for each image:
1.  **Group Data**: It groups the data by **`filename`**.
2.  **Count Occurrences**: For each image, it counts the number of correct (`1`) and incorrect (`0`) answers using `value_counts()`.
3.  **Restructure Data**: The `.unstack()` method pivots the table, creating separate columns for the counts of `0`s and `1`s, making it easier to work with.
4.  **Calculate Accuracy**: A new **`Accuracy`** column is calculated using the formula: `(Correct Answers / Total Answers) * 100`.

In [None]:
import pandas as pd

df = pd.read_excel('output.xlsx')

In [None]:
result = df.groupby('filename')['Ground Truth'].value_counts().unstack(fill_value=0)

result.columns = ['Count_0', 'Count_1']
result['Accuracy'] = (result['Count_1'] / (result['Count_0'] + result['Count_1']) * 100).round(2)
result = result.reset_index()

print(result)