# Set up

The aim of this notebook is to get you set up in the Kaggle environment. We'll read and write some data, run a functions, create a plot so you are good to go for the coming weeks.Below is the chunk that loads on a fresh kaggle notebook, it includes some details on the resources available. You can explore more by selecting View (at top right of page) -> Show sidebar. In here (to right of page) are some clickable settings for the notebook, including compute resources used, location of persistance storage and more.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## Test read, write 

First let's grab some images.. using following api https://pypi.org/project/bing-image-downloader/

In [None]:
pip install bing-image-downloader


In [None]:
import os
import shutil
from bing_image_downloader import downloader
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

In [None]:
downloader.download("child hungry", limit=5, output_dir="images", adult_filter_off=False)

In [None]:
%ls /kaggle/working/images/'child hungry'/

In [None]:
import os
from PIL import Image
import matplotlib.pyplot as plt

def load_and_display_images(image_dir):
    """
    Load images from a directory, store them in a list, 
    and display their thumbnails with names.

    Parameters:
    image_dir (str): Directory containing image files.
    
    Returns:
    images (list): List of PIL image objects.
    image_names (list): List of image filenames.
    """
    # 1. Load all images and store them in a list
    images = []
    image_names = []
    
    for filename in os.listdir(image_dir):
        if filename.lower().endswith(('jpg', 'jpeg')):
            image_path = os.path.join(image_dir, filename)
            images.append(Image.open(image_path))
            image_names.append(filename)

    # 2. Plot thumbnails of the images with their names
    fig, axes = plt.subplots(1, len(images), figsize=(15, 5))
    if len(images) == 1:
        axes = [axes]  # Ensure axes is iterable when there's only 1 image

    for ax, img, name in zip(axes, images, image_names):
        ax.imshow(img.resize((128, 128)))  # Resize for thumbnail
        ax.set_title(name, fontsize=8)
        ax.axis('off')

    plt.tight_layout()
    plt.show()

    return images, image_names


In [None]:
# Example usage
image_dir = '/kaggle/working/images/child hungry/'
images, image_names = load_and_display_images(image_dir)

lets grab some other, clearly different, images.

In [None]:
downloader.download("child happy", limit=5, output_dir="images", adult_filter_off=False)

Let's put these in the same directory to mix things up a bit..

In [None]:
source_dir = '/kaggle/working/images/child happy/'
destination_dir = '/kaggle/working/images/child hungry/'

os.makedirs(destination_dir, exist_ok=True)

for index, file in enumerate(os.listdir(source_dir)):
    src = os.path.join(source_dir, file)
    if os.path.isfile(src):
        shutil.move(src, os.path.join(destination_dir, f"image{index}{os.path.splitext(file)[1]}"))

%ls /kaggle/working/images/'child happy'/
!rm -r /kaggle/working/images/'child happy'


Now our directory has some mixed up happy and hungry images mixed in..

In [None]:
%ls /kaggle/working/images/'child hungry'/

images, image_names = load_and_display_images('/kaggle/working/images/child hungry/')

## Going in blind

Suppose we didn't obtain these images from the internet. Imagine we had millions of pictures to evaluate not just these. How might we assess what we have? One option is to try some off-the-shelf models to help extract text based meaning from the images. And then see if we can enumerate from there. We'll be using the Hugging Face APIs for this... https://huggingface.co/tasks/image-text-to-text

In [None]:
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
import torch

Now enable the GPU on the right hand panel and check it is working. If CPU is returned try again. Note, you only have 30 hours per week so make sure you use your time where needed and for the computation only.

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

print(f"Using device: {device}")

Below is a function that takes a random selection of images and some text input and outputs some text for each image. It uses a small model ~1.7GB (0.B params) so is not massively accurate. At the same time the individual descriptions are interesting. Don't worry too much about the technical details right now (this is just a demo set up notebook, and in future sessions we'll pre-empt any work done in notebooks in session with conceptual and theoretical backgrounds, but if interested here is the background paper https://arxiv.org/pdf/2407.07895).

In [None]:
import os
import random
import torch
import matplotlib.pyplot as plt
from PIL import Image
from transformers import AutoProcessor, LlavaForConditionalGeneration

# Load the model and processor once to avoid reloading them for each image
model_id = "llava-hf/llava-interleave-qwen-0.5b-hf"
device = "cuda:0"  # Assuming you're using GPU

# Load model and processor
model = LlavaForConditionalGeneration.from_pretrained(
    model_id, 
    low_cpu_mem_usage=True
).to(device)

processor = AutoProcessor.from_pretrained(model_id)

# Function to process and generate response for a given image and prompt
def generate_response(image_path, prompt):
    # Load image from the path
    image = Image.open(image_path).convert("RGB")

    # Define a chat history and use `apply_chat_template` to get the correctly formatted prompt
    conversation = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {"type": "image"},
            ],
        },
    ]

    # Apply chat template to the conversation
    formatted_prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)

    # Process the inputs (text and image)
    inputs = processor(formatted_prompt, image, return_tensors="pt").to(device)

    # Generate the output from the model
    output = model.generate(**inputs, max_new_tokens=100)

    # Decode and return the response
    return processor.decode(output[0], skip_special_tokens=True), image

# Function to randomly select a specified number of images from a directory and process them with a given prompt
def process_multiple_random_images_from_directory(directory, prompt, num_images=5):
    # List all image files in the directory (assuming image files are .jpg, .jpeg, or .png)
    image_files = [f for f in os.listdir(directory) if f.lower().endswith(('jpg', 'jpeg', 'png'))]
    
    # If no image files are found, return
    if not image_files:
        print("No image files found in the directory.")
        return

    # If the requested number of images exceeds the available ones, adjust
    num_images = min(num_images, len(image_files))
    
    # Randomly select the specified number of images
    selected_image_files = random.sample(image_files, num_images)
    
    # Process each randomly selected image
    for selected_image_file in selected_image_files:
        image_path = os.path.join(directory, selected_image_file)
        print(f"Processing randomly selected image: {selected_image_file}")
        response, image = generate_response(image_path, prompt)
        
        # Display the image and the response
        display_image_with_response(image, response, selected_image_file)

# Function to display an image and its corresponding model response
def display_image_with_response(image, response, image_file):
    # Create a figure and axes
    plt.figure(figsize=(8, 8))
    
    # Display the image
    plt.imshow(image)
    plt.axis('off')  # Hide axes
    
    # Display the response as a title
    plt.title(f"Response for {image_file}:\n{response}", fontsize=12)
    
    # Show the image with response
    plt.show()


In [None]:
directory = "/kaggle/working/images/child hungry/"
prompt = "Describe the contents of this image."

# Example call to process 3 random images from a given directory
process_multiple_random_images_from_directory(directory, prompt, num_images=2)


Below we check out the ability of the model to capture sentiment. Not so good! Clearly needs some fine tuning.

In [None]:
directory = "/kaggle/working/images/child hungry/"
prompt = "Is this image a happy scene or not?"

# Example call to process 3 random images from a given directory
process_multiple_random_images_from_directory(directory, prompt, num_images=1)


One interesting thing about the model above is that it can aparently used to query multiple images at once. This might be useful for assessing the contents aross the set of images we have... might be worth exploring this later, if it is unable to do sentiment on single images I would not trust it on multiple comparisons!

## Clean up

If you happen to want to clean up your working directory you can uncomment the following.

In [None]:
#!rm -r /kaggle/working