<a href="https://colab.research.google.com/github/MarioMouse826/Computer-Vision-CSGA-2771-/blob/main/CV_A0_Submission.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Submit Your Early Assignment

Complete the assignment by following the steps outlined below. Save and submit your work as an ipynb file when finished. You are expected to complete this assignment in under one hour.

## Step 0: Download the Image from the Given URL, setting up Notebooks
Retrieve the image from this link: [The Mysterious Painting](https://upload.wikimedia.org/wikipedia/en/7/74/PicassoGuernica.jpg)

Next, go through each of the three provided notebooks. Combine them into a new notebook and configure your environment according to the specifications for the tasks below.

## Step 1: Artist Recognition with the SigLip Model
Utilize the SigLIP model(CLIP with Sigmoid activation) to identify the artist of the painting from the list of artists provided below. Display the prediction accuracy. The expected output is the artist's name, denoted as [ARTIST].

Use the following Possible Artist Descriptions for zero-shot classification:
```python
descriptions = [
  "a painting by Leonardo da Vinci",
  "a painting by Michelangelo",
  "a painting by Vincent van Gogh",
  "a painting by Pablo Picasso",
  "a painting by Rembrandt",
  "a painting by Claude Monet"
]
```

## Step 2: Style-Based Object or Scene Generation
Once you've identified the [ARTIST], use this information to run the Stable Diffusion model. Create an object or scene (of your choice) inspired by the style of the identified artist. The output for this step should be an image, labeled as [GEN_IMAGE].

## Step 3: Image Segmentation with SAM Model
Take the generated image [GEN_IMAGE] from the previous step, and apply the SAM model for image segmentation. Present the segmentation masks. The result of this task should be a segmented image, denoted as [SEGMENT]. If you face issues such as CUDA running out of memory during SAM step, try to resize the image to a smaller scale before SAM.


# Step 0: Environment Setup and Installation


In [None]:
# Check for GPU
!nvidia-smi

# Set up HOME directory
import os
HOME = os.getcwd()
print("HOME:", HOME)

# Install all required dependencies from the three notebooks
!pip install 'git+https://github.com/facebookresearch/segment-anything.git'
!pip install transformers scipy ftfy accelerate
!pip install diffusers==0.30.2
!pip install supervision
!pip install opencv-python
!pip install --upgrade -q git+https://github.com/huggingface/transformers sentencepiece

# Create necessary directories
%cd {HOME}
!mkdir -p {HOME}/weights
!mkdir -p {HOME}/data

In [None]:
# Imports
# General imports
import os
import torch
from PIL import Image
import requests
from IPython.core.display import display, HTML
import cv2
import numpy as np

# SigLIP imports
from transformers import AutoProcessor, AutoModel, pipeline

# SAM imports
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator
import supervision as sv

# Stable Diffusion imports
from diffusers import StableDiffusionXLPipeline

* Step 1: Artist Recognition with SigLip model

In [None]:
# Load the SigLIP model and processor
processor = AutoProcessor.from_pretrained("google/siglip-so400m-patch14-384")
model = AutoModel.from_pretrained("google/siglip-so400m-patch14-384")

# Define the possible artist descriptions
descriptions = [
  "a painting by Leonardo da Vinci",
  "a painting by Michelangelo",
  "a painting by Vincent van Gogh",
  "a painting by Pablo Picasso",
  "a painting by Rembrandt",
  "a painting by Claude Monet"
]

# --- Upload your image in Colab ---
# This code will prompt you to upload the file when run in Colab
# Upload the PicassoGuernica.jpg into Google Colab so that the SigLip model will recognize it as "a painting by Pablo Picasso"
# The output was successful in previous runs.
from google.colab import files
uploaded = files.upload()

# Get the filename (assuming you upload one file)
image_path = list(uploaded.keys())[0]
image = Image.open(image_path).convert("RGB")
display(image)  # Show the uploaded image

# Prepare inputs for SigLIP
inputs = processor(text=descriptions, images=image, padding="max_length", return_tensors="pt")

# Perform inference
with torch.no_grad():
    outputs = model(**inputs)
logits_per_image = outputs.logits_per_image
probs = torch.sigmoid(logits_per_image)  # Apply sigmoid activation

# Process and display results
text_probs = probs[0].cpu().numpy() * 100
max_index = text_probs.argmax()

print("Classification Results:")
for i, (desc, prob) in enumerate(zip(descriptions, text_probs)):
    if i == max_index:
        display(HTML(f"<span style='color: red; font-weight: bold;'>{desc}: {prob:.2f}%</span>"))
    else:
        print(f"{desc}: {prob:.2f}%")

# Extract the identified artist
identified_artist = descriptions[max_index].split("by ")[1]
print(f"\nIdentified Artist: {identified_artist}")
ARTIST = identified_artist

* Step 2: Style-Based Generation with Stable Diffusion XL.

This cell should generate a new image in the style of the identified artist.

In [None]:
# Load Stable Diffusion XL pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipe = pipe.to("cuda")

# Create a prompt using the identified artist
prompt = f"A futuristic cityscape in the style of {ARTIST}"
print(f"Generation Prompt: {prompt}")

# Generate image
image = pipe(prompt).images[0]

# Save and display the generated image
gen_image_path = os.path.join(HOME, "generated_image.png")
image.save(gen_image_path)
display(image)
GEN_IMAGE = gen_image_path

* Step 3: Image Segmentation with SAM

This cell segments the generated image using the Segment Anything Model.

In [None]:
# Download SAM model weights
%cd {HOME}/weights
!wget -q https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
%cd {HOME}

# Set up device
DEVICE = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
MODEL_TYPE = "vit_h"
CHECKPOINT_PATH = os.path.join(HOME, "weights", "sam_vit_h_4b8939.pth")

# Load SAM model
sam = sam_model_registry[MODEL_TYPE](checkpoint=CHECKPOINT_PATH)
sam.to(device=DEVICE)

# Create mask generator
mask_generator = SamAutomaticMaskGenerator(sam)

# Load the generated image
image_bgr = cv2.imread(GEN_IMAGE)
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)

# Generate masks
sam_result = mask_generator.generate(image_rgb)

# Visualize results
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)
detections = sv.Detections.from_sam(sam_result=sam_result)
annotated_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)

# Display segmentation results
sv.plot_images_grid(
    images=[image_bgr, annotated_image],
    grid_size=(1, 2),
    titles=['generated image', 'segmented image'],
    size=(16, 8)
)

# Save segmented image
seg_image_path = os.path.join(HOME, "segmented_image.png")
cv2.imwrite(seg_image_path, annotated_image)
SEGMENT = seg_image_path

* Summary: This cell provides a summary of the outputs.

This single notebook combining the 3 previous ipnyb files should contain the following output:

Step 1: Once you uploads the PicassoGuernica.jpg file, SigLip should identify Pablo Picasso as the artist using SigLIP.

Step 2: Stable Diffusion XL should generate a futuristic cityscape in Picasso's style using Stable Diffusion XL. What specifically is generated is different everytime but they share some basic features that can be identified as a Pablo Picasso painting to the naked eye.

Step 3: SAM segments the previously generated image in Picasso's style using SAM into various colored blocks, polygons, fractals etc. I tested this myself in Google Colab in previous runs and it has worked thus far.

In [None]:
print("WORKFLOW COMPLETE!")
print(f"1. Identified Artist: {ARTIST}")
print(f"2. Generated Image: {GEN_IMAGE}")
print(f"3. Segmented Image: {SEGMENT}")

# Display all outputs
print("\n--- Final Outputs ---")
display(HTML(f"<h3>Identified Artist: {ARTIST}</h3>"))

gen_img = Image.open(GEN_IMAGE)
display(gen_img)

seg_img = cv2.imread(SEGMENT)
seg_img_rgb = cv2.cvtColor(seg_img, cv2.COLOR_BGR2RGB)
display(Image.fromarray(seg_img_rgb))