<a href="https://colab.research.google.com/github/AnaBelenCarbajal/Thesis/blob/main/SIAMESE_Pre_trainig_Animal_shapes_Dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Dataset animal shapes**

1) installing packages

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
!pip install lightning

Collecting lightning
  Downloading lightning-2.3.0-py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m22.9 MB/s[0m eta [36m0:00:00[0m
Collecting lightning-utilities<2.0,>=0.8.0 (from lightning)
  Downloading lightning_utilities-0.11.2-py3-none-any.whl (26 kB)
Collecting torchmetrics<3.0,>=0.7.0 (from lightning)
  Downloading torchmetrics-1.4.0.post0-py3-none-any.whl (868 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m868.8/868.8 kB[0m [31m39.6 MB/s[0m eta [36m0:00:00[0m
Collecting pytorch-lightning (from lightning)
  Downloading pytorch_lightning-2.3.0-py3-none-any.whl (812 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m812.2/812.2 kB[0m [31m46.3 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch<4.0,>=2.0.0->lightning)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12=

In [3]:
# importing required packages

import numpy as np
from matplotlib import pyplot as plt
import torch
import os
from random import choice
import pandas as pd
import lightning as L

from torchvision.datasets import ImageFolder
from torchvision.io import read_image, ImageReadMode
from torch.utils.data import Dataset
import torchvision.transforms.functional as transform
from torchvision import transforms
from torchvision.transforms import v2
from torchvision.transforms import Pad
from torch.utils.data import DataLoader
import cv2

from PIL import Image

In [72]:
class SiameseNetwork(L.LightningModule):
  def __init__(self):
     super().__init__()
     resnet = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=True)
     for param in resnet.parameters():
        param.requires_grad = False

     embedder = torch.nn.Sequential(
         torch.nn.Linear(2048, 1024),
         torch.nn.Linear(1024, 512)
     )
     resnet.fc = embedder
     self.model = resnet
     self.cos = torch.nn.CosineSimilarity()


  def predict_distance(self, img_left, img_right):
    embed_left = self.model(img_left)
    embed_right = self.model(img_right)
    return self.cos(embed_left, embed_right)

  def forward(self, img):
    return self.model(img)

  def training_step(self, batch, batch_idx):
    img_left, img_right, gt = batch
    gt[gt==0] = -1

    embed_left = self.model(img_left)
    embed_right = self.model(img_right)
    loss = torch.nn.functional.cosine_embedding_loss(embed_left, embed_right, gt)
    self.log("train_loss", loss)
    return loss


  def validation_step(self, batch, batch_idx):
      img_left, img_right, gt = batch
      gt[gt==0] = -1

      embed_left = self.model(img_left)
      embed_right = self.model(img_right)
      loss = torch.nn.functional.cosine_embedding_loss(embed_left, embed_right, gt)
      self.log("val_loss", loss)

  def test_step(self, batch, batch_idx):
      img_left, img_right, gt = batch
      gt[gt==0] = -1

      embed_left = self.model(img_left)
      embed_right = self.model(img_right)
      loss = torch.nn.functional.cosine_embedding_loss(embed_left, embed_right, gt)
      self.log("test_loss", loss)


  def configure_optimizers(self):
          # http://karpathy.github.io/2019/04/25/recipe/ why 3e-4
          optimizer = torch.optim.Adam(self.parameters(), lr=3e-4)
          return optimizer



2) I uploaded images180.zip to this collab, it needs to be unzipped

In [7]:
# unzip folder (previously uploaded in collab)
!unzip images180.zip -d my_data

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: my_data/images180/images224/horse/88.png  
  inflating: my_data/__MACOSX/images180/images224/horse/._88.png  
  inflating: my_data/images180/images224/horse/77.png  
  inflating: my_data/__MACOSX/images180/images224/horse/._77.png  
  inflating: my_data/images180/images224/horse/63.png  
  inflating: my_data/__MACOSX/images180/images224/horse/._63.png  
  inflating: my_data/images180/images224/horse/62.png  
  inflating: my_data/__MACOSX/images180/images224/horse/._62.png  
  inflating: my_data/images180/images224/horse/76.png  
  inflating: my_data/__MACOSX/images180/images224/horse/._76.png  
  inflating: my_data/images180/images224/horse/89.png  
  inflating: my_data/__MACOSX/images180/images224/horse/._89.png  
  inflating: my_data/images180/images224/horse/60.png  
  inflating: my_data/__MACOSX/images180/images224/horse/._60.png  
  inflating: my_data/images180/images224/horse/74.png  
  inflating: my_da

3) Root: directory to the filder containing the aniam categories

In [8]:
# root directory to data
root = "my_data/images180/images180/"

4) Labels: categories

In [9]:
# get category labels
labels = os.listdir(root)

#remove ".DS_Store"
labels.remove(".DS_Store")

5) Retrieve label and image information in dictionary for every image -> list of dictionaries

In [10]:
# get list of dictionaries with respective label and image number for all images
data = []

for label in labels:
  folder_path = root + label
  shapes = os.listdir(folder_path)
  for shape in shapes:
    category_dict = {'label': label, 'image': shape}
    data.append(category_dict)

6) Retireve image directory

In [11]:
# function to get image directory
def image_direct(root, category_dict_item):
  return root + category_dict_item['label'] + "/" + category_dict_item['image']

# get image directories
img_dir = []
for image in data:
  direct = image_direct(root, image)
  img_dir.append(direct)

8) Append image directory and size to dictionary

In [12]:
# add img_dir and img_size to data
for img in data:
  for dir in img_dir:
    img['img_dir'] = dir

** Compute random positions -> random

In [13]:
# height, width combinations
combinations = []
for left_and_right in range(-22,23):
    for up_and_down in range(-22,23):
        combinations.append((left_and_right, up_and_down))

# margins
left = 22
right = 22
top = 22
bottom = 22

# make combinations for left, top, right and bottom
i = 0
new_combinations = []
for position in combinations:
  left_new = left - position[0]
  right_new = right + position[0]
  top_new = top + position[1]
  bottom_new = bottom - position[1]
  new_combinations.append([left_new, top_new, right_new, bottom_new])

# random positions
positions = []
for i in new_combinations:
  pd.Series([1, 2, 3])
  j = pd.Series(i)
  jj = j.to_list()
  positions.append(jj)

** Segment foreground and background to apply ColorJitter transformation separetly

In [14]:
class ImageProcessor:
    def __init__(self, brightness=0.5, contrast=0.5, saturation=0.5, hue=0.1):
        self.color_jitter = transforms.ColorJitter(
            brightness=brightness,
            contrast=contrast,
            saturation=saturation,
            hue=hue
        )

    def separate_foreground_background(self, image_np):
        gray = cv2.cvtColor(image_np, cv2.COLOR_RGB2GRAY)
        _, mask = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
        mask_inv = cv2.bitwise_not(mask)
        mask_3c = cv2.cvtColor(mask, cv2.COLOR_GRAY2BGR)
        mask_inv_3c = cv2.cvtColor(mask_inv, cv2.COLOR_GRAY2BGR)
        foreground = cv2.bitwise_and(image_np, mask_3c)
        background = cv2.bitwise_and(image_np, mask_inv_3c)
        return foreground, background, mask, mask_inv

    def apply_transform(self, image_np):
        image_pil = Image.fromarray(image_np)
        transformed_image = self.color_jitter(image_pil)
        return np.array(transformed_image)

    def combine_images(self, foreground, background, mask, mask_inv):
        # Use the mask to combine the transformed foreground and background
        combined_image = cv2.bitwise_and(foreground, foreground, mask=mask)
        combined_image += cv2.bitwise_and(background, background, mask=mask_inv)
        return combined_image

    def process_image(self, image_tensor):
        image_np = image_tensor.permute(1, 2, 0).cpu().numpy()  # Convert to numpy array in HWC format
        image_np = (image_np * 255).astype(np.uint8)  # Convert to uint8 range [0, 255]

        foreground, background, mask, mask_inv = self.separate_foreground_background(image_np)
        transformed_foreground = self.apply_transform(foreground)
        transformed_background = self.apply_transform(background)
        combined_image = self.combine_images(transformed_foreground, transformed_background, mask, mask_inv)
        combined_image = combined_image.astype(np.float32) / 255.0  # Normalize back to [0, 1] range
        combined_image = torch.from_numpy(combined_image).permute(2, 0, 1)  # Convert back to tensor in CHW format
        return combined_image

9) Class ShapeImagesDataset

Requires:
- data (list of dictionnaries)
- img_dir (list of image directiories)
- labels (list of labels)
- transforms (preprocessing transformations)
- image_processor (function to apply ColorJitter to foreground and background)

Output:
This class returns 2 images, 50% of the times wihtin the same category and 50% from other category
- img1 = tensor img1
- img2 - tensor img2
- gt = ground truth (same category 1 or different 0)


In [15]:
class ShapeImageDataset (Dataset):
  # return two images, with 50% chance in same or different category and ground truth

  # __init__
  def __init__(self, data, img_dir, labels, transform=None, image_processor=None):
    self.img_labels = data
    self.img_dir = img_dir
    self.labels = labels
    self.transform = transform
    self.image_processor = image_processor
    self.same_category = [1, 0]

  # __len__
  def __len__(self):
    return len(self.img_labels)

  ### funtions for __getitem__ ###

  # we start by picking another category as the current image
  def pick_other_category(self, label):
    other_categories = []

    for cat in self.labels:
      if cat != label:
        other_categories.append(cat)

    return choice(other_categories)

  # we select an image from the other category
  def select_random_image(self, label):
    result = []
    for item in self.img_labels:
      if item["label"] == label:
        result.append(item)
    return choice(result)

  # function to read both images
  def read_image_from_directory(self, category_dict_item):
    return read_image(image_direct(root, category_dict_item), ImageReadMode.RGB)

  # __getitem__
  def __getitem__(self, idx):
    # image 1
    img1_data = self.img_labels[idx]
    img1_label = img1_data["label"]

    # 50-50 choose image 2 from same category
    same = choice(self.same_category)
    if same == 1:
      img2_label = img1_label
    else:
      img2_label = self.pick_other_category(img1_label)

    # image 2
    img2_data = self.select_random_image(img2_label)

    #gt
    if img1_label == img2_label:
      gt = 1
    else:
      gt = 0

    img1 = self.read_image_from_directory(img1_data)
    img2 = self.read_image_from_directory(img2_data)

    # apply padding
    padding_img1 = Pad(padding = choice(positions), fill=128)
    padding_img2 = Pad(padding = choice(positions), fill=128)
    img1 = padding_img1(img1)
    img2 = padding_img2(img2)

    if self.transform:
      img1 = self.transform(img1)
      img2 = self.transform(img2)

    if self.image_processor:
      img1 = self.image_processor.process_image(img1)
      img2 = self.image_processor.process_image(img2)

    return img1, img2, gt

In [16]:
preprocess = v2.Compose(
    [   v2.ToDtype(torch.float32, scale=True),
        v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)

image_processor = ImageProcessor(brightness=(0.8,4), contrast=(0.6,0.8), saturation=0.2, hue=0.5)

11) Creating dataset (len=2000)

In [17]:
pretraining_dataset = ShapeImageDataset(data, img_dir, labels, preprocess, image_processor)
len(pretraining_dataset)

2000

Plotting

In [37]:
dic = '/content/drive/MyDrive/Thesis/Neural_network/Pre-training on Animals shapes/'

In [38]:
def tensor_to_pil(image_tensor):
    return transform.to_pil_image(image_tensor)

In [None]:
# Example usage to display 5 pairs of images
fig, axs = plt.subplots(4, 2, figsize=(5, 10))

for i in range(4):
    idx = np.random.randint(0, len(pretraining_dataset) - 1)
    img1, img2, gt = pretraining_dataset[idx]  # Get image pair and ground truth
    img1_pil = tensor_to_pil(img1)
    img2_pil = tensor_to_pil(img2)

    axs[i, 0].imshow(img1_pil)
    axs[i, 0].set_title('Image 1')
    axs[i, 0].axis('off')

    axs[i, 1].imshow(img2_pil)
    axs[i, 1].set_title(f'Image 2 - GT: {gt}')
    axs[i, 1].axis('off')

plt.tight_layout()
#plt.savefig(dic + 'pre-training10.pdf')
plt.show()

10) network

In [64]:
train_dataset, val_dataset = torch.utils.data.random_split(pretraining_dataset, [0.7, 0.3])
train_loader = DataLoader(train_dataset, 64)
val_loader = DataLoader(val_dataset, 64)


In [73]:
net = SiameseNetwork()

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0


In [76]:
trainer = L.Trainer(max_epochs=10, log_every_n_steps=5)
trainer.fit(model=net, train_dataloaders=train_loader, val_dataloaders=val_loader)

INFO: GPU available: True (cuda), used: True
INFO:lightning.pytorch.utilities.rank_zero:GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: HPU available: False, using: 0 HPUs
INFO:lightning.pytorch.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO: 
  | Name  | Type             | Params | Mode 
---------------------------------------------------
0 | model | ResNet           | 26.1 M | train
1 | cos   | CosineSimilarity | 0      | train
---------------------------------------------------
2.6 M     Trainable params
23.5 M    Non-trainable params
26.1 M    Total params
104.524   Total estimated model params size (MB)
INFO:lightning.pytorch.callbacks.model_summary:
  | Name  | Type             | Params | Mode 
-----------------

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO: `Trainer.fit` stopped: `max_epochs=10` reached.
INFO:lightning.pytorch.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=10` reached.


In [78]:
trainer.test(net, train_loader)

INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Testing: |          | 0/? [00:00<?, ?it/s]

[{'test_loss': 0.316600501537323}]

In [79]:
img_left, img_right, gt = pretraining_dataset[1000]

In [25]:
img_right.shape

torch.Size([3, 224, 224])

In [83]:
with torch.no_grad():
  res = net.predict_distance(img_left.unsqueeze(0), img_right.unsqueeze(0))

res

tensor([0.9828])

In [84]:
gt

0