[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io)

# AI for System Engineers and Project Managers

## Deep Learning - Multi Modal - Contrastive Language Image Pre Training (CLIP)

Displays using a _Zero Shot Model_ for _Image Classification_.

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.000 | 07/03/2025 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0037FeaturesTransform.ipynb)

In [1]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning

# Deep Learning
from torch.utils.data import Dataset

# Image Processing
import skimage as ski

# Miscellaneous
import math
import os
import pickle
from platform import python_version
import random
import onedrivedownloader #<! https://github.com/loribonna/onedrivedownloader

# Typing
from typing import Callable, Dict, List, Optional, Self, Set, Tuple, Union

# Visualization
import matplotlib as mpl
from matplotlib.patches import Rectangle
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

 ```python
 vallToFill = ???
 ```

 - Multi Line to Fill (At least one)

```python
# You need to start writing
?????
```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

?????
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

PROJECT_NAME      = 'FixelCourses'
DATA_FOLDER_PATH  = 'DataSets'
MODEL_FOLDER_PATH = 'Models'

BASE_FOLDER      = os.getcwd()[:len(os.getcwd()) - (os.getcwd()[::-1].lower().find(PROJECT_NAME.lower()[::-1]))]

L_IMG_EXT = ['.png', '.jpeg', '.jpg']

In [None]:
# Courses Packages



In [2]:
# General Auxiliary Functions

class CLIPDataset(Dataset):
    def __init__( self, dataFolderPath: str, dataFileName: str, hDataTrans: Optional[Callable] = None ) -> None:
        """
        
        """

        dfDataCaptions = pd.read_csv(os.path.join(dataFolderPath, dataFileName))

        self.dataFolderPath = dataFolderPath
        self.dataFileName   = dataFileName
        self.hDataTrans     = hDataTrans
        self.dfDataCaptions = dfDataCaptions
        self.numSamples     = len(dfDataCaptions)

    def __getitem__(self, idx):
        
        imgFileName = str(self.dfDataCaptions['image'][idx])
        captionTxt  = str(self.dfDataCaptions['caption'][idx])

        mI = ski.io.imread(os.path.join(self.dataFolderPath, imgFileName))
        mI = ski.util.img_as_float32(mI)

        if self.hDataTrans is not None:
            mI = self.hDataTrans(mI)

        return mI, captionTxt

    def __len__(self):
        return self.numSamples



## Contrastive Learning

Contrastive Learning is a _self supervised_ learning technique which learns embedding which clusters data based on the knowledge which samples are similar in some sense.

![](https://i.imgur.com/wH4Yc5c.png)
<!-- ![](https://i.postimg.cc/9M6SymRV/Picture1.png) -->

### OpenAI CLIP Model

The `CLIP` model learns to match _Text_ and _Image_.  
During training it learned:
 - Embedding Text.
 - Embedding Images.
 - Match Text (Embedding) and Image (Embedding).

![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ee/Contrastive_Language-Image_Pretraining.png/800px-Contrastive_Language-Image_Pretraining.png)

Applications:

 - _Zero Shot Classification_.
 - Retrieval Systems - Extract images from a DB given text.
 - Pre Processor - For text in the context of image generation or images for featurization in text context.


* <font color='brown'>(**#**)</font> [OpenAI CLIP](https://github.com/openai/CLIP) ([Wikipedia - CLIP](https://en.wikipedia.org/wiki/Contrastive_Language-Image_Pre-training), [OpenAI CLIP Page](https://openai.com/index/clip)).
* <font color='brown'>(**#**)</font> [OpenCLIP](https://github.com/mlfoundations/open_clip/) is an open model which includes _Fine Tuning_ models and training scripts.
* <font color='brown'>(**#**)</font> [The Stanford AI Lab Blog - Understanding Deep Learning Algorithms that Leverage Unlabeled Data, Part 2: Contrastive Learning](https://ai.stanford.edu/blog/understanding-contrastive-learning).
* <font color='brown'>(**#**)</font> [Ankesh Anand - Contrastive Self-Supervised Learning](https://ankeshanand.com/blog/2020/01/26/contrative-self-supervised-learning.html).
* <font color='brown'>(**#**)</font> [Szymon Palucha - Understanding OpenAI’s CLIP Model](https://scribe.rip/6b52bade3fa3).
* <font color='brown'>(**#**)</font> [Kerry Halupka - Getting started with OpenAI’s CLIP](https://scribe.rip/a3b8f5277867).
* <font color='brown'>(**#**)</font> [Simple Implementation of OpenAI CLIP Model: A Tutorial](https://scribe.rip/ace6ff01d9f2).

In [None]:
# Parameters

# Data
datasetName = 'Flickr8K'
datasetUrl  = 'https://technionmail-my.sharepoint.com/:u:/g/personal/royia_technion_ac_il/EZxtZtYu1s9AgopNp5YSXYAB4tRzJWmoQuvItw8gd3GKcA?e=kPqVOM'

# Pre Processing

# Model

# Points

# Data Visualization


## Generate / Load Data

The image is an image of a running dogs.



In [None]:
# Verify Data is Available

dataSetPath = os.path.join(BASE_FOLDER, DATA_FOLDER_PATH, datasetName)

if not os.path.isdir(dataSetPath):
    # Download, unzip and remove ZIP file
    onedrivedownloader.download(datasetUrl, os.path.join(BASE_FOLDER, DATA_FOLDER_PATH, datasetName + '.zip'), unzip = True, clean = True)

In [None]:
# Load / Generate Data 

mI = ski.io.imread(imgUrl)

### Plot Data

In [None]:
# Plot the Data

hF, hA = plt.subplots(1, 1, figsize = (12, 12))
hA.imshow(mI)

* <font color='brown'>(**#**)</font> Some of the images are not well annotated.

## Load Model

The models is based on [ONNX](https://github.com/microsoft/onnxruntime) with a wrapping class.

* <font color='brown'>(**#**)</font> ONNX is a general run time. Though it has optimizations specific for several HW.
* <font color='brown'>(**#**)</font> For NVIDIA based hardware the most optimized Run Time is [TensorRT](https://github.com/NVIDIA/TensorRT).

In [None]:
# Model

oSam = SAM2Image(os.path.join(modelsPath, modelEncFileName), os.path.join(modelsPath, modelDecFileName))

## Inference

In [None]:
# Set the Image -> Generate Embeddings

oSam.set_image(mI) #<! Input should be UINT8

* <font color='brown'>(**#**)</font> The 

In [None]:
# Add Annotations
lMask = []
for lblId, (vPtCoord, lblMode) in enumerate(zip(lPtCoord, lLblMode)):
    for ii in range(lblMode.shape[0]):
        oSam.add_point((vPtCoord[ii][0], vPtCoord[ii][1]), lblMode[ii], lblId)

dMasks = oSam.get_masks()

In [None]:
# Display a Single Mask
plt.imshow(dMasks[0])

In [None]:
# Plot the image with masks
hF, hA = plt.subplots(1, 1, figsize=(12, 12))
hA.imshow(mI)

lClrCamps = ['Blues', 'Greens', 'Oranges', 'Purples', 'Reds']

# Overlay masks
for lblId, mM in dMasks.items():
    for jj, vPt in enumerate(lPtCoord[lblId]):
        hA.scatter(vPt[0], vPt[1], c = lMatPltLibclr[lblId], s = 125, label = f'{lblId}')
    # Work on masks per annotation point
    hA.imshow(mM, alpha = 0.5 * mM, cmap = lClrCamps[lblId])

hA.legend()
plt.show()



### Larger Model Result

![](https://github.com/ibaiGorordo/ONNX-SAM2-Segment-Anything/raw/main/doc/img/sam2_masked_img.jpg)