<a href="https://colab.research.google.com/github/Alirezamirbagheri/Machine-Learning-Transfer-learning-for-configurations-control/blob/main/Transfer_Learning_for_Configuration_Control.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **1. Libraries**

In [3]:
import numpy as np
import PIL.Image as Image
import matplotlib.pylab as plt
import torch
from torch.utils.data import Dataset
import pathlib
from torchsummary import summary
from tqdm import tqdm
from torch.optim import Adam
import torch.nn as nn
from torchvision.transforms import transforms
import os
!pip install torchmetrics
!pip install torchsummary
from torchmetrics import ConfusionMatrix

Collecting torchmetrics
  Downloading torchmetrics-1.8.2-py3-none-any.whl.metadata (22 kB)
Collecting lightning-utilities>=0.8.0 (from torchmetrics)
  Downloading lightning_utilities-0.15.2-py3-none-any.whl.metadata (5.7 kB)
Downloading torchmetrics-1.8.2-py3-none-any.whl (983 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m983.2/983.2 kB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading lightning_utilities-0.15.2-py3-none-any.whl (29 kB)
Installing collected packages: lightning-utilities, torchmetrics
Successfully installed lightning-utilities-0.15.2 torchmetrics-1.8.2


In [4]:
print("Numpy version: " + np.__version__)
print("PIL.Image version: " + Image.__version__)
print("Matplotlib.pylab version: " + plt.__version__)

Numpy version: 2.0.2
PIL.Image version: 11.3.0
Matplotlib.pylab version: 2.0.2


# **2. Data preparation and import**
In this Notebook Google Drive is used.

In [5]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


## 2.1 **Loading the test data**
The data set is already split into a **training**, **validation** and  **test** sets.  The class names are derived from the sub folder names.

For a CNN, the images must have the same dimensions (height × width × channels).
Because:
- Convolution layers slide fixed-size filters over the image.
- If images had different sizes, the output feature maps would also have different shapes → impossible to batch them together.
- Dense (fully connected) layers at the end require a fixed input size.

Here we define a variable called **IMAGE_SHAPE**.

In [6]:
import tensorflow as tf
IMAGE_SHAPE = (224, 224)
image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1/255)
training_data="./drive/MyDrive/ML/Transfer_Learning/dataset_truck/training"
training_image_data  = image_generator.flow_from_directory(training_data,target_size=IMAGE_SHAPE)

# ImageDataGenerator, a utility for preprocessing and augmenting image data before feeding it into a neural network.
# Automatically load images from disk. Preprocess them (e.g., rescale, rotate, flip, shift, etc.). Feed batches of preprocessed images into your model during training.
# This makes a generator but the data not loaded yet
# This normalization helps neural networks train more efficiently and stably.
# rescale=1./255 → normalizes pixel values from [0,255] → [0,1].
# target_size=(224,224) → resizes every image to the input size required by your CNN (e.g., ResNet, MobileNet, VGG all expect 224×224).
# flow_from_directory → automatically assigns labels from folder names inside training_data.
# Feed batches of preprocessed images into your model during training.

Found 60 images belonging to 6 classes.


In [7]:
# Hyperparameters
BATCH = 32
EPOCH = 5

## 2.2 Loading the validation data

In [8]:
image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1/255)
validation_data="./drive/MyDrive/ML/Transfer_Learning/dataset_truck/validation"
validation_data_image_data  = image_generator.flow_from_directory(validation_data,target_size=IMAGE_SHAPE)

Found 150 images belonging to 6 classes.


# **3. The pre-trained base model**
A pre-trained model is used for Transfer Learning.
For the use of this model there are two possibilities:


* Feature Extractor: Use of the learned features for your own application.
* Fine Tuning: Re-training of the Base Model for your own application.

In the following, we will use the base model as a feature extractor and create our own classification layer for our application.

## 4.1 Structure and use of the basic model - MobilNetV2
Various pre-trained network architectures are already available for download. Often these are pre-trained with the [ImageNet dataset](http://www.image-net.org/).

Here we use the MobileNetV2 model as a pre-trained base model. The torchsummary.summary() command can be used to output the architecture of the model to get an overview. It is necessary to give a network and a shape of images as inputs.

If we want to use a model as a feature extractor, we need to fit it to our own classification task. The classification layer of the pre-trained model must thus be replaced by one of our own.

To simplify this, we load a version of MobilNetV2, which is already prepared for use as a feature extractor - the classification layer is already removed.

In [9]:
MobileNetV2 = torch.hub.load('pytorch/vision:v0.10.0', 'mobilenet_v2', pretrained=True)
# if we choose pretrained=False, only the structure and random weights of model loaded without prior knowledge. But with pretrained=True we use the available weights and
# and available feature maps. The model in this approach is ready for transfer learning and just need to change the last layer of classification
summary(MobileNetV2,(3, 224, 224))
# هر لایه تعدادی پارامتر دارد که وزن آن ها در مسیر برگشت تغییر می کند. نحوه محاسبه پارامترها و مفهوم آن ها به این صورت است که
# .خروجی هر فیلتر یک فیچرمپ است که در حالت عادی اندازه آن همان اندازه عکس است. ولی تعداد کانال ها 1 است یعنی سه تا رنگ تبدیل به یک عدد می شود.
# به هر المان فیلتر یک وزن تعلق می گیرد. مثل لنزی می ماند که باید تنظیم شود. پس اگر 3*3 است یعنی هر فیتر در این لایه 9 تا پارامتر دارد
# در هر لایه برای اینکه فیچرها را درستاستخراج کنیم فقط به یک فیلتر اکتفا نمی کنیم و تعداد فیلترها را زیاد می کنیم. مثلا 32 فیلتر. پس خروجی هر لایه 32 تا فیچرمپ می شود
# به عبارت دیگر تعداد کانال خروجی برابر تعداد فیلترهاست. پس تعداد پارامترهای هر لایه برابر با ضرب تعداد پارامترهای فیلتر که 9 است در تعداد کانال خروجی که 32 است.
# سوال اینجاست که اندازه عکس چه اثری دارد. اندازه عکس پارامتر نیست و فقط حجم محاسبات را بالا می برد یعنی فلاپس. زیرا فیلتر با وزن یکسان کل عکس را اسکن می کند
# هر چه لایه عمیق تر باشد تعداد کانال خروجی بیشتر میشود مثلا 64 سپس 128 سپس زیرا باید ترکیب متنوعی از فیچرهای پایه (لبه، رنگ، تکسچر) و اشیا و اشکال شناسایی شود

Downloading: "https://github.com/pytorch/vision/zipball/v0.10.0" to /root/.cache/torch/hub/v0.10.0.zip




Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /root/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth


100%|██████████| 13.6M/13.6M [00:00<00:00, 54.5MB/s]


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 32, 112, 112]             864
       BatchNorm2d-2         [-1, 32, 112, 112]              64
             ReLU6-3         [-1, 32, 112, 112]               0
            Conv2d-4         [-1, 32, 112, 112]             288
       BatchNorm2d-5         [-1, 32, 112, 112]              64
             ReLU6-6         [-1, 32, 112, 112]               0
            Conv2d-7         [-1, 16, 112, 112]             512
       BatchNorm2d-8         [-1, 16, 112, 112]              32
  InvertedResidual-9         [-1, 16, 112, 112]               0
           Conv2d-10         [-1, 96, 112, 112]           1,536
      BatchNorm2d-11         [-1, 96, 112, 112]             192
            ReLU6-12         [-1, 96, 112, 112]               0
           Conv2d-13           [-1, 96, 56, 56]             864
      BatchNorm2d-14           [-1, 96,

In [12]:
print(MobileNetV2)
# To see the architecture of the model without input size (previous command)

MobileNetV2(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
    )
    (1): InvertedResidual(
      (conv): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
          (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (2): InvertedResidual(
      (conv): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(16, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(96, eps=

In [11]:
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# if we have GPU it uses GPU#0, if not it will use CPU
class_names = list(training_image_data.class_indices.keys())
# we need the class name list in order to change the last layer of the model to this classes
print(class_names)

['container_big_tractor_b', 'container_big_tractor_y', 'container_small_b', 'container_small_gy', 'loader_tractor_r', 'loader_tractor_y']


In [14]:
MobileNetV2.classifier[1] = torch.nn.Linear(in_features=MobileNetV2.classifier[1].in_features,
                                            out_features=len(class_names))
# [1] means last layer. in_features means input paramerters of last layer that we keeps it. out_features means the output of the last layer that we change it.
MobileNetV2 = MobileNetV2.to(DEVICE)
summary(MobileNetV2,(3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 32, 112, 112]             864
       BatchNorm2d-2         [-1, 32, 112, 112]              64
             ReLU6-3         [-1, 32, 112, 112]               0
            Conv2d-4         [-1, 32, 112, 112]             288
       BatchNorm2d-5         [-1, 32, 112, 112]              64
             ReLU6-6         [-1, 32, 112, 112]               0
            Conv2d-7         [-1, 16, 112, 112]             512
       BatchNorm2d-8         [-1, 16, 112, 112]              32
  InvertedResidual-9         [-1, 16, 112, 112]               0
           Conv2d-10         [-1, 96, 112, 112]           1,536
      BatchNorm2d-11         [-1, 96, 112, 112]             192
            ReLU6-12         [-1, 96, 112, 112]               0
           Conv2d-13           [-1, 96, 56, 56]             864
      BatchNorm2d-14           [-1, 96,

In [21]:
from torch.utils.data import DataLoader

# Define DataLoader
training_data_loader = DataLoader(
    training_image_data,      # your dataset
    batch_size=32,         # choose a batch size
    shuffle=True           # shuffle for training
)

for image_batch, label_batch in training_data_loader:
  print("Image batch shape: ", image_batch.shape)
  print("Label batch shape: ", label_batch.shape)
  break

# علت این بلوک این است که ببینیم آیا داده ها بدستی لود شده اند یا نه. برای همین یک بچ را لود کرده و میگوییم شکل و لیبل را چاپ کند تا ببینیم درست است یا نه.
# علت اینکه از لوپ خارج میشویم برای اینست که نمی خواهیم روی کل دیتاست این کار را کنیم و همیشه برای بچ اول کافی است
# منظوز ار شکل یعنی چند تا عکس در بچ است، ابعاد آن چیست و چه تعداد لایه رنگی دارد.

RuntimeError: stack expects each tensor to be equal size, but got [32, 224, 224, 3] at entry 0 and [28, 224, 224, 3] at entry 1