In [1]:
import numpy as np 
import pickle as pkl
import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50,preprocess_input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.layers import GlobalMaxPooling2D

from sklearn.neighbors import NearestNeighbors
import os 

In [2]:
filename =[]
for file in os.listdir("images"):
    filename.append(os.path.join("images", file))

In [3]:
len(filename)

44441

In [4]:
#Importing ResNet50 Model
model = ResNet50(weights="imagenet", include_top=False, input_shape=(224,224,3))
model.trainable = False

'''
 1-> include_top=False
Removes the fully connected (FC) layers from ResNet50.
Keeps only the convolutional layers (feature extraction part).
This allows us to add custom layers for classification.

ResNet50 is originally trained on ImageNet (1,000 classes), but if we use it for a different task (e.g., classifying only 10 classes)
we need to remove the top classification layers and add our own.


2-> input_shape=(224,224,3) Sets the input size for the model:
224x224 pixels (height & width).
3 channels (RGB images).
Why?
ResNet50 was trained on ImageNet with 224x224 RGB images, so we use the same size to match its pretrained weights.

'''

model = tf.keras.models.Sequential([
    model,GlobalMaxPooling2D()
])
model.summary()

''''
What is this part doing?
model = tf.keras.models.Sequential([
    model,
    GlobalMaxPooling2D()
])
Wraps ResNet50 inside a Sequential model`.

Adds GlobalMaxPooling2D() after ResNet50.

🔹 What is GlobalMaxPooling2D()?
Converts feature maps into a 1D vector by taking the maximum value from each feature map.

Reduces dimensions while keeping important features.

Helps prepare the output for fully connected layers.

Why Do We Need a 1D Vector Before Adding Classification Layers in ResNet50?
🔹 1. ResNet50 Outputs High-Dimensional Feature Maps
When you pass an image through ResNet50 (without the fully connected layers), it extracts deep features and outputs a high-dimensional tensor (e.g., (7,7,2048)).

💡 Example Output from ResNet50 (Feature Maps):

(None, 7, 7, 2048)  # Batch size, Height, Width, Channels
This is a 3D tensor (except for batch size).

🔹 2. Classification Layers Expect a 1D Input
In a classification model, the final layers usually consist of fully connected (Dense) layers, which expect a 1D input (flat vector) instead of a 3D tensor.

💡 Example of a Dense Layer:

Dense(512, activation='relu')  # Expects input shape (None, 2048), not (None, 7, 7, 2048)
✅ A Dense layer works with a 1D vector (e.g., (None, 2048)).
❌ It cannot directly process a 3D feature map (e.g., (None, 7, 7, 2048)).

Summary
When using a convolutional neural network (CNN) like ResNet50, the output feature maps have a 3D shape (e.g., (7,7,2048)).
However, most classification or regression tasks require a 1D feature vector to pass into Fully Connected (Dense) layers.
Regardless of the CNN model you choose (ResNet50, VGG16, MobileNet, EfficientNet, etc.), adding custom layers is still important

✅ ResNet50 outputs 3D feature maps ((7,7,2048)).
✅ Fully connected (Dense) layers require a 1D input.
✅ GlobalMaxPooling2D() or GlobalAveragePooling2D() converts 3D to 1D.
✅ After conversion, we can add Dense layers for classification.
✅ include_top=False → Removes fully connected layers, allowing custom classification layers.
✅ input_shape=(224,224,3) → Ensures the input size matches pretrained weights.

Why Do We Use GlobalMaxPooling2D()?
CNN models like ResNet50 output high-dimensional feature maps (e.g., (7,7,2048)). However, classification layers (Dense layers) require a 1D vector as input.

GlobalMaxPooling2D() helps us achieve this by:

Reducing dimensions from (height, width, channels) → (channels).

Keeping the most important features by taking the maximum value from each feature map.

Improving computational efficiency by reducing the number of parameters compared to using Flatten().
When Do We Use GlobalMaxPooling2D()?
We use GlobalMaxPooling2D() in scenarios like:

1-> After Feature Extraction in Pretrained CNNs

When using models like ResNet50, EfficientNet, or VGG without fully connected layers (include_top=False).
To convert high-dimensional feature maps into 1D vectors before classification.
Example:
base_model = ResNet50(weights="imagenet", include_top=False, input_shape=(224,224,3))
model = tf.keras.Sequential([
    base_model,
    GlobalMaxPooling2D(),  # Converts (7,7,2048) → (2048)
    Dense(512, activation='relu'),
    Dense(10, activation='softmax')  # 10-class classification
])
2-> To Reduce Model Complexity Compared to Flatten()

Flatten() converts (7,7,2048) → (100352), which adds too many parameters.

GlobalMaxPooling2D() converts (7,7,2048) → (2048), reducing complexity.

3->In Transfer Learning

When reusing CNNs for new tasks and needing to extract compact feature vectors.

4->For Embedding Generation

Used in face recognition, image search, and object detection to generate compact embeddings.

'''

'\'\nWhat is this part doing?\nmodel = tf.keras.models.Sequential([\n    model,\n    GlobalMaxPooling2D()\n])\nWraps ResNet50 inside a Sequential model`.\n\nAdds GlobalMaxPooling2D() after ResNet50.\n\n🔹 What is GlobalMaxPooling2D()?\nConverts feature maps into a 1D vector by taking the maximum value from each feature map.\n\nReduces dimensions while keeping important features.\n\nHelps prepare the output for fully connected layers.\n\nWhy Do We Need a 1D Vector Before Adding Classification Layers in ResNet50?\n🔹 1. ResNet50 Outputs High-Dimensional Feature Maps\nWhen you pass an image through ResNet50 (without the fully connected layers), it extracts deep features and outputs a high-dimensional tensor (e.g., (7,7,2048)).\n\n💡 Example Output from ResNet50 (Feature Maps):\n\n(None, 7, 7, 2048)  # Batch size, Height, Width, Channels\nThis is a 3D tensor (except for batch size).\n\n🔹 2. Classification Layers Expect a 1D Input\nIn a classification model, the final layers usually consist of

In [5]:
import os

print(os.path.exists("sample/canvas.avif"))


True


In [6]:
from PIL import Image
from numpy.linalg import norm
import pillow_avif

img = image.load_img("sample/watch.webp", target_size=(224,224))
img_array = image.img_to_array(img)
img_expand = np.expand_dims(img_array,axis=0)
img_pre = preprocess_input(img_expand)
result = model.predict(img_pre).flatten()
norm_result = result/norm(result)
norm_result

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3s/step


array([0.00179623, 0.00358526, 0.00648926, ..., 0.000904  , 0.02057721,
       0.01859489], dtype=float32)

In [7]:
result

array([0.6424686, 1.2823577, 2.3210473, ..., 0.3233397, 7.3599606,
       6.650934 ], dtype=float32)

In [8]:
def extract_images(image_path, model):

    img = image.load_img(image_path, target_size=(224,224))
    img_array = image.img_to_array(img)
    img_expand = np.expand_dims(img_array,axis=0)
    img_pre = preprocess_input(img_expand)
    result = model.predict(img_pre).flatten()
    norm_result = result/norm(result)
    return norm_result

In [9]:
extract_images(filename[0], model)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 201ms/step


array([0.        , 0.01761617, 0.00171604, ..., 0.01247241, 0.02726403,
       0.06899223], dtype=float32)

In [10]:
image_features = []
for file in filename[0:20]:
    image_features.append(extract_images(file, model))
image_features

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 252ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 206ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 189ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 180ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 205ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 201ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 186ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 169ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 271ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 266ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 312ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 262ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 290ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m 

[array([0.        , 0.01761617, 0.00171604, ..., 0.01247241, 0.02726403,
        0.06899223], dtype=float32),
 array([0.        , 0.03648945, 0.        , ..., 0.00997915, 0.02375535,
        0.04649904], dtype=float32),
 array([0.        , 0.03642143, 0.00710436, ..., 0.00140772, 0.        ,
        0.05435037], dtype=float32),
 array([0.00232165, 0.05030549, 0.00747744, ..., 0.00346687, 0.03391024,
        0.04565736], dtype=float32),
 array([0.00306834, 0.06240455, 0.        , ..., 0.00170625, 0.02032888,
        0.0583326 ], dtype=float32),
 array([0.        , 0.10469121, 0.00198092, ..., 0.        , 0.03033769,
        0.02712846], dtype=float32),
 array([0.        , 0.12438459, 0.01465612, ..., 0.00289705, 0.04055161,
        0.0653459 ], dtype=float32),
 array([0.        , 0.09169197, 0.01569913, ..., 0.        , 0.00503582,
        0.0456004 ], dtype=float32),
 array([0.        , 0.09545271, 0.01153319, ..., 0.00073009, 0.04513267,
        0.07661071], dtype=float32),
 array([0.

In [11]:
image_sav = pkl.dump(image_features, open  ("image_sav.pkl", "wb"))

In [14]:
filename = pkl.dump(filename, open("filename.pkl", "wb"))

In [15]:
image_sav = pkl.load(open("image_sav.pkl", "rb"))

In [16]:
filename = pkl.load(open("filename.pkl", "rb"))

In [17]:
np.array(image_sav).shape

(20, 2048)

In [19]:
neighbors = NearestNeighbors(n_neighbors=6 , algorithm="brute", metric="euclidean")

''''
NearestNeighbors:->> Initializes the nearest neighbors model.
n_neighbors=6:->> Finds the 6 closest neighbors for a given data point.
algorithm="brute":->> Uses the brute-force method to compute nearest neighbors (good for small datasets).
metric="euclidean":->> Uses Euclidean distance to measure similarity between points.

'''

'\'\nNearestNeighbors:->> Initializes the nearest neighbors model.\nn_neighbors=6:->> Finds the 6 closest neighbors for a given data point.\nalgorithm="brute":->> Uses the brute-force method to compute nearest neighbors (good for small datasets).\nmetric="euclidean":->> Uses Euclidean distance to measure similarity between points.\n\n'

In [20]:
neighbors.fit(image_sav)