# **Feature Extraction**

### **Abstract**  

- **Objective:** Extract meaningful feature representations from images using a pre-trained deep learning model.  
- **Model Used:** **ResNet50** (pre-trained on ImageNet, with Global Max Pooling for feature extraction).  
- **Feature Extraction Process:**  
  - Load and resize images to **224×224** pixels.  
  - Convert images to arrays and preprocess them using **ResNet50's preprocess_input** function.  
  - Extract features using the model and apply **L2 normalization** for consistency.  
- **Dataset Processing:**  
  - Iterate through image files in a specified directory.  
  - Extract features and store them in a structured feature list.  
- **Applications:**  
  - Image-based recommendation systems.  
  - Content-based image retrieval.  
  - Similarity detection and clustering.  
- **Outcome:** A structured feature representation that enables efficient image retrieval and comparison.

In [1]:
import tensorflow
from tensorflow.keras.preprocessing import image
from tensorflow.keras.layers import GlobalMaxPooling2D
from tensorflow.keras.applications.resnet50 import ResNet50,preprocess_input
import numpy as np
from numpy.linalg import norm
import os
from tqdm import tqdm
import pickle


## Model building

- This code uses **ResNet50**, a pre-trained model, to extract important image features. It removes unnecessary layers, freezes learning, and adds a feature extractor (**GlobalMaxPooling2D**). This helps convert images into useful numeric representations for tasks like **finding similar images** or **recommendations**.

In [2]:
model=ResNet50(weights='imagenet',include_top=False,input_shape=(224,224,3))
model.trainable=False
model=tensorflow.keras.Sequential([model,GlobalMaxPooling2D()])

In [3]:
print(model.summary())

None


## Feature extraction function(image preprocessing)

- This function extracts key features from an image for comparison. It resizes the image to **224×224**, converts it to a numerical format, and preprocesses it for ResNet50. The model then extracts and flattens the features, which are **normalized** for efficient similarity searches. This enables **image search, recommendations, and retrieval**.


In [4]:
def extract_features(img_path,model):
    img=image.load_img(img_path,target_size=(224,224))
    img_array=image.img_to_array(img)
    exp_img=np.expand_dims(img_array,axis=0)
    prepro_img=preprocess_input(exp_img)
    result=model.predict(prepro_img,verbose=0).flatten()
    norm_result=result/norm(result)
    
    return norm_result

## Training


This code creates a list of full file paths for all images in the "images" folder, making them ready for processing.

In [5]:
filename=[]
for file in os.listdir('images'):
      filename.append(os.path.join('images',file))

      
filename[0:5]

['images\\10000.jpg',
 'images\\10001.jpg',
 'images\\10002.jpg',
 'images\\10003.jpg',
 'images\\10004.jpg']

In [6]:

feature_list=[]
for file in tqdm(filename, desc="Processing Images", unit="img"):
    feature_list.append(extract_features(file,model))

print(np.array(feature_list).shape)

Processing Images: 100%|██████████| 44441/44441 [1:31:11<00:00,  8.12img/s]


(44441, 2048)


In [7]:
np.array(feature_list).shape

(44441, 2048)

## Saving models

In [None]:
with open('filename.pkl','wb')as file:
    pickle.dump(filename,file)

In [None]:
with open('embeddings.pkl','wb')as file:
    pickle.dump(feature_list,file)

In [None]:
with open('model.pkl','wb')as file:
    pickle.dump(model,file)