<a href="https://colab.research.google.com/github/Davron030901/PyTorch/blob/main/22_Keras_Using_CNN_s_as_a_Feature_Extractor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Keras Cats vs Dogs - Feature Extraction**

---

In this lesson, we learn how to use a pretrained network as a feature extractor. We'll then use those feautres as the input for our Logistic Regression Clasifier.
1. Download and Explore our data
2. Load our pretrained VGG16 Model
3. Extract our Features using VGG16
4. Train a LR Classifier using those features
5. Test some inferences

### **You will need to use High-RAM and GPU (for speed increase).**

![](https://github.com/rajeevratan84/ModernComputerVision/raw/main/Screenshot%202021-05-17%20at%207.55.52%20pm.png)

![](https://github.com/rajeevratan84/ModernComputerVision/raw/main/Screenshot%202021-05-17%20at%207.57.25%20pm.png)

## **1. Download and Explore our data**

In [None]:
import numpy as np
import os
import pandas as pd
import matplotlib.pyplot as plt
import random
import time
import gc  # Garbage collector uchun
from tensorflow.keras.preprocessing.image import img_to_array, load_img
from tensorflow.keras.applications import VGG16, imagenet_utils
from tensorflow.keras.applications.vgg16 import preprocess_input
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [None]:
!gdown --id 1Dvw0UpvItjig0JbnzbTgYKB-ibMrXdxk
!unzip -q dogs-vs-cats.zip
!unzip -q train.zip
!unzip -q test1.zip

### **Loading our data and it's labels into a dataframe**

There are many ways we can do this, this way is relatively simple to follow.

In [None]:
# Fayllarni va ularning kategoriyalarini olish
filenames = os.listdir("./train")
categories = []

for filename in filenames:
    category = filename.split('.')[0]
    if category == 'dog':
        categories.append(1)
    else:
        categories.append(0)

In [None]:
# Rasm yo'llarini to'liq yaratish
image_paths = ["./train/" + filename for filename in filenames]
labels = categories
print(f"Jami rasmlar: {len(image_paths)}")

Jami rasmlar: 25000


## **2. Load our pretrained VGG16 Model**

In [None]:
# VGG16 modelini yuklash
model = VGG16(weights="imagenet", include_top=False)
print("VGG16 modeli yuklandi")

VGG16 modeli yuklandi


In [None]:
model.summary()

## **What exactly are we doing?**

We're taking the output of the last CONV-POOL layer (see below).

The output shape at this layer is **7 x 7 x 512**

![feat_extraction](https://appliedmachinelearning.files.wordpress.com/2021/05/ef54e-vgg16.png?w=612&zoom=2)
Image referenced from [here](https://appliedmachinelearning.blog/2019/07/29/transfer-learning-using-feature-extraction-from-trained-models-food-images-classification/)

In [None]:
# Xotirani tejash parametrlari
batch_size = 64  # Kichikroq batch hajmi
save_interval = 10  # Har qancha batch dan keyin diskga saqlaymiz

In [None]:
# Batchlarni hisoblash
total_batches = len(image_paths) // batch_size
if len(image_paths) % batch_size != 0:
    total_batches += 1

print(f"Jami batchlar: {total_batches}")
print(f"Har {save_interval} batchdan keyin diskga saqlanadi")

Jami batchlar: 391
Har 10 batchdan keyin diskga saqlanadi


### **Store our Image Paths and Label names**

In [None]:
# Har bir batch uchun features va labels uchun fayl nomlari
feature_files = []
label_files = []

# Xususiyatlarni ajratish va diskga saqlash
start_time = time.time()

## **3. Extract our Features using VGG16**

In [None]:
for batch_idx in range(0, total_batches):
    batch_start = time.time()

    # Joriy batch uchun indekslarni aniqlash
    start_idx = batch_idx * batch_size
    end_idx = min((batch_idx + 1) * batch_size, len(image_paths))

    # Batch ma'lumotlarini olish
    batch_paths = image_paths[start_idx:end_idx]
    batch_labels = labels[start_idx:end_idx]
    batch_images = []
    valid_indices = []

    print(f"\nBatch {batch_idx+1}/{total_batches} qayta ishlanmoqda, {len(batch_paths)} ta rasm")

    # Har bir rasmni qayta ishlash
    for i, image_path in enumerate(batch_paths):
        try:
            # Rasmni yuklash va o'lchamini o'zgartirish
            image = load_img(image_path, target_size=(224, 224))
            image = img_to_array(image)

            # VGG16 uchun rasmni tayyorlash
            image = np.expand_dims(image, axis=0)
            image = preprocess_input(image)

            # Rasmni batch ga qo'shish
            batch_images.append(image)
            valid_indices.append(i)
        except Exception as e:
            print(f"Xatolik: {image_path}: {str(e)}")

    if not batch_images:
        print(f"Batch {batch_idx+1} bo'sh edi")
        continue

    # Rasmlarni birlashtirib batch hosil qilish
    batch_images = np.vstack(batch_images)

    # Xususiyatlarni ajratish
    features = model.predict(batch_images, batch_size=len(batch_images), verbose=0)

    # Xususiyatlarni qayta shakllantirish
    features = features.reshape(features.shape[0], -1)

    # Valid labellarga mos keladigan xususiyatlarni va labellarni saqlash
    valid_batch_labels = [batch_labels[idx] for idx in valid_indices]

    # Vaqtda qancha ketganini ko'rsatish
    batch_end = time.time()
    batch_time = batch_end - batch_start
    elapsed = batch_end - start_time
    remaining = (batch_time * (total_batches - (batch_idx + 1)))

    print(f"Batch {batch_idx+1} tugadi. Vaqt: {batch_time:.2f}s")
    print(f"O'tgan vaqt: {elapsed/60:.2f} min. Taxminiy qolgan vaqt: {remaining/60:.2f} min")

    # Xotiradan tozalash
    batch_images = None
    gc.collect()

    # Har bir save_interval da diskga saqlash
    if (batch_idx + 1) % save_interval == 0 or batch_idx == total_batches - 1:
        feature_file = f"features_batch_{batch_idx//save_interval}.npy"
        label_file = f"labels_batch_{batch_idx//save_interval}.npy"

        np.save(feature_file, features)
        np.save(label_file, np.array(valid_batch_labels))

        feature_files.append(feature_file)
        label_files.append(label_file)

        print(f"Xususiyatlar va yorliqlar saqlandi: {feature_file}, {label_file}")

    # Xotiradan tozalash
    features = None
    valid_batch_labels = None
    gc.collect()


Batch 1/391 qayta ishlanmoqda, 64 ta rasm
Batch 1 tugadi. Vaqt: 20.90s
O'tgan vaqt: 0.42 min. Taxminiy qolgan vaqt: 135.88 min

Batch 2/391 qayta ishlanmoqda, 64 ta rasm
Batch 2 tugadi. Vaqt: 0.60s
O'tgan vaqt: 1.34 min. Taxminiy qolgan vaqt: 3.91 min

Batch 3/391 qayta ishlanmoqda, 64 ta rasm
Batch 3 tugadi. Vaqt: 0.54s
O'tgan vaqt: 1.36 min. Taxminiy qolgan vaqt: 3.48 min

Batch 4/391 qayta ishlanmoqda, 64 ta rasm
Batch 4 tugadi. Vaqt: 0.87s
O'tgan vaqt: 1.38 min. Taxminiy qolgan vaqt: 5.60 min

Batch 5/391 qayta ishlanmoqda, 64 ta rasm
Batch 5 tugadi. Vaqt: 0.96s
O'tgan vaqt: 1.40 min. Taxminiy qolgan vaqt: 6.18 min

Batch 6/391 qayta ishlanmoqda, 64 ta rasm
Batch 6 tugadi. Vaqt: 0.57s
O'tgan vaqt: 1.42 min. Taxminiy qolgan vaqt: 3.68 min

Batch 7/391 qayta ishlanmoqda, 64 ta rasm
Batch 7 tugadi. Vaqt: 0.55s
O'tgan vaqt: 1.44 min. Taxminiy qolgan vaqt: 3.53 min

Batch 8/391 qayta ishlanmoqda, 64 ta rasm
Batch 8 tugadi. Vaqt: 0.86s
O'tgan vaqt: 1.46 min. Taxminiy qolgan vaqt: 5.47 m

In [None]:
print(f"\nBarcha xususiyatlar ajratildi va saqlandi! Vaqt: {(time.time() - start_time)/60:.2f} min")

# Saqlangan fayllardan ma'lumotlarni o'qib, model yaratish
print("\nSaqlangan xususiyatlardan train/test ma'lumotlarini tayyorlash...")

# Hamma xususiyatlar va yorliqlarni to'plash
all_features = []
all_labels = []

for feat_file, label_file in zip(feature_files, label_files):
    features = np.load(feat_file)
    labels = np.load(label_file)

    # Train/test ga ajratish uchun saqlash
    all_features.append(features)
    all_labels.append(labels)

    # Xotiradan tozalash
    features = None
    labels = None
    gc.collect()


Barcha xususiyatlar ajratildi va saqlandi! Vaqt: 9.13 min

Saqlangan xususiyatlardan train/test ma'lumotlarini tayyorlash...


In [None]:
# Train/test ga ajratish uchun ma'lumotlarni birlashtirish
X = np.vstack(all_features)
y = np.concatenate(all_labels)

print(f"Jami xususiyatlar: {X.shape}, Jami yorliqlar: {y.shape}")

# Xotiradan tozalash
all_features = None
all_labels = None
gc.collect()

# Ma'lumotlarni train/test ga ajratish
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=7)

# Xotiradan tozalash
X = None
gc.collect()

Jami xususiyatlar: (2536, 25088), Jami yorliqlar: (2536,)


0

## **4. Train a LR Classifier using those features**

First let's store our extracted feature info in a format that can loaded directly by sklearn.

In [None]:
# Logistic Regression modelini o'rgatish
print("Logistic Regression modelini o'rgatish...")
glm = LogisticRegression(C=0.1, max_iter=1000)
glm.fit(X_train, y_train)

Logistic Regression modelini o'rgatish...


## **5. Test some inferences**

In [None]:
# Test natijalarini baholash
accuracy = glm.score(X_test, y_test)
print(f'Logistic Regression modeli aniqligi: {accuracy*100:.2f}%')

# Test natijalarini tekshirish
print("Test rasmlarida sinovdan o'tkazish...")
image_names_test = os.listdir("./test1")
image_paths_test = ["./test1/" + x for x in image_names_test]

Logistic Regression modeli aniqligi: 98.62%
Test rasmlarida sinovdan o'tkazish...


In [None]:
# Kamroq test rasmlarini tanlash (xotirani tejash uchun)
test_sample = random.sample(image_paths_test, min(12, len(image_paths_test)))

# Test rasmlarini bashorat qilish
predictions = []

for test_path in test_sample:
    try:
        # Rasmni yuklash va qayta ishlash
        image = load_img(test_path, target_size=(224, 224))
        image = img_to_array(image)
        image = np.expand_dims(image, axis=0)
        image = preprocess_input(image)

        # Xususiyatlarni olish
        features = model.predict(image, verbose=0)
        features = features.reshape(1, -1)

        # Bashorat qilish
        result = glm.predict(features)[0]
        label = 'dog' if result == 1 else 'cat'
        predictions.append(label)

        # Xotiradan tozalash
        features = None
        image = None
        gc.collect()
    except Exception as e:
        print(f"Test rasmi xatosi: {test_path}: {str(e)}")
        predictions.append("error")


In [None]:
# Bashoratlarni ko'rsatish
for i, (path, pred) in enumerate(zip(test_sample, predictions)):
    print(f"{i+1}. {path} - Bashorat: {pred}")

# Vaqtinchalik fayllarni tozalash (ixtiyoriy)
for file in feature_files + label_files:
    if os.path.exists(file):
        os.remove(file)

print("\nIsh yakunlandi!")

1. ./test1/10652.jpg - Bashorat: cat
2. ./test1/11894.jpg - Bashorat: dog
3. ./test1/238.jpg - Bashorat: dog
4. ./test1/10844.jpg - Bashorat: dog
5. ./test1/8890.jpg - Bashorat: cat
6. ./test1/7353.jpg - Bashorat: dog
7. ./test1/9444.jpg - Bashorat: cat
8. ./test1/9938.jpg - Bashorat: dog
9. ./test1/558.jpg - Bashorat: dog
10. ./test1/565.jpg - Bashorat: dog
11. ./test1/4691.jpg - Bashorat: dog
12. ./test1/9284.jpg - Bashorat: dog

Ish yakunlandi!


## **How do we compare to Kaggle's top 10?**
https://www.kaggle.com/c/dogs-vs-cats/leaderboard

We just got 98.34%, second place! Not too shabby :)

![](https://github.com/rajeevratan84/ModernComputerVision/raw/main/Screenshot%202021-05-17%20at%208.09.25%20pm.png)