# **PRODIGY Task 3**

# **Dataset:**
[ https://www.kaggle.com/c/dogs-vs-cats/data]( https://www.kaggle.com/c/dogs-vs-cats/data)

# **Task: Classifying Cats and Dogs Using SVM**

The aim is to classify images of cats and dogs using Support Vector Machines (SVM). Several approaches are implemented, including direct SVM classification, Principal Component Analysis (PCA) for dimensionality reduction, and Histogram of Oriented Gradients (HOG) for feature extraction. The classification accuracy for each approach is evaluated and compared to determine the most effective technique.



### **Importing libraries**

In [None]:
import zipfile
import os
import numpy as np
from PIL import Image
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
from sklearn.decomposition import PCA
from sklearn.model_selection import GridSearchCV
from skimage.feature import hog
from skimage import color

### **Extracting the Zip File Containing the Images**
The zip file is extracted to a specified directory to access the images for further processing.


In [None]:
# Path to the zip file
zip_file_path = '/content/drive/MyDrive/Colab Notebooks/internship/train.zip'

# Path where you want to extract the images
extract_path = '/content/drive/MyDrive/Colab Notebooks/internship/extracted image'

# Extracting zip file
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall(extract_path)

print("Extraction completed!")

Extraction completed!


### **Loading and Preprocessing Images**
Images are resized to a fixed size of 64x64 and stored in arrays. The labels (0 for cat, 1 for dog) are assigned based on the filenames.

In [None]:
image_dir = '/content/drive/MyDrive/Colab Notebooks/internship/extracted image/train'

img_size = (64, 64)

images = []
labels = []

for img_name in os.listdir(image_dir):
    if img_name.endswith(".jpg"):

        img_path = os.path.join(image_dir, img_name)

        img = Image.open(img_path).resize(img_size)

        img_array = np.array(img)

        images.append(img_array)

        # Extract label from file name ('cat' or 'dog')
        if 'cat' in img_name:
            labels.append(0)  # Label '0' for cat
        elif 'dog' in img_name:
            labels.append(1)  # Label '1' for dog

# Convert lists to numpy arrays for use in the SVM
images = np.array(images)
labels = np.array(labels)

print("Image data and labels prepared!")

Image data and labels prepared!


### **Normalizing and Flattening the Image Data**
The pixel values of the images are normalized to a range of [0, 1]. The images are then flattened to convert each image into a 1D array.

In [None]:
images = images / 255.0
# Flattening the image arrays
n_samples = images.shape[0]
n_features = images.shape[1] * images.shape[2] * images.shape[3]

# Reshape the images to a 2D array where each row is a flattened image
X = images.reshape(n_samples, n_features)

print(f"Shape of flattened images: {X.shape}")
print(f"Shape of labels: {labels.shape}")

Shape of flattened images: (25000, 12288)
Shape of labels: (25000,)


### **Splitting the Dataset into Training and Testing Set**s
The dataset is split into 80% training data and 20% testing data.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.2, random_state=42)

### **Training a Support Vector Machine (SVM) Classifier with RBF Kernel**
An SVM with an RBF kernel is trained on the flattened image data, and predictions are made on the test set.

In [None]:
svm = SVC(kernel='rbf', gamma='scale',random_state = 0)

svm.fit(X_train, y_train)



### **Predict the label for the test set.**

In [None]:
y_pred = svm.predict(X_test)

### **Evaluating the SVM Classifier**
The accuracy, classification report, and confusion matrix are displayed to evaluate the performance of the SVM.

In [None]:
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of the SVM with RBF kernel: {accuracy:.2f}")

Accuracy of the SVM with RBF kernel: 0.69


In [None]:
# Classification report for 'Cat' and 'Dog'
target_names = ['Cat', 'Dog']
classification_rep = classification_report(y_test, y_pred, target_names=target_names)
print("Classification Report:\n", classification_rep)

# Save classification report to a file
classification_file_path = 'classification_report.txt'
with open(classification_file_path, 'w') as file:
    file.write(classification_rep)

print(f"Classification report saved to: {classification_file_path}")

Classification Report:
               precision    recall  f1-score   support

         Cat       0.69      0.70      0.69      2515
         Dog       0.69      0.69      0.69      2485

    accuracy                           0.69      5000
   macro avg       0.69      0.69      0.69      5000
weighted avg       0.69      0.69      0.69      5000

Classification report saved to: classification_report.txt


In [None]:
cm = confusion_matrix(y_test, y_pred)
print(cm)

[[1749  766]
 [ 778 1707]]


### **Feature Reduction Using Principal Component Analysis (PCA)**
PCA is applied to reduce the dimensionality of the image data to 1000 features.

In [None]:
# Reduce features to 1000 using PCA
pca = PCA(n_components=1000)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

### **Training and Evaluating SVM After PCA**
The SVM classifier is trained again on the reduced data, and the results are evaluated.

In [None]:
svm_2 = SVC(kernel='rbf', gamma='scale',random_state = 0)

svm_2.fit(X_train_pca, y_train)

### **Predict the test set label after applying PCA**

In [None]:
y_pred_2 = svm_2.predict(X_test_pca)

### **Evaluating the SVM Classifier after applying PCA**

In [None]:
accuracy_2 = accuracy_score(y_test, y_pred_2)
print(f"Accuracy of the SVM with RBF kernel: {accuracy_2:.2f}")

Accuracy of the SVM with RBF kernel: 0.69


In [None]:
# Classification report for 'Cat' and 'Dog'
target_names = ['Cat', 'Dog']
classification_rep_2 = classification_report(y_test, y_pred, target_names=target_names)
print("Classification Report:\n", classification_rep_2)

# Save classification report to a file
classification_file_path_2 = 'classification_report_2.txt'
with open(classification_file_path_2, 'w') as file:
    file.write(classification_rep_2)

print(f"Classification report saved to: {classification_file_path_2}")

Classification Report:
               precision    recall  f1-score   support

         Cat       0.69      0.70      0.69      2515
         Dog       0.69      0.68      0.69      2485

    accuracy                           0.69      5000
   macro avg       0.69      0.69      0.69      5000
weighted avg       0.69      0.69      0.69      5000

Classification report saved to: classification_report_2.txt


In [None]:
cm_2 = confusion_matrix(y_test, y_pred)
print(cm_2)

[[1748  767]
 [ 789 1696]]


### **Hyperparameter Tuning Using GridSearchCV**
GridSearchCV is used to find the optimal hyperparameters for the SVM classifier. The best parameters are identified and applied.

In [None]:
param_grid = {
    'C': [0.1, 0.5, 0.9,1],
    'kernel':['rbf', 'poly', 'sigmoid']
}

# Perform GridSearchCV to find best parameters
grid_search = GridSearchCV(svm_2, param_grid,cv=1,refit=True, verbose=2)
grid_search.fit(X_train_pca, y_train)

# Best parameters found by GridSearch
print("Best parameters found:", grid_search.best_params_)

Fitting 2 folds for each of 12 candidates, totalling 24 fits
[CV] END ..................................C=0.1, kernel=rbf; total time= 2.8min
[CV] END ..................................C=0.1, kernel=rbf; total time= 3.7min
[CV] END .................................C=0.1, kernel=poly; total time= 2.5min
[CV] END .................................C=0.1, kernel=poly; total time= 2.1min
[CV] END ..............................C=0.1, kernel=sigmoid; total time= 1.8min
[CV] END ..............................C=0.1, kernel=sigmoid; total time= 1.8min
[CV] END ..................................C=0.5, kernel=rbf; total time= 2.5min
[CV] END ..................................C=0.5, kernel=rbf; total time= 2.4min
[CV] END .................................C=0.5, kernel=poly; total time= 2.0min
[CV] END .................................C=0.5, kernel=poly; total time= 2.0min
[CV] END ..............................C=0.5, kernel=sigmoid; total time= 1.6min
[CV] END ..............................C=0.5, ke

In [None]:
# Use the best estimator to make predictions
grid_y_pred = grid_search.best_estimator_.predict(X_test_pca)

### **Evaluating the SVM Classifier for the best parameters**

In [None]:
accuracy_2_best_estemate = accuracy_score(y_test, grid_y_pred)
print(f"Accuracy of the SVM with RBF kernel: {accuracy_2_best_estemate:.3f}")

Accuracy of the SVM with RBF kernel: 0.689


In [None]:
# Classification report for 'Cat' and 'Dog'
target_names = ['Cat', 'Dog']
classification_rep_2_best_estemate = classification_report(y_test, grid_y_pred, target_names=target_names)
print("Classification Report:\n", classification_rep_2_best_estemate)

# Save classification report to a file
classification_file_path_2_best_estemate = 'classification_report_2_best_estemate.txt'
with open(classification_file_path_2_best_estemate, 'w') as file:
    file.write(classification_rep_2_best_estemate)

print(f"Classification report saved to: {classification_file_path_2_best_estemate}")

Classification Report:
               precision    recall  f1-score   support

         Cat       0.69      0.70      0.69      2515
         Dog       0.69      0.68      0.69      2485

    accuracy                           0.69      5000
   macro avg       0.69      0.69      0.69      5000
weighted avg       0.69      0.69      0.69      5000

Classification report saved to: classification_report_2_best_estemate.txt


In [None]:
cm_2_best_estemate = confusion_matrix(y_test, grid_y_pred)
print(cm_2_best_estemate)

[[1748  767]
 [ 789 1696]]


### **Extracting Histogram of Oriented Gradients (HOG) Features**
HOG features are extracted from the grayscale versions of the images using different pixel-per-cell settings.

In [None]:
image_dir = '/content/drive/MyDrive/Colab Notebooks/internship/extracted image/train'


img_size = (64, 64)

images_hog = []
labels = []

for img_name in os.listdir(image_dir):
    if img_name.endswith(".jpg"):

        img_path = os.path.join(image_dir, img_name)

        # Open image and resize it to img_size
        img = Image.open(img_path).resize(img_size)

        img_array = np.array(img)

        # Convert image to grayscale for HOG
        img_gray = color.rgb2gray(img_array)

        # Extract HOG features
        hog_features = hog(img_gray, pixels_per_cell=(32, 32),
                           cells_per_block=(2, 2), block_norm='L2-Hys', visualize=False)

        # Append the HOG features instead of raw image data
        images_hog.append(hog_features)

        # Extract label from file name ('cat' or 'dog')
        if 'cat' in img_name:
            labels.append(0)  # Label '0' for cat
        elif 'dog' in img_name:
            labels.append(1)  # Label '1' for dog

# Convert lists to numpy arrays for use in the SVM
images_hog = np.array(images_hog)
labels = np.array(labels)

print("HOG features and labels prepared!")

HOG features and labels prepared!


In [None]:
X_train_hog, X_test_hog, y_train, y_test = train_test_split(images_hog, labels, test_size=0.2, random_state=42)

### **Training and Evaluating SVM with HOG Features (Pixels Per Cell = 32x32)**

In [None]:
svm_hog = SVC(kernel='rbf', gamma='scale', random_state=0)

# Train the classifier
svm_hog.fit(X_train_hog, y_train)

In [None]:
# Make predictions on the test set
y_pred_hog = svm_hog.predict(X_test_hog)

In [None]:
accuracy_hog = accuracy_score(y_test, y_pred_hog)
print(f"Accuracy of the SVM with HOG features: {accuracy_hog:.2f}")

Accuracy of the SVM with HOG features: 0.70


In [None]:
# Classification report for 'Cat' and 'Dog'
target_names = ['Cat', 'Dog']
classification_rep_hog = classification_report(y_test, y_pred_hog, target_names=target_names)
print("Classification Report:\n", classification_rep_hog)

# Save classification report to a file
classification_file_path_hog = 'classification_report_hog.txt'
with open(classification_file_path_hog, 'w') as file:
    file.write(classification_rep_hog)

print(f"Classification report saved to: {classification_file_path_hog}")

Classification Report:
               precision    recall  f1-score   support

         Cat       0.72      0.67      0.69      2514
         Dog       0.69      0.74      0.71      2486

    accuracy                           0.70      5000
   macro avg       0.71      0.70      0.70      5000
weighted avg       0.71      0.70      0.70      5000

Classification report saved to: classification_report_hog.txt


In [None]:
cm_hog = confusion_matrix(y_test, y_pred_hog)
print(cm_hog)

[[1684  830]
 [ 650 1836]]


### **Extracting Histogram of Oriented Gradients (HOG) Features**

In [None]:
image_dir = '/content/drive/MyDrive/Colab Notebooks/internship/extracted image/train'

img_size = (64, 64)

images_hog_2 = []
labels = []

for img_name in os.listdir(image_dir):
    if img_name.endswith(".jpg"):

        img_path = os.path.join(image_dir, img_name)

        img = Image.open(img_path).resize(img_size)

        img_array = np.array(img)

        # Convert image to grayscale for HOG
        img_gray = color.rgb2gray(img_array)

        # Extract HOG features
        hog_features_2 = hog(img_gray, pixels_per_cell=(16, 18),
                           cells_per_block=(2, 2), block_norm='L2-Hys', visualize=False)

        # Append the HOG features instead of raw image data
        images_hog_2.append(hog_features_2)

        # Extract label from file name ('cat' or 'dog')
        if 'cat' in img_name:
            labels.append(0)  # Label '0' for cat
        elif 'dog' in img_name:
            labels.append(1)  # Label '1' for dog

# Convert lists to numpy arrays for use in the SVM
images_hog_2 = np.array(images_hog_2)
labels = np.array(labels)

print("HOG features and labels prepared!")

HOG features and labels prepared!


In [None]:
X_train_hog_2, X_test_hog_2, y_train, y_test = train_test_split(images_hog_2, labels, test_size=0.2, random_state=42)

### **Training and Evaluating SVM with HOG Features (Pixels Per Cell = 16x18)**

In [None]:
svm_hog_2 = SVC(kernel='rbf', gamma='scale', random_state=0)


svm_hog_2.fit(X_train_hog_2, y_train)

In [None]:
y_pred_hog_2 = svm_hog_2.predict(X_test_hog_2)

In [None]:
accuracy_hog_2 = accuracy_score(y_test, y_pred_hog_2)
print(f"Accuracy of the SVM with HOG features: {accuracy_hog_2:.2f}")

Accuracy of the SVM with HOG features: 0.77


In [None]:
# Classification report for 'Cat' and 'Dog'
target_names = ['Cat', 'Dog']
classification_rep_hog_2 = classification_report(y_test, y_pred_hog_2, target_names=target_names)
print("Classification Report:\n", classification_rep_hog_2)

# Save classification report to a file
classification_file_path_hog_2 = 'classification_report_hog_2.txt'
with open(classification_file_path_hog_2, 'w') as file:
    file.write(classification_rep_hog_2)

print(f"Classification report saved to: {classification_file_path_hog_2}")

Classification Report:
               precision    recall  f1-score   support

         Cat       0.78      0.75      0.76      2514
         Dog       0.76      0.78      0.77      2486

    accuracy                           0.77      5000
   macro avg       0.77      0.77      0.77      5000
weighted avg       0.77      0.77      0.77      5000

Classification report saved to: classification_report_hog_2.txt


In [None]:
cm_hog_2 = confusion_matrix(y_test, y_pred_hog_2)
print(cm_hog_2)

[[1887  627]
 [ 546 1940]]


### **Extracting Histogram of Oriented Gradients (HOG) Features**

In [None]:
image_dir = '/content/drive/MyDrive/Colab Notebooks/internship/extracted image/train'

img_size = (64, 64)

images_hog_3 = []
labels = []

for img_name in os.listdir(image_dir):
    if img_name.endswith(".jpg"):

        img_path = os.path.join(image_dir, img_name)

        img = Image.open(img_path).resize(img_size)

        img_array = np.array(img)

        img_gray = color.rgb2gray(img_array)

        # Extract HOG features
        hog_features_3 = hog(img_gray, pixels_per_cell=(8, 8),
                           cells_per_block=(2, 2), block_norm='L2-Hys', visualize=False)

        # Append the HOG features instead of raw image data
        images_hog_3.append(hog_features_3)

        # Extract label from file name ('cat' or 'dog')
        if 'cat' in img_name:
            labels.append(0)  # Label '0' for cat
        elif 'dog' in img_name:
            labels.append(1)  # Label '1' for dog

# Convert lists to numpy arrays for use in the SVM
images_hog_3 = np.array(images_hog_3)
labels = np.array(labels)

print("HOG features and labels prepared!")

HOG features and labels prepared!


In [None]:
X_train_hog_3, X_test_hog_3, y_train, y_test = train_test_split(images_hog_3, labels, test_size=0.2, random_state=42)

### **Training and Evaluating SVM with HOG Features (Pixels Per Cell = 8x8)**

In [None]:
svm_hog_3 = SVC(kernel='rbf', gamma='scale', random_state=0)

svm_hog_3.fit(X_train_hog_3, y_train)

In [None]:
y_pred_hog_3 = svm_hog_3.predict(X_test_hog_3)

In [None]:
accuracy_hog_3 = accuracy_score(y_test, y_pred_hog_3)
print(f"Accuracy of the SVM with HOG features: {accuracy_hog_3:.2f}")

Accuracy of the SVM with HOG features: 0.78


In [None]:
# Classification report for 'Cat' and 'Dog'
target_names = ['Cat', 'Dog']
classification_rep_hog_3 = classification_report(y_test, y_pred_hog_3, target_names=target_names)
print("Classification Report:\n", classification_rep_hog_3)

# Save classification report to a file
classification_file_path_hog_3 = 'classification_report_hog_3.txt'
with open(classification_file_path_hog_3, 'w') as file:
    file.write(classification_rep_hog_3)

print(f"Classification report saved to: {classification_file_path_hog_3}")

Classification Report:
               precision    recall  f1-score   support

         Cat       0.79      0.77      0.78      2514
         Dog       0.78      0.79      0.78      2486

    accuracy                           0.78      5000
   macro avg       0.78      0.78      0.78      5000
weighted avg       0.78      0.78      0.78      5000

Classification report saved to: classification_report_hog_3.txt


In [None]:
cm_hog_3 = confusion_matrix(y_test, y_pred_hog_3)
print(cm_hog_3)

[[1947  567]
 [ 526 1960]]


# **Conclusion**
The classification results show that using HOG features significantly improves the performance of the SVM model compared to using raw pixel values. The accuracy increases as the pixels-per-cell parameter in HOG is reduced, indicating that finer feature extraction improves model performance. While raw SVM and PCA + SVM yield the same accuracy (0.69), using HOG with smaller cell sizes (8x8) achieves the best accuracy (0.78), demonstrating the effectiveness of texture-based features in distinguishing between cats and dogs.