# Implement a support vector machine (SVM) to classify images of cats and dogs from the Kaggle dataset.

### Importing neccessary Libraries

In [82]:
import os
import cv2
import shutil
import numpy as np
import seaborn as sns
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
import random
import plotly.express as px
import scipy as sp

from scipy import ndimage
from shutil import copyfile
from tensorflow.keras.layers import Conv2D,Add,MaxPooling2D, Dense, BatchNormalization,Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import LearningRateScheduler
from tensorflow.keras.preprocessing.image import ImageDataGenerator 
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score




### 🔹 UPDATE this path to match where you extracted the dataset

In [None]:

YOUR_DATASET_PATH = r"C:\Users\kingzuzu\Downloads\dog_vs_cat_dataset"  # Example for Windows

### Define train path

In [95]:

train_path = os.path.join(YOUR_DATASET_PATH, "train")

In [85]:
# Check if the directories exist
cat_folder = os.path.join(train_path, "Cat")
dog_folder = os.path.join(train_path, "Dog")

### Image properties

In [96]:

IMG_SIZE = (64, 64)

### Function to load and preprocess images

In [97]:

def load_images_from_folder(folder, label):
    images, labels = [], []
    for filename in os.listdir(folder):
        img_path = os.path.join(folder, filename)
        img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
        if img is not None:
            img = cv2.resize(img, IMG_SIZE)
            images.append(img.flatten())  # Flatten image to 1D array
            labels.append(label)
    return images, labels

### Load images

In [98]:

cat_images, cat_labels = load_images_from_folder(cat_folder, 0)
dog_images, dog_labels = load_images_from_folder(dog_folder, 1)

### combine dataset

In [99]:

X = np.array(cat_images + dog_images)
y = np.array(cat_labels + dog_labels)

### Split data

In [100]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Train SVM classifier

In [101]:

svm_model = SVC(kernel="linear")
svm_model.fit(X_train, y_train)

### Predict and evalute

In [102]:

y_pred = svm_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"SVM Classification Accuracy: {accuracy:.2f}")

SVM Classification Accuracy: 0.45


Insights

Poor Model Performance:  SVM classifier achieved only 45% accuracy, which is worse than random guessing for a binary classification task.

Inadequate Feature Representation: Using flattened grayscale pixels directly as features is likely the main cause of poor performance.

High Dimensionality Issue:  SVM is trying to classify in a 4,096-dimensional space (64×64 pixels), which is challenging due to the "curse of dimensionality.

Basic Implementation: The implementation uses default parameters and minimal preprocessing, which limits potential performance.

Recommendations

Feature Engineering:

Use image descriptors like HOG (Histogram of Oriented Gradients) or SIFT instead of raw pixels
Consider using a pre-trained CNN as a feature extractor, then apply SVM on those features
Maintain color information rather than converting to grayscale


Preprocessing Improvements:

Add normalization to scale pixel values (e.g., divide by 255)
Apply image augmentation (rotation, flipping, etc.) to increase training data
Consider larger image sizes to preserve more detail


SVM Optimization:

Try different kernels, especially RBF which often works better for image data
Implement grid search for hyperparameter tuning (C, gamma values)
Add class weight balancing if your dataset is imbalanced


Dimensionality Reduction:

Apply PCA to reduce dimensions before training the SVM
Try feature selection methods to keep only relevant pixel information


Alternative Approaches:

Consider CNN-based approaches which are state-of-the-art for image classification
If sticking with SVM, use it on top of extracted features rather than raw pixels
Try ensemble methods by combining multiple SVMs with different configurations


Code Structure:

Add cross-validation to get more reliable performance metrics
Implement confusion matrix analysis to understand which class is poorly predicted
Add visualization of decision boundaries to debug the model



These improvements should significantly boost your classification accuracy beyond the current 45% level.