# Cat vs. Dog Image Classification with SVM

This notebook demonstrates a simple image classification task using a Support Vector Machine (SVM) classifier. The TensorFlow "cats_vs_dogs" dataset is utilized for training and testing. To address potential memory issues, a percentage of the dataset is loaded, and images are resized and flattened on-the-fly to reduce memory consumption. Hyperparameter optimization is performed using grid search with cross-validation. The SVM model is trained, and its accuracy is evaluated on a test set.

In [6]:
# Import necessary libraries
import tensorflow as tf
import tensorflow_datasets as tfds
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import numpy as np
import cv2
from scipy.stats import uniform, randint

In [2]:
# Load the TensorFlow "cats_vs_dogs" dataset
dataset, info = tfds.load('cats_vs_dogs', split='train[:20%]', with_info=True)

Downloading and preparing dataset 786.67 MiB (download: 786.67 MiB, generated: 1.04 GiB, total: 1.81 GiB) to /root/tensorflow_datasets/cats_vs_dogs/4.0.1...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/1 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/23262 [00:00<?, ? examples/s]



Shuffling /root/tensorflow_datasets/cats_vs_dogs/4.0.1.incompleteY69DPZ/cats_vs_dogs-train.tfrecord*...:   0%|…

Dataset cats_vs_dogs downloaded and prepared to /root/tensorflow_datasets/cats_vs_dogs/4.0.1. Subsequent calls will reuse this data.


In [4]:
# Extract images and labels from the dataset
data = [example['image'] for example in tfds.as_numpy(dataset)]
labels = [example['label'] for example in tfds.as_numpy(dataset)]

In [5]:
# Flatten and resize the images
new_size = (32, 32)
data_resized = [cv2.resize(img, new_size).flatten() for img in data]

In [7]:
# Convert labels to 0 (cat) or 1 (dog)
labels_binary = np.array([1 if label == 1 else 0 for label in labels])

In [8]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    np.array(data_resized), labels_binary, test_size=0.2, random_state=42
)

In [9]:
# Define parameter distribution for random search
param_dist = {
    'C': uniform(0.1, 10),  # Vary C from 0.1 to 10
    'kernel': ['linear', 'rbf', 'poly'],
    'gamma': ['scale', 'auto'],
    'degree': randint(2, 5),  # For polynomial kernel, vary degree from 2 to 4
}

In [10]:
# Create an SVM classifier
svm_classifier = SVC()

In [11]:
# Perform random search with cross-validation
random_search = RandomizedSearchCV(svm_classifier, param_distributions=param_dist, n_iter=10, cv=3, n_jobs=-1)
random_search.fit(X_train, y_train)

In [12]:
# Get the best parameters from the random search
best_params = random_search.best_params_
print("Best Hyperparameters:", best_params)

Best Hyperparameters: {'C': 5.68218057819329, 'degree': 3, 'gamma': 'auto', 'kernel': 'poly'}


In [13]:
# Train the classifier with the best parameters
best_svm_classifier = SVC(**best_params)
best_svm_classifier.fit(X_train, y_train)

In [14]:
# Make predictions on the test set
predictions = best_svm_classifier.predict(X_test)

In [15]:
# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)

Accuracy: 0.6143931256713212
