<a href="https://colab.research.google.com/github/DurgaPittala/prodigy_ml_03/blob/main/task3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [10]:
import os
import numpy as np
import cv2
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.preprocessing import StandardScaler

# --- 1. Configuration ---
DATADIR = "/content/dataset/Cat vs Dog/train"  # Updated path to your extracted Kaggle data
CATEGORIES = ["Cat", "Dog"] # Updated to match folder names
IMG_SIZE = 64  # SVMs struggle with high resolution; 64x64 is a good balance

def create_training_data():
    training_data = []
    for category in CATEGORIES:
        path = os.path.join(DATADIR, category)
        class_num = CATEGORIES.index(category)  # 0 for Cat, 1 for Dog

        for img in os.listdir(path):
            try:
                img_array = cv2.imread(os.path.join(path, img), cv2.IMREAD_GRAYSCALE)
                new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
                training_data.append([new_array.flatten(), class_num])
            except Exception as e:
                pass
    return training_data

# --- 2. Data Preparation ---
data = create_training_data()
X = np.array([i[0] for i in data])
y = np.array([i[1] for i in data])

# Split into 80% training, 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# --- 3. Feature Scaling ---
# Critical for SVM because it relies on distance calculations
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# --- 4. Training the SVM ---
print("Training SVM... This may take a few minutes depending on dataset size.")
model = SVC(kernel='rbf', C=1.0, gamma='auto')
model.fit(X_train, y_train)

# --- 5. Predictions and Evaluation ---
y_pred = model.predict(X_test)

print("\n--- Model Evaluation ---")
print(f"Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%")
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=CATEGORIES))

Training SVM... This may take a few minutes depending on dataset size.

--- Model Evaluation ---
Accuracy: 57.95%

Classification Report:
              precision    recall  f1-score   support

         Cat       0.59      0.60      0.60       228
         Dog       0.56      0.56      0.56       212

    accuracy                           0.58       440
   macro avg       0.58      0.58      0.58       440
weighted avg       0.58      0.58      0.58       440



## Final Task

### Subtask:
Confirm that the `FileNotFoundError` has been resolved and the model has successfully trained and evaluated.


## Summary:

### Q&A
The `FileNotFoundError` has been resolved by correcting the `CATEGORIES` list from `['cat', 'dog']` to `['Cat', 'Dog']`. The model has successfully trained and evaluated.

### Data Analysis Key Findings
*   The SVM model achieved an overall accuracy of 57.95%.
*   For the 'Cat' category, the model demonstrated a precision of 0.59, a recall of 0.60, and an f1-score of 0.60.
*   For the 'Dog' category, the model showed a precision of 0.56, a recall of 0.56, and an f1-score of 0.56.
*   The macro average and weighted average f1-scores for the model were both 0.58.

### Insights or Next Steps
*   The current model's accuracy (57.95%) is relatively low for image classification. Further improvements to feature extraction (e.g., using more sophisticated techniques than simple pixel flattening) or exploring more complex model architectures (e.g., Convolutional Neural Networks) are necessary.
*   Given the low accuracy, it would be beneficial to perform a deeper error analysis on misclassified images to understand common failure patterns and guide future model enhancements.
