# One-Class SVM for Animal Detection (When Only Animal Images are Available)

This notebook demonstrates how to detect if an image contains an animal, **without** having any non-animal training images. We use a **One-Class SVM**, which learns what “normal” (in this case, *animal*) looks like, and flags anything else as an outlier.

## Key Points
1. **No non-animal data**: We cannot do a standard supervised (binary) classification. We only have *animal* images.
2. **One-Class SVM**: This approach allows us to treat animal images as our "normal" class. At prediction time, if a new image deviates significantly from what the model learned, it's flagged as "Non-Animal."
3. **Hyperparameter Tuning**: We do a simple grid search over parameters like `nu` and `gamma` to find a good fit for our animal data.
4. **Same Function Names**: The required functions are `load_images`, `train_animal_classifier`, and `predict_image`.


## Setup & Imports
We will use:
- `PIL` (via `Pillow`) to handle image loading and resizing.
- `numpy` for numerical arrays.
- `sklearn.svm.OneClassSVM` for our one-class model.
- `train_test_split` to create a validation set from our animal images.


In [None]:
import os
import numpy as np
from PIL import Image

from sklearn.svm import OneClassSVM
from sklearn.model_selection import train_test_split

def load_images(image_dir, label, image_size=(64, 64)):
    """
    Loads images from a directory, converts them to grayscale, resizes, and flattens.
    Returns (X, y) where:
        - X is a NumPy array of shape (N, width*height)
        - y is a 1D array of the same label (for compatibility, though only 'animal' images here).
    
    :param image_dir: Directory with images (all are the same 'label').
    :param label: Integer label (e.g., 1 for animal). Not used for one-class training,
                  but included for consistent function signature.
    :param image_size: Tuple (width, height) for resizing.
    """
    data = []
    labels = []
    for fname in os.listdir(image_dir):
        fpath = os.path.join(image_dir, fname)
        # Attempt to open and process the image
        try:
            img = Image.open(fpath).convert("L").resize(image_size)
            data.append(np.array(img).flatten())
            labels.append(label)
        except Exception as e:
            # Skip files that cannot be opened/processed
            pass
    return np.array(data), np.array(labels)

def train_animal_classifier(animal_path, non_animal_path):
    """
    Trains a One-Class SVM on the provided animal images. Ignores non_animal_path
    because no negative data is available. We perform:
      1) Loading and splitting the animal data into train/validation sets
      2) A simple hyperparameter grid search for OneClassSVM
      3) Return the best-performing model (i.e., minimal outlier rate on validation)

    :param animal_path: Directory containing images of animals (our 'normal' data).
    :param non_animal_path: Directory for non-animal images (ignored in one-class scenario).
    :return: A trained OneClassSVM model.
    """
    # -- Load animal data --
    X, _ = load_images(animal_path, label=1, image_size=(64, 64))
    if len(X) == 0:
        raise ValueError(f"No images found in '{animal_path}'. Cannot train the model.")

    # -- Split: train (80%) and validation (20%) --
    X_train, X_val = train_test_split(X, test_size=0.2, random_state=42)

    # -- Define a small grid of hyperparameters to try --
    param_grid = {
        "nu":    [0.001, 0.01, 0.1],  # controls fraction of outliers
        "gamma": ["scale", 1e-3, 1e-4],  # kernel coefficient for rbf
    }

    best_model = None
    best_outlier_rate = float("inf")
    best_params = None

    # -- Grid search manually over OneClassSVM parameters --
    for nu_val in param_grid["nu"]:
        for gamma_val in param_grid["gamma"]:
            model = OneClassSVM(kernel="rbf", nu=nu_val, gamma=gamma_val)
            model.fit(X_train)

            # Predict on validation set: +1 (inlier), -1 (outlier)
            val_preds = model.predict(X_val)
            # Count how many of our own animal images were flagged as outliers
            outlier_count = np.sum(val_preds == -1)
            outlier_rate = outlier_count / len(X_val)

            if outlier_rate < best_outlier_rate:
                best_outlier_rate = outlier_rate
                best_model = model
                best_params = (nu_val, gamma_val)

    print(f"Best One-Class SVM params: nu={best_params[0]}, gamma={best_params[1]}")
    print(f"Validation outlier rate on animal data: {best_outlier_rate:.2%}")

    # best_model is the OneClassSVM that yielded the minimal outlier rate on validation
    return best_model

def predict_image(file_path, model, image_size=(64, 64)):
    """
    Predict if an image is 'Animal' or 'Non-Animal' based on a trained OneClassSVM.

    :param file_path: Path to the image file
    :param model: Trained OneClassSVM
    :param image_size: Tuple (width, height) for resizing the input image
    :return: 'Animal' if the model predicts +1, else 'Non-Animal'
    """
    # Load & preprocess
    img = Image.open(file_path).convert("L").resize(image_size)
    arr = np.array(img).flatten().reshape(1, -1)

    # Prediction (+1 = inlier, -1 = outlier)
    prediction = model.predict(arr)[0]
    return "Animal" if prediction == 1 else "Non-Animal"


## Example Usage
Assume you have a folder structure like this:
```
data/
  animals/
    animal1.jpg
    animal2.jpg
    ...
  non_animals/  (ignored in one-class, but needed in the function signature)
    empty1.jpg
    empty2.jpg
    ...
```

Uncomment and modify the paths to run the following code.

In [None]:
# Example usage (uncomment and update the paths below):
#
# animal_folder = "./data/animals"  # Folder with your animal images
# non_animal_folder = "./data/non_animals"  # Non-animal folder (ignored here)
# model = train_animal_classifier(animal_folder, non_animal_folder)
#
# test_image_path = "./data/some_test_image.jpg"
# result = predict_image(test_image_path, model)
# print(f"Prediction for {test_image_path}: {result}")
#
# If the image looks similar to your training set, you'll likely get "Animal".
# Otherwise, you'll get "Non-Animal" (outlier).

## How It Works
1. **Load & Preprocess**: Converts images to grayscale, resizes them to 64×64, and flattens them into 1D arrays.
2. **Train/Validation Split**: Splits the available *animal* images into train (80%) and validation (20%).
3. **One-Class SVM**: The model learns a decision boundary around your *animal* data. Anything that deviates significantly is labeled as "-1" (outlier).
4. **Hyperparameter Tuning**: We do a simple loop over a small grid of `(nu, gamma)` values:
   - `nu` controls the fraction of outliers allowed in the training set.
   - `gamma` is the kernel coefficient for the RBF kernel.
   - We pick the combination that yields the **fewest** outliers on the validation set.
5. **Prediction**: `predict_image` loads a new image and runs the trained One-Class SVM. The result is either +1 ("Animal") or -1 ("Non-Animal").

> **Important**: Because we have no real non-animal data for training or testing, we cannot measure the false positive rate (i.e., how often it labels a non-animal as "Animal"). If you gather non-animal images later, you can check how well the model rejects them.

## Next Steps
- **Collect Non-Animal Data**: If you ever get actual negative (non-animal) images, you can:
  1. Check how often they're classified incorrectly.
  2. Possibly switch to a **binary classifier** (like an SVM or deep net) that trains on both animal and non-animal examples.
- **Feature Extraction**: Instead of raw pixels, you could consider using more advanced features (e.g., HOG, color histograms, or even deep features) if your images vary widely in lighting, angles, etc.
- **Tune Further**: If you have many images, you might expand the hyperparameter search or use a tool like `GridSearchCV` (though it typically expects a labeled dataset).