##### This script is a machine learning application for training a Support Vector Machine (SVM) classifier to distinguish between positive and negative image samples. It performs data preprocessing, feature extraction, hyperparameter tuning, classifier training, evaluation, and model saving. The goal is to create a more robust classifier by augmenting the dataset, using HOG features, and incorporating hard negative mining. The script consists of several steps and components, which I'll explain in detail:

**1. Import Libraries:** 

The script starts with importing the necessary libraries, such as functions from libraries like OpenCV for image processing, scikit-learn for machine learning, and  functions from a custom utility module.

In [1]:
from utils import *
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import LinearSVC
from sklearn.metrics import classification_report
import joblib

**2. Define Image Directories:**

* positive_images_folder: Path to the directory containing positive image samples.
* negative_images_folder: Path to the directory containing negative image samples.

**3. Load Images:**

* load_positive_images() and load_negative_images(): Functions that load the positive and negative image samples from the specified folders.

In [4]:
positive_images_folder = "pos_dataset"
negative_images_folder = "neg_dataset"

positive_images=load_positive_images(positive_images_folder)
negative_images=load_negative_images(negative_images_folder)

**4. Patch Extraction and Augmentation:**

* num_patches_per_image: Number of patches to extract from each negative image.
* patch_size: Size of the patches to extract.
* augmented_negative_images: A list to hold augmented negative image patches.
* Loop through each negative image:
  * Extract random patches from the image using the extract_random_patches() function.
  * Extend the augmented_negative_images list with the extracted patches.

In [3]:
num_patches_per_image = 3
patch_size = (64, 64)
augmented_negative_images=[]
for image in negative_images:
    patches = extract_random_patches(image, num_patches_per_image, patch_size)
    augmented_negative_images.extend(patches)

**5. Simulated Distance Variation (Augmentation):**
* scale_factors: List of scaling factors for simulating distance variation.
* augmented_positive_images: A list to hold augmented positive images.
* Loop through each positive image:
  * Simulate distance variations by applying scaling to the image using the simulate_distance_variation() function.
  * Extend the augmented_positive_images list with the simulated images.

In [4]:
scale_factors = [1.0]
augmented_positive_images = []
for image in positive_images:
    simulated_images = simulate_distance_variation(image, scale_factors)
    augmented_positive_images.extend(simulated_images)

**6. Mirror Augmentation:**

* mirror_augmented_positive_images: Create mirror images of the augmented positive images.
* Extend the mirror_augmented_positive_images list with the mirror images and the original images.

In [5]:
mirror_augmented_positive_images = add_mirror_images(augmented_positive_images)

**7. Creating the Dataset and Labels:**

* dataset: Combined list of augmented positive images and augmented negative image patches.
* labels: List of labels corresponding to the dataset, where 1 represents positive samples and 0 represents negative samples.

In [6]:
dataset = mirror_augmented_positive_images + augmented_negative_images
labels = [1] * len(mirror_augmented_positive_images) + [0] * len(augmented_negative_images)

**7. Feature Extraction (HOG):**

HOG (Histogram of Oriented Gradients) feature extraction is a technique used in computer vision and image processing. It involves analyzing the distribution of gradient orientations in an image to capture its local texture and shape information.
* features_list: A list to store the HOG feature vectors for each image in the dataset.
* Loop through each image in the dataset:
  * Extract HOG features using the get_hog_features() function. This function also implements within it a second function for image resizing (see utils.py). Its deafult arguments are: 
    * _resize_width=64_
    * _resize_height=64_ 
    * _orient = 9_ 
    * _pix_per_cell = 8_
    * _cell_per_block = 2_
  * Append the features to the features_list.

In [7]:
features_list = []
for image in dataset:
    features = get_hog_features(image)
    features_list.append(features)

**9. Train-Test Split:**

* Split the dataset and labels into training and testing sets using train_test_split().

**10. Hyperparameter Tuning:**

* param_grid: A dictionary containing the hyperparameters to be tuned (C and loss) for the SVM classifier.
* Create an instance of LinearSVC and perform hyperparameter tuning using GridSearchCV.

**11. Training the SVM Classifier:**

* Train a new LinearSVC classifier with the best hyperparameters obtained from the grid search.
* Fit the classifier using the training data.

**12. Evaluation and Printing Results:**

* Make predictions on the test data using the trained classifier.
* Print a classification report that includes precision, recall, F1-score, and support for both classes.

In [11]:
X_train, X_test, y_train, y_test = train_test_split(features_list, labels, test_size=0.2, random_state=42, stratify=labels)

param_grid = {
    'C': [0.01, 0.1, 1.0],           
    'loss': ['squared_hinge'],  
}

svm = LinearSVC(dual=False, random_state=42)  
grid_search = GridSearchCV(svm, param_grid, cv=5, n_jobs=-1)
grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_

svm = LinearSVC(dual=False, random_state=42, C=best_params['C'], loss=best_params['loss'])
svm.fit(X_train, y_train)

y_pred = svm.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.99      1.00      0.99      1428
           1       1.00      0.99      1.00      2022

    accuracy                           0.99      3450
   macro avg       0.99      0.99      0.99      3450
weighted avg       0.99      0.99      0.99      3450



**13. Hard Negative Mining:** 

* negative_images_augmented: Take a subset of the original negative images for hard negative mining.
* stepSize: The step size for sliding a window over the images for hard negative mining.
* Perform hard negative mining using the hard_negative_mining() function to identify challenging negative samples. This function also implements within it a sliding window function, the portions are then submitted to the get_hog_features function and made a prediction with the trained classifier; in case the prediction == 1 (false positives), appended to an empty list (see utils.py). its deafult arguments are: windowSize = (64, 64).

In [14]:
negative_images_augmented = negative_images[:500] 
stepSize = 20

hard_negatives = hard_negative_mining(svm, negative_images_augmented, stepSize)

**14. Updating Training Data with Hard Negatives:**

* Append the HOG features of the hard negative samples to the training features.
* Update the labels accordingly.

In [16]:
X_train_hard_negatives = np.vstack([X_train, [get_hog_features(img) for img in hard_negatives]])
y_train_hard_negatives = np.concatenate([y_train, np.zeros(len(hard_negatives))])

**15. Retraining SVM with Hard Negatives:**

* Create a new LinearSVC classifier instance and train it using the updated training data.

In [20]:
svm_with_hard_negatives = LinearSVC(dual=False, random_state=42, C=best_params['C'], loss=best_params['loss'])
svm_with_hard_negatives.fit(X_train_hard_negatives, y_train_hard_negatives)

**16. Saving the Trained Model:**

* Save the final trained SVM classifier (with hard negatives) using the joblib.dump() function.

In [23]:
joblib.dump(svm_with_hard_negatives, 'svm_with_hard_negatives')

['svm_with_hard_negatives']