# CPSC-5616: SVM and RF

The following code shows the example of trains and evaluates SVM and Random Forest on the Iris dataset. It then provides performance metrics for both classifiers to enable a comparison of their efficacy.

**Support Vector Machine (SVM) Classifier:**

An SVM classifier with a linear kernel is initialized and trained on the standardized training data.
Hyperparameters: `'C', 'kernel','gamma','degree'`

**Random Forest Classifier:**

A Random Forest classifier, consisting of an ensemble of decision trees, is initialized with 100 trees and trained on the standardized training data.
Hyperparameters: `'n_estimators', 'max_features','max_depth', 'min_samples_split', and 'min_samples_leaf'`

`TODO: Refer to the PyTorch documentation and try with various hyperparameters`

In [None]:
# Import necessary libraries and modules
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load the Iris dataset - a popular dataset for classification, consisting of 3 classes of iris plants
iris = datasets.load_iris()
X = iris.data          # Features of the dataset (sepal and petal measurements)
y = iris.target        # Target labels (species of iris)

# Split the dataset into training and testing sets.
# 80% of data will be used for training and 20% will be used for testing.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling (standardization): This is done to bring all features to a similar scale.
# StandardScaler standardizes features by removing the mean and scaling to unit variance.
scaler = StandardScaler()

# Fit the scaler on training data and transform it.
X_train = scaler.fit_transform(X_train)

# Transform the test data using the same scaler. It's important not to fit again to avoid data leakage.
X_test = scaler.transform(X_test)

# SVM Classifier: Support Vector Machine with a linear kernel.
svm_model = SVC(kernel='linear', C=1)   # Initialize the SVM model with a linear kernel and C (regularization) value of 1.
svm_model.fit(X_train, y_train)          # Train the SVM model on the training data.
svm_predictions = svm_model.predict(X_test)  # Use the trained model to predict the labels of the test data.

# Calculate the accuracy of the SVM model's predictions.
svm_accuracy = accuracy_score(y_test, svm_predictions)

# Generate a detailed classification report showing performance metrics for the SVM.
svm_classification_report = classification_report(y_test, svm_predictions)

# Display the SVM's accuracy and classification report.
print("SVM Accuracy:", svm_accuracy)
print("SVM Classification Report:\n", svm_classification_report)


# Random Forest Classifier: An ensemble of decision trees.
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)  # Initialize the Random Forest with 100 trees and a fixed random state for reproducibility.
rf_model.fit(X_train, y_train)  # Train the Random Forest model on the training data.
rf_predictions = rf_model.predict(X_test)  # Use the trained model to predict the labels of the test data.

# Calculate the accuracy of the Random Forest model's predictions.
rf_accuracy = accuracy_score(y_test, rf_predictions)

# Generate a detailed classification report showing performance metrics for the Random Forest.
rf_classification_report = classification_report(y_test, rf_predictions)

# Display the Random Forest's accuracy and classification report.
print("Random Forest Accuracy:", rf_accuracy)
print("Random Forest Classification Report:\n", rf_classification_report)


SVM Accuracy: 0.9666666666666667
SVM Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.89      0.94         9
           2       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30

Random Forest Accuracy: 1.0
Random Forest Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

