# Model Training Notebook
This notebook implements a machine learning pipeline for training multiple classification models on a downsampled dataset. The workflow includes loading preprocessed data and training several different classifier models.

## Data Loading Section

In [2]:
import joblib
# Load the training data
X_train_downsampled = joblib.load('../Data/X_train_downsampled.joblib')
y_train_downsampled = joblib.load('../Data/y_train_downsampled.joblib')


## The code initializes five different classification models:

### Random Forest Classifier

Ensemble learning method using multiple decision trees
Good for handling non-linear relationships


### Support Vector Machine (SVM)

Effective for high-dimensional spaces
Works well when clear margin of separation exists


### K-Nearest Neighbors (KNN)

Instance-based learning
Simple but effective for many classification tasks


### Logistic Regression

Linear classifier with probabilistic approach
Good baseline model for binary classification


### Gradient Boosting Classifier

Advanced ensemble method
Often provides high accuracy but may require tuning

In [3]:
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
import joblib

# Initialize models
models = {
    "RandomForest": RandomForestClassifier(random_state=42),
    "SVM": SVC(random_state=42),
    "KNN": KNeighborsClassifier(),
    "LogisticRegression": LogisticRegression(random_state=42),
    "GradientBoosting": GradientBoostingClassifier(random_state=42)
}

# Train each model on the downsampled training data and save them to the Models directory
for model_name, model in models.items():
    # Fit the model
    model.fit(X_train_downsampled, y_train_downsampled)
    
    # Save the model using joblib
    joblib.dump(model, f'../Models/{model_name}_model.joblib')
    print(f"{model_name} model trained and saved.")


RandomForest model trained and saved.
SVM model trained and saved.
KNN model trained and saved.
LogisticRegression model trained and saved.
GradientBoosting model trained and saved.
