# **MODEL DEVELOPMENT NOTEBOOK**

## Objectives

* Write here your notebook objective, for example, "Fetch data from Kaggle and save as raw data", or "engineer features for modelling"

## Inputs

* Write here which data or information you need to run the notebook 

## Outputs

* Write here which files, code or artefacts you generate by the end of the notebook 

## Additional Comments

* In case you have any additional comments that don't fit in the previous bullets, please state them here. 


---

# Set Project Root Directory

Centralise the base path using project_root

In [None]:
import os
from pathlib import Path

# Resolve the project root
project_root = Path.cwd()
if project_root.name == "jupyter_notebooks":
    project_root = project_root.parent

# Import Libraries

In this section, all necessary standard libaries are imported to allow using their functions.

Import Libraries with necessary Settings

In [None]:
# Standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import joblib

# Settings
%matplotlib inline
sns.set(style="whitegrid")

# ML libraries
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from xgboost import XGBClassifier

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

---

# Load Train & Test Sets

In this section, the train and test sets are loaded to be able to access the prepared data.

In [None]:
data_path = project_root / "outputs" / "data"

X_train = pd.read_csv(data_path / "X_train.csv")
X_test = pd.read_csv(data_path / "X_test.csv")
y_train = pd.read_csv(data_path / "y_train.csv").values.ravel()
y_test = pd.read_csv(data_path / "y_test.csv").values.ravel()

---

# Model Development

In this section, 

## Helper Function for Training Models

In [None]:
# Train and quickly evaluate any given model
def train_and_evaluate(model, model_name):
    print(f"\n🧪 {model_name}")
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)

    acc = accuracy_score(y_test, predictions)
    print("✅ Accuracy:", round(acc, 4))
    print("📉 Confusion Matrix:\n", confusion_matrix(y_test, predictions))
    print("📋 Classification Report:\n", classification_report(y_test, predictions))

    return model, acc

* 

## Train Multiple Models

To start, multiple models are trained with different algorithms to find the best one for this data.

In [None]:
# Logistic Regression
logreg = LogisticRegression(max_iter=2000, random_state=42)
train_and_evaluate(logreg, "Logistic Regression")

# Random Forest
rf = RandomForestClassifier(n_estimators=100, random_state=42)
train_and_evaluate(rf, "Random Forest")

# Linear Support Vector Classifier
linear_svc = LinearSVC(max_iter=10000)
train_and_evaluate(linear_svc, "Linear Support Vector Classifier")

# XGBoost
xgb = XGBClassifier(use_label_encoder=False, eval_metric='mlogloss', random_state=42)
train_and_evaluate(xgb, "XGBoost")

* 

# Save Files

## Save Best Performing Default Model

In [None]:
# Replace rf with chosen model
best_model = rf

models_path = project_root / "models"
os.makedirs(models_path, exist_ok=True)
joblib.dump(best_model, models_path / "best_model.pkl")

print("✅ Saved best model to /models/")

---

# Conclusion and Next Steps

* 