# Notebook 2: Baseline Model Training (Isolation Forest)

### Objective
The purpose of this notebook is to train, evaluate, and save a baseline anomaly detection model. We will use the `IsolationForest` algorithm from scikit-learn as our first model.

### Why Isolation Forest?
Isolation Forest is an excellent choice for a baseline model because it is computationally efficient and performs well on this type of "obvious spike" anomaly. It is an unsupervised algorithm that works by "isolating" observations, and it inherently identifies anomalies as data points that are easier to separate from the rest of the sample.

### Key Steps
1.  **Setup**: Import libraries and configure paths.
2.  **Data Loading**: Load the prepared dataset.
3.  **Feature Selection**: Select the feature(s) to be used for training.
4.  **Model Training**: Train the `IsolationForest` model.
5.  **Evaluation**: Evaluate the model's performance against our known labels.
6.  **Model Serialization**: Save the trained model to a file for later use in our API.

### Expected Outcome
A trained `IsolationForest` model saved as a `.joblib` file and a performance report detailing its effectiveness at identifying the anomalies in our dataset.

In [None]:
import pandas as pd
import numpy as np
import json
import os
import joblib

from sklearn.ensemble import IsolationForest
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")
plt.rcParams["figure.figsize"] = (18, 6)

# --- Path Configuration ---
BASE_DIR = ".."
DATA_ROOT_DIR = os.path.join(BASE_DIR, "data/raw/NAB-master")
MODELS_DIR = os.path.join(BASE_DIR, "models")
os.makedirs(MODELS_DIR, exist_ok=True)

# --- Dataset Selection (consistent with Notebook 1) ---
DATASET_NAME = "realAWSCloudwatch/ec2_cpu_utilization_24ae8d.csv"
LABELS_FILE = "labels/combined_labels.json"

data_path = os.path.join(DATA_ROOT_DIR, "data", DATASET_NAME)
label_path = os.path.join(DATA_ROOT_DIR, LABELS_FILE)

print("Configuration completed")