# 3 Model Training and Prediction

This notebook trains and evaluates a predictive model using the imputed feature matrix with depth-2 interaction terms. It includes model selection, training, prediction, evaluation, and export of results.

## Contents

- **3.1 Load Transformed Dataset**
- **3.2 Define Target and Features**
- **3.3 Train-Test Split**
- **3.4 Model Training**
- **3.5 Prediction and Evaluation**
- **3.6 Export Predictions**
- **3.7 Save Model Artifact**

Load essential packages for data access, manipulation, and file handling.

In [1]:
# Import required libraries
import pandas as pd

## 3.1 Load Transformed Dataset

Load the imputed feature matrix with depth-2 interactions from the export stage.

In [2]:
# Load Transformed Dataset
df_imputed = pd.read_csv('../data/interaction/earthquake_imputed_2way.csv')
features_imputed = ['dmin', 'Year', 'cdi', 'dmin:Year']
df_raw = pd.read_csv('../data/interaction/earthquake_raw_2way.csv')
features_raw = ['Year', 'nst', 'sig', 'magnitude', 'Year:magnitude', 'depth']

## 3.2 Define Target and Features

Specify the target variable for prediction and construct the feature matrix. This step isolates the outcome column (`tsunami`) from the rest of the dataset, preparing inputs for model training.

- Target variable: `tsunami` (binary classification)
- Feature matrix: all other columns from the transformed dataset
- No feature pruning or filtering is applied at this stage
- Class distribution is printed for diagnostic clarity

In [3]:
# Define target column
target = 'tsunami'  # Replace with actual target if different

# Separate features and target
X_imputed = df_imputed.drop(columns=[target])
y_imputed = df_imputed[target]
X_raw = df_raw.drop(columns=[target])
y_raw = df_raw[target]