# ErgoPose Risk Classifier â€” Model Training

This notebook is the **third stage** of the *ErgoPose Risk Classifier* project.  
It defines, trains, and evaluates the **Artificial Neural Network (ANN)** for posture classification based on the preprocessed dataset.

### Objectives
- Load the cleaned dataset from `data/processed/`.
- Encode categorical posture labels.
- Split the data into training and testing sets.
- Define and train an ANN model for multi-class classification.
- Save the trained model and scaler to the `models/` directory.

### Input and Output
- **Input:** `data/processed/clean_postural_risk_dataset.csv`  
- **Outputs:**  
  - `models/neural_network.pkl`  
  - `models/scaler.pkl`

In [2]:
"""
Imports the necessary libraries for model definition, training, and evaluation.
"""

# [1] Imports
import pandas as pd
import numpy as np
from pathlib import Path
from sklearn.model_selection import train_test_split, KFold
from sklearn.preprocessing import LabelEncoder, StandardScaler
import joblib
from math import sqrt


In [3]:
"""
Defines paths for processed data and model output directories.
"""

# [2] Paths configuration
DATA_PATH = Path("../data/processed/clean_postural_risk_dataset.csv")
MODELS_PATH = Path("../models")
MODELS_PATH.mkdir(exist_ok=True)

print(f"Dataset path: {DATA_PATH}")
print(f"Models directory: {MODELS_PATH}")


Dataset path: ..\data\processed\clean_postural_risk_dataset.csv
Models directory: ..\models


In [4]:
"""
Defines the number of neurons in the hidden layers by the 'Geometric Pyramid Rule'
"""

data = pd.read_csv(DATA_PATH)

X = data.drop(columns=['upperbody_label'])
y = data['upperbody_label']

input_neurons = X.shape[1]
output_neurons = 2 # Binary classification

input_output_neurons = int(sqrt(input_neurons*output_neurons))

print(f"The number of neurons in hidden layers will be: {int(input_output_neurons*0.5)} <= N <= {int(input_output_neurons*2)}")

The number of neurons in hidden layers will be: 5 <= N <= 20


In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

print(f"Training data shape: {X_train.shape}")
print(f"Testing data shape: {X_test.shape}")

Training data shape: (3355, 51)
Testing data shape: (1439, 51)
