## NGT Sign Language Recognition - Model Training Pipeline

Trains a Random Forest classifier with Optuna hyperparameter tuning.

Features:

- Optuna hyperparameter optimization (100 trials)

- Cross-validation during tuning

- Saves best model with joblib

Output:

```models/random_forest.joblib``` - Trained model

In [9]:
import os
import pandas as pd
import numpy as np
from datetime import datetime

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import joblib
import optuna
from optuna.samplers import TPESampler

### Configuration

In [10]:
# Paths
DATA_PATH = "data/samples.csv"
MODEL_DIR = "models"
MODEL_PATH = os.path.join(MODEL_DIR, "random_forest.joblib")
REPORT_PATH = os.path.join(MODEL_DIR, "training_report.txt")

# Training settings
TEST_SIZE = 0.2
RANDOM_STATE = 42
CV_FOLDS = 5
OPTUNA_TRIALS = 100

# Optuna hyperparameter search space
PARAM_SPACE = {
    'n_estimators': (20, 300),
    'max_depth': (5, 50),
    'min_samples_split': (2, 20),
    'min_samples_leaf': (1, 10),
    'max_features': ['sqrt', 'log2', None]
}

print("Configuration loaded!")
print(f"  Data path: {DATA_PATH}")
print(f"  Model will be saved to: {MODEL_PATH}")

Configuration loaded!
  Data path: data/samples.csv
  Model will be saved to: models\random_forest.joblib


### Data Loading

In [11]:
# Load dataset
df = pd.read_csv(DATA_PATH)

# Separate features and labels
X = df.drop('label', axis=1)
y = df['label']

print(f"Dataset loaded!")
print(f"  Total samples: {len(df)}")
print(f"  Features: {X.shape[1]}")
print(f"  Classes: {y.nunique()} letters")

# Show samples per class
print("Samples per letter:")
sample_counts = y.value_counts().sort_index()
for label, count in sample_counts.items():
    bar = "█" * min(count // 2, 20)
    print(f"  {label}: {count:3d} {bar}")

Dataset loaded!
  Total samples: 690
  Features: 42
  Classes: 23 letters
Samples per letter:
  A:  30 ███████████████
  B:  30 ███████████████
  C:  30 ███████████████
  D:  30 ███████████████
  E:  30 ███████████████
  F:  30 ███████████████
  G:  30 ███████████████
  I:  30 ███████████████
  K:  30 ███████████████
  L:  30 ███████████████
  M:  30 ███████████████
  N:  30 ███████████████
  O:  30 ███████████████
  P:  30 ███████████████
  Q:  30 ███████████████
  R:  30 ███████████████
  S:  30 ███████████████
  T:  30 ███████████████
  U:  30 ███████████████
  V:  30 ███████████████
  W:  30 ███████████████
  X:  30 ███████████████
  Y:  30 ███████████████


In [12]:
# Split data with 80% / 20% ratio
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=TEST_SIZE,
    stratify=y,
    random_state=RANDOM_STATE
)

### Optuna Optimization

In [13]:
def objective(trial):
    """Optuna objective function for hyperparameter tuning."""
    
    params = {
        'n_estimators': trial.suggest_int('n_estimators', *PARAM_SPACE['n_estimators']),
        'max_depth': trial.suggest_int('max_depth', *PARAM_SPACE['max_depth']),
        'min_samples_split': trial.suggest_int('min_samples_split', *PARAM_SPACE['min_samples_split']),
        'min_samples_leaf': trial.suggest_int('min_samples_leaf', *PARAM_SPACE['min_samples_leaf']),
        'max_features': trial.suggest_categorical('max_features', PARAM_SPACE['max_features']),
        'random_state': RANDOM_STATE,
        'n_jobs': -1
    }
    
    model = RandomForestClassifier(**params)
    
    cv = StratifiedKFold(n_splits=CV_FOLDS, shuffle=True, random_state=RANDOM_STATE)
    scores = cross_val_score(model, X_train, y_train, cv=cv, scoring='accuracy')
    
    return scores.mean()

In [14]:
sampler = TPESampler(seed=RANDOM_STATE)
study = optuna.create_study(direction='maximize', sampler=sampler)
study.optimize(objective, n_trials=OPTUNA_TRIALS, show_progress_bar=True)

[I 2026-01-24 12:05:26,202] A new study created in memory with name: no-name-58dbabfb-5dd4-4878-97a8-5950bdd7bf2f
Best trial: 0. Best value: 0.978247:   1%|          | 1/100 [00:01<02:39,  1.62s/it]

[I 2026-01-24 12:05:27,816] Trial 0 finished with value: 0.9782473382473382 and parameters: {'n_estimators': 125, 'max_depth': 48, 'min_samples_split': 15, 'min_samples_leaf': 6, 'max_features': 'sqrt'}. Best is trial 0 with value: 0.9782473382473382.


Best trial: 1. Best value: 0.985487:   2%|▏         | 2/100 [00:04<04:09,  2.55s/it]

[I 2026-01-24 12:05:31,021] Trial 1 finished with value: 0.9854873054873055 and parameters: {'n_estimators': 263, 'max_depth': 32, 'min_samples_split': 15, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 1 with value: 0.9854873054873055.


Best trial: 2. Best value: 0.985504:   3%|▎         | 3/100 [00:09<05:55,  3.67s/it]

[I 2026-01-24 12:05:36,005] Trial 2 finished with value: 0.9855036855036854 and parameters: {'n_estimators': 71, 'max_depth': 13, 'min_samples_split': 7, 'min_samples_leaf': 6, 'max_features': None}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 2. Best value: 0.985504:   4%|▍         | 4/100 [00:11<04:49,  3.02s/it]

[I 2026-01-24 12:05:38,024] Trial 3 finished with value: 0.9818673218673218 and parameters: {'n_estimators': 59, 'max_depth': 18, 'min_samples_split': 8, 'min_samples_leaf': 5, 'max_features': 'sqrt'}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 2. Best value: 0.985504:   5%|▌         | 5/100 [00:22<08:59,  5.68s/it]

[I 2026-01-24 12:05:48,417] Trial 4 finished with value: 0.9728255528255529 and parameters: {'n_estimators': 186, 'max_depth': 7, 'min_samples_split': 13, 'min_samples_leaf': 2, 'max_features': None}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 2. Best value: 0.985504:   6%|▌         | 6/100 [00:26<08:20,  5.32s/it]

[I 2026-01-24 12:05:53,055] Trial 5 finished with value: 0.9782473382473382 and parameters: {'n_estimators': 247, 'max_depth': 19, 'min_samples_split': 3, 'min_samples_leaf': 7, 'max_features': None}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 2. Best value: 0.985504:   7%|▋         | 7/100 [00:27<06:00,  3.87s/it]

[I 2026-01-24 12:05:53,956] Trial 6 finished with value: 0.9782637182637183 and parameters: {'n_estimators': 29, 'max_depth': 46, 'min_samples_split': 6, 'min_samples_leaf': 7, 'max_features': None}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 2. Best value: 0.985504:   8%|▊         | 8/100 [00:29<04:49,  3.14s/it]

[I 2026-01-24 12:05:55,522] Trial 7 finished with value: 0.970974610974611 and parameters: {'n_estimators': 71, 'max_depth': 49, 'min_samples_split': 16, 'min_samples_leaf': 10, 'max_features': None}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 2. Best value: 0.985504:   9%|▉         | 9/100 [00:30<03:48,  2.51s/it]

[I 2026-01-24 12:05:56,656] Trial 8 finished with value: 0.9855036855036854 and parameters: {'n_estimators': 44, 'max_depth': 14, 'min_samples_split': 2, 'min_samples_leaf': 4, 'max_features': None}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 2. Best value: 0.985504:  10%|█         | 10/100 [00:32<03:46,  2.51s/it]

[I 2026-01-24 12:05:59,176] Trial 9 finished with value: 0.9800819000819001 and parameters: {'n_estimators': 120, 'max_depth': 17, 'min_samples_split': 12, 'min_samples_leaf': 2, 'max_features': None}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 2. Best value: 0.985504:  11%|█         | 11/100 [00:35<03:53,  2.63s/it]

[I 2026-01-24 12:06:02,065] Trial 10 finished with value: 0.9727927927927927 and parameters: {'n_estimators': 164, 'max_depth': 30, 'min_samples_split': 20, 'min_samples_leaf': 10, 'max_features': 'log2'}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 2. Best value: 0.985504:  12%|█▏        | 12/100 [00:37<03:25,  2.33s/it]

[I 2026-01-24 12:06:03,714] Trial 11 finished with value: 0.9782309582309582 and parameters: {'n_estimators': 87, 'max_depth': 5, 'min_samples_split': 2, 'min_samples_leaf': 4, 'max_features': 'log2'}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 2. Best value: 0.985504:  13%|█▎        | 13/100 [00:38<02:42,  1.87s/it]

[I 2026-01-24 12:06:04,529] Trial 12 finished with value: 0.9764619164619166 and parameters: {'n_estimators': 23, 'max_depth': 11, 'min_samples_split': 7, 'min_samples_leaf': 4, 'max_features': None}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 2. Best value: 0.985504:  14%|█▍        | 14/100 [00:40<02:48,  1.96s/it]

[I 2026-01-24 12:06:06,696] Trial 13 finished with value: 0.9746109746109746 and parameters: {'n_estimators': 109, 'max_depth': 24, 'min_samples_split': 5, 'min_samples_leaf': 8, 'max_features': None}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 2. Best value: 0.985504:  15%|█▌        | 15/100 [00:44<03:32,  2.50s/it]

[I 2026-01-24 12:06:10,452] Trial 14 finished with value: 0.9837018837018837 and parameters: {'n_estimators': 203, 'max_depth': 14, 'min_samples_split': 10, 'min_samples_leaf': 4, 'max_features': None}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 2. Best value: 0.985504:  16%|█▌        | 16/100 [00:45<02:55,  2.09s/it]

[I 2026-01-24 12:06:11,579] Trial 15 finished with value: 0.9800491400491399 and parameters: {'n_estimators': 52, 'max_depth': 39, 'min_samples_split': 4, 'min_samples_leaf': 5, 'max_features': 'log2'}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 2. Best value: 0.985504:  17%|█▋        | 17/100 [00:50<04:18,  3.11s/it]

[I 2026-01-24 12:06:17,074] Trial 16 finished with value: 0.9818837018837019 and parameters: {'n_estimators': 300, 'max_depth': 24, 'min_samples_split': 9, 'min_samples_leaf': 3, 'max_features': None}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 2. Best value: 0.985504:  18%|█▊        | 18/100 [00:52<03:47,  2.78s/it]

[I 2026-01-24 12:06:19,066] Trial 17 finished with value: 0.9746109746109746 and parameters: {'n_estimators': 96, 'max_depth': 11, 'min_samples_split': 2, 'min_samples_leaf': 8, 'max_features': None}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 2. Best value: 0.985504:  19%|█▉        | 19/100 [00:55<03:41,  2.73s/it]

[I 2026-01-24 12:06:21,694] Trial 18 finished with value: 0.9818673218673218 and parameters: {'n_estimators': 148, 'max_depth': 23, 'min_samples_split': 5, 'min_samples_leaf': 6, 'max_features': 'sqrt'}. Best is trial 2 with value: 0.9855036855036854.


Best trial: 19. Best value: 0.985504:  20%|██        | 20/100 [00:56<02:58,  2.24s/it]

[I 2026-01-24 12:06:22,778] Trial 19 finished with value: 0.9855036855036856 and parameters: {'n_estimators': 49, 'max_depth': 35, 'min_samples_split': 7, 'min_samples_leaf': 3, 'max_features': 'log2'}. Best is trial 19 with value: 0.9855036855036856.


Best trial: 19. Best value: 0.985504:  21%|██        | 21/100 [00:58<02:40,  2.03s/it]

[I 2026-01-24 12:06:24,330] Trial 20 finished with value: 0.9818673218673218 and parameters: {'n_estimators': 76, 'max_depth': 36, 'min_samples_split': 11, 'min_samples_leaf': 1, 'max_features': 'log2'}. Best is trial 19 with value: 0.9855036855036856.


Best trial: 19. Best value: 0.985504:  22%|██▏       | 22/100 [00:59<02:15,  1.73s/it]

[I 2026-01-24 12:06:25,365] Trial 21 finished with value: 0.9836855036855037 and parameters: {'n_estimators': 42, 'max_depth': 41, 'min_samples_split': 7, 'min_samples_leaf': 3, 'max_features': 'log2'}. Best is trial 19 with value: 0.9855036855036856.


Best trial: 19. Best value: 0.985504:  23%|██▎       | 23/100 [00:59<01:50,  1.44s/it]

[I 2026-01-24 12:06:26,124] Trial 22 finished with value: 0.9782309582309583 and parameters: {'n_estimators': 20, 'max_depth': 32, 'min_samples_split': 9, 'min_samples_leaf': 3, 'max_features': 'log2'}. Best is trial 19 with value: 0.9855036855036856.


Best trial: 19. Best value: 0.985504:  24%|██▍       | 24/100 [01:01<01:41,  1.34s/it]

[I 2026-01-24 12:06:27,239] Trial 23 finished with value: 0.9800491400491401 and parameters: {'n_estimators': 50, 'max_depth': 10, 'min_samples_split': 4, 'min_samples_leaf': 5, 'max_features': 'log2'}. Best is trial 19 with value: 0.9855036855036856.


Best trial: 19. Best value: 0.985504:  25%|██▌       | 25/100 [01:02<01:52,  1.50s/it]

[I 2026-01-24 12:06:29,110] Trial 24 finished with value: 0.9837018837018837 and parameters: {'n_estimators': 90, 'max_depth': 21, 'min_samples_split': 6, 'min_samples_leaf': 4, 'max_features': None}. Best is trial 19 with value: 0.9855036855036856.


Best trial: 19. Best value: 0.985504:  26%|██▌       | 26/100 [01:05<02:15,  1.83s/it]

[I 2026-01-24 12:06:31,697] Trial 25 finished with value: 0.9836855036855038 and parameters: {'n_estimators': 141, 'max_depth': 27, 'min_samples_split': 8, 'min_samples_leaf': 2, 'max_features': 'log2'}. Best is trial 19 with value: 0.9855036855036856.


Best trial: 19. Best value: 0.985504:  27%|██▋       | 27/100 [01:07<02:06,  1.73s/it]

[I 2026-01-24 12:06:33,212] Trial 26 finished with value: 0.9818673218673218 and parameters: {'n_estimators': 68, 'max_depth': 14, 'min_samples_split': 2, 'min_samples_leaf': 7, 'max_features': None}. Best is trial 19 with value: 0.9855036855036856.


Best trial: 19. Best value: 0.985504:  28%|██▊       | 28/100 [01:07<01:46,  1.49s/it]

[I 2026-01-24 12:06:34,120] Trial 27 finished with value: 0.972792792792793 and parameters: {'n_estimators': 40, 'max_depth': 36, 'min_samples_split': 5, 'min_samples_leaf': 6, 'max_features': 'log2'}. Best is trial 19 with value: 0.9855036855036856.


Best trial: 19. Best value: 0.985504:  29%|██▉       | 29/100 [01:09<01:51,  1.57s/it]

[I 2026-01-24 12:06:35,894] Trial 28 finished with value: 0.9745945945945946 and parameters: {'n_estimators': 106, 'max_depth': 28, 'min_samples_split': 20, 'min_samples_leaf': 3, 'max_features': 'sqrt'}. Best is trial 19 with value: 0.9855036855036856.


Best trial: 19. Best value: 0.985504:  30%|███       | 30/100 [01:12<02:06,  1.81s/it]

[I 2026-01-24 12:06:38,238] Trial 29 finished with value: 0.9709909909909911 and parameters: {'n_estimators': 131, 'max_depth': 44, 'min_samples_split': 18, 'min_samples_leaf': 6, 'max_features': None}. Best is trial 19 with value: 0.9855036855036856.


Best trial: 19. Best value: 0.985504:  31%|███       | 31/100 [01:13<01:55,  1.68s/it]

[I 2026-01-24 12:06:39,630] Trial 30 finished with value: 0.9836855036855037 and parameters: {'n_estimators': 82, 'max_depth': 15, 'min_samples_split': 13, 'min_samples_leaf': 5, 'max_features': 'sqrt'}. Best is trial 19 with value: 0.9855036855036856.


Best trial: 19. Best value: 0.985504:  32%|███▏      | 32/100 [01:16<02:27,  2.16s/it]

[I 2026-01-24 12:06:42,917] Trial 31 finished with value: 0.9836855036855037 and parameters: {'n_estimators': 236, 'max_depth': 34, 'min_samples_split': 15, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 19 with value: 0.9855036855036856.


Best trial: 19. Best value: 0.985504:  33%|███▎      | 33/100 [01:20<02:53,  2.58s/it]

[I 2026-01-24 12:06:46,483] Trial 32 finished with value: 0.9855036855036854 and parameters: {'n_estimators': 300, 'max_depth': 40, 'min_samples_split': 17, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 19 with value: 0.9855036855036856.


Best trial: 19. Best value: 0.985504:  34%|███▍      | 34/100 [01:23<03:10,  2.89s/it]

[I 2026-01-24 12:06:50,092] Trial 33 finished with value: 0.9855036855036854 and parameters: {'n_estimators': 300, 'max_depth': 42, 'min_samples_split': 17, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 19 with value: 0.9855036855036856.


Best trial: 19. Best value: 0.985504:  35%|███▌      | 35/100 [01:26<03:03,  2.82s/it]

[I 2026-01-24 12:06:52,737] Trial 34 finished with value: 0.9854873054873055 and parameters: {'n_estimators': 211, 'max_depth': 40, 'min_samples_split': 14, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 19 with value: 0.9855036855036856.


Best trial: 35. Best value: 0.989124:  36%|███▌      | 36/100 [01:31<03:33,  3.33s/it]

[I 2026-01-24 12:06:57,280] Trial 35 finished with value: 0.989123669123669 and parameters: {'n_estimators': 278, 'max_depth': 34, 'min_samples_split': 11, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 35 with value: 0.989123669123669.


Best trial: 35. Best value: 0.989124:  37%|███▋      | 37/100 [01:35<03:43,  3.55s/it]

[I 2026-01-24 12:07:01,318] Trial 36 finished with value: 0.9854873054873055 and parameters: {'n_estimators': 272, 'max_depth': 37, 'min_samples_split': 11, 'min_samples_leaf': 3, 'max_features': 'sqrt'}. Best is trial 35 with value: 0.989123669123669.


Best trial: 35. Best value: 0.989124:  38%|███▊      | 38/100 [01:37<03:25,  3.32s/it]

[I 2026-01-24 12:07:04,110] Trial 37 finished with value: 0.9818837018837018 and parameters: {'n_estimators': 174, 'max_depth': 8, 'min_samples_split': 9, 'min_samples_leaf': 4, 'max_features': None}. Best is trial 35 with value: 0.989123669123669.


Best trial: 38. Best value: 0.989124:  39%|███▉      | 39/100 [01:39<02:48,  2.76s/it]

[I 2026-01-24 12:07:05,579] Trial 38 finished with value: 0.9891236691236692 and parameters: {'n_estimators': 58, 'max_depth': 31, 'min_samples_split': 7, 'min_samples_leaf': 2, 'max_features': None}. Best is trial 38 with value: 0.9891236691236692.


Best trial: 39. Best value: 0.990942:  40%|████      | 40/100 [01:40<02:19,  2.33s/it]

[I 2026-01-24 12:07:06,895] Trial 39 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 67, 'max_depth': 32, 'min_samples_split': 8, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  41%|████      | 41/100 [01:41<01:57,  1.98s/it]

[I 2026-01-24 12:07:08,069] Trial 40 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 62, 'max_depth': 33, 'min_samples_split': 8, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  42%|████▏     | 42/100 [01:42<01:36,  1.67s/it]

[I 2026-01-24 12:07:08,999] Trial 41 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 62, 'max_depth': 33, 'min_samples_split': 8, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  43%|████▎     | 43/100 [01:44<01:31,  1.60s/it]

[I 2026-01-24 12:07:10,442] Trial 42 finished with value: 0.989140049140049 and parameters: {'n_estimators': 106, 'max_depth': 32, 'min_samples_split': 10, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  44%|████▍     | 44/100 [01:45<01:19,  1.42s/it]

[I 2026-01-24 12:07:11,450] Trial 43 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 67, 'max_depth': 30, 'min_samples_split': 8, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  45%|████▌     | 45/100 [01:46<01:10,  1.28s/it]

[I 2026-01-24 12:07:12,402] Trial 44 finished with value: 0.9855200655200654 and parameters: {'n_estimators': 65, 'max_depth': 27, 'min_samples_split': 10, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  46%|████▌     | 46/100 [01:47<01:11,  1.33s/it]

[I 2026-01-24 12:07:13,845] Trial 45 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 108, 'max_depth': 29, 'min_samples_split': 8, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  47%|████▋     | 47/100 [01:48<00:58,  1.11s/it]

[I 2026-01-24 12:07:14,451] Trial 46 finished with value: 0.9873382473382474 and parameters: {'n_estimators': 31, 'max_depth': 28, 'min_samples_split': 8, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  48%|████▊     | 48/100 [01:50<01:12,  1.39s/it]

[I 2026-01-24 12:07:16,485] Trial 47 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 118, 'max_depth': 30, 'min_samples_split': 6, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  49%|████▉     | 49/100 [01:52<01:17,  1.51s/it]

[I 2026-01-24 12:07:18,284] Trial 48 finished with value: 0.989123669123669 and parameters: {'n_estimators': 96, 'max_depth': 38, 'min_samples_split': 8, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  50%|█████     | 50/100 [01:53<01:15,  1.52s/it]

[I 2026-01-24 12:07:19,814] Trial 49 finished with value: 0.9854873054873055 and parameters: {'n_estimators': 77, 'max_depth': 26, 'min_samples_split': 12, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  51%|█████     | 51/100 [01:54<01:11,  1.46s/it]

[I 2026-01-24 12:07:21,147] Trial 50 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 63, 'max_depth': 33, 'min_samples_split': 9, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  52%|█████▏    | 52/100 [01:56<01:14,  1.56s/it]

[I 2026-01-24 12:07:22,931] Trial 51 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 119, 'max_depth': 30, 'min_samples_split': 6, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  53%|█████▎    | 53/100 [01:58<01:17,  1.65s/it]

[I 2026-01-24 12:07:24,790] Trial 52 finished with value: 0.9891400491400493 and parameters: {'n_estimators': 130, 'max_depth': 30, 'min_samples_split': 6, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  54%|█████▍    | 54/100 [01:59<01:05,  1.42s/it]

[I 2026-01-24 12:07:25,689] Trial 53 finished with value: 0.9873382473382474 and parameters: {'n_estimators': 35, 'max_depth': 29, 'min_samples_split': 8, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  55%|█████▌    | 55/100 [02:01<01:09,  1.54s/it]

[I 2026-01-24 12:07:27,518] Trial 54 finished with value: 0.989123669123669 and parameters: {'n_estimators': 117, 'max_depth': 25, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  56%|█████▌    | 56/100 [02:02<01:04,  1.46s/it]

[I 2026-01-24 12:07:28,781] Trial 55 finished with value: 0.9873054873054873 and parameters: {'n_estimators': 84, 'max_depth': 22, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  57%|█████▋    | 57/100 [02:04<01:02,  1.46s/it]

[I 2026-01-24 12:07:30,230] Trial 56 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 97, 'max_depth': 33, 'min_samples_split': 7, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  58%|█████▊    | 58/100 [02:04<00:54,  1.30s/it]

[I 2026-01-24 12:07:31,170] Trial 57 finished with value: 0.9891236691236692 and parameters: {'n_estimators': 59, 'max_depth': 31, 'min_samples_split': 8, 'min_samples_leaf': 3, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  59%|█████▉    | 59/100 [02:07<01:10,  1.71s/it]

[I 2026-01-24 12:07:33,823] Trial 58 finished with value: 0.9873218673218673 and parameters: {'n_estimators': 155, 'max_depth': 19, 'min_samples_split': 10, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  60%|██████    | 60/100 [02:08<01:01,  1.55s/it]

[I 2026-01-24 12:07:35,007] Trial 59 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 73, 'max_depth': 35, 'min_samples_split': 5, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  61%|██████    | 61/100 [02:10<00:57,  1.48s/it]

[I 2026-01-24 12:07:36,333] Trial 60 finished with value: 0.989123669123669 and parameters: {'n_estimators': 89, 'max_depth': 25, 'min_samples_split': 9, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  62%|██████▏   | 62/100 [02:11<00:50,  1.33s/it]

[I 2026-01-24 12:07:37,306] Trial 61 finished with value: 0.989123669123669 and parameters: {'n_estimators': 65, 'max_depth': 33, 'min_samples_split': 9, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  63%|██████▎   | 63/100 [02:12<00:47,  1.29s/it]

[I 2026-01-24 12:07:38,508] Trial 62 finished with value: 0.989123669123669 and parameters: {'n_estimators': 54, 'max_depth': 31, 'min_samples_split': 9, 'min_samples_leaf': 3, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  64%|██████▍   | 64/100 [02:13<00:43,  1.20s/it]

[I 2026-01-24 12:07:39,507] Trial 63 finished with value: 0.9691891891891892 and parameters: {'n_estimators': 44, 'max_depth': 29, 'min_samples_split': 7, 'min_samples_leaf': 9, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  65%|██████▌   | 65/100 [02:15<00:47,  1.37s/it]

[I 2026-01-24 12:07:41,273] Trial 64 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 110, 'max_depth': 33, 'min_samples_split': 8, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  66%|██████▌   | 66/100 [02:17<00:57,  1.68s/it]

[I 2026-01-24 12:07:43,668] Trial 65 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 142, 'max_depth': 36, 'min_samples_split': 6, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  67%|██████▋   | 67/100 [02:18<00:52,  1.61s/it]

[I 2026-01-24 12:07:45,102] Trial 66 finished with value: 0.9837018837018837 and parameters: {'n_estimators': 68, 'max_depth': 38, 'min_samples_split': 10, 'min_samples_leaf': 3, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  68%|██████▊   | 68/100 [02:20<00:49,  1.55s/it]

[I 2026-01-24 12:07:46,529] Trial 67 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 98, 'max_depth': 29, 'min_samples_split': 7, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  69%|██████▉   | 69/100 [02:20<00:39,  1.26s/it]

[I 2026-01-24 12:07:47,113] Trial 68 finished with value: 0.9837182637182638 and parameters: {'n_estimators': 25, 'max_depth': 34, 'min_samples_split': 9, 'min_samples_leaf': 4, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  70%|███████   | 70/100 [02:22<00:37,  1.24s/it]

[I 2026-01-24 12:07:48,286] Trial 69 finished with value: 0.9873218673218673 and parameters: {'n_estimators': 77, 'max_depth': 27, 'min_samples_split': 3, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  71%|███████   | 71/100 [02:23<00:34,  1.19s/it]

[I 2026-01-24 12:07:49,379] Trial 70 finished with value: 0.9836855036855038 and parameters: {'n_estimators': 61, 'max_depth': 32, 'min_samples_split': 12, 'min_samples_leaf': 3, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  72%|███████▏  | 72/100 [02:24<00:37,  1.35s/it]

[I 2026-01-24 12:07:51,103] Trial 71 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 122, 'max_depth': 30, 'min_samples_split': 6, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  73%|███████▎  | 73/100 [02:26<00:38,  1.42s/it]

[I 2026-01-24 12:07:52,683] Trial 72 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 115, 'max_depth': 35, 'min_samples_split': 8, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  74%|███████▍  | 74/100 [02:28<00:40,  1.55s/it]

[I 2026-01-24 12:07:54,520] Trial 73 finished with value: 0.9891400491400493 and parameters: {'n_estimators': 133, 'max_depth': 30, 'min_samples_split': 5, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  75%|███████▌  | 75/100 [02:29<00:33,  1.33s/it]

[I 2026-01-24 12:07:55,332] Trial 74 finished with value: 0.9873218673218673 and parameters: {'n_estimators': 47, 'max_depth': 32, 'min_samples_split': 7, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  76%|███████▌  | 76/100 [02:31<00:37,  1.58s/it]

[I 2026-01-24 12:07:57,503] Trial 75 finished with value: 0.9891400491400493 and parameters: {'n_estimators': 167, 'max_depth': 26, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  77%|███████▋  | 77/100 [02:32<00:34,  1.50s/it]

[I 2026-01-24 12:07:58,818] Trial 76 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 87, 'max_depth': 37, 'min_samples_split': 7, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  78%|███████▊  | 78/100 [02:34<00:33,  1.53s/it]

[I 2026-01-24 12:08:00,415] Trial 77 finished with value: 0.989123669123669 and parameters: {'n_estimators': 80, 'max_depth': 50, 'min_samples_split': 9, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  79%|███████▉  | 79/100 [02:34<00:26,  1.28s/it]

[I 2026-01-24 12:08:01,129] Trial 78 finished with value: 0.98006552006552 and parameters: {'n_estimators': 36, 'max_depth': 28, 'min_samples_split': 11, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  80%|████████  | 80/100 [02:37<00:31,  1.58s/it]

[I 2026-01-24 12:08:03,400] Trial 79 finished with value: 0.9836855036855038 and parameters: {'n_estimators': 182, 'max_depth': 34, 'min_samples_split': 6, 'min_samples_leaf': 2, 'max_features': 'log2'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  81%|████████  | 81/100 [02:38<00:28,  1.50s/it]

[I 2026-01-24 12:08:04,712] Trial 80 finished with value: 0.9727764127764127 and parameters: {'n_estimators': 100, 'max_depth': 23, 'min_samples_split': 8, 'min_samples_leaf': 8, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  82%|████████▏ | 82/100 [02:39<00:26,  1.46s/it]

[I 2026-01-24 12:08:06,066] Trial 81 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 103, 'max_depth': 33, 'min_samples_split': 7, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  83%|████████▎ | 83/100 [02:40<00:21,  1.27s/it]

[I 2026-01-24 12:08:06,886] Trial 82 finished with value: 0.9891400491400493 and parameters: {'n_estimators': 53, 'max_depth': 31, 'min_samples_split': 8, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  84%|████████▍ | 84/100 [02:41<00:20,  1.25s/it]

[I 2026-01-24 12:08:08,107] Trial 83 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 90, 'max_depth': 29, 'min_samples_split': 6, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  85%|████████▌ | 85/100 [02:43<00:19,  1.32s/it]

[I 2026-01-24 12:08:09,603] Trial 84 finished with value: 0.989123669123669 and parameters: {'n_estimators': 111, 'max_depth': 33, 'min_samples_split': 7, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  86%|████████▌ | 86/100 [02:45<00:19,  1.42s/it]

[I 2026-01-24 12:08:11,232] Trial 85 finished with value: 0.9891400491400493 and parameters: {'n_estimators': 127, 'max_depth': 35, 'min_samples_split': 4, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  87%|████████▋ | 87/100 [02:46<00:16,  1.30s/it]

[I 2026-01-24 12:08:12,253] Trial 86 finished with value: 0.9873218673218673 and parameters: {'n_estimators': 72, 'max_depth': 31, 'min_samples_split': 10, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  88%|████████▊ | 88/100 [02:47<00:17,  1.43s/it]

[I 2026-01-24 12:08:13,996] Trial 87 finished with value: 0.9873054873054873 and parameters: {'n_estimators': 142, 'max_depth': 32, 'min_samples_split': 9, 'min_samples_leaf': 3, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  89%|████████▉ | 89/100 [02:49<00:15,  1.39s/it]

[I 2026-01-24 12:08:15,290] Trial 88 finished with value: 0.989123669123669 and parameters: {'n_estimators': 92, 'max_depth': 37, 'min_samples_split': 8, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  90%|█████████ | 90/100 [02:49<00:12,  1.24s/it]

[I 2026-01-24 12:08:16,190] Trial 89 finished with value: 0.9855036855036856 and parameters: {'n_estimators': 61, 'max_depth': 28, 'min_samples_split': 7, 'min_samples_leaf': 2, 'max_features': 'log2'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  91%|█████████ | 91/100 [02:51<00:10,  1.21s/it]

[I 2026-01-24 12:08:17,337] Trial 90 finished with value: 0.9909418509418509 and parameters: {'n_estimators': 83, 'max_depth': 26, 'min_samples_split': 6, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 39. Best value: 0.990942:  92%|█████████▏| 92/100 [02:52<00:09,  1.16s/it]

[I 2026-01-24 12:08:18,370] Trial 91 finished with value: 0.9891400491400493 and parameters: {'n_estimators': 72, 'max_depth': 35, 'min_samples_split': 5, 'min_samples_leaf': 1, 'max_features': 'sqrt'}. Best is trial 39 with value: 0.9909418509418509.


Best trial: 92. Best value: 0.990958:  93%|█████████▎| 93/100 [02:53<00:07,  1.11s/it]

[I 2026-01-24 12:08:19,373] Trial 92 finished with value: 0.990958230958231 and parameters: {'n_estimators': 71, 'max_depth': 30, 'min_samples_split': 4, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 92 with value: 0.990958230958231.


Best trial: 93. Best value: 0.992776:  94%|█████████▍| 94/100 [02:54<00:06,  1.03s/it]

[I 2026-01-24 12:08:20,211] Trial 93 finished with value: 0.9927764127764128 and parameters: {'n_estimators': 57, 'max_depth': 30, 'min_samples_split': 4, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 93 with value: 0.9927764127764128.


Best trial: 93. Best value: 0.992776:  95%|█████████▌| 95/100 [02:54<00:04,  1.08it/s]

[I 2026-01-24 12:08:20,900] Trial 94 finished with value: 0.9855200655200657 and parameters: {'n_estimators': 40, 'max_depth': 29, 'min_samples_split': 3, 'min_samples_leaf': 3, 'max_features': 'sqrt'}. Best is trial 93 with value: 0.9927764127764128.


Best trial: 93. Best value: 0.992776:  96%|█████████▌| 96/100 [02:55<00:03,  1.12it/s]

[I 2026-01-24 12:08:21,719] Trial 95 finished with value: 0.9927764127764128 and parameters: {'n_estimators': 55, 'max_depth': 30, 'min_samples_split': 4, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 93 with value: 0.9927764127764128.


Best trial: 93. Best value: 0.992776:  97%|█████████▋| 97/100 [02:56<00:02,  1.14it/s]

[I 2026-01-24 12:08:22,566] Trial 96 finished with value: 0.9927764127764128 and parameters: {'n_estimators': 56, 'max_depth': 27, 'min_samples_split': 4, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 93 with value: 0.9927764127764128.


Best trial: 93. Best value: 0.992776:  98%|█████████▊| 98/100 [02:57<00:01,  1.07it/s]

[I 2026-01-24 12:08:23,622] Trial 97 finished with value: 0.9855036855036854 and parameters: {'n_estimators': 54, 'max_depth': 28, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': None}. Best is trial 93 with value: 0.9927764127764128.


Best trial: 93. Best value: 0.992776:  99%|█████████▉| 99/100 [02:58<00:00,  1.13it/s]

[I 2026-01-24 12:08:24,398] Trial 98 finished with value: 0.990958230958231 and parameters: {'n_estimators': 49, 'max_depth': 25, 'min_samples_split': 3, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 93 with value: 0.9927764127764128.


Best trial: 93. Best value: 0.992776: 100%|██████████| 100/100 [02:58<00:00,  1.79s/it]

[I 2026-01-24 12:08:24,990] Trial 99 finished with value: 0.990958230958231 and parameters: {'n_estimators': 31, 'max_depth': 25, 'min_samples_split': 2, 'min_samples_leaf': 2, 'max_features': 'sqrt'}. Best is trial 93 with value: 0.9927764127764128.





In [15]:
best_params = study.best_params
best_cv_accuracy = study.best_value
print(f"Best CV accuracy: {best_cv_accuracy:.4f}")

Best CV accuracy: 0.9928


### Train Final Model

In [16]:
model = RandomForestClassifier(
    **best_params,
    random_state=RANDOM_STATE,
    n_jobs=-1
)

model.fit(X_train, y_train)


0,1,2
,n_estimators,57
,criterion,'gini'
,max_depth,30
,min_samples_split,4
,min_samples_leaf,2
,min_weight_fraction_leaf,0.0
,max_features,'sqrt'
,max_leaf_nodes,
,min_impurity_decrease,0.0
,bootstrap,True


### Evaluate the model

In [17]:
# Predictions
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)

# Accuracy
test_accuracy = accuracy_score(y_test, y_pred)
print(f"Test accuracy: {test_accuracy:.4f}")

# Average confidence for correct predictions
correct_mask = y_pred == y_test
correct_confidences = y_proba.max(axis=1)[correct_mask]
avg_confidence = correct_confidences.mean()
print(f"Average confidence (correct predictions): {avg_confidence:.4f}")


Test accuracy: 0.9928
Average confidence (correct predictions): 0.9029


### Save the model

In [18]:
os.makedirs(MODEL_DIR, exist_ok=True)
joblib.dump(model, MODEL_PATH)
print(f"Model saved to: {MODEL_PATH}")

Model saved to: models\random_forest.joblib


### Summary

In [19]:
print(f"  Best CV accuracy:   {best_cv_accuracy:.4f}")
print(f"  Test accuracy:      {test_accuracy:.4f}")
print(f"  Model saved to:     {MODEL_PATH}")

  Best CV accuracy:   0.9928
  Test accuracy:      0.9928
  Model saved to:     models\random_forest.joblib
