Entrenamiento y validaci√≥n del modelos de Regresi√≥n Log√≠stica, modificando la data de entrenamiento y evaluaci√≥n con combinaciones entre que el train est√© escalado v aumentado V directo y que el val est√© escalado o no. 

Se ejecutar√°n los entrenamientos de cada modelo 100 veces para obtener el promedio de los valores de las m√©tricas que devuelven estas evaluaciones.

Finalmente se generar√° un reporte con las m√©tricas obtenidas y una conclusi√≥n de cu√°l es la mejor combinaci√≥n para este modelo y tipo de aumentaci√≥n.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

# Import project modules
from utils.data_loader import load_split, prepare_features_target
from augmentation.smote_augment import apply_smote
from training.logistic_regression import train_logistic_regression
from models.manage_models import save_model
from validation.validate_model import validate_model

print("‚úì All imports successful!")


‚úì All imports successful!


In [16]:
# Load training data
train_df = load_split('train', data_dir='dataset/splits')
X_train, y_train = prepare_features_target(train_df, target_col='Fault')

print(f"Training data shape: {X_train.shape}")
print(f"Target distribution:\n{y_train.value_counts()}")
print(f"\nOriginal class distribution:\n{y_train.value_counts(normalize=True)}")


Training data shape: (941, 9)
Target distribution:
Fault
0    652
1    289
Name: count, dtype: int64

Original class distribution:
Fault
0    0.69288
1    0.30712
Name: proportion, dtype: float64


In [17]:
# Apply SMOTE augmentation to training data
# This augmented data will be used for the general model training
X_train_aug, y_train_aug = apply_smote(
    X_train,
    y_train,
    sampling_strategy='auto',
    random_state=42
)

print(f"Augmented training data shape: {X_train_aug.shape}")
print(f"Augmented target distribution:\n{pd.Series(y_train_aug).value_counts()}")
print(f"\nAugmented class distribution:\n{pd.Series(y_train_aug).value_counts(normalize=True)}")


Augmented training data shape: (1304, 9)
Augmented target distribution:
0    652
1    652
Name: count, dtype: int64

Augmented class distribution:
0    0.5
1    0.5
Name: proportion, dtype: float64


In [18]:
# Scale the augmented training data
# StandardScaler will be fitted on augmented training data
scaler = StandardScaler()
X_train_aug_scaled = scaler.fit_transform(X_train_aug)
X_train_scaled = scaler.transform(X_train)

print(f"Scaled augmented training data shape: {X_train_aug_scaled.shape}")
print(f"Mean of scaled features: {np.mean(X_train_aug_scaled, axis=0)[:5]}")  # Show first 5
print(f"Std of scaled features: {np.std(X_train_aug_scaled, axis=0)[:5]}")   # Show first 5


Scaled augmented training data shape: (1304, 9)
Mean of scaled features: [-1.68031914e-15 -7.36323068e-15  6.91577845e-15  5.08727348e-15
 -1.31311117e-15]
Std of scaled features: [1. 1. 1. 1. 1.]




In [19]:
# Scale the trainig data
X_train_scaled = scaler.fit_transform(X_train)

print(f"Scaled training data shape: {X_train_scaled.shape}")
print(f"Mean of scaled features: {np.mean(X_train_scaled, axis=0)[:5]}")  # Show first 5
print(f"Std of scaled features: {np.std(X_train_scaled, axis=0)[:5]}")   # Show first 5

Scaled training data shape: (941, 9)
Mean of scaled features: [-1.68952112e-15 -1.21381238e-15  7.13563109e-16 -6.72032981e-16
  7.49901971e-16]
Std of scaled features: [1. 1. 1. 1. 1.]


In [20]:
# Load validation data
val_df = load_split('validation', data_dir='dataset/splits')
X_val, y_val = prepare_features_target(val_df, target_col='Fault')

# Scale validation data using the same scaler fitted on augmented training data
X_val_scaled = scaler.transform(X_val)

### Entrenamiento y validaci√≥n normal

In [21]:
def regular_model():
    # Train the model
    model = train_logistic_regression(
        X_train,
        y_train,
        model_name="logistic_regression_normal",
        save_path='models/',
        C=1.0,
        max_iter=1000,
        solver='lbfgs',
        random_state=42
    )

    print(f"‚úì Model trained successfully!")
    print(f"Model parameters: {model.get_params()}")

    # Validate the model
    val_metrics = validate_model(
        model=model,
        X=X_val,
        y=y_val,
        metrics=['accuracy', 'precision', 'recall', 'f1', 'roc_auc']
    )

    save_paths = save_model(
        model=model,
        model_name="logistic_regression_normal",
        save_path='models/',
        metadata={
            'training_samples': len(y_train),
            'augmented': False,
            'scaler_applied': False,
            'original_training_samples': len(y_train),
            'augmentation_method': None,
            'hyperparameters': model.get_params()
        }
    )
    # print("=" * 60)
    # print("VALIDATION SET RESULTS")
    # print("=" * 60)
    # for metric, value in val_metrics.items():
    #     print(f"{metric}: {value:.4f}")

    return val_metrics

# print(regular_model())

### Entrenamiento normal y validaci√≥n escalada

In [22]:
def reg_train_scale_val():
    # Train the model
    model = train_logistic_regression(
        X_train,
        y_train,
        model_name="logistic_regression_normal",
        save_path='models/',
        C=1.0,
        max_iter=1000,
        solver='lbfgs',
        random_state=42
    )

    print(f"‚úì Model trained successfully!")
    print(f"Model parameters: {model.get_params()}")

    # Validate the model
    val_metrics_scale_val = validate_model(
        model=model,
        X=X_val_scaled,
        y=y_val,
        metrics=['accuracy', 'precision', 'recall', 'f1', 'roc_auc']
    )

    # print("=" * 60)
    # print("VALIDATION SET RESULTS")
    # print("=" * 60)
    # for metric, value in val_metrics_scale_val.items():
    #     print(f"{metric}: {value:.4f}")

    return val_metrics_scale_val

# print(reg_train_scale_val())

### Entrenamiento aumentado y validaci√≥n normal

In [23]:
def aug_train_reg_val():
    # Train the model
    model = train_logistic_regression(
        X_train_aug,
        y_train_aug,
        model_name="logistic_regression_aug_smote",
        save_path='models/',
        C=1.0,
        max_iter=1000,
        solver='lbfgs',
        random_state=42
    )

    print(f"‚úì Model trained successfully!")
    print(f"Model parameters: {model.get_params()}")

    # Validate the model
    val_metrics_aug_train_reg_val = validate_model(
        model=model,
        X=X_val,
        y=y_val,
        metrics=['accuracy', 'precision', 'recall', 'f1', 'roc_auc']
    )

    save_paths = save_model(
        model=model,
        model_name="logistic_regression_aug_smote",
        save_path='models/',
        metadata={
            'training_samples': len(y_train_aug),
            'augmented': True,
            'scaler_applied': False,
            'original_training_samples': len(y_train),
            'augmentation_method': 'SMOTE',
            'hyperparameters': model.get_params()
        }
    )

    # print("=" * 60)
    # print("VALIDATION SET RESULTS")
    # print("=" * 60)
    # for metric, value in val_metrics_aug_train_reg_val.items():
    #     print(f"{metric}: {value:.4f}")

    return val_metrics_aug_train_reg_val

# print(aug_train_reg_val())

### Entrenamiento aumentado y validaci√≥n escalada

In [24]:
def aug_train_scaled_val():
    # Train the model
    model = train_logistic_regression(
        X_train_aug,
        y_train_aug,
        model_name="logistic_regression_aug_smote",
        save_path='models/',
        C=1.0,
        max_iter=1000,
        solver='lbfgs',
        random_state=42
    )

    print(f"‚úì Model trained successfully!")
    print(f"Model parameters: {model.get_params()}")

    # Validate the model
    val_metrics_aug_train_scale_val = validate_model(
        model=model,
        X=X_val_scaled,
        y=y_val,
        metrics=['accuracy', 'precision', 'recall', 'f1', 'roc_auc']
    )

    # print("=" * 60)
    # print("VALIDATION SET RESULTS")
    # print("=" * 60)
    # for metric, value in val_metrics_aug_train_scale_val.items():
    #     print(f"{metric}: {value:.4f}")

    return val_metrics_aug_train_scale_val

# print(aug_train_scaled_val())

### Entrenamiento escalado y validaci√≥n normal

In [25]:
def scale_train_reg_val():
    # Train the model
    model = train_logistic_regression(
        X_train_scaled,
        y_train,
        model_name="logistic_regression_scaled",
        save_path='models/',
        C=1.0,
        max_iter=1000,
        solver='lbfgs',
        random_state=42
    )

    print(f"‚úì Model trained successfully!")
    print(f"Model parameters: {model.get_params()}")

    # Validate the model
    val_metrics_scaled_train_val = validate_model(
        model=model,
        X=X_val,
        y=y_val,
        metrics=['accuracy', 'precision', 'recall', 'f1', 'roc_auc']
    )

    save_paths = save_model(
        model=model,
        model_name="logistic_regression_scaled",
        save_path='models/',
        metadata={
            'training_samples': len(y_train_aug),
            'augmented': False,
            'scaler_applied': True,
            'original_training_samples': len(y_train),
            'augmentation_method': None,
            'hyperparameters': model.get_params()
        }
    )

    # print("=" * 60)
    # print("VALIDATION SET RESULTS")
    # print("=" * 60)
    # for metric, value in val_metrics_scaled_train_val.items():
    #     print(f"{metric}: {value:.4f}")

    return val_metrics_scaled_train_val

# print(scale_train_reg_val())

### Entrenamiento escalado y validaci√≥n escalada

In [26]:
def scale_train_val():
    # Train the model
    model = train_logistic_regression(
        X_train_scaled,
        y_train,
        model_name="logistic_regression_scaled",
        save_path='models/',
        C=1.0,
        max_iter=1000,
        solver='lbfgs',
        random_state=42
    )

    print(f"‚úì Model trained successfully!")
    print(f"Model parameters: {model.get_params()}")

    # Validate the model
    val_metrics_scaled_train_val = validate_model(
        model=model,
        X=X_val_scaled,
        y=y_val,
        metrics=['accuracy', 'precision', 'recall', 'f1', 'roc_auc']
    )

    # print("=" * 60)
    # print("VALIDATION SET RESULTS")
    # print("=" * 60)
    # for metric, value in val_metrics_scaled_train_val.items():
    #     print(f"{metric}: {value:.4f}")

    return val_metrics_scaled_train_val

# print(scale_train_val())

### Entrenamiento aumentado y escalado y validaci√≥n normal

In [27]:
def scale_aug_train_reg_val():
    # Train the model
    model = train_logistic_regression(
        X_train_aug_scaled,
        y_train_aug,
        model_name="logistic_regression_aug_smote_scaled",
        save_path='models/',
        C=1.0,
        max_iter=1000,
        solver='lbfgs',
        random_state=42
    )

    print(f"‚úì Model trained successfully!")
    print(f"Model parameters: {model.get_params()}")

    # Validate the model
    val_metrics_scaled_aug_train_reg_val = validate_model(
        model=model,
        X=X_val,
        y=y_val,
        metrics=['accuracy', 'precision', 'recall', 'f1', 'roc_auc']
    )

    save_paths = save_model(
        model=model,
        model_name="logistic_regression_aug_smote_scaled",
        save_path='models/',
        metadata={
            'training_samples': len(y_train_aug),
            'augmented': True,
            'scaler_applied': True,
            'original_training_samples': len(y_train),
            'augmentation_method': 'SMOTE',
            'hyperparameters': model.get_params()
        }
    )

    # print("=" * 60)
    # print("VALIDATION SET RESULTS")
    # print("=" * 60)
    # for metric, value in val_metrics_scaled_aug_train_reg_val.items():
    #     print(f"{metric}: {value:.4f}")

    return val_metrics_scaled_aug_train_reg_val

# print(scale_aug_train_reg_val())

### Entrenamiento escalado y aumentado y validaci√≥n escalada

In [28]:
def scale_aug_train_scale_val():
    # Train the model
    model = train_logistic_regression(
        X_train_aug_scaled,
        y_train_aug,
        model_name="logistic_regression_aug_smote_scaled",
        save_path='models/',
        C=1.0,
        max_iter=1000,
        solver='lbfgs',
        random_state=42
    )

    print(f"‚úì Model trained successfully!")
    print(f"Model parameters: {model.get_params()}")

    # Validate the model
    val_metrics_scaled_aug_train_scale_val = validate_model(
        model=model,
        X=X_val_scaled,
        y=y_val,
        metrics=['accuracy', 'precision', 'recall', 'f1', 'roc_auc']
    )

    # print("=" * 60)
    # print("VALIDATION SET RESULTS")
    # print("=" * 60)
    # for metric, value in val_metrics_scaled_aug_train_scale_val.items():
    #     print(f"{metric}: {value:.4f}")

    return val_metrics_scaled_aug_train_scale_val

# print(scale_aug_train_scale_val())

## Execute All Models and Generate Comparison Report

In [29]:
# Import the comparison and reporting functions
from results.compare_and_report import compare_and_report, print_comparison_summary

print("‚úì Comparison and reporting functions imported!")


‚úì Comparison and reporting functions imported!


In [30]:
# Execute all model training and validation functions
# Store results in a list with corresponding model names

print("=" * 70)
print("EXECUTING ALL MODEL CONFIGURATIONS")
print("=" * 70)

# List to store all metrics
all_metrics = []
model_names = []

# 1. Regular model (no augmentation, no scaling)
print("\n[1/8] Training: Regular Model (No Augmentation, No Scaling)")
metrics_1 = regular_model()
all_metrics.append(metrics_1)
model_names.append("Regular (No Aug, No Scale)")
print(f"   ‚úì Completed - Accuracy: {metrics_1['accuracy']:.4f}")

# 2. Regular training, scaled validation
print("\n[2/8] Training: Regular Model, Scaled Validation")
metrics_2 = reg_train_scale_val()
all_metrics.append(metrics_2)
model_names.append("Regular Train, Scaled Val")
print(f"   ‚úì Completed - Accuracy: {metrics_2['accuracy']:.4f}")

# 3. Augmented training, regular validation
print("\n[3/8] Training: Augmented (SMOTE), Regular Validation")
metrics_3 = aug_train_reg_val()
all_metrics.append(metrics_3)
model_names.append("Augmented SMOTE Train, Regular Val")
print(f"   ‚úì Completed - Accuracy: {metrics_3['accuracy']:.4f}")

# 4. Augmented training, scaled validation
print("\n[4/8] Training: Augmented (SMOTE), Scaled Validation")
metrics_4 = aug_train_scaled_val()
all_metrics.append(metrics_4)
model_names.append("Augmented SMOTE Train, Scaled Val")
print(f"   ‚úì Completed - Accuracy: {metrics_4['accuracy']:.4f}")

# 5. Scaled training, regular validation
print("\n[5/8] Training: Scaled Training, Regular Validation")
metrics_5 = scale_train_reg_val()
all_metrics.append(metrics_5)
model_names.append("Scaled Train, Regular Val")
print(f"   ‚úì Completed - Accuracy: {metrics_5['accuracy']:.4f}")

# 6. Scaled training, scaled validation
print("\n[6/8] Training: Scaled Training, Scaled Validation")
metrics_6 = scale_train_val()
all_metrics.append(metrics_6)
model_names.append("Scaled Train, Scaled Val")
print(f"   ‚úì Completed - Accuracy: {metrics_6['accuracy']:.4f}")

# 7. Augmented and scaled training, regular validation
print("\n[7/8] Training: Augmented & Scaled Training, Regular Validation")
metrics_7 = scale_aug_train_reg_val()
all_metrics.append(metrics_7)
model_names.append("Aug SMOTE & Scaled Train, Regular Val")
print(f"   ‚úì Completed - Accuracy: {metrics_7['accuracy']:.4f}")

# 8. Augmented and scaled training, scaled validation
print("\n[8/8] Training: Augmented & Scaled Training, Scaled Validation")
metrics_8 = scale_aug_train_scale_val()
all_metrics.append(metrics_8)
model_names.append("Aug SMOTE & Scaled Train, Scaled Val")
print(f"   ‚úì Completed - Accuracy: {metrics_8['accuracy']:.4f}")

print("\n" + "=" * 70)
print("‚úì All models trained and validated!")
print("=" * 70)


EXECUTING ALL MODEL CONFIGURATIONS

[1/8] Training: Regular Model (No Augmentation, No Scaling)


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT

Increase the number of iterations to improve the convergence (max_iter=1000).
You might also want to scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


‚úì Model trained successfully!
Model parameters: {'C': 1.0, 'class_weight': None, 'dual': False, 'fit_intercept': True, 'intercept_scaling': 1, 'l1_ratio': None, 'max_iter': 1000, 'multi_class': 'deprecated', 'n_jobs': None, 'penalty': 'l2', 'random_state': 42, 'solver': 'lbfgs', 'tol': 0.0001, 'verbose': 0, 'warm_start': False}
   ‚úì Completed - Accuracy: 0.6946

[2/8] Training: Regular Model, Scaled Validation


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT

Increase the number of iterations to improve the convergence (max_iter=1000).
You might also want to scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


‚úì Model trained successfully!
Model parameters: {'C': 1.0, 'class_weight': None, 'dual': False, 'fit_intercept': True, 'intercept_scaling': 1, 'l1_ratio': None, 'max_iter': 1000, 'multi_class': 'deprecated', 'n_jobs': None, 'penalty': 'l2', 'random_state': 42, 'solver': 'lbfgs', 'tol': 0.0001, 'verbose': 0, 'warm_start': False}
   ‚úì Completed - Accuracy: 0.4790

[3/8] Training: Augmented (SMOTE), Regular Validation


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT

Increase the number of iterations to improve the convergence (max_iter=1000).
You might also want to scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


‚úì Model trained successfully!
Model parameters: {'C': 1.0, 'class_weight': None, 'dual': False, 'fit_intercept': True, 'intercept_scaling': 1, 'l1_ratio': None, 'max_iter': 1000, 'multi_class': 'deprecated', 'n_jobs': None, 'penalty': 'l2', 'random_state': 42, 'solver': 'lbfgs', 'tol': 0.0001, 'verbose': 0, 'warm_start': False}
   ‚úì Completed - Accuracy: 0.5090

[4/8] Training: Augmented (SMOTE), Scaled Validation
‚úì Model trained successfully!
Model parameters: {'C': 1.0, 'class_weight': None, 'dual': False, 'fit_intercept': True, 'intercept_scaling': 1, 'l1_ratio': None, 'max_iter': 1000, 'multi_class': 'deprecated', 'n_jobs': None, 'penalty': 'l2', 'random_state': 42, 'solver': 'lbfgs', 'tol': 0.0001, 'verbose': 0, 'warm_start': False}
   ‚úì Completed - Accuracy: 0.4731

[5/8] Training: Scaled Training, Regular Validation
‚úì Model trained successfully!
Model parameters: {'C': 1.0, 'class_weight': None, 'dual': False, 'fit_intercept': True, 'intercept_scaling': 1, 'l1_ratio': 

STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT

Increase the number of iterations to improve the convergence (max_iter=1000).
You might also want to scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [31]:
# Compare all models and generate report
print("\n" + "=" * 70)
print("GENERATING COMPARISON REPORT")
print("=" * 70)

comparison_results, report_path = compare_and_report(
    metrics_list=all_metrics,
    model_names=model_names,
    output_path=None,  # Will auto-generate with timestamp
    title="Logistic Regression Model Comparison Report"
)

print(f"\n‚úì Report generated successfully!")
print(f"üìÑ Report saved to: {report_path}")

# Print summary to console
print_comparison_summary(comparison_results)



GENERATING COMPARISON REPORT

‚úì Report generated successfully!
üìÑ Report saved to: /home/ari/Collage/04-Forth_Year/Preimer_Semestre/AM/Final_Proj/Machine-Learning-Project/src/results/model_comparison_report_20251119_010822.md
MODEL COMPARISON SUMMARY

üìä Best Model by Metric:
----------------------------------------------------------------------
  Accuracy       : Regular (No Aug, No Scale)     (0.6946)
  Precision      : Aug SMOTE & Scaled Train, Scaled Val (0.5871)
  Recall         : Regular (No Aug, No Scale)     (0.6946)
  F1             : Regular (No Aug, No Scale)     (0.5694)
  Roc_auc        : Aug SMOTE & Scaled Train, Scaled Val (0.5127)

üèÜ Overall Ranking:
----------------------------------------------------------------------
  ü•á Rank 1: Scaled Train, Scaled Val       (Score: 0.9610)
  ü•à Rank 2: Scaled Train, Regular Val      (Score: 0.9594)
  ü•â Rank 3: Aug SMOTE & Scaled Train, Regular Val (Score: 0.9594)
     Rank 4: Regular (No Aug, No Scale)     (Score: