<h1 style="text-align: center; color: darkblue;">Inference</h1>

### 📑 <font color='blue'> Table of Contents </font>
1. [Introduction](#introduction)
2. [Setup](#setup)
3. [Helper Functions](#helpers)
4. [Results](#results) <br>
    4.1. [Metrics](#metrics)<br>
    4.2. [Confussion Matrix](#conf_matrix)<br>
    4.3. [ROC Curve](#roc_curve)<br>

<a name="introduction"></a>
## <font color="darkred"> 1. Introduction </font>

In this notebook, we demonstrate how to use our trained model to make predictions on new data.

The workflow is as follows:
    
- Load model and related components (scaler, encoder, etc.)

- Get new data

- Preprocess the data (scaling, encoding, feature selection, etc.)

- Make predictions

- Interpret the predictions

This order ensures that the data is prepared exactly as during training before generating predictions, and then results can be meaningfully interpreted.

<a name="setup"></a>
## <font color="darkred"> 2. Setup </font>

In [19]:
import os
import joblib
import pandas as pd
import numpy as np
import tensorflow as tf

from tensorflow import keras
from joblib import load

In this notebook we will focus on the last experiment only.

In [5]:
# get last experiment

base_path = "../outputs/saved_models"

# list all experiment directories
experiments = [d for d in os.listdir(base_path) if os.path.isdir(os.path.join(base_path, d))]

# sort them by timestamp at the end of the name
experiments.sort()

# last one (most recent)
latest_experiment = experiments[-1]
latest_path = os.path.join(base_path, latest_experiment)

print("All experiments:", experiments)
print("Latest experiment:", latest_experiment)
print("Path to latest:", latest_path)


All experiments: ['experiment_baseline_standardize_20250905_192125']
Latest experiment: experiment_baseline_standardize_20250905_192125
Path to latest: ../outputs/saved_models/experiment_baseline_standardize_20250905_192125


In [8]:
experiment_path = latest_path

<a name="load"></a>
## <font color="darkred"> 3. Load model and related components </font>

**Load model**

In [10]:
model_path = os.path.join(experiment_path, "model.h5")
model = keras.models.load_model(model_path)

2025-09-05 20:38:27.152858: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


**Load scaler and encoder**

In [11]:
scaler_path = os.path.join(experiment_path, "scaler.pkl")
encoder_path = os.path.join(experiment_path, "encoder.pkl")

In [16]:
# scaler
scaler = load(scaler_path)

# See type of scaler
print(type(scaler))

# Main learned attributes
print("mean:", getattr(scaler, "mean_", None)) # mean for every feature
print("var:", getattr(scaler, "var_", None)) # variance for every feature


<class 'sklearn.preprocessing._data.StandardScaler'>
mean: [1.41559238e+01 1.93511328e+01 9.21518750e+01 6.58153516e+02
 9.61988672e-02 1.03554531e-01 8.85161713e-02 4.88897402e-02
 1.81255273e-01 6.27087305e-02 4.09529102e-01 1.21794902e+00
 2.90134512e+00 4.10547617e+01 6.94725781e-03 2.51113359e-02
 3.16497336e-02 1.17416348e-02 2.04345078e-02 3.75897129e-03
 1.63169453e+01 2.57480273e+01 1.07621934e+02 8.86556445e+02
 1.32138906e-01 2.53280762e-01 2.71695561e-01 1.14682229e-01
 2.90017188e-01 8.38891016e-02]
var: [1.25668844e+01 1.85788022e+01 5.99698209e+02 1.26879396e+05
 2.01562007e-04 2.81310339e-03 6.48463278e-03 1.53749022e-03
 7.52674738e-04 4.73514193e-05 8.17568545e-02 3.12598201e-01
 4.38201521e+00 2.22211910e+03 8.32386721e-06 2.98804298e-04
 9.40867147e-04 3.93270921e-05 6.84521929e-05 6.59908845e-06
 2.36568213e+01 3.76441772e+01 1.15197767e+03 3.32406115e+05
 5.35499243e-04 2.45697564e-02 4.37180255e-02 4.36030274e-03
 3.76101920e-03 3.24277262e-04]


In [17]:
encoder = joblib.load(encoder_path)
encoder

{'M': 1, 'B': 0}

<a name="new_data"></a>
## <font color="darkred"> 4. Get new data </font>

In this example, we will simulate some new data.

In [65]:
data = np.array([17.99, 10.38, 122.8, 1001.0, 0.1184, 0.2776, 0.3001,
       0.14, 0.2419, 0.07871, 1.095, 0.9053, 8.589, 153.4, 0.006399,
       0.04904, 0.053, 0.01587, 0.03003, 0.006193, 25.38, 17.33, 184.6,
       2019.0, 0.1622, 0.32, 0.7119, 0.2654, 0.4601, 0.1789])




vector = np.random.rand(30)
data = vector

import numpy as np

# Original data for reference
original = np.array([17.99, 10.38, 122.8, 1001.0, 0.1184, 0.2776, 0.3001,
                     0.14, 0.2419, 0.07871, 1.095, 0.9053, 8.589, 153.4,
                     0.006399, 0.04904, 0.053, 0.01587, 0.03003, 0.006193,
                     25.38, 17.33, 184.6, 2019.0, 0.1622, 0.32, 0.7119,
                     0.2654, 0.4601, 0.1789])

# Generate a new vector: similar scale, but perturbed enough
np.random.seed(42)  # for reproducibility
noise_factor = 0.7  # larger factor → more difference
new_data = original * (1 + noise_factor * (2 * np.random.rand(*original.shape) - 1))

print(new_data)


data = new_data

[1.48301674e+01 1.69297803e+01 1.62684398e+02 1.13926000e+03
 6.13816498e-02 1.43905710e-01 1.14433249e-01 2.11770525e-01
 2.76143610e-01 1.01638350e-01 3.60056030e-01 1.50087314e+00
 1.25864898e+01 9.16219474e+01 3.54859715e-03 2.73038200e-02
 3.84747744e-02 1.64200384e-02 2.71688325e-02 4.38291489e-03
 2.93543571e+01 8.58340005e+00 1.30881863e+02 1.64125839e+03
 1.52224372e-01 4.47758831e-01 4.12576872e-01 2.70688948e-01
 5.19627920e-01 6.53039704e-02]


<a name="preprocessing"></a>
## <font color="darkred"> 5. Preprocess new data </font>

In [66]:
# Apply scaler (the only thing we need in here)
scaled_data = scaler.transform(data.reshape(1,-1))

print(scaled_data)

[[ 0.19019672 -0.56175787  2.88020265  1.35065882 -2.45239105  0.76078753
   0.32184246  4.15397301  3.45867029  5.65735791 -0.17302418  0.50603055
   4.6266769   1.07271689 -1.17799984  0.12683614  0.22250568  0.74602223
   0.81395495  0.24288649  2.6804838  -2.79759906  0.68530888  1.30900397
   0.86796553  1.24070979  0.67378776  2.36257162  3.74403077 -1.03206579]]




In [55]:
y_pred_prob = model.predict(scaled_data)
y_pred_prob

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 27ms/step


array([[0.9999996]], dtype=float32)

In [56]:
y_pred = (y_pred_prob > 0.5).astype(int)
y_pred

array([[1]])

In [None]:
# M with prob. 0.9999999

In [67]:

# using col names!!!!!!!!!!!!

import json
import pandas as pd
import numpy as np


scaler_path = os.path.join(experiment_path, "scaler.pkl")


# Load column names
with open(f"{experiment_path}/columns.json", "r") as f:
    columns = json.load(f)

# Example new_data array
#new_data = np.random.rand(len(columns))  # your new vector

# Convert to DataFrame with proper column names
new_data_df = pd.DataFrame([new_data], columns=columns)

print(new_data_df.head())

scaled_data = scaler.transform(new_data_df)


   radius_mean  texture_mean  perimeter_mean  area_mean  smoothness_mean  \
0    14.830167      16.92978      162.684398    1139.26         0.061382   

   compactness_mean  concavity_mean  concave points_mean  symmetry_mean  \
0          0.143906        0.114433             0.211771       0.276144   

   fractal_dimension_mean  ...  radius_worst  texture_worst  perimeter_worst  \
0                0.101638  ...     29.354357         8.5834       130.881863   

    area_worst  smoothness_worst  compactness_worst  concavity_worst  \
0  1641.258386          0.152224           0.447759         0.412577   

   concave points_worst  symmetry_worst  fractal_dimension_worst  
0              0.270689        0.519628                 0.065304  

[1 rows x 30 columns]


In [68]:
scaled_data

array([[ 0.19019672, -0.56175787,  2.88020265,  1.35065882, -2.45239105,
         0.76078753,  0.32184246,  4.15397301,  3.45867029,  5.65735791,
        -0.17302418,  0.50603055,  4.6266769 ,  1.07271689, -1.17799984,
         0.12683614,  0.22250568,  0.74602223,  0.81395495,  0.24288649,
         2.6804838 , -2.79759906,  0.68530888,  1.30900397,  0.86796553,
         1.24070979,  0.67378776,  2.36257162,  3.74403077, -1.03206579]])

In [69]:
y_pred_prob = model.predict(scaled_data)
y_pred_prob

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step


array([[0.9999996]], dtype=float32)