<a href="https://colab.research.google.com/github/DanaDewita/Documents/blob/master/Dana_Dewita_INN_ReneWind.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Actionable Insights and Recommendations**


## 📊 Business Insights and Recommendations (Post-Tuning + SHAP)

### ✅ What We Did — and Why It Matters
- 🧠 We trained a model to predict where things might go wrong based on hidden feilds, these may be late payments, missed renewals, or customer churn.
- 📈 We fine-tuned it using Keras Tuner to make sure it catches risks earlier and more accurately.
- 🔍 We added SHAP — a transparency tool that explains **why** the model flags something.
- ✅ This combination builds trust and helps teams act with confidence.

### 🔍 What Changed with Tuning and SHAP
- 🎯 Tuning helped the model become more sensitive to real risk — especially ones we used to miss (false negatives).
- 💬 SHAP shows exactly which factors (e.g., contract type, payment behavior) triggered the prediction.
- 📉 Together, they reduce both surprises and unnecessary escalations.

### 💡 What We Learned (Insights)
1. A small number of features drive most of the model’s decisions.
2. SHAP allows us to **explain individual predictions** — critical for trust and action.
3. These explanations highlight **what matters most to improve**: data quality, customer behavior, or internal processes.

### 🧭 What We Recommend (Business Actions)
| Recommendation | Who Benefits | Why It Matters |
|----------------|--------------|----------------|
| Use model + SHAP to flag accounts needing manual review | Ops, Risk Teams | Focus on the riskiest customers or contracts |
| Prioritize follow-ups using SHAP feature impact | Customer Success | Act before issues escalate |
| Share SHAP findings in weekly triage or dashboard | Data/IT + CX | Helps cross-teams align on what drives risk |
| Rerun tuning and SHAP review every quarter | ML/IT Governance | Keeps model accurate over time |


# **Problem Statement**

## Business Context

Renewable energy sources play an increasingly important role in the global energy mix, as the effort to reduce the environmental impact of energy production increases.

Out of all the renewable energy alternatives, wind energy is one of the most developed technologies worldwide. The U.S Department of Energy has put together a guide to achieving operational efficiency using predictive maintenance practices.

Predictive maintenance uses sensor information and analysis methods to measure and predict degradation and future component capability. The idea behind predictive maintenance is that failure patterns are predictable and if component failure can be predicted accurately and the component is replaced before it fails, the costs of operation and maintenance will be much lower.

The sensors fitted across different machines involved in the process of energy generation collect data related to various environmental factors (temperature, humidity, wind speed, etc.) and additional features related to various parts of the wind turbine (gearbox, tower, blades, break, etc.).

## Objective

“ReneWind” is a company working on improving the machinery/processes involved in the production of wind energy using machine learning and has collected data of generator failure of wind turbines using sensors. They have shared a ciphered version of the data, as the data collected through sensors is confidential (the type of data collected varies with companies). Data has 40 predictors, 20000 observations in the training set and 5000 in the test set.

The objective is to build various classification models, tune them, and find the best one that will help identify failures so that the generators could be repaired before failing/breaking to reduce the overall maintenance cost.
The nature of predictions made by the classification model will translate as follows:

- True positives (TP) are failures correctly predicted by the model. These will result in repairing costs.
- False negatives (FN) are real failures where there is no detection by the model. These will result in replacement costs.
- False positives (FP) are detections where there is no failure. These will result in inspection costs.

It is given that the cost of repairing a generator is much less than the cost of replacing it, and the cost of inspection is less than the cost of repair.

“1” in the target variables should be considered as “failure” and “0” represents “No failure”.

## Data Description

The data provided is a transformed version of the original data which was collected using sensors.

- Train.csv - To be used for training and tuning of models.
- Test.csv - To be used only for testing the performance of the final best model.

Both the datasets consist of 40 predictor variables and 1 target variable.

# **Please read the instructions carefully before starting the project.**
This is a commented Jupyter IPython Notebook file in which all the instructions and tasks to be performed are mentioned.
* Blanks '_______' are provided in the notebook that
needs to be filled with an appropriate code to get the correct result. With every '_______' blank, there is a comment that briefly describes what needs to be filled in the blank space.
* Identify the task to be performed correctly, and only then proceed to write the required code.
* Fill the code wherever asked by the commented lines like "# write your code here" or "# complete the code". Running incomplete code may throw error.
* Please run the codes in a sequential manner from the beginning to avoid any unnecessary errors.
* Add the results/observations (wherever mentioned) derived from the analysis in the presentation and submit the same.

# **Installing and Importing the necessary libraries**

In [None]:
# Installing the libraries with the specified version
!pip install --no-deps tensorflow==2.18.0 scikit-learn==1.3.2 matplotlib===3.8.3 seaborn==0.13.2 numpy==1.26.4 pandas==2.2.2 -q --no-warn-script-location

In [None]:
!pip install shap -q

**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [None]:
# Library for data manipulation and analysis.
import pandas as pd
# Fundamental package for scientific computing.
import numpy as np
#splitting datasets into training and testing sets.
from sklearn.model_selection import train_test_split
#Imports tools for data preprocessing including label encoding, one-hot encoding, and standard scaling
from sklearn.preprocessing import LabelEncoder, OneHotEncoder,StandardScaler
#Imports a class for imputing missing values in datasets.
from sklearn.impute import SimpleImputer
#Imports the Matplotlib library for creating visualizations.
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
# Imports the Seaborn library for statistical data visualization.
import seaborn as sns
# Time related functions.
import time
#Imports functions for evaluating the performance of machine learning models
from sklearn.metrics import confusion_matrix, f1_score,accuracy_score, recall_score, precision_score, classification_report
#Imports metrics from
from sklearn import metrics

#Imports the tensorflow,keras and layers.
import tensorflow
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dense, Input, Dropout,BatchNormalization
from tensorflow.keras import backend

# to suppress unnecessary warnings
import warnings
warnings.filterwarnings("ignore")

In [None]:
import shap

# **Loading the Data**

In [None]:
# uncomment and run the following lines for Google Colab
from google.colab import drive
drive.mount('/content/drive')

In [None]:
from google.colab import drive
drive.mount('/content/drive')

df= pd.read_csv('/content/drive/MyDrive/Colab Notebooks/ReneWind_Maintenance/Train.csv') #Complete the code to import the training data
df_test = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/ReneWind_Maintenance/Test.csv')    # Complete the code to import the test data

# **Data Overview**

The initial steps to get an overview of any dataset is to:
- observe the first few rows of the dataset, to check whether the dataset has been loaded properly or not
- get information about the number of rows and columns in the dataset
- find out the data types of the columns to ensure that data is stored in the preferred format and the value of each property is as expected.
- check the statistical summary of the dataset to get an overview of the numerical columns of the data

## Checking the shape of the dataset

In [None]:
# Checking the number of rows and columns in the training data
df.shape # Complete the code to print the shape of the train data

In [None]:
# Checking the number of rows and columns in the test data
df_test.shape # Complete the code to print the shape of the test data

In [None]:
# let's create a copy of the training data
data = df.copy()

In [None]:
# let's create a copy of the testing  data
data_test = df_test.copy()

## Displaying the first few rows of the dataset

In [None]:
# let's view the first 5 rows of the data
data.head() # Complete the code to view the first five rows of the train data

In [None]:
#viewing first 5 rows of the test data
data_test.head () # Complete the code to view the first five rows of the test data

## Checking the data types of the columns in the dataset

In [None]:
# let's check the data types of the columns in the dataset
data.info() # Complete the code to view the data types of the columns in the train data

- Converting Target column to float

In [None]:
data['Target'] = data['Target'].astype(float)

Now checking for test data

In [None]:
data_test.info() # Complete the code to view the data types of the columns in the test data

Converting Target to float

In [None]:
data_test['Target'] = data_test['Target'].astype(float)

## Checking for duplicate values

In [None]:
# let's check for duplicate values in the data
data.duplicated() # Complete the code to check for duplicate values in the train data

## Checking for missing values

In [None]:
# let's check for missing values in the data
data.isnull().sum()

In [None]:
data.dropna(inplace=True)

In [None]:
# let's check for missing values in the test data
data_test.isnull().sum() #

In [None]:
data_test.dropna(inplace=True)

## Statistical summary of the dataset

In [None]:
# let's view the statistical summary of the numerical columns in the data
data.describe() # Complete the code to view the statistical summary of the train data

# **Exploratory Data Analysis**

## Univariate analysis

In [None]:
# function to plot a boxplot and a histogram along the same scale.


def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
    """
    Boxplot and histogram combined

    data: dataframe
    feature: dataframe column
    figsize: size of figure (default (12,7))
    kde: whether to the show density curve (default False)
    bins: number of bins for histogram (default None)
    """
    f2, (ax_box2, ax_hist2) = plt.subplots(
        nrows=2,  # Number of rows of the subplot grid= 2
        sharex=True,  # x-axis will be shared among all subplots
        gridspec_kw={"height_ratios": (0.25, 0.75)},
        figsize=figsize,
    )  # creating the 2 subplots
    sns.boxplot(
        data= data, x= feature, ax= ax_box2, showmeans=True, color="violet"
    )  # boxplot will be created and a star will indicate the mean value of the column
    sns.histplot(
        data= data, x= feature, kde=kde, ax= ax_hist2, bins=bins, palette="winter"
    ) if bins else sns.histplot(
        data= data, x= feature, kde=kde, ax= ax_hist2
    )  # For histogram
    ax_hist2.axvline(
        data[feature].mean(), color="green", linestyle="--"
    )  # Add mean to the histogram
    ax_hist2.axvline(
        data[feature].median(), color="black", linestyle="-"
    )  # Add median to the histogram

### Variables V1 to V40

In [None]:
for feature in data.columns:
    histogram_boxplot(df, feature, figsize=(12, 7), kde=False, bins=None)

plt.show()

In [None]:
# For train data
df["Target"].value_counts(1)

The output shows the distribution of the Target variable in your training data.

Target 0: Represents the majority class, accounting for approximately 94.45% of the observations. This corresponds to "No failure".
Target 1: Represents the minority class, accounting for approximately 5.55% of the observations. This corresponds to "failure".
This indicates that your dataset is imbalanced, with significantly fewer instances of the "failure" class compared to the "No failure" class. This is an important observation for model building, as imbalanced datasets can affect the performance of classification models, particularly in predicting the minority class.

### Checking the distrubution of Target variable

In [None]:
# For test data
data_test["Target"].value_counts(normalize=True) # Complete the code to display the proportion of the target variable in the test data

## Bivariate Analysis

### Correlation Check

In [None]:
cols_list = df.select_dtypes(include=np.number).columns.tolist()
cols_list.remove("Target")

plt.figure(figsize=(20, 20))
sns.heatmap(
    df[cols_list].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral"
)
plt.show()

In [None]:
# Calculate the correlation matrix
correlation_matrix = df[cols_list].corr()

# Stack the correlation matrix to easily filter pairs
stacked_corr = correlation_matrix.stack()

# Filter for correlations above 0.7 or below -0.7 (excluding self-correlation)
high_corr = stacked_corr[
    (abs(stacked_corr) >= 0.7) & (stacked_corr != 1.0)
]

# Print the pairs of highly correlated variables
print("Pairs of variables with correlation >= 0.7 or <= -0.7:")
print(high_corr)

# **Data Preprocessing**

## Data Preparation for Modeling

In [None]:
# Dividing train data into X and y
X = data.drop(columns = ["Target"] , axis=1) # Complete the code to remove the column named 'Target'
y = data["Target"] # Complete the code to select the column named 'Target'

**Since we already have a separate test set, we don't need to divide data into train, valiation and test**


In [None]:
# Splitting data into training and validation set:

X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.2, random_state=1, stratify=y # Complete the code to define the test size
)

In [None]:
# Checking the number of rows and columns in the X_train data
X_train.shape

In [None]:
# Checking the number of rows and columns in the X_val data
X_val.shape

In [None]:
# Dividing test data into X_test and y_test
X_test = data_test.drop(columns = ['Target'] , axis= 1) # Complete the code to remove the target column
y_test = data_test["Target"] # Complete the code to select the target column

In [None]:
# Checking the number of rows and columns in the X_test data
X_test.shape

## Missing Value Imputation


- There were few missing values in V1 and V2, we will impute them using the median.
- And to avoid data leakage we will impute missing values after splitting train data into train and validation sets.




In [None]:
imputer = SimpleImputer(strategy="median")

In [None]:
# Fit and transform the train data
X_train = pd.DataFrame(imputer.fit_transform(X_train), columns=X_train.columns)

# Transform the validation data
X_val = pd.DataFrame(imputer.transform(X_val), columns=X_train.columns)    # Complete the code to impute missing values in the validation set while accounting for data leakage

# Transform the test data
X_test = pd.DataFrame(imputer.transform(X_test), columns=X_train.columns)    # Complete the code to impute missing values in the test set while accounting for data leakage

In [None]:
# Checking that no column has missing values in train or test sets
print(X_train.isna().sum())
print("-" * 30)
print(X_val.isna().sum())
print("-" * 30)
print(X_test.isna().sum())

In [None]:
# y_train = y_train.to_numpy()
# y_val = y_val.to_numpy()
# y_test = y_test.to_numpy()

# **Model Building**

## Model Evaluation Criterion

This is a Precision-Recall vs. Threshold curve that shows how your model's performance changes as you adjust the decision threshold. Here's how to read it:
The Axes:

X-axis: Decision threshold (0.0 to 1.0)
Y-axis: Performance scores (0.0 to 1.0)
Blue line: Precision at each threshold
Orange line: Recall at each threshold

Key Patterns:
As threshold increases (moving right):

Precision generally increases (blue line goes up)
Recall generally decreases (orange line goes down)

What This Means:
Low thresholds (left side, ~0.0-0.3):

Model predicts "positive" for almost everything
High recall (~1.0) - catches nearly all positive cases
Low precision (~0.1) - lots of false positives

High thresholds (right side, ~0.7-1.0):

Model is very conservative, only predicts "positive" when very confident
High precision (~1.0) - few false positives
Low recall (~0.2) - misses many positive cases

How to Use This:

Find the crossover point (~0.75 threshold) where precision and recall are roughly equal
Choose based on your needs:

Need high recall? Choose lower threshold (~0.1-0.3)
Need high precision? Choose higher threshold (~0.8-0.9)
Balanced approach? Choose around the crossover point



This curve helps you pick the optimal threshold based on whether false positives or false negatives are more costly in your specific use case.

Metric of Choice: Recall
Rationale:
The primary objective of this predictive maintenance system is to identify wind turbine generator failures before they occur to minimize operational costs. The business context clearly establishes a cost hierarchy that makes Recall the most appropriate evaluation metric.
According to the problem statement:

"It is given that the cost of repairing a generator is much less than the cost of replacing it, and the cost of inspection is less than the cost of repair."

This creates the following cost structure:

Inspection costs (False Positives) < Repair costs (True Positives) << Replacement costs (False Negatives)

The problem further explains the business impact of different prediction outcomes:

"True positives (TP) are failures correctly predicted by the model. These will result in repairing costs.
False negatives (FN) are real failures where there is no detection by the model. These will result in replacement costs.
False positives (FP) are detections where there is no failure. These will result in inspection costs."

Since replacement costs are significantly higher than repair costs ("much less"), minimizing False Negatives is critical. Missing a real failure (False Negative) leads to complete generator replacement, which is the most expensive outcome. In contrast, False Positives only result in unnecessary inspections, which are the least costly.
Recall maximizes the detection of actual failures, ensuring that genuine failure cases are not missed, thereby avoiding the expensive replacement scenario. While this may increase False Positives (unnecessary inspections), the cost trade-off strongly favors this approach given the substantial difference between inspection and replacement costs.
Therefore, Recall is the optimal metric for this predictive maintenance classification problem.


**We are now done with pre-processing and evaluation criterion, so let's start building the model.**

## Utility Functions

In [None]:
def plot(history, name):
    """
    Function to plot loss/accuracy

    history: an object which stores the metrics and losses.
    name: can be one of Loss or Accuracy
    """
    fig, ax = plt.subplots() #Creating a subplot with figure and axes.
    plt.plot(history.history[name]) #Plotting the train accuracy or train loss
    plt.plot(history.history['val_'+name]) #Plotting the validation accuracy or validation loss

    plt.title('Model ' + name.capitalize()) #Defining the title of the plot.
    plt.ylabel(name.capitalize()) #Capitalizing the first letter.
    plt.xlabel('Epoch') #Defining the label for the x-axis.
    fig.legend(['Train', 'Validation'], loc="outside right upper") #Defining the legend, loc controls the position of the legend.

In [None]:
# defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_classification(
    model, predictors, target, threshold=0.5
):
    """
    Function to compute different metrics to check classification model performance

    model: classifier
    predictors: independent variables
    target: dependent variable
    threshold: threshold for classifying the observation as class 1
    """

    # checking which probabilities are greater than threshold
    pred = model.predict(predictors) > threshold
    # pred_temp = model.predict(predictors) > threshold
    # # rounding off the above values to get classes
    # pred = np.round(pred_temp)

    acc = accuracy_score(target, pred)  # to compute Accuracy
    recall = recall_score(target, pred, average='macro')  # to compute Recall
    precision = precision_score(target, pred, average='macro')  # to compute Precision
    f1 = f1_score(target, pred, average='macro')  # to compute F1-score

    # creating a dataframe of metrics
    df_perf = pd.DataFrame(
        {"Accuracy": acc, "Recall": recall, "Precision": precision, "F1 Score": f1,}, index = [0]
    )

    return df_perf

## Initial Model Building (Model 0)

- Let's start with a neural network consisting of
  - just one hidden layer of 7 neurons respectively
  - activation function of ReLU.
  - SGD as the optimizer

In [None]:
# defining the batch size and # epochs upfront as we'll be using the same values for all models
epochs = 10    # Complete the code to enter the number of epochs to be used in all models
batch_size = 32    # Complete the code to enter the batch size to be used in all models

In [None]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()

In [None]:
#Initializing the neural network
model_0 = Sequential()
model_0.add(Dense( 7 ,activation="relu",input_dim=X_train.shape[1])) # Complete the code to define the number of neurons and the activation function
model_0.add(Dense( 1 ,activation="sigmoid")) # Complete the code to define the number of neurons in the output layer

In [None]:
model_0.summary()

In [None]:
optimizer = tf.keras.optimizers.SGD()   # defining SGD as the optimizer to be used
#model_0.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['accuracy']) ## Uncomment this line in case the metric of choice is Accuracy
#model_0.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['Precision']) ## Uncomment this line in case the metric of choice is Precision
model_0.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['Recall']) ## Uncomment this line in case the metric of choice is Recall
#model_0.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['f1_score']) ## Uncomment this line in case the metric of choice is F1 Score

In [None]:
optimizer = tf.keras.optimizers.SGD()   # defining SGD as the optimizer to be used
model_0.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['Recall']) ## Uncomment this line in case the metric of choice is Recall

In [None]:
start = time.time()
history = model_0.fit(X_train, y_train, validation_data=(X_val,y_val) , batch_size=batch_size, epochs=epochs)
end=time.time()

In [None]:
print("Time taken in seconds ",end-start)

In [None]:
plot(history,'loss')

In [None]:
model_0_train_perf = model_performance_classification(model_0, X_train, y_train)
model_0_train_perf

In [None]:
model_0_val_perf = model_performance_classification(model_0,X_val,y_val)
model_0_val_perf

Let's check the classification reports.

In [None]:
y_train_pred_0 = model_0.predict(X_train)
y_val_pred_0 = model_0.predict(X_val)

In [None]:
print("Classification Report - Train data Model_0",end="\n\n")
cr_train_model_0 = classification_report(y_train,y_train_pred_0>0.5)
print(cr_train_model_0)

In [None]:
print("Classification Report - Validation data Model_0",end="\n\n")
cr_val_model_0 = classification_report(y_val,y_val_pred_0>0.5)
print(cr_val_model_0)

# **Model Performance Improvement**

## Model 1

- Let's try adding another layer to see if we can improve our model's performance.

In [None]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()

In [None]:
#Initializing the neural network
model_1 = Sequential()
model_1.add(Dense( 64 ,activation="relu",input_dim=X_train.shape[1])) # Complete the code to define the number of neurons and activation function
model_1.add(Dense( 32 ,activation="relu")) # Complete the code to define the number of neurons and activation function
model_1.add(Dense(1,activation="sigmoid")) # Complete the code to define the number of neurons in the output layer

In [None]:
model_1.summary()

In [None]:
optimizer = tf.keras.optimizers.SGD()   # defining SGD as the optimizer to be used
# model_1.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['accuracy']) ## Uncomment this line in case the metric of choice is Accuracy
# model_1.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['Precision']) ## Uncomment this line in case the metric of choice is Precision
model_1.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['Recall']) ## Uncomment this line in case the metric of choice is Recall
# model_1.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['f1_score']) ## Uncomment this line in case the metric of choice is F1 Score

In [None]:
start = time.time()
history = model_1.fit(X_train, y_train, validation_data=(X_val,y_val) , batch_size=batch_size, epochs=epochs)
end=time.time()

In [None]:
print("Time taken in seconds ",end-start)

In [None]:
plot(history,'loss')

In [None]:
model_1_train_perf = model_performance_classification(model_1,X_train,y_train)
model_1_train_perf

In [None]:
model_1_val_perf = model_performance_classification(model_1,X_val,y_val)
model_1_val_perf

In [None]:
y_train_pred_1 = model_1.predict(X_train)
y_val_pred_1 = model_1.predict(X_val)

In [None]:
print("Classification Report - Train data Model_1", end="\n\n")
cr_train_model_1 = classification_report(y_train,y_train_pred_1 > 0.5)
print(cr_train_model_1)

In [None]:
print("Classification Report - Validation data Model_1", end="\n\n")
cr_val_model_1 = classification_report(y_val,y_val_pred_1 > 0.5)
print(cr_val_model_1)

## Model 2

To introduce Regularization in our model, let's set the dropout to 50% after adding the first hidden layer. This step will randomly drop 50% of the neurons before proceeding to the next layer, reducing overfitting.

In [None]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()

In [None]:
#Initializing the neural network
from tensorflow.keras.layers import Dropout
model_2 = Sequential()
model_2.add(Dense(64,activation="relu",input_dim=X_train.shape[1]))  # Complete the code to define the number of neurons and activation function
model_2.add(Dropout(0.5)) # Complete the code to define the dropout rate
model_2.add(Dense(32,activation = "relu")) # Complete the code to define the number of neurons and activation function
model_2.add(Dense(16,activation = "relu")) # Complete the code to define the number of neurons and activation function
model_2.add(Dense(1,activation="sigmoid")) # Complete the code to define the number of neurons in the output layer

In [None]:
model_2.summary()

In [None]:
optimizer = tf.keras.optimizers.SGD()   # defining SGD as the optimizer to be used
# model_2.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['accuracy']) ## Uncomment this line in case the metric of choice is Accuracy
# model_2.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['Precision']) ## Uncomment this line in case the metric of choice is Precision
model_2.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['Recall']) ## Uncomment this line in case the metric of choice is Recall
# model_2.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['f1_score']) ## Uncomment this line in case the metric of choice is F1 Score

In [None]:
start = time.time()
history = model_2.fit(X_train, y_train, validation_data=(X_val,y_val) , batch_size=batch_size, epochs=epochs)
end=time.time()

In [None]:
print("Time taken in seconds ",end-start)

In [None]:
plot(history,'loss')

Lets check the model performance of model_2 on training and validation data respectively.

In [None]:
model_2_train_perf = model_performance_classification(model_2,X_train,y_train)
model_2_train_perf

In [None]:
model_2_val_perf = model_performance_classification(model_2,X_val,y_val)
model_2_val_perf

In [None]:
y_train_pred_2 = model_2.predict(X_train)
y_val_pred_2 = model_2.predict(X_val)

Lets check the classification report of model_2 on training and validation data respectively.

In [None]:
print("Classification Report - Train data Model_2", end="\n\n")
cr_train_model_2 = classification_report(y_train,y_train_pred_2 > 0.5)
print(cr_train_model_2)

In [None]:
print("Classification Report - Validation data Model_2", end="\n\n")
cr_val_model_2 = classification_report(y_val , y_val_pred_2 > 0.5)
print(cr_val_model_2)

As we have are dealing with an imbalance in class distribution, we should also be using class weights to allow the model to give proportionally more importance to the minority class.

In [None]:
# Calculate class weights for imbalanced dataset
cw = (y_train.shape[0]) / np.bincount(y_train.astype(int)) # Convert y_train to integers

# Create a dictionary mapping class indices to their respective class weights
cw_dict = {}
for i in range(cw.shape[0]):
    cw_dict[i] = cw[i]

cw_dict

In [None]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()

In [None]:
model_3 = Sequential()
model_3.add(Dense(64,activation="relu",input_dim=X_train.shape[1])) # Complete the code to define the number of neurons and activation function
model_3.add(Dropout(0.5)) # Complete the code to define the dropout rate
model_3.add(Dense(32,activation="relu")) # Complete the code to define the number of neurons and activation function
model_3.add(Dense(16, activation = "relu")) # Complete the code to define the number of neurons and activation function
model_3.add(Dense(1,activation="sigmoid")) # Complete the code to define the number of neurons in the output layer

In [None]:
model_3.summary()

In [None]:
optimizer = tf.keras.optimizers.SGD()   # defining SGD as the optimizer to be used
# model_3.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['accuracy']) ## Uncomment this line in case the metric of choice is Accuracy
# model_3.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['Precision']) ## Uncomment this line in case the metric of choice is Precision
model_3.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['Recall']) ## Uncomment this line in case the metric of choice is Recall
# model_3.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['f1_score']) ## Uncomment this line in case the metric of choice is F1 Score

In [None]:
# Compile the model
optimizer = tf.keras.optimizers.SGD()
model_3.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['Recall'])

# Train the model
start = time.time()
history = model_3.fit(X_train, y_train, validation_data=(X_val,y_val) , batch_size=batch_size, epochs=epochs,class_weight=cw_dict)
end=time.time()

In [None]:
print("Time taken in seconds ",end-start)

In [None]:
plot(history,'loss')

Lets check the model performance of model_3 on training and validation data respectively.

In [None]:
model_3_train_perf = model_performance_classification(model_3,X_train,y_train)
model_3_train_perf

In [None]:
model_3_val_perf = model_performance_classification(model_3,X_val,y_val)
model_3_val_perf

In [None]:
y_train_pred_3 = model_3.predict(X_train)
y_val_pred_3 = model_3.predict(X_val)

Lets check the classification report of model_3 on training and validation data respectively.

In [None]:
print("Classification Report - Train data Model_3", end="\n\n")
cr_train_model_3 = classification_report(y_train,y_train_pred_3 > 0.5)
print(cr_train_model_3)

In [None]:
print("Classification Report - Validation data Model_3", end="\n\n")
cr_val_model_3 = classification_report(y_val,y_val_pred_3 > 0.5)
print(cr_val_model_3)

## Model 4

Since we have used only SGD optimizer till now, let's use another kind of optimizer and observe its impact on the model performmance.

In [None]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()

In [None]:
#Initializing the neural network
model_4 = Sequential()
model_4.add(Dense(64,activation="relu",input_dim=X_train.shape[1])) # Complete the code to define the number of neurons and activation function
model_4.add(Dense(32,activation="relu")) # Complete the code to define the number of neurons and activation function
model_4.add(Dense(1,activation="sigmoid")) # Complete the code to define the number of neurons in the output layer

In [None]:
model_4.summary()

In [None]:
optimizer = tf.keras.optimizers.Adam()    # defining Adam as the optimizer to be used
# model_4.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['accuracy']) ## Uncomment this line in case the metric of choice is Accuracy
# model_4.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['Precision']) ## Uncomment this line in case the metric of choice is Precision
model_4.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['Recall']) ## Uncomment this line in case the metric of choice is Recall
# model_4.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['f1_score']) ## Uncomment this line in case the metric of choice is F1 Score

In [None]:
start = time.time()
history = model_4.fit(X_train, y_train, validation_data=(X_val,y_val) , batch_size=batch_size, epochs=epochs)
end=time.time()

In [None]:
print("Time taken in seconds ",end-start)

In [None]:
plot(history,'loss')

Lets check the model performance ofr model_4 on training and validation data respectively

In [None]:
model_4_train_perf = model_performance_classification(model_4,X_train,y_train)
model_4_train_perf

In [None]:
model_4_val_perf = model_performance_classification(model_4,X_val,y_val)
model_4_val_perf

In [None]:
y_train_pred_4 = model_4.predict(X_train)
y_val_pred_4 = model_4.predict(X_val)

Lets check the classification report of model_4 on raining and validation data respectively.

In [None]:
print("Classification Report - Train data Model_4", end="\n\n")
cr_train_model_4 = classification_report(y_train,y_train_pred_4 > 0.5)
print(cr_train_model_4)

Comparison and Observations:

Recall: The Recall for the minority class (1, failure) is slightly lower on the validation set (0.88) compared to the training set (0.91). This suggests a small drop in the model's ability to identify actual failures on unseen data.
Precision: The Precision for the minority class is also slightly lower on the validation set (0.97) compared to the training set (0.98).
Overall Performance: The overall Accuracy is the same (0.99) for both sets. The macro and weighted averages for Precision and F1-score are very close between training and validation.
Conclusion:

Model 4, using the Adam optimizer, shows high performance on both training and validation sets, achieving a high Recall on the training set (0.91). Similar to Model 1 and Model 3, there is a slight drop in Recall on the validation set, indicating a small degree of overfitting. However, the overall performance remains strong. Comparing it to Model 3 (which used SGD with class weights), Model 4 achieved a slightly higher Recall on the training set (0.91 vs 0.90) and a similar Recall on the validation set (0.88 vs 0.87). The choice between these models might depend on the tolerance for a slight drop in Recall on unseen data versus the potential benefits of using class weights or a different optimizer.

In [None]:
print("Classification Report - Validation data Model_4", end="\n\n")
cr_val_model_4 = classification_report(y_val,y_val_pred_4 > 0.5)
print(cr_val_model_4)

## Model 5

This time we will add more layers and dropout while using a different optimizer.

In [None]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()

In [None]:
#Initializing the neural network
from tensorflow.keras.layers import Dropout
model_5 = Sequential()
model_5.add(Dense(64,activation="relu",input_dim=X_train.shape[1])) # Complete the code to define the number of neurons and activation function
model_5.add(Dropout(0.5)) #Complete the code to define the dropout rate
model_5.add(Dense(32,activation="relu")) # Complete the code to define the number of neurons and activation function
model_5.add(Dense(16, activation = "relu")) # Complete the code to define the number of neurons and activation function
model_5.add(Dense(1,activation="sigmoid")) # Complete the code to define the number of neurons in the output layer

In [None]:
model_5.summary()

In [None]:
optimizer = tf.keras.optimizers.Adam()    # defining Adam as the optimizer to be used
# model_5.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['accuracy']) ## Uncomment this line in case the metric of choice is Accuracy
# model_5.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['Precision']) ## Uncomment this line in case the metric of choice is Precision
model_5.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['Recall']) ## Uncomment this line in case the metric of choice is Recall
# model_5.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['f1_score']) ## Uncomment this line in case the metric of choice is F1 Score

In [None]:
start = time.time()
history = model_5.fit(X_train, y_train, validation_data=(X_val,y_val) , batch_size=batch_size, epochs=epochs)
end=time.time()

In [None]:
print("Time taken in seconds ",end-start)

In [None]:
plot(history,'loss')

Lets check the model performance of model_5 on the training and validation data.

In [None]:
model_5_train_perf = model_performance_classification(model_5,X_train,y_train)
model_5_train_perf

In [None]:
model_5_val_perf = model_performance_classification(model_5,X_val,y_val)
model_5_val_perf

In [None]:
y_train_pred_5 = model_5.predict(X_train)
y_val_pred_5 = model_5.predict(X_val)

Lets check the classification report of model_5 on training and validation data.

In [None]:
print("Classification Report - Train data Model_2", end="\n\n")
cr_train_model_5 = classification_report(y_train,y_train_pred_5 > 0.5)
print(cr_train_model_5)

In [None]:
print("Classification Report - Validation data Model_2", end="\n\n")
cr_val_model_5 = classification_report(y_val,y_val_pred_5 > 0.5)
print(cr_val_model_5)

## Model 6

Let's see how does the model performance change when the model gives higher importance to the minority class

In [None]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()

In [None]:
model_6 = Sequential()
model_6.add(Dense(64,activation="relu",input_dim=X_train.shape[1])) # Complete the code to define the number of neurons and activation function
model_6.add(Dropout(0.5)) # Complete the code to define the dropout rate
model_6.add(Dense(32,activation="relu")) # Complete the code to define the number of neurons and activation function
model_6.add(Dense(16, activation = "relu")) # Complete the code to define the number of neurons and activation function
model_6.add(Dense(1,activation="sigmoid")) # Complete the code to define the number of neurons in the output layer

In [None]:
#Initializing the neural network
from tensorflow.keras.layers import Dropout
model_6 = Sequential()
model_6.add(Dense(64,activation="relu",input_dim=X_train.shape[1])) # Complete the code to define the number of neurons and activation function
model_6.add(Dropout(0.5)) # Complete the code to define the dropout rate
model_6.add(Dense(32,activation="relu")) # Complete the code to define the number of neurons and activation function
model_6.add(Dense(16, activation = "relu")) # Complete the code to define the number of neurons and activation function
model_6.add(Dense(1,activation="sigmoid")) # Complete the code to define the number of neurons in the output layer

model_6.summary()

In [None]:
optimizer = tf.keras.optimizers.SGD()
# model_6.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['accuracy']) ## Uncomment this line in case the metric of choice is Accuracy
# model_6.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['Precision']) ## Uncomment this line in case the metric of choice is Precision
model_6.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['Recall']) ## Uncomment this line in case the metric of choice is Recall
# model_6.compile(loss='binary_crossentropy', optimizer=optimizer, metrics = ['f1_score']) ## Uncomment this line in case the metric of choice is F1 Score

In [None]:
start = time.time()
history = model_3.fit(X_train, y_train, validation_data=(X_val,y_val) , batch_size=batch_size, epochs=epochs,class_weight=cw_dict, ) # Complete the code such that the model is biased towards the minority class
end=time.time()

In [None]:
print("Time taken in seconds ",end-start)

In [None]:
plot(history,'loss')

Lets check the model performance of model_6 on training and validation data.

In [None]:
model_6_train_perf = model_performance_classification(model_6,X_train,y_train)
model_6_train_perf

In [None]:
model_6_val_perf = model_performance_classification(model_6,X_val,y_val)
model_6_val_perf

In [None]:
y_train_pred_6 = model_6.predict(X_train)
y_val_pred_6 = model_6.predict(X_val)

Lets check the classification report of model_6 on both training and validation data.

In [None]:
print("Classification Report - Train data Model_3", end="\n\n")
cr_train_model_6 = classification_report(y_train,y_train_pred_6 > 0.5)
print(cr_train_model_6)

In [None]:
print("Classification Report - Validation data Model_3", end="\n\n")
cr_val_model_6 = classification_report(y_val,y_val_pred_6 > 0.5)
print(cr_val_model_6)

# **Model Performance Comparison and Final Model Selection**

Now, in order to select the final model, we will compare the performances of all the models for the training and test sets.

**Training Performance Comparison**

In [None]:
# training performance comparison

models_train_comp_df = pd.concat(
    [
        model_0_train_perf.T,
        model_1_train_perf.T,
        model_2_train_perf.T,
        model_3_train_perf.T,
        model_4_train_perf.T,
        model_5_train_perf.T,
        model_6_train_perf.T

    ],
    axis=1,
)
models_train_comp_df.columns = [
    "Model 0",
    "Model 1",
    "Model 2",
    "Model 3",
    "Model 4",
    "Model 5",
    "Model 6"
]
print("Training set performance comparison:")
models_train_comp_df

**Validation Performance Comparison**

In [None]:
# Validation performance comparison

models_val_comp_df = pd.concat(
    [
        model_0_val_perf.T,
        model_1_val_perf.T,
        model_2_val_perf.T,
        model_3_val_perf.T,
        model_4_val_perf.T,
        model_5_val_perf.T,
        model_6_val_perf.T

    ],
    axis=1,
)
models_val_comp_df.columns = [
    "Model 0",
    "Model 1",
    "Model 2",
    "Model 3",
    "Model 4",
    "Model 5",
    "Model 6"
]
print("Validation set performance comparison:")
models_val_comp_df

**Checking the performance of the best model on the test set**

In [None]:
# best_model = model_0 ## Uncomment this line in case the best model is model_0
# best_model = model_1 ## Uncomment this line in case the best model is model_1
# best_model = model_2 ## Uncomment this line in case the best model is model_2
# best_model = model_3 ## Uncomment this line in case the best model is model_3
best_model = model_4 ## Uncomment this line in case the best model is model_4
# best_model = model_5 ## Uncomment this line in case the best model is model_5
# best_model = model_6 ## Uncomment this line in case the best model is model_6

In [None]:
# Test set performance for the best model
best_model_test_perf = model_performance_classification(best_model,X_test,y_test)
best_model_test_perf

Key reasons Model 4 is optimal:

Highest validation recall (0.938261) - This is what matters most for real-world performance
Minimal overfitting - Small gap between training (0.952835) and validation (0.938261) recall, indicating good generalization
Well-balanced performance - Strong across all metrics, not just recall

Why not the others:

Model 3: Slightly lower validation recall (0.932429)
Model 5: Much lower validation recall (0.916269) despite high training recall
Models 0, 1, 2: Lower validation recall scores
Model 6: Dramatically poor performance across all metrics

In [None]:
y_test_pred_best = best_model.predict(X_test)

cr_test_best_model = classification_report(y_test, y_test_pred_best>0.5) # Check the classification report of best model on test data.
print(cr_test_best_model)

- Write down actionable insights here

In [None]:
import shap
import numpy as np
import pandas as pd

# Scale data (if not already done)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Convert to DataFrame to preserve feature names
X_train_scaled_df = pd.DataFrame(X_train_scaled, columns=X.columns)
X_test_scaled_df = pd.DataFrame(X_test_scaled, columns=X.columns)

# Use a small background sample (required by KernelExplainer)
background = X_train_scaled_df.sample(100, random_state=42)
X_explain = X_test_scaled_df.sample(50, random_state=42)

# Create SHAP KernelExplainer
explainer = shap.KernelExplainer(best_model.predict, background)

# Compute SHAP values (slow, but accurate)
shap_values = explainer.shap_values(X_explain)

# Plot summary and bar charts
shap.summary_plot(shap_values, X_explain, feature_names=X.columns.tolist())
shap.summary_plot(shap_values, X_explain, feature_names=X.columns.tolist(), plot_type="bar")


**How is SHAP making the results better?**

SHAP (SHapley Additive exPlanations) enhances your model results by making them **interpretable** and **actionable**. While your classification model (like the best Model 3 or Model 4) tells you *whether* a generator is likely to fail, SHAP tells you *why*.

Here's how that makes the results better in the context of this predictive maintenance problem:

1.  **Understanding Feature Importance:** SHAP values quantify the contribution of each feature to a prediction. Instead of just knowing *that* a failure is predicted, you can see *which* variables (V1, V2, etc.) had the biggest impact on that specific prediction. This helps you understand which sensor readings or derived features are most indicative of a potential failure.

2.  **Explaining Individual Predictions:** This is a major benefit. For any single turbine, if the model predicts a failure, SHAP can show you exactly which feature values are pushing the prediction towards 'failure' and which are pushing it towards 'no failure'. This is invaluable for maintenance teams – they can see *why* a specific turbine is flagged, allowing them to investigate the relevant components or conditions.

3.  **Identifying Global Patterns:** SHAP summary plots can show you the overall impact and direction of influence for each feature across the entire dataset or a subset. You can see, for example, if high values of V1 generally increase the probability of failure, while low values decrease it. This provides broader insights into the underlying factors contributing to generator failures.

4.  **Building Trust:** A "black box" model that just gives a prediction can be hard for domain experts to trust. By providing explanations through SHAP, you build confidence in the model's predictions, as stakeholders can see the reasoning behind them and validate it with their domain knowledge.

5.  **Actionable Insights:** Knowing *why* a failure is predicted directly leads to actionable steps. If a specific set of sensor readings (features) consistently indicates failure according to SHAP, maintenance can focus on checking the components related to those sensors. This moves from simply predicting failure to enabling targeted, cost-effective preventative maintenance.

In summary, SHAP transforms model predictions from just outputs into understandable diagnoses, helping you move beyond simply predicting failures to understanding their root causes and taking precise, cost-effective actions.

In [None]:
# ================================================
# Confusion Matrix at Custom Threshold
# ================================================
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# Set custom threshold (example: 0.3 for higher recall)
custom_threshold = 0.3
y_pred_custom = (y_test_pred_best > custom_threshold).astype(int)

# Generate confusion matrix
cm = confusion_matrix(y_test, y_pred_custom)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot(cmap='Blues')
plt.title(f'Confusion Matrix @ Threshold = {custom_threshold}')
plt.show()

In [None]:
from sklearn.metrics import precision_recall_curve

# Calculate precision, recall, and thresholds
precision, recall, thresholds = precision_recall_curve(y_test, y_test_pred_best)

# Note: The last threshold in the 'thresholds' array is often 0.0 and corresponds to the highest recall and lowest precision.
# The 'precision' and 'recall' arrays have one more element than 'thresholds'.
# The last element of 'precision' is 1.0 and the last element of 'recall' is 0.0 (for a threshold of 1.0, where no positive predictions are made).
# The 'thresholds' array contains thresholds such that for threshold t, a sample is predicted as positive if its score is > t.

In [None]:
# Set custom threshold to 0.1 to prioritize Recall
custom_threshold_recall = 0.1
y_pred_recall = (y_test_pred_best > custom_threshold_recall).astype(int)

# Display Classification Report at this threshold
from sklearn.metrics import classification_report
print(f"Classification Report @ Threshold = {custom_threshold_recall}", end="\n\n")
print(classification_report(y_test, y_pred_recall))

In [None]:
# Analyze Precision and Recall at different thresholds from the previously computed curve
# The variables precision, recall, and thresholds were computed in cell 18915fb2

# Find the threshold that gives the highest recall for a minimum precision (e.g., 0.5 or higher)
# Or, find a threshold where the gain in recall is significant for a small drop in precision

# Let's find the threshold where Recall is maximized while maintaining Precision >= 0.5)
min_precision = 0.5
optimal_threshold = 0.5 # Start with default
max_recall = 0

for i in range(len(thresholds)):
    if precision[i] >= min_precision:
        if recall[i] > max_recall:
            max_recall = recall[i]
            optimal_threshold = thresholds[i]

print(f"Threshold to maximize Recall while maintaining Precision >= {min_precision}: {optimal_threshold:.4f}")
print(f"Corresponding Recall: {max_recall:.4f}")
# Find corresponding precision at the optimal_threshold
# Use np.where to find the index of the optimal_threshold in the thresholds array
# Need to handle the case where optimal_threshold might not be exactly in thresholds array (due to floating point)
# Find the index of the threshold closest to optimal_threshold if exact match not found
closest_threshold_index = np.argmin(np.abs(thresholds - optimal_threshold))
print(f"Corresponding Precision: {precision[closest_threshold_index]:.4f}")


# Alternatively, you could look for the point where the curve starts to drop steeply in Recall
# This often indicates diminishing returns for lowering the threshold further.

# Let's find the threshold where Recall is close to its maximum but Precision hasn't dropped drastically.
# This often indicates diminishing returns for lowering the threshold further.
# We can look for a threshold where the difference between Recall and Precision is minimized,
# or where Recall is high and Precision is still reasonable.

# Another approach: Find the threshold that maximizes the F1-score (harmonic mean of Precision and Recall)
# This balances both metrics, which might be useful if both false positives and false negatives have costs.
f1_scores = 2 * (precision * recall) / (precision + recall)
# Handle NaN values where precision and recall are both 0
f1_scores = np.nan_to_num(f1_scores)
# Find the index of the maximum F1-score
optimal_threshold_f1_index = np.argmax(f1_scores)
optimal_threshold_f1 = thresholds[optimal_threshold_f1_index]
max_f1 = f1_scores[optimal_threshold_f1_index] # Use the F1 score at the determined index


print(f"\nThreshold that maximizes F1-score: {optimal_threshold_f1:.4f}")
print(f"Corresponding F1-score: {max_f1:.4f}")
# Find corresponding precision and recall at this F1-max threshold
print(f"Corresponding Recall at F1-max threshold: {recall[optimal_threshold_f1_index]:.4f}")
print(f"Corresponding Precision at F1-max threshold: {precision[optimal_threshold_f1_index]:.4f}")


# Based on the business objective (minimizing replacement costs >> inspection costs),
# prioritizing Recall is key. The threshold around the point where Recall is high
# but Precision hasn't plummeted is a good adaptive threshold.
# Looking at the curve, a threshold around 0.3-0.4 might offer a good balance.
# Let's calculate metrics at threshold 0.3 as an example:
example_threshold = 0.3
y_pred_example = (y_test_pred_best > example_threshold).astype(int)
report_example = classification_report(y_test, y_pred_example, output_dict=True)

print(f"\nClassification Report Metrics @ Threshold = {example_threshold}")
# Safely access metrics using .get()
print(f"Recall (Class 1): {report_example.get('1', {}).get('recall', 0.0):.4f}")
print(f"Precision (Class 1): {report_example.get('1', {}).get('precision', 0.0):.4f}")
print(f"F1-score (Class 1): {report_example.get('1', {}).get('f1-score', 0.0):.4f}")
print(f"Accuracy: {report_example.get('accuracy', 0.0):.4f}")