# <center> Customer Churn Prediction Using LSTM 👨🏻‍💻</center>

## <center>[Msc. Diego Hurtado](https://www.linkedin.com/in/diegohurtadoo/)</center>

<img src="https://media.tenor.com/b2kdbWrrNZQAAAAC/going-in-the-portal-tom-holland.gif" width="700" height="500">

# 1 <a id='1'>Introduction</a>
[Table of contents](#0.1)

Telecom or telecommunications industry is one of the fastest-growing and rapidly evolving industries. With the increasing competition, it has become more important than ever for telecom companies to retain their customers. In this context, predicting customer churn, i.e., the likelihood of a customer leaving a company, has become a crucial task for telecom companies.

This Telco Churn Prediction dataset, provided by IBM, contains a sample of customer data with attributes such as customer services, account information, and demographics. The dataset also includes a binary label indicating whether the customer has churned or not. The goal of this dataset is to predict whether a customer is likely to churn or not based on their profile and services subscribed.

By analyzing the customer data and developing retention strategies, telecom companies can not only retain their customers but also acquire new customers by attracting customers from their competitors. In this regard, machine learning models can be used to predict churn and identify the most important features that contribute to customer churn. This can help telecom companies develop focused customer retention programs and improve their business performance.

customerID : Customer ID

gender : Whether the customer is a male or a female

SeniorCitizen : Whether the customer is a senior citizen or not (1, 0)

Partner : Whether the customer has a partner or not (Yes, No)

Dependents : Whether the customer has dependents or not (Yes, No)

tenure : Number of months the customer has stayed with the company

PhoneService : Whether the customer has a phone service or not (Yes, No)

MultipleLines : Whether the customer has multiple lines or not (Yes, No, No phone service)

InternetService : Customer’s internet service provider (DSL, Fiber optic, No)

OnlineSecurity : Whether the customer has online security or not (Yes, No, No internet service)

OnlineBackup : Whether the customer has online backup or not (Yes, No, No internet service)

DeviceProtection : Whether the customer has device protection or not (Yes, No, No internet service)

TechSupport : Whether the customer has tech support or not (Yes, No, No internet service)

StreamingTV : Whether the customer has streaming TV or not (Yes, No, No internet service)

StreamingMovies : Whether the customer has streaming movies or not (Yes, No, No internet service)

Contract : The contract term of the customer (Month-to-month, One year, Two year)

PaperlessBilling : Whether the customer has paperless billing or not (Yes, No)

PaymentMethod : The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))

MonthlyCharges : The amount charged to the customer monthly

TotalCharges : The total amount charged to the customer

Churn : Whether the customer churned or not (Yes or No)

# Table of Contents Diego 👨🏻‍💻<a id='0.1'></a>

* [Introduction](#1)
* [Data Modeling ](#2)
    * [Import Packages](#2.1)
    * [Custom Classes](#2.2)
    * [Data Preprocessing](#2.3)
        * [Data Reading](#2.3.1)
        * [Data Cleaning](#2.3.2)
        * [feature engineering](#2.3.3)
        * [Basic EDA](#2.3.4)
    * [Data Preprocessing](#2.4)
        * [LSTMClassifier](#2.4.1)
        * [Data Cleaning](#2.4.2)
        * [Evaluate](#2.4.3)
        * [LSTM Prediction](#2.4.4)
        * [Feature importance](#2.4.5)
        * [Save LSTM Model](#2.4.6)

# 2 <a id='2'> Data Modeling 📚</a>
[Table of contents](#0.1)

# 2.1 <a id='2.1'>Import Packages📚</a>
[Table of contents](#0.1)

In [None]:
# import os
# os.environ["SM_FRAMEWORK"] = "tf.keras"

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import numpy as np
import pandas as pd
from collections import Counter
from tqdm import tqdm

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import math
from math import sqrt
import sys

from scipy.stats import pearsonr

import category_encoders as ce
# Evaluation Metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder

from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, roc_auc_score, roc_curve, f1_score, recall_score


from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.tree import export_graphviz
from six import StringIO 
import IPython, graphviz
from IPython.display import Image  
from sklearn.tree import export_graphviz
import pydotplus
import re

from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV, KFold, cross_val_score
import json

plt.style.use('seaborn-dark-palette')
plt.rcParams["figure.figsize"] = [16, 8]

width = 1000
height = 750

bg_color = '#FFFFFF'
paper_bg = '#FFFFFF'

plt.rcParams.update({'font.size': 18})
color_charts = '#2baae2'
plt.style.use('ggplot')
color = '#16171f'
plt.rcParams['text.color'] = color
plt.rcParams['axes.labelcolor'] = color
plt.rcParams['xtick.color'] = color
plt.rcParams['ytick.color'] = color

plt.rcParams.update({'text.color' : color,
                             'axes.labelcolor' : color})

plt.rcParams.update({'font.size': 17})
plt.rc('font', size=17)

from sklearn.metrics import precision_recall_curve, auc

from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,ConfusionMatrixDisplay


from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, roc_auc_score, roc_curve, f1_score, recall_score
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, Flatten, Dense
from tensorflow.keras.layers import Dense, LSTM, Dropout
from ydata_profiling import ProfileReport

from keras.callbacks import EarlyStopping, ReduceLROnPlateau
from sklearn.inspection import permutation_importance
from sklearn.metrics import balanced_accuracy_score

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sb

import plotly.graph_objs as go
import plotly.tools as tls
from plotly.offline import iplot, init_notebook_mode

# 2.2 <a id='2.1'>Custom Classes🔍</a>
[Table of contents](#0.1)

In [None]:
class pipeline_churn_prediction():
    def __init__(self):
        self.project = 'churn prediction '
        
    def get_percent_missing(self, df):
        percent_missing = df.isnull().sum() * 100 / len(df)
        missing_value_df = pd.DataFrame({'column_name': df.columns,
                                             'percent_missing': percent_missing})
        missing_value_df.sort_values('percent_missing', inplace=True)
        percent_missing = df.isnull().sum() * 100 / len(df)
        
        print('Percentage of Missing Values: ')

        return percent_missing

In [None]:
def label_encode_features(df):
    """
    Encodes the specified categorical features of the given dataframe using Label Encoder.
    
    Parameters:
    df (pandas.DataFrame): The input dataframe.
    features (list): List of categorical features to be encoded.
    
    Returns:
    encoded_df (pandas.DataFrame): The encoded dataframe with the same columns as the input dataframe.
    """
    le = LabelEncoder()
    encoded_df = df.copy()
    
    features = [i for i in list(df.columns) if i not in list(df.describe().columns)]
    
    print('Label Encoder Transformation')
    for feature in features:
        encoded_df[feature] = le.fit_transform(encoded_df[feature])
        print(f'{feature}: {len(encoded_df[feature].unique())} unique value(s)')
        print(f'Unique values: {list(encoded_df[feature].unique())}\n')
        
    return encoded_df

In [None]:
def unique_counts(df):
    """
    Returns the number of unique values and unique values for each feature in the given dataframe.

    Parameters:
    df (pandas.DataFrame): The input dataframe.

    Returns:
    unique_counts_df (pandas.DataFrame): A dataframe containing the feature names, the number of unique values, the unique values, and the data type of each feature.
    """
    unique_counts = df.nunique()
    unique_values = [df[column].unique() for column in df.columns]
    data_types = [str(df[column].dtype) for column in df.columns]
    unique_counts_df = pd.DataFrame({'feature': df.columns, 'unique_count': unique_counts, 'unique_values': unique_values, 'data_type': data_types})
    return unique_counts_df

# 2.3 <a id='2.3'> Data Preprocessing 🔍</a>
[Table of contents](#0.1)

# 2.3.1 <a id='2.3.1'> Data Reading </a>
[Table of contents](#0.1)

In [None]:
plt.rcParams["figure.figsize"] = [16, 8]
pipeline_churn_diego = pipeline_churn_prediction()

df_train = pd.read_csv('data/WA_Fn-UseC_-Telco-Customer-Churn.csv')
id_sub = df_train.customerID
df_train = df_train.drop(df_train.columns[0],axis=1)

print(df_train.shape[0])
print(len(df_train.columns.tolist()))
df_train.head(1)

In [None]:
# Checking the missing values 
df_train.info()

In [None]:
pipeline_churn_diego.get_percent_missing(df_train)

In [None]:
df_train[['TotalCharges', 'MonthlyCharges']].head(5)

In [None]:
df_train.describe().T

In [None]:
# Describe the string data
df_train.describe(include='O')

In [None]:
df_train.select_dtypes(include=['object'])

# 2.3.2 <a id='2.3.2'> Data Cleaning </a>
[Table of contents](#0.1)

In [None]:
df_train.loc[df_train['TotalCharges'] == ' ', 'TotalCharges'] = np.nan
df_train['TotalCharges'] = df_train['TotalCharges'].astype(float)
df_train.fillna(-99, inplace=True)

In [None]:
unique_counts(df_train.select_dtypes(include=['object']))

In [None]:
# Display the head of the data
df_train.head()

# 2.3.3 <a id='2.3.3'> feature engineering </a>
[Table of contents](#0.1)

In [None]:
df_train['internet']= np.where(df_train.InternetService != 'No', 'Yes', 'No')

df_train['num_services'] = (df_train[['PhoneService', 'OnlineSecurity',
                                      'OnlineBackup', 'DeviceProtection', 
                                      'TechSupport', 'StreamingTV', 
                                      'StreamingMovies', 'internet']] == 'Yes').sum(axis=1)

df_train['Engaged'] = (df_train['Contract'] != 'Month-to-month').astype(int)
df_train['YandNotE'] = ((df_train['SeniorCitizen'] == 0) & (df_train['Engaged'] == 0)).astype(int)
df_train['ElectCheck'] = ((df_train['PaymentMethod'].eq('Electronic check')) & (df_train['Engaged'] == 0)).astype(int)
df_train['fiberopt'] = (df_train['InternetService'] != 'Fiber optic').astype(int)
df_train['StreamNoInt'] = (df_train['StreamingTV'] != 'No internet service').astype(int)
df_train['NoProt'] = ((df_train['OnlineBackup'] != 'No') | (df_train['DeviceProtection'] != 'No') | (df_train['TechSupport'] != 'No')).astype(int)

services = ['PhoneService', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies']
df_train['TotalServices'] = df_train[services].eq('Yes').sum(axis=1)

In [None]:
df_train.head(1)

In [None]:
df_train['Churn'] = df_train['Churn'].replace('Yes', 1)
df_train['Churn'] = df_train['Churn'].replace('No', 0)

In [None]:
# define the colors and fonts
colors = ['#E71D36', '#1F5673', '#F4D35E', '#EE964B']
font = dict(family='Arial', size=10)

#Label encoding Binary columns
le = LabelEncoder()
tmp_churn = df_train[df_train['Churn'] == 1]
tmp_no_churn = df_train[df_train['Churn'] == 0]
bi_cs = df_train.nunique()[df_train.nunique() == 2].keys()
dat_rad = df_train[bi_cs]

for cols in bi_cs :
    tmp_churn[cols] = le.fit_transform(tmp_churn[cols])

data_frame_x = tmp_churn[bi_cs].sum().reset_index()
data_frame_x.columns  = ["feature","yes"]
data_frame_x["no"]    = tmp_churn.shape[0]  - data_frame_x["yes"]
data_frame_x  = data_frame_x[data_frame_x["feature"] != "Churn"]

#count of 1's(yes)
trace1 = go.Scatterpolar(r = data_frame_x["yes"].values.tolist(), 
                         theta = data_frame_x["feature"].tolist(),
                         fill  = "toself",name = "Churn 1's",
                         mode = "markers+lines", visible=True,
                         line=dict(color=colors[0], width=2),
                         marker = dict(size = 8, color=colors[0])
                        )

#count of 0's(No)
trace2 = go.Scatterpolar(r = data_frame_x["no"].values.tolist(),
                         theta = data_frame_x["feature"].tolist(),
                         fill  = "toself",name = "Churn 0's",
                         mode = "markers+lines", visible=True,
                         line=dict(color=colors[1], width=2),
                         marker = dict(size = 8, color=colors[1])
                        ) 

for cols in bi_cs :
    tmp_no_churn[cols] = le.fit_transform(tmp_no_churn[cols])

data_frame_x = tmp_no_churn[bi_cs].sum().reset_index()
data_frame_x.columns  = ["feature","yes"]
data_frame_x["no"]    = tmp_no_churn.shape[0]  - data_frame_x["yes"]
data_frame_x  = data_frame_x[data_frame_x["feature"] != "Churn"]

#count of 1's(yes)
trace3 = go.Scatterpolar(r = data_frame_x["yes"].values.tolist(),
                         theta = data_frame_x["feature"].tolist(),
                         fill  = "toself",name = "NoChurn 1's",
                         mode = "markers+lines", visible=False,
                         line=dict(color=colors[2], width=2),
                         marker = dict(size = 8, color=colors[2])
                        )

#count of 0's(No)
trace4 = go.Scatterpolar(r = data_frame_x["no"].values.tolist(),
                         theta = data_frame_x["feature"].tolist(),
                         fill  = "toself",name = "NoChurn 0's",
                         mode = "markers+lines", visible=False,
                         line=dict(color=colors[3], width=2),
                         marker = dict(size = 8, color=colors[3])
                        ) 

data = [trace1, trace2, trace3, trace4]

updatemenus = list([
    dict(active=0,
         x=-0.15,
         buttons=list([  
            dict(
                label = 'Churn Dist',
                 method = 'update',
                 args = [{'visible': [True, True, False, False]}, 
                     {'title': 'Customer Churn Binary Counting Distribution'}]),
             
             dict(
                  label = 'No-Churn Dist',
                 method = 'update',
                 args = [{'visible': [False, False, True, True]},
                     {'title': 'No Customer Churn Binary Counting Distribution'}]),

        ]),
    )
])

# update the layout
layout = dict(
    title='ScatterPolar Distribution of Churn and Non-Churn Customers (Select from Dropdown)', 
    showlegend=False,
    updatemenus=updatemenus,
    font=font,
    polar=dict(
        radialaxis=dict(
            visible=True,
            showticklabels=True,
            tickangle=0,
            tickfont=dict(family='Arial', size=10),
            ticksuffix=' customers',
            ticklen=10,
            range=[0, 1800]
        ),
        angularaxis=dict(
            tickfont=dict(family='Arial', size=10),
            ticks='outside',
            tickcolor='#DDD',
            ticklen=10
        ),
    ),
    paper_bgcolor='rgb(240,240,240)',
    plot_bgcolor='rgb(240,240,240)',
)

# create the figure and apply the layout
fig = dict(data=data, layout=layout)

# show the chart
iplot(fig)

In [None]:
df_train = label_encode_features(df_train)

In [None]:
col = list(df_train.columns)
categorical_features = []
numerical_features = []
for i in col:
    if len(df_train[i].unique()) > 6:
        numerical_features.append(i)
    else:
        categorical_features.append(i)

print('Categorical Features :',*categorical_features)

In [None]:
print('Numerical Features :',*numerical_features)

In [None]:
from sklearn.preprocessing import MinMaxScaler,StandardScaler

mms = MinMaxScaler() # Normalization
ss = StandardScaler() # Standardization

df_train['MonthlyCharges_Group'] = [int(i / 5) for i in df_train['MonthlyCharges']]
df_train['TotalCharges_Group'] = [int(i / 500) for i in df_train['TotalCharges']]

df_train.drop(columns = ['MonthlyCharges_Group','TotalCharges_Group'], inplace = True)

df_train['tenure'] = mms.fit_transform(df_train[['tenure']])
df_train['MonthlyCharges'] = mms.fit_transform(df_train[['MonthlyCharges']])
df_train['TotalCharges'] = mms.fit_transform(df_train[['TotalCharges']])
df_train.head()

In [None]:
unique_counts(df_train)

In [None]:
df_train.head(1)

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
# Split the dataset into X and y
X = df_train.drop('Churn', axis=1)
y = df_train['Churn']

# Fit a Random Forest Classifier model
model = RandomForestClassifier()
model.fit(X, y)

# Get the feature importance scores
importance_scores = model.feature_importances_

# Create a list of feature names and their importance scores
feature_importance = list(zip(X.columns, importance_scores))

# Sort the features by importance score in descending order
feature_importance.sort(key=lambda x: x[1], reverse=True)

df_feature_importance = pd.DataFrame(feature_importance, columns=['feature', 'importance_score'])

df_feature_importance

In [None]:
# Create a bar chart to show the feature importance
fig, ax = plt.subplots(figsize=(10, 6))

# Set the x and y values
x_values = [x[0] for x in feature_importance]
y_values = [x[1] for x in feature_importance]

# Plot the bar chart
sns.barplot(x=x_values, y=y_values, ax=ax, color='b')

# Set the x-axis labels to be rotated
plt.xticks(rotation=90)

# Set the title and labels
ax.set_title('Feature Importance Scores')
ax.set_xlabel('Features')
ax.set_ylabel('Importance Score')
plt.show()

In [None]:
# Keep only the top 10 most important features
top_features = [feature for feature, score in feature_importance[:15]]

# Create a new dataset with only the top features
df_train = df_train[top_features]

In [None]:
# Threshold for removing correlated variables
threshold = 0.90

# Absolute value correlation matrix
corr_matrix = df_train.corr().abs()

# Getting the upper triangle of correlations
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))

# Select columns with correlations above threshold
to_drop = [column for column in upper.columns if any(upper[column] > threshold)]

print('There are %d columns to remove.' % (len(to_drop)))
print(list(to_drop))

In [None]:
df_train = df_train.drop(columns = to_drop)
print('Training shape: ', df_train.shape)

In [None]:
df_train['churn'] = y

In [None]:
df_train.head(5)

# 2.3.4 <a id='2.3.4'> Super Basic EDA </a>
[Table of contents](#0.1)

In [None]:
profile = ProfileReport(df_train)
profile

In [None]:
df_train.head(1)

In [None]:
# calculate correlation and filter for high correlation
corr = df_train.corr()
high_corr = corr[abs(corr) > 0.6]

# create heatmap with improved style
sns.set(font_scale=1.2)
fig, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(high_corr, annot=True, annot_kws={"size": 8}, cmap='coolwarm', linewidths=.5, cbar=True, square=True, ax=ax)
ax.set_title('High Correlation Features', fontsize=14)
ax.set_xlabel('Features')
ax.set_ylabel('Features')
ax.tick_params(axis='x', labelrotation=45)
plt.show()

In [None]:
print (pearsonr(df_train['MonthlyCharges'], df_train['TechSupport'])[0])

# 2.4 <a id='2.4'> Model 🔍</a>
[Table of contents](#0.1)

# 2.4.1 <a id='2.4.1'> LSTMClassifier </a>
[Table of contents](#0.1)

In [None]:
# Define the function to create the LSTM model outside the class
def create_lstm_model(input_shape):
    model = Sequential()
    model.add(LSTM(64, input_shape=input_shape, return_sequences=True))
    model.add(Dropout(0.2))
    model.add(LSTM(32))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# Define the LSTMClassifier class
class LSTMClassifier:
    
    """
    
    Class Created by: Diego Gustavo Hurtado Olivares
    
    LSTMClassifier is a class that provides an easy-to-use interface for training, evaluating, 
    and tuning Long Short-Term Memory (LSTM) models for binary classification tasks.
    
    This class includes methods for preprocessing data, building and training the LSTM model,
    evaluating the model using various metrics, plotting the training history and ROC curve,
    predicting new instances, saving and loading the model, and obtaining feature importances 
    using permutation importance. It also provides support for hyperparameter tuning, K-fold
    cross-validation, and early stopping.
    
    Example usage:
    --------------
    
    # Initialize the LSTMClassifier with your data
    lstm_classifier = LSTMClassifier(data)
    
    # Preprocess the data
    lstm_classifier.preprocess_data()
    
    # Build the LSTM model
    lstm_classifier.build_model()
    
    # Train the model
    history = lstm_classifier.train_model(epochs=50, batch_size=32)
    
    # Evaluate the model
    lstm_classifier.evaluate_model()
    
    # Plot the training history and ROC curve
    lstm_classifier.plot_training_history(history)
    lstm_classifier.plot_roc_curve()
    
    # Predict new instances
    y_pred = lstm_classifier.predict(X_new)
    
    """
    
    def __init__(self, data):
        # Initialize the data path and variables to store the data, train and test sets, and model
        # self.data_path = data_path
        self.data = data
        self.X_train = None
        self.X_test = None
        self.y_train = None
        self.y_test = None
        self.scaler = None
        self.model = None
        
    def load_data(self):
        # Load the data from the specified path
        self.data = pd.read_csv(self.data_path)
        
    def preprocess_data(self):
        # Split the data into features (X) and target (y)
        X = self.data.drop(["churn"], axis=1)
        y = self.data["churn"]
        
        # Split the data into train and test sets
        self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(X, y, test_size=0.2, random_state=42)
        
        # Standardize the features
        self.scaler = StandardScaler()
        self.X_train = self.scaler.fit_transform(self.X_train)
        self.X_test = self.scaler.transform(self.X_test)
        
        # Reshape the features to be 3D arrays suitable for input into an LSTM model
        self.X_train = np.reshape(self.X_train, (self.X_train.shape[0], 1, self.X_train.shape[1]))
        self.X_test = np.reshape(self.X_test, (self.X_test.shape[0], 1, self.X_test.shape[1]))
        
    def build_model(self):
        # Create a sequential model with two LSTM layers, two dropout layers, and a dense output layer
        self.model = Sequential()
        self.model.add(LSTM(64, input_shape=(1, self.X_train.shape[2]), return_sequences=True))
        self.model.add(Dropout(0.2))
        self.model.add(LSTM(32))
        self.model.add(Dropout(0.2))
        self.model.add(Dense(1, activation='sigmoid'))
        # Compile the model with the binary crossentropy loss function, the Adam optimizer, and accuracy metrics
        self.model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
        
    def early_stopping(self, patience=10, restore_best_weights=True):
        return EarlyStopping(monitor='val_loss', patience=patience, restore_best_weights=restore_best_weights)
    
    def custom_metric(self, y_true, y_pred):
        # Example: Implement the balanced accuracy metric
        balanced_accuracy = balanced_accuracy_score(y_true, y_pred)
        return balanced_accuracy
    
        
    def evaluate_model(self, use_custom_metric=True):
        y_pred_prob = self.model.predict(self.X_test)
        y_pred_rounded = np.round(y_pred_prob)
        y_pred = y_pred_rounded.astype(int).ravel()

        # Evaluation metrics
        confusion_mat = confusion_matrix(self.y_test, y_pred)
        classification_re = classification_report(self.y_test, y_pred)
        accuracy = accuracy_score(self.y_test, y_pred)
        f1 = f1_score(self.y_test, y_pred)
        recall = recall_score(self.y_test, y_pred)

        # Print the evaluation metrics
        print("Confusion Matrix:")
        print(confusion_mat)
        print("Classification Report:")
        print(classification_re)
        print("Accuracy:", accuracy)
        print("F1 Score:", f1)
        print("Recall:", recall)

        if use_custom_metric:
            custom_metric_value = self.custom_metric(self.y_test, y_pred)
            print("Custom Metric Value (Balanced Accuracy):", custom_metric_value)
        
    def plot_training_history(self, history):
        # Plot training & validation accuracy values
        plt.figure(figsize=(12, 5))
        plt.plot(history.history['accuracy'])
        plt.plot(history.history['val_accuracy'])
        plt.title('Model accuracy')
        plt.ylabel('Accuracy')
        plt.xlabel('Epoch')
        plt.legend(['Train', 'Test'], loc='upper left')
        plt.show()

        # Plot training & validation loss values
        plt.figure(figsize=(12, 5))
        plt.plot(history.history['loss'])
        plt.plot(history.history['val_loss'])
        plt.title('Model loss')
        plt.ylabel('Loss')
        plt.xlabel('Epoch')
        plt.legend(['Train', 'Test'], loc='upper left')
        plt.show()

    def plot_roc_curve(self):
        y_pred_prob = self.model.predict(self.X_test).ravel()
        fpr, tpr, thresholds = roc_curve(self.y_test, y_pred_prob)
        auc_score = auc(fpr, tpr)

        plt.figure(figsize=(10, 6))
        plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % auc_score)
        plt.plot([0, 1], [0, 1], 'k--')
        plt.xlim([0.0, 1.0])
        plt.ylim([0.0, 1.05])
        plt.xlabel('False Positive Rate')
        plt.ylabel('True Positive Rate')
        plt.title('Receiver Operating Characteristic (ROC)')
        plt.legend(loc="lower right")
        plt.show()
        
    def predict(self, X):
        # Preprocess input
        X_scaled = self.scaler.transform(X)
        X_reshaped = np.reshape(X_scaled, (X_scaled.shape[0], 1, X_scaled.shape[1]))
        
        # Make predictions
        y_pred_prob = self.model.predict(X_reshaped)
        y_pred_rounded = np.round(y_pred_prob)
        y_pred = y_pred_rounded.astype(int).ravel()
        return y_pred

    def save_model(self, model_path):
        self.model.save(model_path)

    def load_model(self, model_path):
        self.model = load_model(model_path)

    def get_model_summary(self):
        self.model.summary()
        
    def tune_hyperparameters(self, param_grid, cv=5, search_type='grid', n_iter=None, random_state=42):
        input_shape = (1, self.X_train.shape[2])

        # Wrap the model for use with scikit-learn
        model = KerasClassifier(build_fn=lambda: create_lstm_model(input_shape), verbose=0)

        if search_type == 'grid':
            search = GridSearchCV(estimator=model, param_grid=param_grid, cv=cv)
        elif search_type == 'random':
            if n_iter is None:
                raise ValueError("n_iter must be specified for random search.")
            search = RandomizedSearchCV(estimator=model, param_distributions=param_grid, cv=cv,
                                        n_iter=n_iter, random_state=random_state)

        search_result = search.fit(self.X_train, self.y_train)

        # Print the best score and best parameters
        print("Best score: %f using %s" % (search_result.best_score_, search_result.best_params_))
        return search_result

    def k_fold_cross_validation(self, n_splits=5, epochs=50, batch_size=32):
        input_shape = (1, self.X_train.shape[2])

        # Define a function to create the model with the proper input shape
        def create_model():
            return create_lstm_model(input_shape)

        # Wrap the model for use with scikit-learn
        model = KerasClassifier(build_fn=create_model, epochs=epochs, batch_size=batch_size, verbose=0)

        # Perform k-fold cross-validation
        kfold = KFold(n_splits=n_splits, shuffle=True, random_state=42)
        results = cross_val_score(model, self.X_train, self.y_train, cv=kfold)

        # Print the mean and standard deviation of the cross-validation scores
        print("Cross-Validation Accuracy: %.2f%% (%.2f%%)" % (results.mean() * 100, results.std() * 100))

    def model_to_json(self):
        model_json = self.model.to_json()
        return model_json

    def json_to_model(self, model_json):
        self.model = model_from_json(model_json)
        
    def predict_proba(self, X):
        X_scaled = self.scaler.transform(X)
        X_reshaped = np.reshape(X_scaled, (X_scaled.shape[0], 1, X_scaled.shape[1]))
        y_pred_prob = self.model.predict(X_reshaped)
        return y_pred_prob

    def train_val_split(self, val_size=0.1, random_state=42):
        self.X_train, self.X_val, self.y_train, self.y_val = train_test_split(
            self.X_train, self.y_train, test_size=val_size, random_state=random_state)

        # Reshape X_val using the same number of features as in X_train and X_test
        self.X_val = np.reshape(self.X_val, (self.X_val.shape[0], 1, self.X_train.shape[2]))


    def learning_rate_reduction(self, factor=0.1, patience=10, min_lr=1e-5):
        return ReduceLROnPlateau(monitor='val_loss', factor=factor, patience=patience, min_lr=min_lr)

    def train_model(self, epochs, batch_size, use_early_stopping=True, use_lr_reduction=True):
        callbacks = []
        if use_early_stopping:
            callbacks.append(self.early_stopping())
        if use_lr_reduction:
            callbacks.append(self.learning_rate_reduction())

        history = self.model.fit(self.X_train, self.y_train, epochs=epochs, batch_size=batch_size,
                                 validation_data=(self.X_test, self.y_test), callbacks=callbacks)
        return history
    
    def get_feature_importance(self, X, y, n_repeats=10, random_state=42):
        # Wrap the predict_proba method for use with sklearn's permutation_importance function
        def predict_proba_wrapped(X):
            return self.predict_proba(X)

        # Compute the permutation importance
        result = permutation_importance(predict_proba_wrapped, X, y, n_repeats=n_repeats,
                                        random_state=random_state, n_jobs=-1)

        # Combine the feature importances and their names into a dataframe and sort by importance
        feature_importance = pd.DataFrame({'feature': self.data.drop(["churn"], axis=1).columns,
                                           'importance': result.importances_mean,
                                           'std': result.importances_std})

        feature_importance = feature_importance.sort_values(by='importance', ascending=False)

        return feature_importance

# 2.4.2 <a id='2.4.2'> LSTM Training </a>
[Table of contents](#0.1)

In [None]:
# Instantiate the LSTMClassifier
lstm_classifier = LSTMClassifier(df_train)

# Preprocess the data
lstm_classifier.preprocess_data()

# Split the training data into training and validation sets
lstm_classifier.train_val_split()

# Build the model
lstm_classifier.build_model()

# Train the model with early stopping and learning rate reduction
history = lstm_classifier.train_model(epochs=1000, batch_size=128)

In [None]:
# Tune hyperparameters
param_grid = {
    'epochs': [25, 50],
    'batch_size': [32, 64]
}

# grid_result = lstm_classifier.tune_hyperparameters(param_grid)

# Perform k-fold cross-validation
# lstm_classifier.k_fold_cross_validation(n_splits=5, epochs=50, batch_size=32)

# 2.4.3 <a id='2.4.3'> Evaluate </a>
[Table of contents](#0.1)

In [None]:
# Plot the training history
lstm_classifier.plot_training_history(history)

In [None]:
# Evaluate the model
lstm_classifier.evaluate_model()

In [None]:
# Plot the ROC curve
lstm_classifier.plot_roc_curve()

# 2.4.4 <a id='2.4.4'> LSTM Prediction </a>
[Table of contents](#0.1)

In [None]:
y_pred_prob = lstm_classifier.predict_proba(df_test)

In [None]:
predicted = lstm_classifier.predict(df_test)

In [None]:
submission = pd.DataFrame({'id':id_sub, 'churn':predicted})
submission.head(5)

In [None]:
submission.churn.replace([0,1],['no','yes'], inplace=True)

In [None]:
submission.to_csv('submission.csv',index=False)

# 2.4.5 <a id='2.4.5'> Feature importance </a>
[Table of contents](#0.1)

In [None]:
# Get feature importances
# feature_importances = lstm_classifier.get_feature_importance(X_test, y_test)

# Print the feature importances
# print("Feature importances:")
# print(feature_importances)

# 2.4.6 <a id='2.4.'> Save LSTM Model </a>
[Table of contents](#0.1)

In [None]:
# Save the model to disk
# classifier.save_model('lstm_model.h5')

# Load the model from disk
# classifier.load_model('lstm_model.h5')

# Save the model architecture as a JSON string
# model_json = classifier.model_to_json()

# Load the model architecture from a JSON string
# classifier.json_to_model(model_json)

# Print the model architecture summary
# classifier.get_model_summary()

# Thanks 

<img src="https://lh5.googleusercontent.com/9ROjm25aJ9h7n9dPSco1C0OOnEOdYXxO1omW_gAj6SUasnKVE3bqKMcLKzj0ZzLUUvBzrVHrnY2tYGLdJECV2X5_09Q1JAHv_zS3EvNGRNf6IoX9nEQkpPOa67hhBk6yQS53C1Hf" width="400" height="400">

<img src="https://www.vectorlogo.zone/logos/linkedin/linkedin-tile.svg" align='left' alt="plotly" width="60" height="60"/> </a><a> 
## [Msc. Diego Hurtado](https://www.linkedin.com/in/diegohurtadoo/)

<img src="https://www.vectorlogo.zone/logos/github/github-tile.svg" align='left' alt="plotly" width="60" height="60"/> </a><a> 
## [Msc. Diego Hurtado](https://github.com/DiegoHurtad0)

<img src="https://www.vectorlogo.zone/logos/medium/medium-tile.svg" align='left' alt="plotly" width="60" height="60"/> </a><a> 
## [Msc. Diego O’HURTADO](https://medium.com/@diego.hurtado.olivares)

<img src="https://raw.githubusercontent.com/DiegoHurtad0/Covid-19-Dataset-Mexico/master/wave.svg" width="900" height="600">

## [Msc. Diego Hurtado](https://www.linkedin.com/in/diegohurtadoo/)

## “When you are asked if you can do a job, tell ’em, ‘Certainly I can!’ Then get busy and find out how to do it.” — Theodore Roosevelt.