<img src="https://drive.google.com/file/d/1pRABOqky6x-IHvrw9c1Ai4bvNNLe8k9w/view" alt="Churn Image" style="width: 100%; border-radius: 15px;">


 **From Data_Science in 6_months by Dr.Aammar**
1. Subject of this Juypeter notebook script: **`Portfolio Project_ Binary Classification with a Bank Churn Dataset`**
2. Authorized by: Engr.Hurrirah
3. Where to contact: engr.mht21@gmail.com, [linkedIn](www.linkedin.com/in/hurairah-tahir/), [Github](https://github.com/EngrHurrirah)
4. Date: 01/22/2024

5. `Purpose:` To perfoam:
    + Detailed EDA
    + Apply Neural network model
    + Increase the accuracy

<div style="text-align: left; background-color: #1a1a1a; font-family: 'Trebuchet MS', sans-serif; color: #d9d9d9; padding: 20px; line-height: 1.2; border-radius: 8px; margin-bottom: 0em; text-align: center; font-size: 24px; border: 2px solid #004080;">
About DataSet
</div>


## **Description:**
The dataset for this competition (both train and test) was generated from a deep learning model trained on the Bank Customer Churn Prediction dataset. Feature distributions are close to, but not exactly the same, as the original. 

#### our dataset containe :

| Column Name      | Description |
|------------------|-------------|
| Surname          | Customer's surname (not typically useful for prediction). |
| CreditScore      | Customer's credit score. |
| Geography        | Customer's country of residence. |
| Gender           | Customer's gender. |
| Age              | Customer's age. |
| Tenure           | Number of years the customer has been with the bank. |
| Balance          | Customer's bank balance. |
| NumOfProducts    | Number of products the customer has with the bank. |
| HasCrCard        | Indicates whether the customer has a credit card (1) or not (0). |
| IsActiveMember   | Indicates whether the customer is an active member (1) or not (0). |
| EstimatedSalary  | Customer's estimated salary. |
| Exited           | Whether the customer has Exited   (1) or not (0) - the target variable. |

 + + + + + + + + + + + + + + + + + + + + + + + + + + + 
## **🚀 `Decoding Customer Churn: Data Odyssey` 🚀**

</p>

Dive into the heart of customer retention mysteries with this notebook. Uncover the tapestry of demographics, account intricacies, and the burning question – why do customers say goodbye?

<h3>    
<center> Unmasking the Exit: Behind Customer Churn </center>
</h3>

🔍 Beyond the Basics: Not Just Poor Service 🔍

Explore the nuanced reasons for churn – from the desire for personalized services to the battlefield of pricing wars. It's a journey into the unexpected, where data reveals the true influencers on customer loyalty.

🌐 Personalization Magic: Crafting Unique Experiences 🌐

Discover the impact of tailoring services to individual needs. It's not just about the numbers; it's about creating experiences that resonate on a personal level, keeping customers hooked.

💸 Pricing Wars: The Strategic Dance 💸

Delve into the delicate balance of fees and rates, understanding how they sway customer decisions. It's a battlefield where perceptions of value shape the loyalty landscape.

🔄 Exit Strategies: Navigating Account Closure 🔄

In the banking world, the ease of account closure is a silent game-changer. We explore the pathways, deciphering how simplicity or complexity shapes the loyalty trajectory.

`Join us on this unique journey where data becomes insight, and insight becomes strategy. The pursuit of understanding customer churn goes beyond the ordinary, offering a narrative that's not just informative but captivatingly exceptional.`

## **Acknowledgements**

@misc{playground-series-s4e1,\
author = {Walter Reade, Ashley Chow},\
title = {Binary Classification with a Bank Churn Dataset },\
publisher = {Kaggle},\
year = {2024},\
[url](https://kaggle.com/competitions/playground-series-s4e1)

<div style="text-align: left; background-color: #1a1a1a; font-family: 'Trebuchet MS', sans-serif; color: #d9d9d9; padding: 20px; line-height: 1.2; border-radius: 8px; margin-bottom: 0em; text-align: center; font-size: 24px; border: 2px solid #004080;">
Importing Libraries
</div>



In [None]:
# import libraries

# 1. to handle the data
import pandas as pd
import numpy as np

# to visualize the dataset
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from tabulate import tabulate

# Library to overcome Warnings.
import warnings
warnings.filterwarnings('ignore')

# Library to perform Statistical Analysis.
from scipy import stats
from scipy.stats import chi2
from scipy.stats import chi2_contingency

# Library to Display whole Dataset.
pd.set_option("display.max.columns",None)

# To preprocess the data
from sklearn.preprocessing import StandardScaler, MinMaxScaler, LabelEncoder
from sklearn.impute import SimpleImputer, KNNImputer
# import iterative imputer
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer

<div style="text-align: left; background-color: #1a1a1a; font-family: 'Trebuchet MS', sans-serif; color: #d9d9d9; padding: 20px; line-height: 1.2; border-radius: 8px; margin-bottom: 0em; text-align: center; font-size: 24px; border: 2px solid #004080;">
Loading DataSet
</div>


In [None]:
df_train = pd.read_csv("train.csv")
df_test = pd.read_csv("test.csv")
submission = pd.read_csv("sample_submission.csv")

+ Viewing the first 5 rows of `train dataset`

In [None]:
df_train.head()

+ Viewing the first 5 rows of `test dataset`

<div style="text-align: left; background-color: #1a1a1a; font-family: 'Trebuchet MS', sans-serif; color: #d9d9d9; padding: 20px; line-height: 1.2; border-radius: 8px; margin-bottom: 0em; text-align: center; font-size: 24px; border: 2px solid #004080;">
Data Pre-Processing
</div>


1. Dimension of Data_Set

In [None]:
print("Train dataset shape: ",df_train.shape)
print("Test dataset shape: ",df_test.shape)

<div style="border-radius: 8px; border: 2px solid #004080; padding: 15px; background-color: #1a1a1a; font-size: 18px; text-align: left; color: #d9d9d9;">

<h3 align="left"><font color="#004080"> 🔍 Insights:</font></h3>

* There are a total of **165,034 records** and **14 columns** available in the dataset for training.

</div>


2. Statistical reults from Data

In [None]:
df_train.info()

In [None]:
# Decoding the Data: Unveiling Column Data Types
column_data_types = df_train.dtypes

# Delving into the Diversity: Counting Numerical and Categorical Columns
numerical_count = 0
categorical_count = 0

for column_name, data_type in column_data_types.items():
    if np.issubdtype(data_type, np.number):
        numerical_count += 1
    else:
        categorical_count += 1

# Revealing the Composition
print(f"Exploring the Dataset: {numerical_count} Numerical Columns found in the Training Data")
print(f"Understanding Diversity: {categorical_count} Categorical Columns identified in the Training Data")


3. Listing data set in table which are only numeric

In [None]:
numeric_columns = df_train.select_dtypes(include=[np.number]).columns

# Create a DataFrame with the numeric columns
numeric_columns_df = pd.DataFrame({"Numeric Columns": numeric_columns})

# Display the numeric columns in a table
print(tabulate(numeric_columns_df, headers="keys", tablefmt="pretty"))

4. Listing data set in table which are only Categorical

In [None]:
categorical_columns = df_train.select_dtypes(include=["object"]).columns

# Create a DataFrame with the categorical columns
categorical_columns_df = pd.DataFrame({"Categorical Columns": categorical_columns})

# Display the categorical columns in a table
print(tabulate(categorical_columns_df, headers="keys", tablefmt="pretty"))

<div style="text-align: left; background-color: #1a1a1a; font-family: 'Trebuchet MS', sans-serif; color: #d9d9d9; padding: 20px; line-height: 1.2; border-radius: 8px; margin-bottom: 0em; text-align: center; font-size: 24px; border: 2px solid #004080;">
Dublicates in Data 
</div>


5. Checking for Dublicates `in` train and test `data_set`

In [None]:
print("Duplicates in Train Dataset: ",df_train.duplicated().sum())

In [None]:
print("Duplicates in Test Dataset: ",df_train.duplicated().sum())

<div style="border-radius: 8px; border: 2px solid #004080; padding: 15px; background-color: #1a1a1a; font-size: 18px; text-align: left; color: #d9d9d9;">

<h3 align="left"><font color="#004080"> 🔍 Insights:</font></h3>

**🎉 Great News! No duplicate records found in both Train and Test Datasets. 🎉**

</div>


<div style="text-align: left; background-color: #1a1a1a; font-family: 'Trebuchet MS', sans-serif; color: #d9d9d9; padding: 20px; line-height: 1.2; border-radius: 8px; margin-bottom: 0em; text-align: center; font-size: 24px; border: 2px solid #004080;">
Missing values in Data
</div>


6. Checking for Missing values and there percentage

In [None]:
print("Checking for missing Values+there percentage in Train Dataset")
missing_data = df_train.isnull().sum().to_frame().rename(columns={0:"Total No. of Missing Values"})
missing_data["% of Missing Values"] = round((missing_data["Total No. of Missing Values"]/len(df_train))*100,2)
missing_data

In [None]:
print("Checking for missing Values+there percentage in Test Dataset")
missing_data = df_test.isnull().sum().to_frame().rename(columns={0:"Total No. of Missing Values"})
missing_data["% of Missing Values"] = round((missing_data["Total No. of Missing Values"]/len(df_test))*100,2)
missing_data



<div style="border-radius: 8px; border: 2px solid #004080; padding: 15px; background-color: #1a1a1a; font-size: 18px; text-align: left; color: #d9d9d9;">

<h3 align="left"><font color="#004080"> 🔍 Insights:</font></h3>

**🚫✨ A Glorious Clean Slate: No Missing Values Detected! ✨🚫**

</div>


<div style="text-align: left; background-color: #1a1a1a; font-family: 'Trebuchet MS', sans-serif; color: #d9d9d9; padding: 20px; line-height: 1.2; border-radius: 8px; margin-bottom: 0em; text-align: center; font-size: 24px; border: 2px solid #004080;">
Outliers in Data 
</div>


7.1. Outliers checking in `training data` and handling it

In [None]:
training_num_columns = df_train.select_dtypes(include=['float64', 'int64']).columns.tolist()
print(training_num_columns)

In [None]:
# Create box plots for each numerical column
plt.figure(figsize=(18, 10))

# Set the color palette
custom_palette = sns.dark_palette("#1a1a1a", n_colors=len(training_num_columns))
sns.set_palette(custom_palette)
sns.set(style="whitegrid")

# Adjust the number of rows and columns in the subplot
num_rows = 5
num_cols = 3

for i, col in enumerate(training_num_columns, 1):
    plt.subplot(num_rows, num_cols, i)
    sns.boxplot(x=df_train[col], color='#004080', width=0.5)    
    plt.title(col)
    plt.xlabel("")

plt.suptitle("Distribution of Key Medical Indicators", y=1.02, fontsize=16)
plt.tight_layout(rect=[0, 0, 1, 0.95]) 
plt.show()




<div style="border-radius: 8px; border: 2px solid #004080; padding: 15px; background-color: #1a1a1a; font-size: 18px; text-align: left; color: #d9d9d9;">

<h3 align="left"><font color="#004080"> 🔍 Insights:</font></h3>

**In training Data_set, `Credit_score` and `Age` have most extreme values**

</div>


In [None]:
iqr_columns = ['CreditScore', 'Age']

The method we are using here is known as the **Tukey's Fences method** for handling outliers. This method identifies outliers based on the interquartile range (IQR), which is the range between the first quartile (Q1) and the third quartile (Q3).

In your code:

- `median_value` is calculated for each numerical column.
- `lower_bound` and `upper_bound` are defined using Tukey's Fences formula, considering values outside this range as potential outliers.
- The lambda function is applied to each element in the column, replacing values outside the defined bounds with the median value.

This approach is a robust way to handle outliers by replacing them with a less extreme value (the median in this case) rather than outright removing them. It helps to mitigate the impact of extreme values on the analysis.

In [None]:
for col in iqr_columns:
    median_value = df_train[col].median()
    lower_bound = df_train[col].quantile(0.25) - 1.5 * (df_train[col].quantile(0.75) - df_train[col].quantile(0.25))
    upper_bound = df_train[col].quantile(0.75) + 1.5 * (df_train[col].quantile(0.75) - df_train[col].quantile(0.25))
    df_train[col] = df_train[col].apply(lambda x: median_value if x < lower_bound or x > upper_bound else x)
    

In [None]:
# Create box plots for each numerical column
plt.figure(figsize=(18, 10))

# Set the color palette
custom_palette = sns.dark_palette("#1a1a1a", n_colors=len(training_num_columns))
sns.set_palette(custom_palette)
sns.set(style="whitegrid")

# Adjust the number of rows and columns in the subplot
num_rows = 5
num_cols = 3

for i, col in enumerate(iqr_columns, 1):
    plt.subplot(num_rows, num_cols, i)
    sns.boxplot(x=df_train[col], color='#004080', width=0.5)    
    plt.title(col)
    plt.xlabel("")

plt.suptitle("Distribution of Key Medical Indicators", y=1.02, fontsize=16)
plt.tight_layout(rect=[0, 0, 1, 0.95]) 
plt.show()




<div style="border-radius: 8px; border: 2px solid #004080; padding: 15px; background-color: #1a1a1a; font-size: 18px; text-align: left; color: #d9d9d9;">

<h3 align="left"><font color="#004080"> 🔍 Insights:</font></h3>

**Most extreme vales of `Credit_score` and `Age` are in much control now**

</div>


7.1. Outliers checking in `testing data` and handling it

In [None]:
testing_num_columns = df_test.select_dtypes(include=['float64', 'int64']).columns.tolist()
print(testing_num_columns)

In [None]:
# Create box plots for each numerical column
plt.figure(figsize=(18, 10))

# Set the color palette
custom_palette = sns.dark_palette("#1a1a1a", n_colors=len(testing_num_columns))
sns.set_palette(custom_palette)
sns.set(style="whitegrid")

# Adjust the number of rows and columns in the subplot
num_rows = 5
num_cols = 3

for i, col in enumerate(training_num_columns, 1):
    plt.subplot(num_rows, num_cols, i)
    sns.boxplot(x=df_test[col], color='#004080', width=0.5)    
    plt.title(col)
    plt.xlabel("")

plt.suptitle("Distribution of Key Medical Indicators", y=1.02, fontsize=16)
plt.tight_layout(rect=[0, 0, 1, 0.95]) 
plt.show()




<div style="border-radius: 8px; border: 2px solid #004080; padding: 15px; background-color: #1a1a1a; font-size: 18px; text-align: left; color: #d9d9d9;">

<h3 align="left"><font color="#004080"> 🔍 Insights:</font></h3>

**Same problem as training Data_set, `Credit_score` and `Age` have most extreme values**

</div>


+ Using same testing data set approch to deal with them

In [None]:
iqr_columns = ['CreditScore', 'Age']

In [None]:
for col in iqr_columns:
    median_value = df_test[col].median()
    lower_bound = df_test[col].quantile(0.25) - 1.5 * (df_test[col].quantile(0.75) - df_test[col].quantile(0.25))
    upper_bound = df_test[col].quantile(0.75) + 1.5 * (df_test[col].quantile(0.75) - df_test[col].quantile(0.25))
    df_test[col] = df_test[col].apply(lambda x: median_value if x < lower_bound or x > upper_bound else x)
    

In [None]:
# Create box plots for each numerical column
plt.figure(figsize=(18, 10))

# Set the color palette
custom_palette = sns.dark_palette("#1a1a1a", n_colors=len(training_num_columns))
sns.set_palette(custom_palette)
sns.set(style="whitegrid")

# Adjust the number of rows and columns in the subplot
num_rows = 5
num_cols = 3

for i, col in enumerate(iqr_columns, 1):
    plt.subplot(num_rows, num_cols, i)
    sns.boxplot(x=df_test[col], color='#004080', width=0.5)    
    plt.title(col)
    plt.xlabel("")

plt.suptitle("Distribution of Key Medical Indicators", y=1.02, fontsize=16)
plt.tight_layout(rect=[0, 0, 1, 0.95]) 
plt.show()




<div style="border-radius: 8px; border: 2px solid #004080; padding: 15px; background-color: #1a1a1a; font-size: 18px; text-align: left; color: #d9d9d9;">

<h3 align="left"><font color="#004080"> 🔍 Insights:</font></h3>

**Most extreme vales of `Credit_score` and `Age` are in much control now**

</div>


<div style="text-align: left; background-color: #1a1a1a; font-family: 'Trebuchet MS', sans-serif; color: #d9d9d9; padding: 20px; line-height: 1.2; border-radius: 8px; margin-bottom: 0em; text-align: center; font-size: 24px; border: 2px solid #004080;">
Descriptive Analysis of Data
</div>


8.1 Descriptive analysis on `Numerical data`

In [None]:
round(df_train.describe().T,2)

8.2 Descriptive analysis on `Categorical data`

In [None]:
df_train.describe(include="O").T



<div style="border-radius: 8px; border: 2px solid #004080; padding: 15px; background-color: #1a1a1a; font-size: 18px; text-align: left; color: #d9d9d9;">

<h3 align="left"><font color="#004080"> 🔍 Insights:</font></h3>

+ **Data shows that all persons are adult the min value of `age` we have is `19`**
+ **Most are `Male` from `France`**

</div>


<div style="text-align: left; background-color: #1a1a1a; font-family: 'Trebuchet MS', sans-serif; color: #d9d9d9; padding: 20px; line-height: 1.2; border-radius: 8px; margin-bottom: 0em; text-align: center; font-size: 24px; border: 2px solid #004080;">
Exploratory Data Analysis (EDA)
</div>


### 1. Visualizing the Employee `churn` Rate

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Set the color palette
background_color = '#1a1a1a'
border_color = '#004080'
text_color = '#d9d9d9'

# Create a figure with a dark background
fig, (ax_bar, ax_pie) = plt.subplots(1, 2, figsize=(12, 6), facecolor=background_color)

# Plotting the bar chart
churn_counts = df_train["Exited"].value_counts()
sns.barplot(x=churn_counts.index, y=churn_counts.values, palette='Set2', ax=ax_bar)
ax_bar.set(title="Churn Counts", xlabel="Attrition", ylabel="Count", facecolor=background_color)
for i, v in enumerate(churn_counts.values):
    ax_bar.text(i, v, str(v), ha="center", va="bottom", fontsize=12, color=text_color)

# Create a pie chart
churn_colors = sns.color_palette('Set2')
ax_pie.pie(churn_counts, labels=churn_counts.index, autopct="%.2f%%", colors=churn_colors, startangle=90)
ax_pie.set(title="Churn Percentage", facecolor=background_color)

# Set border color
for axis in [ax_bar, ax_pie]:
    for spine in axis.spines.values():
        spine.set_color(border_color)

# Set ticks color
for axis in [ax_bar, ax_pie]:
    axis.tick_params(axis='both', colors=text_color)

plt.show()




<div style="border-radius: 8px; border: 2px solid #004080; padding: 15px; background-color: #1a1a1a; font-size: 18px; text-align: left; color: #d9d9d9;">

<h3 align="left"><font color="#004080"> 🔍 Insights:</font></h3>

+ **The customers `Churn` rate of this `Bank` is 21.16%**

</div>


# Defining a Function to `Analyse` the `churn rate` with differen `Features` of the Data_set.

We defined `Following Functions:`
+ pie_bar_plot
+ count_percent_plot
+ hist_with_hue

In [None]:

def pie_bar_plot(df, col, churn_col):
    # Set the color palette
    background_color = '#1a1a1a'
    border_color = '#004080'
    text_color = '#d9d9d9'
    
    # Extract value counts for the specified column
    value_counts = df[col].value_counts().sort_index()

    # Create a DataFrame for the pie chart
    pie_df = pd.DataFrame({'labels': value_counts.index, 'values': value_counts.values})

    # Create a DataFrame for the bar plot
    bar_df = pd.DataFrame({'labels': value_counts.index, 'values': df[df[churn_col] == 1][col].value_counts().sort_index().values})

    # Plotly Pie Chart
    fig = px.pie(pie_df, names='labels', values='values', title=f"Distribution by {col}",
                 labels={'labels': f"Distribution by {col}"}, 
                 color_discrete_sequence=px.colors.qualitative.Set2)
    
    fig.update_traces(textinfo='percent+label', pull=[0.1] * len(pie_df), hoverinfo='label+percent')

    # Update layout for the first subplot (Pie chart)
    fig.update_layout(
        title=dict(font=dict(size=14, color=text_color)),
        paper_bgcolor=background_color,
        font=dict(color=text_color),
    )

    # Plotly Bar Chart
    bar_fig = px.bar(bar_df, x='labels', y='values', title=f"Churn Rate by {col}",
                     labels={'labels': f"Churn Rate by {col}", 'values': 'Churn Rate'},
                     color_discrete_sequence=px.colors.qualitative.Set2)

    # Update layout for the second subplot (Bar plot)
    bar_fig.update_layout(
        title=dict(font=dict(size=14, color=text_color)),
        paper_bgcolor=background_color,
        font=dict(color=text_color),
        xaxis=dict(title=dict(font=dict(size=10, color=text_color))),
        yaxis=dict(title=dict(font=dict(size=10, color=text_color))),
    )

    # Display both charts side by side
    bar_fig.show()

# Example usage:
# pie_bar_plot(your_dataframe, 'some_column', 'churn_col')


In [None]:

def count_percent_plot(df, col, attrition_col):
    # Set the color palette
    background_color = '#1a1a1a'
    border_color = '#004080'
    text_color = '#d9d9d9'

    # Create a DataFrame for the bar plot
    bar_df = pd.DataFrame({'labels': df[col].value_counts().index, 'values': df[col].value_counts().values})

    # Create a DataFrame for the second subplot
    df['attrition_label'] = np.where(df[attrition_col] == 1, 'Yes', 'No')
    value_2 = df[df['attrition_label'] == 'Yes'][col].value_counts().reindex(bar_df['labels'])
    attrition_rate = (value_2 / bar_df['values'] * 100).values

    # Plotly Bar Chart for Employees by col
    fig = px.bar(bar_df, x='labels', y='values', title=f"Employees by {col}",
                 labels={'labels': f"{col}", 'values': 'Count'},
                 color_discrete_sequence=px.colors.qualitative.Set2)

    # Update layout for the first subplot
    fig.update_layout(
        title=dict(font=dict(size=14, color=text_color)),
        paper_bgcolor=background_color,
        font=dict(color=text_color),
        xaxis=dict(title=dict(font=dict(size=10, color=text_color))),
        yaxis=dict(title=dict(font=dict(size=10, color=text_color))),
    )

    # Plotly Bar Chart for Employee Attrition by col
    attrition_fig = px.bar(x=bar_df['labels'], y=value_2.values, title=f"Employee Attrition by {col}",
                           labels={'x': f"{col}", 'y': 'Count'},
                           color_discrete_sequence=px.colors.qualitative.Set2)

    # Update layout for the second subplot
    attrition_fig.update_layout(
        title=dict(font=dict(size=14, color=text_color)),
        paper_bgcolor=background_color,
        font=dict(color=text_color),
        xaxis=dict(title=dict(font=dict(size=10, color=text_color))),
        yaxis=dict(title=dict(font=dict(size=10, color=text_color))),
    )

    # Display both charts side by side
    fig.show()
    attrition_fig.show()

# Example usage:
# count_percent_plot(df_train, 'some_column', 'churn_col')


In [None]:

def hist_with_hue(df, col, attrition_col):
    # Set the color palette
    background_color = '#1a1a1a'
    border_color = '#004080'
    text_color = '#d9d9d9'
    
    # Convert integer attrition column to 'Yes' and 'No'
    df['attrition_label'] = np.where(df[attrition_col] == 1, 'Yes', 'No')

    # Create a DataFrame for the first subplot
    hist_df = pd.DataFrame({
        'values': df[col],
        'attrition_label': df['attrition_label']
    })

    # Create a DataFrame for the second subplot
    box_df = pd.DataFrame({
        'values': df[col],
        'attrition_label': df['attrition_label']
    })

    # Plotly Histogram with Hue
    fig_hist = px.histogram(hist_df, x='values', color='attrition_label', marginal='box',
                            title=f"Distribution by {col}",
                            labels={'values': f"{col}", 'attrition_label': 'Attrition'},
                            color_discrete_sequence=px.colors.qualitative.Set2)

    # Update layout for the first subplot
    fig_hist.update_layout(
        title=dict(font=dict(size=14, color=text_color)),
        paper_bgcolor=background_color,
        font=dict(color=text_color),
        xaxis=dict(title=dict(font=dict(size=10, color=text_color))),
        yaxis=dict(title=dict(font=dict(size=10, color=text_color))),
    )

    # Plotly Box Plot
    fig_box = px.box(box_df, x='attrition_label', y='values', title=f"Distribution by {col} & {attrition_col}",
                     labels={'values': f"{col}", 'attrition_label': 'Attrition'},
                     color_discrete_sequence=px.colors.qualitative.Set2)

    # Update layout for the second subplot
    fig_box.update_layout(
        title=dict(font=dict(size=14, color=text_color)),
        paper_bgcolor=background_color,
        font=dict(color=text_color),
        xaxis=dict(title=dict(font=dict(size=10, color=text_color))),
        yaxis=dict(title=dict(font=dict(size=10, color=text_color))),
    )

    # Display both charts side by side
    fig_hist.show()
    fig_box.show()

# Example usage:
# hist_with_hue(df_train, 'some_column', 'churn_col')


2. Analyzing Employee Churn with `Credit_Score`

In [None]:
hist_with_hue(df_train, 'CreditScore', 'Exited')



<div style="border-radius: 8px; border: 2px solid #004080; padding: 15px; background-color: #1a1a1a; font-size: 18px; text-align: left; color: #d9d9d9;">

<h3 align="left"><font color="#004080"> 🔍 Insights:</font></h3>

+ **Most of the customer's Credit Score are between 600 to 700 in churn data**
+ **Not able to make any decision on it**

</div>


### 3. Analyzing Employee Churn by Geography

In [None]:
pie_bar_plot(df_train, 'Geography', 'Exited')



<div style="border-radius: 8px; border: 2px solid #004080; padding: 15px; background-color: #1a1a1a; font-size: 18px; text-align: left; color: #d9d9d9;">

<h3 align="left"><font color="#004080"> 🔍 Insights:</font></h3>


1. A significant majority of customers, approximately 57.1%, originate from France.

2. Customers from Germany exhibit a notably high attrition rate, signaling a considerable proportion leaving the organization.

3. Conversely, customers hailing from France experience a comparatively lower attrition rate, indicating greater retention within the company.

</div>


### 4. Analyzing Employee Churn by Gender.

In [None]:
pie_bar_plot(df_train, 'Gender', 'Exited')



<div style="border-radius: 8px; border: 2px solid #004080; padding: 15px; background-color: #1a1a1a; font-size: 18px; text-align: left; color: #d9d9d9;">

<h3 align="left"><font color="#004080"> 🔍 Insights:</font></h3>


* Male customers accounts for a higher proportion than female customers by more than 12.8%.  
* Attrition in female customers is higher compared to male customers.

</div>


### 5. Employee Distribution by Age

In [None]:
hist_with_hue(df_train, 'Age', 'Exited')



<div style="border-radius: 8px; border: 2px solid #004080; padding: 15px; background-color: #1a1a1a; font-size: 18px; text-align: left; color: #d9d9d9;">

<h3 align="left"><font color="#004080"> 🔍 Insights:</font></h3>




1. The majority of customers fall within the age range of 30 to 40, indicating a concentration of clientele in this demographic.

2. A noticeable pattern emerges, revealing a positive correlation between age and attrition. As age increases, the likelihood of customers leaving the organization also rises.

3. The median age of customers who exited the organization surpasses that of those who are currently employed, suggesting that departing customers tend to be older on average.

4. Notably, there is a higher attrition rate among customers in older age groups compared to their younger counterparts, indicating a trend of increased departure with advancing age.

</div>


#### 6. Analyzing Employee Churn by Tenure

In [None]:
pie_bar_plot(df_train, 'Tenure', 'Exited')



<div style="border-radius: 8px; border: 2px solid #004080; padding: 15px; background-color: #1a1a1a; font-size: 18px; text-align: left; color: #d9d9d9;">

<h3 align="left"><font color="#004080"> 🔍 Insights:</font></h3>




* Only 0 and 10 Tenure period have less no customers i.e 3% and 6% respectively .  
* Attrition rate is almost same in every tenure category i.e in between (19-22 %) except 0 tenure i.e highest 25%.

</div>


#### 7. Analyzing Employee Achur by Balance

In [None]:
hist_with_hue(df_train, 'Balance', 'Exited')

#### 8. Analyzing Employee churn by NumOfProducts

In [None]:
count_percent_plot(df_train,'NumOfProducts','Exited')

#### 9. Analyzing Employee Churn by HasCrCard

In [None]:
pie_bar_plot(df_train, 'HasCrCard', 'Exited')



<div style="border-radius: 8px; border: 2px solid #004080; padding: 15px; background-color: #1a1a1a; font-size: 18px; text-align: left; color: #d9d9d9;">

<h3 align="left"><font color="#004080"> 🔍 Insights:</font></h3>




* 75% of customers have Credit Card
* 25 % of customers dont have credit card
* both classes have almost same rate Attrition i.e 20-22 %
* No meaningfull information for attrition is seen here.

</div>


#### 10. Analyzing Employee Churn by IsActiveMember

In [None]:
pie_bar_plot(df_train, 'IsActiveMember', 'Exited')



<div style="border-radius: 8px; border: 2px solid #004080; padding: 15px; background-color: #1a1a1a; font-size: 18px; text-align: left; color: #d9d9d9;">

<h3 align="left"><font color="#004080"> 🔍 Insights:</font></h3>




* 50.2% Customers are Not Active members with attrition rate 29%
* 49.8% are active members with attrition rate 12%
* Not Active members are most likely to be Exited

</div>


#### 11. Analyzing Employee Churn by EstimatedSalary

In [None]:
hist_with_hue(df_train, 'EstimatedSalary', 'Exited')

In [None]:
df_train.columns

<div style="text-align: left; background-color: #1a1a1a; font-family: 'Trebuchet MS', sans-serif; color: #d9d9d9; padding: 20px; line-height: 1.2; border-radius: 8px; margin-bottom: 0em; text-align: center; font-size: 24px; border: 2px solid #004080;">
Statistical Analysis

---

**TO be Continuted.....**

</div>
