# Assignment-7

This notebook contains the questions to test the proficiency in `Exploratory Data Analysis`.

### Date: 28th June, 2025

### Steps to solve and upload the assignment 

- Download the notebook in your local machine.
- Solve the assignment and save it.
- Rename the file as `Assignment-07-<your_name>_<your_surname>.ipynb`. For example if your name is Dipika Chopra then name the file as `Assignment-07-Dipika_Chopra.ipynb`.
- Upload the solved notebook in the google drive location: https://drive.google.com/drive/folders/1lziCZ4VgEyOvI_kAngLMVlls0S-cugZE?usp=drive_link
<h3><span style="color:red"> Deadline: 26th July, 2025 </span></h3>

### Problem Statement:

The banking churn prediction dataset (uploaded in the same folder) contains different attributes of the customers of the bank and whether they have churned or not.
Churning means closing the relationship with the bank. Following are the descriptions of the columns in the dataset.

- customer_id: Unique identifier for each customer.
- vintage: The duration of the customer's relationship with the company.
- age: Age of the customer.
- gender: Gender of the customer.
- dependents: Number of dependents the customer has.
- occupation: The occupation of the customer.
- city: City in which the customer is located.
- customer_nw_category: Net worth category of the customer.
- branch_code: Code identifying the branch associated with the customer.
- current_balance: Current balance in the customer's account.
- previous_month_end_balance: Account balance at the end of the previous month.
- average_monthly_balance_prevQ: Average monthly balance in the previous quarter.
- average_monthly_balance_prevQ2: Average monthly balance in the second previous quarter.
- current_month_credit: Credit amount in the current month.
- previous_month_credit: Credit amount in the previous month.
- current_month_debit: Debit amount in the current month.
- previous_month_debit: Debit amount in the previous month.
- current_month_balance: Account balance in the current month.
- previous_month_balance: Account balance in the previous month.
- churn: The target variable indicating whether the customer has churned (1 for churned, 0 for not churned).
- last_transaction: Timestamp of the customer's last transaction.


Your task is to perform exploratory data analysis (EDA) on Banking Churn Prediction dataset. 

In [None]:
# -------------------------------
# 1. Import Libraries
# -------------------------------
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# -------------------------------
# 2. Load Dataset
# -------------------------------
file_path = "Banking_churn_prediction.csv"  # Change path if needed
df = pd.read_csv(file_path)

# Convert date columns
df['last_transaction'] = pd.to_datetime(df['last_transaction'], errors='coerce')

# -------------------------------
# 3. Basic Info & Missing Values
# -------------------------------
print("Shape of Dataset:", df.shape)
print("\nBasic Info:")
print(df.info())

# Missing Values
missing_values = df.isnull().sum()
missing_percent = (missing_values / len(df)) * 100
missing_summary = pd.DataFrame({"Missing Values": missing_values, "Missing %": missing_percent.round(2)})
print("\nMissing Value Summary:")
print(missing_summary[missing_summary["Missing Values"] > 0])

# -------------------------------
# 4. Summary Statistics
# -------------------------------
print("\nNumerical Summary:")
print(df.describe())

print("\nCategorical Summary:")
print(df.describe(include=['object']))

# -------------------------------
# 5. Target Variable Distribution (Churn)
# -------------------------------
plt.figure(figsize=(5,4))
sns.countplot(x='churn', data=df, palette='Set2')
plt.title("Churn Distribution")
plt.show()

print("\nChurn Percentage:")
print(df['churn'].value_counts(normalize=True) * 100)

# -------------------------------
# 6. Gender vs Churn
# -------------------------------
plt.figure(figsize=(6,4))
sns.countplot(x='gender', hue='churn', data=df, palette='Set2')
plt.title("Gender vs Churn")
plt.show()

# -------------------------------
# 7. Age Distribution by Churn
# -------------------------------
df['age'] = pd.to_numeric(df['age'], errors='coerce')  # Ensure numeric

plt.figure(figsize=(7,5))
sns.histplot(df[df['churn']==0]['age'].dropna(), color='blue', kde=True, label="Not Churned", alpha=0.5)
sns.histplot(df[df['churn']==1]['age'].dropna(), color='red', kde=True, label="Churned", alpha=0.5)
plt.title("Age Distribution by Churn")
plt.legend()
plt.show()

# -------------------------------
# 8. Occupation vs Churn (Top 10)
# -------------------------------
plt.figure(figsize=(8,5))
top_occ = df['occupation'].value_counts().head(10).index
sns.countplot(y='occupation', hue='churn', data=df[df['occupation'].isin(top_occ)], palette='Set2')
plt.title("Occupation vs Churn (Top 10)")
plt.show()

# -------------------------------
# 9. Correlation Heatmap (Numerical Features)
# -------------------------------
plt.figure(figsize=(12,8))
numeric_df = df.select_dtypes(include=[np.number])
corr = numeric_df.corr()
sns.heatmap(corr, cmap='coolwarm', center=0, annot=False)
plt.title("Correlation Heatmap")
plt.show()

# -------------------------------
# 10. Boxplot: Current Balance vs Churn
# -------------------------------
plt.figure(figsize=(6,5))
sns.boxplot(x='churn', y='current_balance', data=df, palette='Set2')
plt.title("Current Balance vs Churn")
plt.show()

# -------------------------------
# 11. Last Transaction Analysis
# -------------------------------
plt.figure(figsize=(7,4))
df['last_transaction'].hist(bins=50)
plt.title("Distribution of Last Transaction Dates")
plt.xlabel("Last Transaction Date")
plt.ylabel("Frequency")
plt.show()


print("\nMost Recent Transaction:", df['last_transaction'].max())
print("Oldest Transaction:", df['last_transaction'].min())


: 