# 🔎📊🧩 **Exploratory Data Analysis (EDA)**

<img src="../assets/banner_eda.jpg" style="width:95%">

_Estimated run time: **2min**_

This notebook provides a comprehensive and structured exploration of the dataset to uncover patterns, trends, and insights from the dataset. The analysis is designed to guide feature engineering, preprocessing, and modeling decisions, ensuring a robust and interpretable machine learning pipeline.

To ensure **clarity**, **consistency**, and **ease of navigation**, the following intuitive indicators have been adopted to categorize and guide the reader through the various stages of the analysis:

- **⬜ General Steps**: Core tasks such as data loading, inspection, and preprocessing to set up for analysis.

- **📊 Analytical Procedures**: Statistical tests, visualizations, and exploratory techniques to uncover patterns and relationships.

- **💡 Key Insights**: Actionable findings and hypotheses derived to guide feature engineering and modeling decisions.

- **🧼 Data Cleaning**: Addressing missing values, outliers, and inconsistencies to ensure data integrity and reliability.

- **🛠️ Feature Engineering**: Creating, transforming, and selecting features to enhance model performance and interpretability while reflecting domain knowledge.

- **❌ Ineffective Attempts**: Feature engineering efforts that did not yield improvements but are worth noting for transparency or future reference.

- **⚠️ Alerts**: Important notes or warnings highlighting challenges, limitations, or implications for the analysis and pipeline design.

---
---
---
# 🎯 **1. Introduction** 

---
## └─ **1.1. Project Goals**

This project aims to **predict client subscriptions to term deposits at AI-Vive-Banking using client attributes and marketing campaign data. Accurate predictions will help optimize marketing strategies, allocate resources effectively, and improve customer engagement.** `TO UPDATE`

The EDA focuses on three key areas:

- Data Overview: Assess dataset structure, quality, missing values, and outliers to ensure reliability for modeling.

- Feature Analysis: Examine feature-target relationships, distributions, and correlations to identify predictive signals.

- Pattern Recognition: Discover trends, behavioral patterns, and client segments to guide targeted marketing and enhance predictive accuracy.

Insights from this EDA will inform feature engineering, preprocessing, and model selection, enabling robust, interpretable machine learning aligned with business goals.

---
## └─ **1.2. Preliminary Understanding**

---
---
---
# **⚙️ 2. Project Setup**

---
### └─ **2.1. Import general libraries**

In [None]:
import os
import joblib
import yaml
import requests
from pathlib import Path

---
### └─ **2.2. Import and configure data libraries**

In [None]:
import numpy as np
import pandas as pd

np.random.seed(42)  # Seed value for numpy.random for reproducibility

pd.set_option('max_colwidth', 40)  # Set the maximum width of each column to 40 characters for better readability
pd.set_option('display.max_columns', None)  # Display all columns in the output (instead of truncating them with "..." for wide DataFrames)
pd.set_option('display.float_format', lambda x: '%.3f' % x)  # Limit float precision for cleaner output

---
### └─ **2.3. Import and configure visualization libraries**

In [None]:
from matplotlib import rcParams
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno

# Use ggplot style for plots with a clean, professional look
plt.style.use('ggplot')  

# # Remove the spines (borders) of plots for a minimalist design
rcParams['axes.spines.right'] = False  # Remove the right spines (border) for a minimalist design
rcParams['axes.spines.top'] = False  # Remove the top spines (border) for a minimalist design
rcParams['figure.figsize'] = [12, 9]  # Set default figure size

%config InlineBackend.figure_format = 'retina'  # Enhance plot resolution for high-quality rendering on Retina displays

---
### └─ **2.4. Configure notebook interactivity**

In [None]:
import logging
import warnings
from IPython.core.interactiveshell import InteractiveShell

# Configure logging to display messages with level INFO and above, with a simple format
logging.basicConfig(
    level=logging.INFO,
    format="%(levelname)s: %(message)s"
)

# Hides warnings like deprecation notices or convergence warnings
warnings.filterwarnings('ignore')

# Configure IPython to display all expressions in a cell, not just the last one
# InteractiveShell.ast_node_interactivity = 'all'  # Postpone until after data download to avoid excessive outputs

# Disable Jedi autocompletion in IPython for faster tab completion
%config Completer.use_jedi = False  

---
### └─ **2.5. Configure relative imports**

In [None]:
import sys
from pathlib import Path

# Add project root to sys.path
project_root = Path().resolve().parent  # if running from folder with parent directory as project root
sys.path.append(str(project_root))

---
### └─ **2.6. Get configurations from `config.yaml`**

In [None]:
config_path = "../config.yaml"
with open(config_path, "r", encoding="utf-8") as file:
    config = yaml.safe_load(file)

DATA_FILE_PATH = config["data_file_path"]
IDENTIFIER_COLUMN = config["identifier_column"]
TARGET_COLUMN = config["target_column"]
RANDOM_STATE = config["random_state"]
TEST_SIZE = config["test_size"]

---
### └─ **2.7. Load dataaset into Pandas DataFrame**

In [None]:
RELATIVE_FILE_PATH = Path("../", DATA_FILE_PATH)

df = pd.read_csv(RELATIVE_FILE_PATH)
df.head()

---
---
---
# **🔎 3. Basic Exploration**

- Insights from the basic exploration are captured here; the original analysis is in `basic_exploration.ipynb`.

- That notebook runs generic, reusable workflows, so it can be executed automatically without edits for fast analysis.

- This notebook focuses on project-specific, in-depth analysis, while only summarizing key findings from the basic exploration.

- [➡️ Go to the Basic Exploration Notebook](basic_exploration.ipynb)

[Insert Findings]

---
---
---
# **🧼 4. Data Cleaning**

- Insights from the basic exploration are captured here; the original analysis is in `basic_exploration.ipynb`.

- That notebook runs generic, reusable workflows, so it can be executed automatically without edits for fast analysis.

- This notebook focuses on project-specific, in-depth analysis, while only summarizing key findings from the basic exploration.

In [None]:
from src.data_explorer import DataExplorer
from src.data_cleaner import DataCleaner

explorer = DataExplorer()
cleaner = DataCleaner()

In [None]:
explorer.perform_univariate_analysis(df=df, feature="shares")

---
---
## └─ **4.1. Apply snake_case**



In [None]:
df_cleaned = cleaner.convert_column_names_to_snake_case(df=df, show=True)