<a href="https://colab.research.google.com/github/Rushil-K/Deep-Learning/blob/main/ANN/nmrk2627_ANN_DLM_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Deep Learning Project 1 : Artificial Neural Networks**

# **Executive Summary: ANN-Driven Predictive Analytics for Marketing Conversion**

## **Project Overview**

In today’s data-driven marketing landscape, **predictive analytics** plays a crucial role in optimizing customer acquisition strategies. This project leverages a **deep learning-based Artificial Neural Network (ANN)** model, trained on a **1-million-record marketing dataset**, to predict customer conversion probabilities. The interactive **Streamlit dashboard** ([View Here](https://deep-learning-y9mzjiqycyib63ewyfwkgg.streamlit.app/)) empowers marketing teams to dynamically fine-tune model hyperparameters, visualize real-time model performance, and extract actionable insights.

By integrating advanced machine learning techniques such as **SMOTE (Synthetic Minority Over-sampling Technique)** for class balancing, **SHAP-based feature importance analysis**, and **hyperparameter optimization**, the model enhances predictive accuracy while mitigating key challenges like **class imbalance, overfitting, and model generalization issues**.

---

## **Key Business Impact & Insights**

### **1️⃣ Conversion Rate Optimization (CRO)**

* The model **identifies and predicts customer conversion probabilities** based on critical behavioral and demographic features: **Age, Gender, Income, Purchases, Clicks, and Ad Spend**.  
* **SHAP feature importance analysis** helps in understanding the influence of each factor on conversion, enabling **data-driven marketing budget allocation**.

### **2️⃣ Addressing Class Imbalance for Improved Predictions**

* Initial model training revealed a **severe class imbalance**, leading to a high number of **False Negatives (missed conversions)**.  
* **Solution:** We applied **SMOTE** to synthetically oversample the minority class, ensuring the model effectively learns from both converted and non-converted users.  
* This significantly improved **recall for the converted class**, making the model more effective for conversion predictions.

### **3️⃣ AI-Driven Campaign Targeting & Personalization**

* High **precision and recall metrics** enable the identification of **high-intent customers**, improving audience segmentation.  
* Marketers can **personalize campaigns and optimize ad spend allocation**, leading to better **conversion rates and return on ad spend (ROAS)**.

---

## **Hyperparameters Used & Challenges Overcome**

### **📌 1\. Model Architecture & Activation Functions**

* **Hyperparameters:**

  * **Dense Layers:** 2 to 5 layers  
  * **Neurons per Layer:** 32, 64, 128, 256, 512, 1024 (2ⁿ where n \= 5,6,7,8,9,10)  
  * **Activation Functions:** ReLU, Sigmoid, Tanh, Softmax  
* **Problems Faced:**

  * The initial model suffered from **vanishing gradients** due to improper activation functions.  
  * Using **Sigmoid** in deeper layers led to slow convergence and poor generalization.  
* **Solution:**

  * **ReLU** was chosen for hidden layers to improve learning efficiency, and **Sigmoid** was retained in the output layer for binary classification.  
  * This improved model convergence and reduced training time.

### **📌 2\. Optimizer & Learning Rate Selection**

* **Hyperparameters:**

  * **Optimizers:** Adam (default), SGD, RMSprop  
  * **Learning Rate:** 0.01, 0.001, 0.0001 (default: 0.001)  
* **Problems Faced:**

  * **SGD led to slow convergence** and inconsistent updates, affecting model accuracy.  
  * **Adam performed best** but required careful learning rate tuning to prevent overfitting.  
* **Solution:**

  * **Adam optimizer with a learning rate of 0.001** provided the best balance between convergence speed and stability.

### **📌 3\. Backpropagation Processing & Batch Gradient Strategy**

* **Hyperparameters:**

  * **Batch Gradient Strategies:**  
    * **Batch Gradient Descent (100%)**  
    * **Mini-Batch Gradient Descent (25%, 30%)**  
    * **Stochastic Gradient Descent (20%)**  
* **Problems Faced:**

  * **Full batch gradient descent** was computationally expensive and slowed down training.  
  * **Stochastic gradient descent (SGD) introduced high variance**, causing unstable learning.  
* **Solution:**

  * **Mini-batch gradient descent (25% batch size)** was used to achieve an optimal trade-off between convergence stability and computational efficiency.

### **📌 4\. Dropout Regularization for Overfitting Prevention**

* **Hyperparameter:** Dropout Rate (0.1 \- 0.5, default: 0.3)

* **Problems Faced:**

  * Without dropout, the model **memorized** training data, leading to **poor generalization on test data**.  
* **Solution:**

  * Implementing **dropout at 30%** prevented overfitting while preserving useful feature interactions.

---

## **Performance Evaluation & Insights**

### **📊 1\. Accuracy & Loss Trends**

* The dashboard provides **real-time graphs** displaying accuracy and loss trends over multiple epochs.  
* The model showed **stable convergence** after **\~20 epochs**, reinforcing the effectiveness of hyperparameter tuning.

### **📊 2\. Confusion Matrix & Classification Report**

* **Pre-SMOTE Model Issue:**

  * Predicted almost all cases as **Not Converted** due to class imbalance.  
  * **True Positives (Converted) \= 0**, leading to poor recall.  
* **Post-SMOTE Model Improvement:**

  * Recall for the **Converted** class significantly improved.  
  * Balanced confusion matrix with **higher precision and recall scores**.

### **📊 3\. Feature Importance Analysis (SHAP Values)**

* The **SHAP-based analysis** provides insights into which features impact conversion probability the most.  
* **Top Features Influencing Conversion:**  
  * **Spent & Clicks:** High engagement increases conversion likelihood.  
  * **Purchases:** Previous purchase behavior is a strong predictor of future conversions.  
  * **Income & Age:** Certain income brackets and age groups show higher conversion probabilities.

---

## **Strategic Marketing Takeaways**

✅ **Optimized Customer Targeting:** The model identifies **high-converting demographics**, enabling more effective audience targeting.  
 ✅ **Budget Allocation Efficiency:** **Feature importance analysis** highlights which marketing levers **maximize ROI**.  
 ✅ **Personalized Marketing Strategies:** AI-driven insights allow brands to **tailor messaging based on conversion likelihood**.  
 ✅ **Scalability & Automation:** The dashboard enables **real-time model retraining**, ensuring marketing strategies evolve with **changing customer behaviors**.

---

## **Conclusion**

This project showcases the **power of AI-driven predictive modeling in marketing analytics**. The interactive **Streamlit dashboard** offers marketing professionals a **user-friendly interface to fine-tune ANN hyperparameters, analyze model performance, and extract data-driven insights for conversion optimization**.

🔗 **Live Dashboard:** [Click Here](https://deep-learning-y9mzjiqycyib63ewyfwkgg.streamlit.app/)  
 👥 **Contributors:** **Rushil Kohli & Navneet Mittal**

### Analysis

In [None]:
# Import necessary libraries
import os
import requests
import io
from io import StringIO
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

In [None]:
# Replace with your actual file ID
file_id = '1OPmMFUQmeZuaiYb0FQhwOMZfEbVrWKEK'

# Construct the URL for direct download (using export)
url = f'https://drive.google.com/uc?export=download&id={file_id}'

# Fetch the data using requests

response = requests.get(url)
response.raise_for_status()  # Raise an exception for bad responses

# Read the data into a pandas DataFrame using StringIO
# Specify encoding if needed, e.g., encoding='latin1' or encoding='utf-8'
nmrk2627_df = pd.read_csv(StringIO(response.text), encoding='utf-8')

# Display the head of the dataframe to verify data loading.
display(nmrk2627_df.head())

Unnamed: 0,CustomerID,Age,Gender,Income,Purchases,Clicks,Spent,Converted
0,1,41,Female,52618.0,26,67,2434.0,0
1,2,43,Male,53114.0,3,14,2937.0,0
2,3,43,Female,96145.0,4,78,2076.0,0
3,4,35,Female,92590.0,10,13,1437.0,1
4,5,23,Female,69262.0,14,62,1675.0,1


In [None]:
nmrk2627_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 8 columns):
 #   Column      Non-Null Count    Dtype  
---  ------      --------------    -----  
 0   CustomerID  1000000 non-null  int64  
 1   Age         1000000 non-null  int64  
 2   Gender      1000000 non-null  object 
 3   Income      1000000 non-null  float64
 4   Purchases   1000000 non-null  int64  
 5   Clicks      1000000 non-null  int64  
 6   Spent       1000000 non-null  float64
 7   Converted   1000000 non-null  int64  
dtypes: float64(2), int64(5), object(1)
memory usage: 61.0+ MB
