<a href="https://colab.research.google.com/github/Rushil-K/Deep-Learning/blob/main/ANN/nmrk2627_ANN_DLM_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Learning Project 1 : Artificial Neural Networks
Contributors:
- Navneet Mittal
- Rushil Kohli

# Dataset Description

## Source
The dataset was synthetically created using python.

## Overview  
The dataset used in this project is a **marketing dataset** containing information about customers, their demographics, online behavior, and conversion status. The dataset was preprocessed to create a cleaned version for training an **Artificial Neural Network (ANN) model** for binary classification.

## Dataset Summary  
- **Total Entries:** 5000  
- **Total Columns:** 8  
- **Missing Values:** None  

## Features and Data Types  

| Column Name  | Data Type | Description |
|-------------|-----------|-------------|
| `CustomerID` | int64 | Unique identifier for each customer |
| `Age`        | int64 | Age of the customer |
| `Gender`     | object | Gender of the customer (`Male` or `Female`) |
| `Income`     | int64 | Annual income of the customer (in monetary units) |
| `Purchases`  | int64 | Number of purchases made by the customer |
| `Clicks`     | int64 | Number of times the customer clicked on an advertisement |
| `Spent`      | int64 | Total amount spent by the customer (in monetary units) |
| `Converted`  | int64 | Target variable (`1` if the customer converted, `0` otherwise) |

## Key Insights  

1. **No Missing Values:**  
   - The dataset does not contain any missing values, ensuring data completeness.  

2. **Categorical Features:**  
   - The `Gender` column is the only categorical variable, with two unique values: `Male` and `Female`.  

3. **Target Variable (`Converted`):**  
   - This is a **binary classification problem**, where the goal is to predict whether a customer converts (`1`) or not (`0`).  
   - The dataset contains **two unique values** for `Converted`, indicating a balanced or imbalanced distribution should be checked during analysis.  

4. **Numerical Feature Distribution:**  
   - `Age` ranges from young adults to senior customers.  
   - `Income` and `Spent` have a wide range, which may require normalization or scaling.  
   - `Clicks` and `Purchases` vary significantly across users, potentially influencing conversion rates.  

## Preprocessing Steps Applied  
To prepare the dataset for model training, the following steps were performed:  

- **Handling Categorical Variables:** Converted `Gender` into numerical form using encoding.  
- **Feature Scaling:** Applied normalization techniques to standardize numerical features.  
- **Outlier Detection:** Identified and handled outliers in `Income`, `Spent`, and `Purchases`.  
- **Feature Selection:** Removed unnecessary columns such as `CustomerID` which does not contribute to model prediction.  

## Conclusion  
The preprocessed dataset was used to train an **Artificial Neural Network (ANN) model**, with **hyperparameter tuning and visualizations integrated into a Streamlit dashboard** for better interpretability. The dataset provides valuable insights into customer behavior and conversion likelihood.



# Applications of the ANN Dashboard in Marketing

## 1. Conversion Rate Optimization (CRO)
### Purpose:
- Identify the likelihood of a lead converting into a paying customer.
- Optimize marketing strategies to improve the conversion funnel.
- Minimize bounce rates by understanding drop-off points.

### How the Dashboard Helps:
- **ANN Model Prediction**: The model predicts conversion probability based on customer data.
- **Confusion Matrix**: Shows how well the ANN classifies converted vs. non-converted leads.
- **ROC Curve & AUC Score**: Measures how well the model distinguishes between converted and non-converted users.

---

## 2. Customer Segmentation & Targeting
### Purpose:
- Categorize customers based on behavior, demographics, and past purchases.
- Improve targeted ad campaigns by focusing on high-value segments.
- Reduce Customer Acquisition Cost (CAC) by focusing on high-converting segments.

### How the Dashboard Helps:
- **Feature Importance (RandomForest Surrogate Model)**: Identifies which customer attributes contribute most to conversion.
- **Pie Chart of Target Distribution**: Shows the proportion of converted vs. non-converted users.
- **Pairplot (Data Distribution Analysis)**: Reveals clustering patterns among different customer attributes.

---

## 3. Predictive Analytics for Lead Scoring
### Purpose:
- Assign a conversion likelihood score to each lead.
- Prioritize high-intent leads for direct engagement.
- Automate lead nurturing with personalized emails or ad retargeting.

### How the Dashboard Helps:
- **ANN Model Predictions**: Generates a probability score for conversion.
- **Live Accuracy & Loss Graphs**: Ensures the model is performing optimally for accurate lead scoring.

---

## 4. Advertising Budget Allocation & Optimization
### Purpose:
- Identify which marketing campaigns provide the best ROI.
- Allocate ad spend efficiently across different platforms.
- Reduce wasted spending on low-impact marketing channels.

### How the Dashboard Helps:
- **Feature Importance Analysis**: Highlights which marketing variables (ad clicks, campaign type, spending) impact conversion.
- **Confusion Matrix**: Helps marketers assess model reliability before making spending decisions.

---

## 5. Customer Lifetime Value (CLV) Prediction
### Purpose:
- Forecast the potential revenue a customer will generate over time.
- Improve retention strategies by focusing on high-value customers.
- Enhance personalized offers and loyalty programs.

### How the Dashboard Helps:
- **ANN Prediction on Key Features**: Analyzes spending behavior to predict customer longevity.
- **Pairplot Visualization**: Shows relationships between spending patterns and conversion likelihood.

---

## 6. Behavioral Retargeting & Recommendation Systems
### Purpose:
- Retarget potential customers based on their past interactions.
- Recommend personalized products or services to improve engagement.
- Improve conversion rates for returning visitors.

### How the Dashboard Helps:
- **Feature Importance Analysis**: Shows which past behaviors (e.g., time spent on a website, click-through rates) contribute to conversion.
- **Pie Chart of Conversions**: Identifies which segment of users need retargeting.

---

## 7. A/B Testing & Campaign Performance Analysis
### Purpose:
- Compare different versions of ad creatives, landing pages, or pricing strategies.
- Determine which approach leads to the highest engagement and conversion.
- Optimize email marketing and content strategies.

### How the Dashboard Helps:
- **Model Accuracy & Loss Graphs**: Ensures that the model captures campaign effectiveness.
- **Confusion Matrix**: Helps in assessing whether a campaign correctly identifies potential conversions.

---

## 8. Fraud Detection & Anomaly Detection
### Purpose:
- Identify fake clicks or bot traffic that skews marketing analytics.
- Prevent fraudulent transactions or suspicious behaviors.
- Improve ad quality by reducing invalid engagements.

### How the Dashboard Helps:
- **Pairplot & Data Distribution Analysis**: Detects unusual clusters that might indicate fraudulent activity.
- **Feature Importance Analysis**: Identifies key fraudulent behaviors by analyzing unexpected data patterns.


---

## Conclusion
The **ANN Dashboard** is an essential tool for data-driven marketing, offering powerful insights into **lead conversion, customer segmentation, budget allocation, and campaign performance**. By leveraging **predictive analytics, real-time visualization, and deep learning**, marketers can refine strategies, reduce costs, and improve **overall marketing efficiency**.


### Analysis

In [None]:
# Import necessary libraries
import os
import requests
import io
from io import StringIO
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

In [None]:
# Replace with your actual file ID
file_id = '1OPmMFUQmeZuaiYb0FQhwOMZfEbVrWKEK'

# Construct the URL for direct download (using export)
url = f'https://drive.google.com/uc?export=download&id={file_id}'

# Fetch the data using requests

response = requests.get(url)
response.raise_for_status()  # Raise an exception for bad responses

# Read the data into a pandas DataFrame using StringIO
# Specify encoding if needed, e.g., encoding='latin1' or encoding='utf-8'
nmrk2627_df = pd.read_csv(StringIO(response.text), encoding='utf-8')

# Display the head of the dataframe to verify data loading.
display(nmrk2627_df.head())

Unnamed: 0,CustomerID,Age,Gender,Income,Purchases,Clicks,Spent,Converted
0,1,41,Female,52618.0,26,67,2434.0,0
1,2,43,Male,53114.0,3,14,2937.0,0
2,3,43,Female,96145.0,4,78,2076.0,0
3,4,35,Female,92590.0,10,13,1437.0,1
4,5,23,Female,69262.0,14,62,1675.0,1


In [None]:
nmrk2627_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 8 columns):
 #   Column      Non-Null Count    Dtype  
---  ------      --------------    -----  
 0   CustomerID  1000000 non-null  int64  
 1   Age         1000000 non-null  int64  
 2   Gender      1000000 non-null  object 
 3   Income      1000000 non-null  float64
 4   Purchases   1000000 non-null  int64  
 5   Clicks      1000000 non-null  int64  
 6   Spent       1000000 non-null  float64
 7   Converted   1000000 non-null  int64  
dtypes: float64(2), int64(5), object(1)
memory usage: 61.0+ MB
