<a href="https://colab.research.google.com/github/abhinav4201/Classification-Email-Campaign-Effectiveness-Prediction/blob/main/Classification_Email_Campaign_Effectiveness_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**<font color='#FF3206'>Email Campaign Effectiveness Prediction**

---

Email marketing combines marketing expertise with creative copy. In its most basic form, it takes the form of an email sent to a list of customers that typically includes a sales presentation and a "call to action." This could be as simple as encouraging the customer to click on a web link embedded in the e-mail.

You can also use e-mail when you don’t have anything specific to market, as a mechanism to maintain consumer engagement, strengthen brand perception and add credibility to your business. In fact, even in the Web 2.0 world of
blogs, social networks and RSS feeds, e-mail newsletters are still incredibly popular, and offer a very effective way to get your brand out in front of your list of prospects on a regular basis.

An effective email campaign is the one where a buisness can engage their customer with continous to and fro communication or where it can target specific groups with tailored e-mail offerings. This effectiveness can depend on several points, for example content of email, time of email sent, how often it is send, how many links or attachment is attached in email, word count of the email, how relevant is your target.

Most of the small to medium business owners are making effective use of Gmail-based Email marketing Strategies for offline targeting of converting their prospective customers into leads so that they stay with them in Business.

Most often if email effectiveness remained untreated, they are fall under spam category, which is an unsolicited and unwanted junk email sent out in bulk to an indiscriminate recipient list.

<center><img src="https://drive.google.com/uc?id=1GdCoClrJknAn6jQnIMHp_P0654BBaSjj" width = "70%" height="260vh"/> </center>

<center><h2>“A small list that wants exactly what you're offering is better than a bigger list that isn't committed.”</h2></center>
<p align=right><b>~ Ramsay Leimenstoll</b>, Investment Advisor and Financial Planner · Bell Investment Advisors</p><br>

## **Problem Statement**

* **The primary goal is to build a machine learning model that will classify mail and to check whether mail is ignored, read or acknowledged by the reader in order to stay filtered from being spam.**

## **Importing Libraries**

In [6]:
#data visualization libraries(matplotlib,seaborn)
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style("ticks")
sns.set_context("poster");

# Importing numpy, pandas and tensorflow
import pandas as pd
import numpy as np
import tensorflow as tf

#VIF
from statsmodels.stats.outliers_influence import variance_inflation_factor

#Modelling
#Train-Test Split
from sklearn.model_selection import train_test_split
#Grid Search for Hyperparameter Tuning
from sklearn.model_selection import GridSearchCV

#Metrics
from sklearn import metrics
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, roc_auc_score, f1_score, recall_score,roc_curve, classification_report

# The following lines adjust the granularity of reporting. 
pd.options.display.max_rows = 50
pd.options.display.float_format = "{:.3f}".format

# Importing warnings library. The warnings module handles warnings in Python.
import warnings
warnings.filterwarnings('ignore')

## **Loading Data**

---

In order to proceed, we have to bring in data into the playground, as in machine learning model data is the most important part of analysis.

In [1]:
#here google drive is attached to colab so that files can be accessed easily
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
#creating variable for path of data
path = '/content/drive/MyDrive/almabetter/Supervised-ML-Classification/Classification-Email Campaign Effectiveness Prediction/Data/data_email_campaign.csv'

In [4]:
data = pd.read_csv(path)

## **Studying Data (Preprocessing Data)**

In [5]:
#displaying data
data.head()

Unnamed: 0,Email_ID,Email_Type,Subject_Hotness_Score,Email_Source_Type,Customer_Location,Email_Campaign_Type,Total_Past_Communications,Time_Email_sent_Category,Word_Count,Total_Links,Total_Images,Email_Status
0,EMA00081000034500,1,2.2,2,E,2,33.0,1,440,8.0,0.0,0
1,EMA00081000045360,2,2.1,1,,2,15.0,2,504,5.0,0.0,0
2,EMA00081000066290,2,0.1,1,B,3,36.0,2,962,5.0,0.0,1
3,EMA00081000076560,1,3.0,2,E,2,25.0,2,610,16.0,0.0,0
4,EMA00081000109720,1,0.0,2,C,3,18.0,2,947,4.0,0.0,0


In [13]:
#info about data
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 68353 entries, 0 to 68352
Data columns (total 12 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Email_ID                   68353 non-null  object 
 1   Email_Type                 68353 non-null  int64  
 2   Subject_Hotness_Score      68353 non-null  float64
 3   Email_Source_Type          68353 non-null  int64  
 4   Customer_Location          56758 non-null  object 
 5   Email_Campaign_Type        68353 non-null  int64  
 6   Total_Past_Communications  61528 non-null  float64
 7   Time_Email_sent_Category   68353 non-null  int64  
 8   Word_Count                 68353 non-null  int64  
 9   Total_Links                66152 non-null  float64
 10  Total_Images               66676 non-null  float64
 11  Email_Status               68353 non-null  int64  
dtypes: float64(4), int64(6), object(2)
memory usage: 6.3+ MB


In [10]:
#counting null values
data.isnull().sum()

Email_ID                         0
Email_Type                       0
Subject_Hotness_Score            0
Email_Source_Type                0
Customer_Location            11595
Email_Campaign_Type              0
Total_Past_Communications     6825
Time_Email_sent_Category         0
Word_Count                       0
Total_Links                   2201
Total_Images                  1677
Email_Status                     0
dtype: int64

In [14]:
#finding feature and value count of data
print(f'Shape of email data is : {data.shape}')

Shape of email data is : (68353, 12)


### **Describing Features**

---
> 


* Dataset comprises of **68353** values and **12** different **features**. Some of the features **Customer_Location**, **Total_Past_Communications**, **Total_Links**, **Total_Images** have null values which should be treated for better model prediction.

**Attribute Information ▶**

---
* **Email_Id** - Email id information
* **Email_Type** - Type of email
* **Subject_Hotness_Score** - Email's subject's score
* **Email_Source_Type** - Source of the email 
* **Customer_Location** - Contains demographical data of the customer, the location where the customer resides.
* **Email_Campaign_Type** - The campaign type of the email.
* **Total_Past_Communications** - Count of total previous mails from the same source, the number of communications had.
* **Time_Email_sent_Category** - Time of the day when the email was sent, either morning, evening and night time
* **Word_Count** - Total count of word in each email
* **Total_links** - Total number of links in the email
* **Total_Images** - Total Number of images in the email
* **Email_Status** - Our target variable which contains whether the mail was ignored, read, acknowledged by the reader


In [15]:
data.columns

Index(['Email_ID', 'Email_Type', 'Subject_Hotness_Score', 'Email_Source_Type',
       'Customer_Location', 'Email_Campaign_Type', 'Total_Past_Communications',
       'Time_Email_sent_Category', 'Word_Count', 'Total_Links', 'Total_Images',
       'Email_Status'],
      dtype='object')

* Out of 12 features, there are only 5 numerical features **'Subject_Hotness_Score'**, **'Total_Past_Communications'**, **'Word_Count'**, **'Total_Links'**, **'Total_Images'** and rest others are catergorial features. 

In [16]:
# Missing Value Count Function
def showMissing():
    missing = data.columns[data.isnull().any()].tolist()
    return missing

missingVal = pd.DataFrame()
missingVal['Missing Data Count'] = data[showMissing()].isnull().sum().sort_values(ascending = False)
missingVal['Missing Data Percentage'] = data[showMissing()].isnull().sum().sort_values(ascending = False)/len(data)*100

missingVal

Unnamed: 0,Missing Data Count,Missing Data Percentage
Customer_Location,11595,16.963
Total_Past_Communications,6825,9.985
Total_Links,2201,3.22
Total_Images,1677,2.453
