### Metadata

Prospect ID: A unique ID with which the customer is identified.</br>
Lead Number: A lead number assigned to each lead procured.</br>
Lead Origin: The origin identifier with which the customer was identified to be a lead. Includes API, Landing Page Submission, etc.</br>
Lead Source: The source of the lead. Includes Google, Organic Search, Drift Chat, etc.</br>
Do Not Email: An indicator variable selected by the customer wherein they select whether of not they want to be emailed about the course or not.</br>
Do Not Call: An indicator variable selected by the customer wherein they select whether of not they want to be called about the course or not.</br>
Converted: The target variable. Indicates whether a lead has been successfully converted or not.</br>
TotalVisits: The total number of visits made by the customer on the website.</br>
Total Time Spent on Website: The total time spent by the customer on the website.</br>
Page Views Per Visit: Average number of pages on the website viewed during the visits.</br>
Last Activity: Last activity performed by the customer. Includes Email Opened, Olark Chat Conversation, etc.</br>
Country: The country of the customer.</br>
Specialization: The industry domain in which the customer works. Includes the level 'Select Specialization' which means the customer had not selected this option while filling the form.</br>
How did you hear about us: The source from which the customer heard about our company</br>
Search</br>
Magazine</br>
Newspaper Article</br>
Forums</br>
Newspaper</br>
Digital Advertisement</br>
Through Recommendations: Indicates whether the customer came in through recommendations.</br>
Product Interested: Product the customer is interested to know more</br>
Lead Quality: Indicates the quality of lead based on the data and intuition the the employee who has been assigned to the lead.</br>
Lead Profile: A lead level assigned to each customer based on their profile.</br>
Asymmetrique Activity Index</br>
Asymmetrique Profile Index</br>
Asymmetrique Activity Score</br>
Asymmetrique Profile Score</br>
a free copy of Mastering The CRM: Indicates whether the customer wants a free copy of 'Mastering the CRM' or not.</br>
Last Notable Activity: The last notable acitivity performed by the customer.</br>

In [1]:
#importing all necessary libraries

# Data Analysis
import pandas as pd
import numpy as np

# Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns

#----regular expressions----
import re

#----model and support imports----
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score, cross_val_predict
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, recall_score, precision_score, f1_score, roc_auc_score, ConfusionMatrixDisplay
from imblearn.under_sampling import RandomUnderSampler
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
from imblearn.pipeline import Pipeline as ImbPipeline
from imblearn.pipeline import Pipeline as ImbPipeline
from imblearn.over_sampling import SMOTE
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from scipy.stats import uniform, randint

# Feature Engineering
from category_encoders import OneHotEncoder
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from scipy.stats import pearsonr

# Deployment
from flask import Flask, render_template, request
import pickle

# Set the default style for plotting
sns.set(style="whitegrid")

import warnings
warnings.filterwarnings('ignore')

# Set visualisation styles
sns.set(style="whitegrid")

In [2]:
#Loading dataset
df = pd.read_csv('data/leads.csv')

# Display the first few rows of the dataset to get an overview
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9240 entries, 0 to 9239
Data columns (total 30 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Prospect ID                       9240 non-null   object 
 1   Lead Number                       9240 non-null   int64  
 2   Lead Origin                       9240 non-null   object 
 3   Lead Source                       9240 non-null   object 
 4   Do Not Email                      9240 non-null   object 
 5   Do Not Call                       9240 non-null   object 
 6   TotalVisits                       9103 non-null   float64
 7   Total Time Spent on Website       9240 non-null   int64  
 8   Page Views Per Visit              9103 non-null   float64
 9   Last Activity                     8254 non-null   object 
 10  Country                           8333 non-null   object 
 11  Industry                          7802 non-null   object 
 12  How di

In [3]:
df.describe()

Unnamed: 0,Lead Number,TotalVisits,Total Time Spent on Website,Page Views Per Visit,Asymmetrique Activity Score,Asymmetrique Profile Score,Converted
count,9240.0,9103.0,9240.0,9103.0,5022.0,5022.0,9240.0
mean,617188.435606,3.445238,487.698268,2.36282,14.306252,16.344883,0.38539
std,23405.995698,4.854853,548.021466,2.161418,1.386694,1.811395,0.486714
min,579533.0,0.0,0.0,0.0,7.0,11.0,0.0
25%,596484.5,1.0,12.0,1.0,14.0,15.0,0.0
50%,615479.0,3.0,248.0,2.0,14.0,16.0,0.0
75%,637387.25,5.0,936.0,3.0,15.0,18.0,1.0
max,660737.0,251.0,2272.0,55.0,18.0,20.0,1.0


In [4]:
missing_values = df.isnull().sum()
print("Remaining Missing Values:\n", missing_values)

Remaining Missing Values:
 Prospect ID                            0
Lead Number                            0
Lead Origin                            0
Lead Source                            0
Do Not Email                           0
Do Not Call                            0
TotalVisits                          137
Total Time Spent on Website            0
Page Views Per Visit                 137
Last Activity                        986
Country                              907
Industry                            1438
How did you hear about us           2207
Search                                 0
Magazine                               0
Newspaper Article                      0
Forums                                 0
Newspaper                              0
Digital Advertisement                  0
Through Recommendations                0
Product Interested                     0
Lead Quality                        4767
Lead Profile                        2709
Asymmetrique Activity Index   