## Smart Lead Scoring Engine  

Can you identify the potential leads for a D2C startup?

### Problem Statement  

A D2C startup develops products using cutting edge technologies like Web 3.0. Over the past few months, the company has started multiple marketing campaigns offline and digital both. As a result, the users have started showing interest in the product on the website. These users with intent to buy product(s) are generally known as leads (Potential Customers).   

Leads are captured in 2 ways - Directly and Indirectly.   

Direct leads are captured via forms embedded in the website while indirect leads are captured based on certain activity of a user on the platform such as time spent on the website, number of user sessions, etc.  
Now, the marketing & sales team wants to identify the leads who are more likely to buy the product so that the sales team can manage their bandwidth efficiently by targeting these potential leads and increase the sales in a shorter span of time.  

Now, as a data scientist, your task at hand is to predict the propensity to buy a product based on the user's past activities and user level information.  

### About Dataset

You are provided with the leads data of last year containing both direct and indirect leads. Each lead provides information about their activity on the platform, signup information and campaign information. Based on his past activity on the platform, you need to build the predictive model to classify if the user would buy the product in the next 3 months or not.  

#### Data Dictionary  

You are provided with 3 files - train.csv, test.csv and sample_submission.csv  

**Training set**
train.csv contains the leads information of last 1 year from Jan 2021 to Dec 2021. And also the target variable indicating if the user will buy the product in next 3 months or not   
![image.png](attachment:image.png)

**Test set**
test.csv contains the leads information of the current year from Jan 2022 to March 2022. You need to predict if the lead will buy the product in next 3 months or not.
![image-2.png](attachment:image-2.png)  

#### Evaluation metric
The evaluation metric for this hackathon would be **F1 Score of Class 1.**

In [50]:
# Importing necessory libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 100)

## Reading and Underastanding data

In [51]:
# Read train and test data
df_train = pd.read_csv("train_wn75k28.csv")
df_test = pd.read_csv("test_Wf7sxXF.csv")

# Top 5 datapoints for train
df_train.head()

Unnamed: 0,id,created_at,campaign_var_1,campaign_var_2,products_purchased,signup_date,user_activity_var_1,user_activity_var_2,user_activity_var_3,user_activity_var_4,user_activity_var_5,user_activity_var_6,user_activity_var_7,user_activity_var_8,user_activity_var_9,user_activity_var_10,user_activity_var_11,user_activity_var_12,buy
0,1,2021-01-01,1,2,2.0,2020-09-24,0,0,0,0,0,0,1,1,0,0,0,0,0
1,2,2021-01-01,2,1,2.0,2020-09-19,1,0,1,0,0,0,1,0,0,0,0,0,0
2,3,2021-01-01,9,3,3.0,2021-08-11,1,0,0,0,0,0,0,0,0,0,0,0,0
3,4,2021-01-01,6,7,2.0,2017-10-04,0,0,0,0,0,0,0,0,0,0,0,0,0
4,5,2021-01-01,4,6,,2020-06-08,0,0,0,0,0,0,1,0,0,0,1,0,0


In [52]:
# Top 5 datapoints for train
df_test.head()

Unnamed: 0,id,created_at,campaign_var_1,campaign_var_2,products_purchased,signup_date,user_activity_var_1,user_activity_var_2,user_activity_var_3,user_activity_var_4,user_activity_var_5,user_activity_var_6,user_activity_var_7,user_activity_var_8,user_activity_var_9,user_activity_var_10,user_activity_var_11,user_activity_var_12
0,39162,2022-01-01,2,2,,2021-08-17,1,1,0,0,0,1,0,0,0,0,1,0
1,39163,2022-01-01,4,7,3.0,2020-05-21,1,0,0,0,0,0,0,1,0,0,0,0
2,39164,2022-01-01,8,7,,,0,0,0,0,1,1,0,0,0,0,0,0
3,39165,2022-01-01,9,8,2.0,2020-06-22,0,0,0,0,1,1,1,0,0,0,2,0
4,39166,2022-01-01,4,5,2.0,2021-03-10,1,0,0,0,0,0,0,0,0,0,0,0


In [53]:
# Lets check the shape of the datasets
print('The shape of train dataset is : ',df_train.shape)
print('The shape of test dataset is : ',df_test.shape)

The shape of train dataset is :  (39161, 19)
The shape of test dataset is :  (13184, 18)


In [54]:
# Check if duplicate records are in dataset
print('The duplicate datapoints in train: ', df_train.duplicated().sum())
print('The duplicate datapoints in test: ', df_test.duplicated().sum())

The duplicate datapoints in train:  0
The duplicate datapoints in test:  0


In [55]:
# Checking information on train
df_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39161 entries, 0 to 39160
Data columns (total 19 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   id                    39161 non-null  int64  
 1   created_at            39161 non-null  object 
 2   campaign_var_1        39161 non-null  int64  
 3   campaign_var_2        39161 non-null  int64  
 4   products_purchased    18250 non-null  float64
 5   signup_date           24048 non-null  object 
 6   user_activity_var_1   39161 non-null  int64  
 7   user_activity_var_2   39161 non-null  int64  
 8   user_activity_var_3   39161 non-null  int64  
 9   user_activity_var_4   39161 non-null  int64  
 10  user_activity_var_5   39161 non-null  int64  
 11  user_activity_var_6   39161 non-null  int64  
 12  user_activity_var_7   39161 non-null  int64  
 13  user_activity_var_8   39161 non-null  int64  
 14  user_activity_var_9   39161 non-null  int64  
 15  user_activity_var_1

In [56]:
# Checking information on test
df_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13184 entries, 0 to 13183
Data columns (total 18 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   id                    13184 non-null  int64  
 1   created_at            13184 non-null  object 
 2   campaign_var_1        13184 non-null  int64  
 3   campaign_var_2        13184 non-null  int64  
 4   products_purchased    5048 non-null   float64
 5   signup_date           6535 non-null   object 
 6   user_activity_var_1   13184 non-null  int64  
 7   user_activity_var_2   13184 non-null  int64  
 8   user_activity_var_3   13184 non-null  int64  
 9   user_activity_var_4   13184 non-null  int64  
 10  user_activity_var_5   13184 non-null  int64  
 11  user_activity_var_6   13184 non-null  int64  
 12  user_activity_var_7   13184 non-null  int64  
 13  user_activity_var_8   13184 non-null  int64  
 14  user_activity_var_9   13184 non-null  int64  
 15  user_activity_var_1

The date columns are object. So lets convert them to datetime datatype.

In [57]:
# Datatype conversion for train
df_train['created_at'] = pd.to_datetime(df_train['created_at']).astype('int64')
df_train['signup_date'] = pd.to_datetime(df_train['signup_date']).astype('int64')

# Datatype conversion for test 
df_test['created_at'] = pd.to_datetime(df_test['created_at']).astype('int64')
df_test['signup_date'] = pd.to_datetime(df_test['signup_date']).astype('int64')

In [58]:
# Lets check the datatype
print("Datatype verification for train:\n")
print(df_train.dtypes, '\n')
print("Datatype verification for test:\n")
print(df_train.dtypes)

Datatype verification for train:

id                        int64
created_at                int64
campaign_var_1            int64
campaign_var_2            int64
products_purchased      float64
signup_date               int64
user_activity_var_1       int64
user_activity_var_2       int64
user_activity_var_3       int64
user_activity_var_4       int64
user_activity_var_5       int64
user_activity_var_6       int64
user_activity_var_7       int64
user_activity_var_8       int64
user_activity_var_9       int64
user_activity_var_10      int64
user_activity_var_11      int64
user_activity_var_12      int64
buy                       int64
dtype: object 

Datatype verification for test:

id                        int64
created_at                int64
campaign_var_1            int64
campaign_var_2            int64
products_purchased      float64
signup_date               int64
user_activity_var_1       int64
user_activity_var_2       int64
user_activity_var_3       int64
user_activity_var_4  

In [59]:
# Statistical summary for train
df_train.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
id,39161.0,19581.0,11304.95,1.0,9791.0,19581.0,29371.0,39161.0
created_at,39161.0,1.628511e+18,8288287000000000.0,1.609459e+18,1.622333e+18,1.629677e+18,1.635552e+18,1.640909e+18
campaign_var_1,39161.0,6.523812,3.472944,1.0,4.0,6.0,9.0,16.0
campaign_var_2,39161.0,6.452746,2.614296,1.0,5.0,6.0,8.0,15.0
products_purchased,18250.0,2.154137,0.7798145,1.0,2.0,2.0,3.0,4.0
signup_date,39161.0,-2.575871e+18,5.269934e+18,-9.223372e+18,-9.223372e+18,1.572394e+18,1.615507e+18,1.648598e+18
user_activity_var_1,39161.0,0.4000919,0.5091943,0.0,0.0,0.0,1.0,3.0
user_activity_var_2,39161.0,0.006715865,0.08167578,0.0,0.0,0.0,0.0,1.0
user_activity_var_3,39161.0,0.1028319,0.3037431,0.0,0.0,0.0,0.0,1.0
user_activity_var_4,39161.0,0.01146549,0.1064628,0.0,0.0,0.0,0.0,1.0


In [60]:
# Statistical summary for test
df_test.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
id,13184.0,45753.5,3806.037,39162.0,42457.75,45753.5,49049.25,52345.0
created_at,13184.0,1.644737e+18,2198196000000000.0,1.640995e+18,1.64281e+18,1.64471e+18,1.646611e+18,1.648598e+18
campaign_var_1,13184.0,7.724439,3.599981,1.0,5.0,8.0,10.0,16.0
campaign_var_2,13184.0,7.943416,2.687832,1.0,6.0,8.0,10.0,15.0
products_purchased,5048.0,2.108954,0.7834826,1.0,2.0,2.0,3.0,4.0
signup_date,13184.0,-3.851932e+18,5.418332e+18,-9.223372e+18,-9.223372e+18,-9.223372e+18,1.619136e+18,1.648598e+18
user_activity_var_1,13184.0,0.3866808,0.5056535,0.0,0.0,0.0,1.0,3.0
user_activity_var_2,13184.0,0.004247573,0.06503731,0.0,0.0,0.0,0.0,1.0
user_activity_var_3,13184.0,0.103307,0.304371,0.0,0.0,0.0,0.0,1.0
user_activity_var_4,13184.0,0.007129854,0.0841401,0.0,0.0,0.0,0.0,1.0


## Exploratory Data Analysis
### Data Cleaning

In [61]:
# Missing values 
print("Missing values in train: ")
print(df_train.isnull().sum(),'\n')
print("Missing values in test: ")
print(df_test.isnull().sum())

Missing values in train: 
id                          0
created_at                  0
campaign_var_1              0
campaign_var_2              0
products_purchased      20911
signup_date                 0
user_activity_var_1         0
user_activity_var_2         0
user_activity_var_3         0
user_activity_var_4         0
user_activity_var_5         0
user_activity_var_6         0
user_activity_var_7         0
user_activity_var_8         0
user_activity_var_9         0
user_activity_var_10        0
user_activity_var_11        0
user_activity_var_12        0
buy                         0
dtype: int64 

Missing values in test: 
id                         0
created_at                 0
campaign_var_1             0
campaign_var_2             0
products_purchased      8136
signup_date                0
user_activity_var_1        0
user_activity_var_2        0
user_activity_var_3        0
user_activity_var_4        0
user_activity_var_5        0
user_activity_var_6        0
user_activity_va

**Two variables `products_purchased` & `signup_date` have missing values. Let's calculate the missing percentage.**

In [62]:
# Percentage missing values
print("Percent missing values in train: ")
print(100*df_train.isnull().sum()/len(df_train),'\n')
print("Percent missing values in test: ")
print(100*df_test.isnull().sum()/len(df_test))

Percent missing values in train: 
id                       0.000000
created_at               0.000000
campaign_var_1           0.000000
campaign_var_2           0.000000
products_purchased      53.397513
signup_date              0.000000
user_activity_var_1      0.000000
user_activity_var_2      0.000000
user_activity_var_3      0.000000
user_activity_var_4      0.000000
user_activity_var_5      0.000000
user_activity_var_6      0.000000
user_activity_var_7      0.000000
user_activity_var_8      0.000000
user_activity_var_9      0.000000
user_activity_var_10     0.000000
user_activity_var_11     0.000000
user_activity_var_12     0.000000
buy                      0.000000
dtype: float64 

Percent missing values in test: 
id                       0.000000
created_at               0.000000
campaign_var_1           0.000000
campaign_var_2           0.000000
products_purchased      61.711165
signup_date              0.000000
user_activity_var_1      0.000000
user_activity_var_2      0.00000

In [63]:
# Impute missing values in 'products_purchased' by '0' considering customer has not bought any product.
df_train['products_purchased'].fillna(0.0, inplace=True)

# Test missing
df_test['products_purchased'].fillna(0.0, inplace=True)

In [64]:
ID = df_test['id']

In [65]:
#Lets drop sign-up date variable
df_train.drop(['signup_date',"created_at",'id'],axis=1, inplace=True)
df_test.drop(['signup_date',"created_at",'id'],axis=1, inplace=True)


Lets create **Month** , **weekofyear**, **weekday** and **holiday**

In [66]:
# #Lets create new feature such as month, weekday, week of the year
# #df_train["weekofyear"] = df_train['created_at'].dt.weekofyear
# df_train["month"] = df_train['created_at'].dt.month
# # The day of the week with Monday=0, Sunday=6.
# df_train["weekday"] = df_train['created_at'].dt.weekday
# # Holiday column if day of week is 5 or 6
# df_train['holiday'] = df_train["weekday"].apply(lambda x: 'Yes' if x in [5,6] else 'No' )


# #Lets create new feature such as month, weekday, week of the year
# #df_test["weekofyear"] = df_test['created_at'].dt.weekofyear
# df_test["month"] = df_test['created_at'].dt.month
# # The day of the week with Monday=0, Sunday=6.
# df_test["weekday"] = df_test['created_at'].dt.weekday
# # Holiday column if day of week is 5 or 6
# df_test['holiday'] = df_test["weekday"].apply(lambda x: 'Yes' if x in [5,6] else 'No' )

In [67]:
# # Now drop Created at 
# df_train.drop('created_at',axis=1, inplace=True)
# df_test.drop('created_at',axis=1, inplace=True)

In [68]:
# top 5 observations
df_train.head()

Unnamed: 0,campaign_var_1,campaign_var_2,products_purchased,user_activity_var_1,user_activity_var_2,user_activity_var_3,user_activity_var_4,user_activity_var_5,user_activity_var_6,user_activity_var_7,user_activity_var_8,user_activity_var_9,user_activity_var_10,user_activity_var_11,user_activity_var_12,buy
0,1,2,2.0,0,0,0,0,0,0,1,1,0,0,0,0,0
1,2,1,2.0,1,0,1,0,0,0,1,0,0,0,0,0,0
2,9,3,3.0,1,0,0,0,0,0,0,0,0,0,0,0,0
3,6,7,2.0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,4,6,0.0,0,0,0,0,0,0,1,0,0,0,1,0,0


In [69]:
# # Train
# # Mapping months
# monthDict={1:'Jan', 2:'Feb', 3:'Mar', 4:'Apr', 5:'May', 6:'Jun', 7:'Jul', 8:'Aug', 9:'Sep', 10:'Oct', 11:'Nov', 12:'Dec'}
# df_train['month'] = df_train['month'].map({1:'Jan', 2:'Feb', 3:'Mar', 4:'Apr', 5:'May', 6:'Jun', 7:'Jul', 
#                                            8:'Aug', 9:'Sep', 10:'Oct', 11:'Nov', 12:'Dec'})

# # Mapping weekday
# df_train["weekday"] = df_train["weekday"].map({0:'Sun',1:"Mon",2:"Tue",3:"Wed",4:"Thu",5:"Fri",6:"Sat"})

# #Test
# df_test['month'] = df_test['month'].map({1:'Jan', 2:'Feb', 3:'Mar', 4:'Apr', 5:'May', 6:'Jun', 7:'Jul', 
#                                            8:'Aug', 9:'Sep', 10:'Oct', 11:'Nov', 12:'Dec'})

# df_test["weekday"] = df_test["weekday"].map({0:'Sun',1:"Mon",2:"Tue",3:"Wed",4:"Thu",5:"Fri",6:"Sat"})

In [70]:
# list1 = df_train.columns.to_list()

In [71]:
# # Lets check value counts for all variables
# for col in list1:
#     print(df_train[col].value_counts(normalize=True))

In [72]:
# # Create list of categorical variable list.
# catvars = ['campaign_var_1',"campaign_var_2",'products_purchased','user_activity_var_1','user_activity_var_2',
#            'user_activity_var_3','user_activity_var_4','user_activity_var_5','user_activity_var_6','user_activity_var_7',
#           'user_activity_var_8','user_activity_var_9','user_activity_var_10','user_activity_var_11','user_activity_var_12',
#           'month','weekday','holiday']

# #Box plot 
# fig = plt.figure(figsize=(20,40))
# i = 1
# for vars in catvars:
#     plt.subplot(10,3,i)
#     sns.countplot(data = df_train, x = vars)
#     i = i + 1

# fig.tight_layout()
# plt.show()

In [73]:
# # Statistical varience in the dataset is very low, we are dropping theses variables
# drop_cols = ['user_activity_var_12','user_activity_var_10','user_activity_var_9','user_activity_var_4','user_activity_var_2']

# # train drop
# df_train.drop(drop_cols, axis=1, inplace=True)

# # test drop
# df_test.drop(drop_cols, axis=1, inplace=True)

In [74]:
# # Create list of categorical variable list.
# numvars = ['campaign_var_1','campaign_var_2','user_activity_var_1', 'user_activity_var_3', 'user_activity_var_5', 'user_activity_var_6', 'user_activity_var_7',
#            'user_activity_var_8', 'user_activity_var_11']

# #Box plot 
# fig = plt.figure(figsize=(20,40))
# i = 1
# for vars in numvars:
#     plt.subplot(10,2,i)
#     sns.boxplot(data = df_train, x = vars)
#     i = i + 1

# fig.tight_layout()
# plt.show()

In [75]:
# catvars = df_train.columns.to_list()
# catvars

In [76]:
# Also looking at the value counts and graphs above, few categories have very few values, so we would combine them with the other category

### Bivariate Analysis

In [77]:
# sns.countplot(df_train['campaign_var_1'],hue=df_train['buy'])

In [78]:
# #correlations of numeric values
# plt.figure(figsize=(12,8))

# # heatmap
# sns.heatmap(df_train.corr(), cmap="coolwarm", annot=True)

# # Adding and formatting title
# plt.title("Heatmap for Numerical Variables ", fontdict={'fontsize': 15, 'fontweight' : 5, 'color' : 'Black'})
# plt.show()

## Model Building

In [79]:
df_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39161 entries, 0 to 39160
Data columns (total 16 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   campaign_var_1        39161 non-null  int64  
 1   campaign_var_2        39161 non-null  int64  
 2   products_purchased    39161 non-null  float64
 3   user_activity_var_1   39161 non-null  int64  
 4   user_activity_var_2   39161 non-null  int64  
 5   user_activity_var_3   39161 non-null  int64  
 6   user_activity_var_4   39161 non-null  int64  
 7   user_activity_var_5   39161 non-null  int64  
 8   user_activity_var_6   39161 non-null  int64  
 9   user_activity_var_7   39161 non-null  int64  
 10  user_activity_var_8   39161 non-null  int64  
 11  user_activity_var_9   39161 non-null  int64  
 12  user_activity_var_10  39161 non-null  int64  
 13  user_activity_var_11  39161 non-null  int64  
 14  user_activity_var_12  39161 non-null  int64  
 15  buy                

In [80]:
# cat_cols= df_train.select_dtypes(include=['object']).columns
# cat_cols

In [81]:
# # Lets print the value counts
# for col in cat_cols:
#     print(df_train[col].value_counts())
#     print('='*50)

In [82]:
# # Mapping for yes and no
# df_train['holiday'] = df_train['holiday'].apply(lambda x : 1 if x=='Yes' else 0)
# df_test['holiday'] = df_test['holiday'].apply(lambda x : 1 if x=='Yes' else 0)

In [83]:
# # Creating dummy variables for the categorical variables
# month_df_train =  pd.get_dummies(df_train['month'], drop_first=True)
# week_df_train =  pd.get_dummies(df_train['weekday'], drop_first=True)

# month_df_test =  pd.get_dummies(df_test['month'], drop_first=True)
# week_df_test =  pd.get_dummies(df_test['weekday'], drop_first=True)

In [84]:
# # Concatenating the dummy_vars dataframe to original dataframe
# df_train = pd.concat([df_train,month_df_train,week_df_train],axis=1)
# df_train.head()

In [85]:
# # Concatenating the dummy_vars dataframe to original test dataframe
# df_test = pd.concat([df_test,month_df_test,week_df_test],axis=1)
# df_test.head()

In [86]:
# # Lets drop the original columns from which dummies were created
# df_train.drop( [ 'month', 'weekday'],axis=1, inplace=True)
# df_test.drop( [ 'month', 'weekday'],axis=1, inplace=True)

#### **Test-Train Split**

In [87]:
#Load required library
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier,RandomForestClassifier
from sklearn.feature_selection import RFE
from statsmodels.stats.outliers_influence import variance_inflation_factor

from sklearn import metrics

import statsmodels.api as sm

In [88]:
# Putting feature variable to X
X = df_train.drop('buy', axis = 1)
X.head()

Unnamed: 0,campaign_var_1,campaign_var_2,products_purchased,user_activity_var_1,user_activity_var_2,user_activity_var_3,user_activity_var_4,user_activity_var_5,user_activity_var_6,user_activity_var_7,user_activity_var_8,user_activity_var_9,user_activity_var_10,user_activity_var_11,user_activity_var_12
0,1,2,2.0,0,0,0,0,0,0,1,1,0,0,0,0
1,2,1,2.0,1,0,1,0,0,0,1,0,0,0,0,0
2,9,3,3.0,1,0,0,0,0,0,0,0,0,0,0,0
3,6,7,2.0,0,0,0,0,0,0,0,0,0,0,0,0
4,4,6,0.0,0,0,0,0,0,0,1,0,0,0,1,0


In [89]:
# Putting response variable to y
y = df_train['buy']

y.head()

0    0
1    0
2    0
3    0
4    0
Name: buy, dtype: int64

In [90]:
# Splitting the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=1)

**SMOTE for class imbalance**  
As we have checked the imbalance in data. So would take care of data imbalance here.

In [91]:
# # Perform oversampling with traing data and pass both X_train and y_train to SMOTE
# from imblearn.over_sampling import SMOTE
# smote = SMOTE(random_state=42)
# X_train_smote, y_train_smote = smote.fit_resample(X_train,y_train)

In [92]:
# # Check the shape after Oversampling
# print('Shape of train data after oversampling:{}'.format(X_train_smote.shape))
# print('Value count of training target variable:\n{}'.format(y_train_smote.value_counts()))

## Logistic Regression

In [93]:
from sklearn.linear_model import LogisticRegression

In [94]:
log = LogisticRegression(n_jobs=-1,random_state=21)

In [95]:
log.fit(X_train,y_train)

y_train_pred = log.predict(X_train)

y_test_pred = log.predict(X_test)

In [96]:
from sklearn.metrics import f1_score

In [97]:
print(' Train F1 score:',f1_score(y_train, y_train_pred))
print(' Test F1 score:',f1_score(y_test, y_test_pred))

 Train F1 score: 0.650352550808793
 Test F1 score: 0.662379421221865


In [101]:
df_test['id'] = ID

In [102]:
# Create csv file for Submission with two column
df_test[['id','buy']].to_csv(r'C:\Users\Admin\Desktop\Data Scienc 2022\05_Github_Desktop\Portfolio_DS\Smart Lead Scoring Engine_hackathon\AV_Submission.csv',index=False)