## Marketing Strategy - Personalised Offer

### Problem Statement
<b>Understand the user behaviours regarding the offers</b>

Description:
The data is collected through a survey to understand the driver's behavior regarding their preference for discount/offer for dining/takeaway. The researcher collected these data by providing different scenarios to various users.

Example Scenario: You are driving from IIT Madras to Chennai Airport along with your family and you get an offer (10 percent discount on the bill) from the famous Chinese restaurant in Guindy. Will you avail of the offer while traveling?

Along with the user response, some basic information about the users is collected.

### Column Descriptions:

| Column Name                        | Description                                          |
|------------------------------------|------------------------------------------------------|
| Offer Expiration                   | Number of days the offer is valid.                   |
| Income Range                       | Income range.                                        |
| Visits to Cold Drinks               | Number of times visited cold drinks.                 |
| Travel > 15 Mins for Offer         | Traveled more than 15 minutes to avail an offer?      |
| Rest. Spend < $20                  | Number of times spent less than $20 in a restaurant. |
| Marital Status                     | Marital status.                                     |
| Restaurant Type                    | Type of restaurant.                                 |
| Age                                | Age.                                                |
| Pref. Western Over Chinese          | Prefer Western over Chinese?                         |
| Travel > 25 Mins for Offer         | Traveled more than 25 minutes for an offer?          |
| Travel > 5 Mins for Offer          | Traveled more than 5 minutes for an offer?           |
| Visits to Bars                      | Number of times visited bars.                       |
| Gender                             | Gender.                                             |
| Type of Vehicle Used                | Type of vehicle you use.                             |
| Rest. Same Direction as House       | Coupon offered restaurant in the same direction as your house? |
| Cooks Regularly                     | Cook regularly?                                     |
| Customer Type                      | Preferred company for dining.                        |
| Qualification                      | Qualification.                                      |
| Is Foodie                           | Are you a foodie?                                   |
| Take-Aways                         | Number of times opted for take-away.                 |
| Job/Job Industry                    | Type of industry you work in.                        |
| Rest. Opposite Direction to House   | Coupon offered restaurant opposite direction to your house? |
| Has Children                       | Do you have children?                               |
| Visit Rest. with Rating (Avg)      | Average rating of the restaurant which gave an offer.|
| Temperature                        | Current temperature.                                |
| Rest. Spend > $20                  | Number of times spent greater than $20 in a restaurant. |
| Travel Time                         | Travel time for the restaurant which gave an offer.  |
| Climate                             | Current climate.                                    |
| Drop Location                       | Where are you heading to?                            |
| Pref. Home Food                     | Prefer home food?                                   |


**Target Variable: Offer Accepted**
   - Did you accept the offer?



In [26]:
# importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from plotly.subplots import make_subplots
import plotly.graph_objects as go

# from sklearn.impute import SimpleImputer
from sklearn.impute import KNNImputer
# from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import OrdinalEncoder
# from sklearn.preprocessing import LabelEncoder
# from sklearn.preprocessing import StandardScaler
# from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import RobustScaler

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
# from sklearn.ensemble import RandomForestClassifier
# from sklearn.model_selection import RandomizedSearchCV

from sklearn.metrics import f1_score
from sklearn.metrics import classification_report

In [27]:
data = pd.read_csv('data.csv')

### Data Exploration

In [28]:
data.head()

Unnamed: 0,offer expiration,income_range,no_visited_Cold drinks,travelled_more_than_15mins_for_offer,Restaur_spend_less_than20,Marital Status,restaurant type,age,Prefer western over chinese,travelled_more_than_25mins_for_offer,...,restuarant_opposite_direction_house,has Children,visit restaurant with rating (avg),temperature,Restaur_spend_greater_than20,Travel Time,Climate,drop location,Prefer home food,Offer Accepted
0,2days,₹100000 or More,4~8,1,less1,Married partner,4 star restaurant,36,0,0,...,0,0,4,67,less1,22,Spring,Location B,0,No
1,2days,₹87500 - ₹99999,4~8,0,4~8,Married partner,Take-away restaurant,50plus,0,0,...,0,1,3,89,1~3,18,Summer,Location B,0,Yes
2,2days,₹87500 - ₹99999,less1,1,1~3,Single,Cold drinks,26,1,0,...,1,1,4,67,less1,7,Winter,Location A,1,Yes
3,10hours,₹37500 - ₹49999,less1,0,1~3,Single,Take-away restaurant,46,1,0,...,0,1,3,89,1~3,7,Summer,Location C,0,No
4,2days,₹100000 or More,never,1,1~3,Single,4 star restaurant,21,0,1,...,1,0,3,40,less1,7,Summer,Location C,0,No


In [29]:
data.dtypes

offer expiration                        object
income_range                            object
no_visited_Cold drinks                  object
travelled_more_than_15mins_for_offer     int64
Restaur_spend_less_than20               object
Marital Status                          object
restaurant type                         object
age                                     object
Prefer western over chinese              int64
travelled_more_than_25mins_for_offer     int64
travelled_more_than_5mins_for_offer      int64
no_visited_bars                         object
gender                                  object
car                                     object
restuarant_same_direction_house          int64
Cooks regularly                          int64
Customer type                           object
Qualification                           object
is foodie                                int64
no_Take-aways                           object
Job/Job Industry                        object
restuarant_op

In [30]:
data.shape

(12379, 31)

In [31]:
data.isnull().sum()

offer expiration                            0
income_range                                0
no_visited_Cold drinks                    198
travelled_more_than_15mins_for_offer        0
Restaur_spend_less_than20                 121
Marital Status                              0
restaurant type                             0
age                                         0
Prefer western over chinese                 0
travelled_more_than_25mins_for_offer        0
travelled_more_than_5mins_for_offer         0
no_visited_bars                            93
gender                                      0
car                                     12268
restuarant_same_direction_house             0
Cooks regularly                             0
Customer type                               0
Qualification                               0
is foodie                                   0
no_Take-aways                             144
Job/Job Industry                            0
restuarant_opposite_direction_hous

In [32]:
data['car'].value_counts()

car
Car that is too old to install Onstar :D    24
Scooter and motorcycle                      23
do not drive                                22
Mazda5                                      21
crossover                                   21
Name: count, dtype: int64

Column car has 99.1% missing values and the remaining values doesn't give any usefull insight. Hence, we can drop this column.

In [33]:
data = data.drop(['car'], axis=1)

In [34]:
columns= np.array(data.columns)
for i in columns:
  print(i,':', data[i].unique(),'\n')

offer expiration : ['2days' '10hours'] 

income_range : ['₹100000 or More' '₹87500 - ₹99999' '₹37500 - ₹49999' 'Less than ₹12500'
 '₹50000 - ₹62499' '₹12500 - ₹24999' '₹75000 - ₹87499' '₹25000 - ₹37499'
 '₹62500 - ₹74999'] 

no_visited_Cold drinks : ['4~8' 'less1' 'never' '1~3' 'gt8' nan] 

travelled_more_than_15mins_for_offer : [1 0] 

Restaur_spend_less_than20 : ['less1' '4~8' '1~3' 'gt8' 'never' nan] 

Marital Status : ['Married partner' 'Single' 'Divorced' 'Unmarried partner' 'Widowed'] 

restaurant type : ['4 star restaurant' 'Take-away restaurant' 'Cold drinks'
 'Restaurant with pub' '2 star restaurant'] 

age : ['36' '50plus' '26' '46' '21' 'below21' '41' '31'] 

Prefer western over chinese : [0 1] 

travelled_more_than_25mins_for_offer : [0 1] 

travelled_more_than_5mins_for_offer : [1] 

no_visited_bars : ['less1' 'never' '1~3' nan '4~8' 'gt8'] 

gender : ['Female' 'Male'] 

restuarant_same_direction_house : [0 1] 

Cooks regularly : [1 0] 

Customer type : ['Individual' 'With

The values were expected to be limited as the data is collected through a survey. Hence, let's check the distribution of unique values in each column.

In [35]:
# number of columns in the grid
num_columns = 3

# number of rows needed
num_rows = (len(data.columns) + num_columns - 1) // num_columns

fig = make_subplots(rows=num_rows, cols=num_columns, subplot_titles=data.columns)

# Loop through each column in the DataFrame
for i, column in enumerate(data.columns):
    row, col = divmod(i, num_columns)
   
    # Plot the distribution of the column
    fig.add_trace(go.Bar(x=data[column].value_counts().index, y=data[column].value_counts().values),
                    row=row + 1, col=col + 1)

# Update layout for better readability
fig.update_layout(height=num_rows * 400, width=400 * num_columns,
                  title_text="Distribution of Features in Train Data",
                  showlegend=False)

# Show the plot
fig.show()


Changes as per observation:
- Age can be grouped into the following category: below 21, 21-25, 26-30, 31-35, 36-40, 41-45, 46-50, above 50
- A column for time taken to travel to the restaurant can be created with the following category: less than 5, 5-15, 15-25, more than 25. This can be done by using travel time column and the redudant column can be dropped which are travel > 5 mins for offer, travel > 25 mins for offer and travel > 15 mins for offer.
- Temperature can be grouped into the following category: below 40, 40-60, 60-80, above 80

### Data Preparation

In [36]:
train= data.copy()

In [37]:
# defining columns
Ordinalless=['Restaur_spend_greater_than20','no_Take-aways','no_visited_bars','no_visited_Cold drinks', 'Restaur_spend_less_than20']
OrdinalColumns=['Travel Time','temperature','offer expiration']
Onehotcolumns=['Customer type','restaurant type','Marital Status','Job/Job Industry','gender', 'drop location', 'Climate']

In [38]:
for i in Ordinalless:
  print(train[i].unique())

['less1' '1~3' 'never' 'gt8' nan '4~8']
['1~3' 'gt8' '4~8' 'less1' nan 'never']
['less1' 'never' '1~3' nan '4~8' 'gt8']
['4~8' 'less1' 'never' '1~3' 'gt8' nan]
['less1' '4~8' '1~3' 'gt8' 'never' nan]


In [39]:
train, test = train_test_split(train, test_size=0.2, random_state=42)
train_y= train['Offer Accepted']
train_x= train.drop('Offer Accepted', axis = 1)

In [40]:
# preprocessing all columns using Column Transformer
transformer= ColumnTransformer(transformers=[('ord_pipe',OrdinalEncoder(handle_unknown = 'use_encoded_value', unknown_value= np.nan),OrdinalColumns),
                                             ('ord_pipe2',OrdinalEncoder(categories=[['never','less1','1~3','4~8','gt8']] * len(Ordinalless), handle_unknown = 'use_encoded_value', unknown_value= np.nan) ,Ordinalless),
                                             ('income_pipe',OrdinalEncoder(categories=[['Less than ₹12500','₹12500 - ₹24999','₹25000 - ₹37499',
                                                                                        '₹37500 - ₹49999','₹50000 - ₹62499','₹62500 - ₹74999',
                                                                                        '₹75000 - ₹87499','₹87500 - ₹99999','₹100000 or More']]),['income_range']),
                                             ('onehot_pipe',OneHotEncoder(sparse_output=False, handle_unknown='ignore'),Onehotcolumns),
                                             ('age_pipe', OrdinalEncoder(categories=[['below21','21','26','31','36','41','46','50plus']], handle_unknown = 'use_encoded_value', unknown_value= np.nan),['age']),
                                             ('qualification', OrdinalEncoder(categories=[['Some High School', 'High School Graduate','Some college - no degree','Associates degree','Bachelors degree','Graduate degree (Masters or Doctorate)']]
                                                                              ,handle_unknown = 'use_encoded_value', unknown_value= np.nan),['Qualification'])
                                            ],remainder='passthrough')


In [41]:
# combining travelled columns into one
train_x['travel']= train_x['travelled_more_than_5mins_for_offer']+train_x['travelled_more_than_15mins_for_offer']+train_x['travelled_more_than_25mins_for_offer']
train_x= train_x.drop(['travelled_more_than_5mins_for_offer','travelled_more_than_15mins_for_offer','travelled_more_than_25mins_for_offer'], axis = 1)

In [42]:
# transforming and fitting train
train_x = transformer.fit_transform(train_x)
knn = KNNImputer(n_neighbors=3)
train_x = knn.fit_transform(train_x)
rs = RobustScaler()
train_x = rs.fit_transform(train_x)

In [43]:
lr = LogisticRegression( max_iter= 1000, random_state= 42)
lr.fit(train_x, train_y)

In [44]:
test['travel']= test['travelled_more_than_5mins_for_offer']+test['travelled_more_than_15mins_for_offer']+test['travelled_more_than_25mins_for_offer']
test= test.drop(['travelled_more_than_5mins_for_offer','travelled_more_than_15mins_for_offer','travelled_more_than_25mins_for_offer'], axis = 1)

In [45]:
test_x= test.drop('Offer Accepted', axis = 1)
test_y= test['Offer Accepted']

In [46]:
# transforming test train
test_x = transformer.transform(test_x)
test_x = knn.transform(test_x)
test_x = rs.transform(test_x)

In [47]:
lr_pred = lr.predict(test_x)

In [48]:
f1_score(test_y, lr_pred, pos_label='Yes')

0.6836998706338939

In [49]:
print(classification_report(test_y, lr_pred))

              precision    recall  f1-score   support

          No       0.58      0.40      0.47      1105
         Yes       0.61      0.77      0.68      1371

    accuracy                           0.61      2476
   macro avg       0.60      0.59      0.58      2476
weighted avg       0.60      0.61      0.59      2476



In [50]:
f1_score(train_y, lr.predict(train_x), pos_label='Yes')

0.699603489294211