# iFood CRM Data Analyst Case - Part2

### Description

The objective of the team is to build a predictive model that will produce the highest profit for the
next direct marketing campaign, scheduled for the next month. 
The new campaign, sixth, aims at
selling a new gadget to the Customer Database. 
To build the model, a pilot campaign involving **2.240 customers** was carried out. 
The customers were selected at random and contacted by phone regarding the acquisition of the gadget. 
During the following months, customers who bought the offer were properly labeled. 
The total cost of the sample campaign was 6.720MU and the revenue generated by the customers who accepted the offer was 3.674MU. 
Globally the campaign had a profit of -3.046MU. 
The success rate of the campaign was 15%. 

The objective is of the team is to develop a model that predicts customer behavior and to apply it to the rest of the customer base.
Hopefully the model will allow the company to cherry pick the customers that are most likely to
purchase the offer while leaving out the non-respondents, making the next campaign highly
profitable. Moreover, other than maximizing the profit of the campaign, the CMO is interested in
understanding to study the characteristic features of those customers who are willing to buy the
gadget.

### Key Objectives are:

1. Explore the data – don’t just plot means and counts. Provide insights, define cause and
effect. Provide a better understanding of the characteristic features of respondents;
2. Propose and describe a customer segmentation based on customers behaviors;
3. Create a predictive model which allows the company to maximize the profit of the next
marketing campaign.
4. Whatever else you think is necessary.

### Deliverables:

1. Data Exploration;
2. Segmentation;
3. Classification Model;
4. Feature Importance.

In [4]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn import metrics

import warnings
warnings.filterwarnings("ignore")

In [2]:
# Read Dataset
ifood_df = pd.read_csv('/Users/adityaagarwal/Aditya Ag/Jupyter Notebook/Resume Projects/CRM Analysis for Marketing data/data/ifood_df.csv')

# Split dataset into features and labels
features = ifood_df.drop('Response', axis =1)
labels = ifood_df.Response

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.40, random_state = 5)

## Generating the Model - Random Forest Regressor using GridSearchCv


In [5]:
# Using Grid Search to find the best parameters
param_grid = { 
    'n_estimators': [50, 100, 200],
    'max_features': ['auto'],
    'max_depth' : [None,3,5,8],
    'criterion' :['gini'],
    'min_samples_split':[2,3,4]
}

# Training RF Models with K-Fold of 5 
rf_models = GridSearchCV(RandomForestClassifier(random_state = 5), 
                         param_grid=param_grid, cv=5, verbose=1)

rf_models.fit(X_train, y_train)

Fitting 5 folds for each of 36 candidates, totalling 180 fits


In [6]:
# Get the predictions
predictions = rf_models.predict(X_test)

# Print the Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(predictions, y_test))

Accuracy: 0.8707482993197279


## Feature Importance

In [9]:
# Print Feature Importance

feature_importance = pd.DataFrame(data={"features":X_test.columns, 
                            "importance":rf_models.best_estimator_.feature_importances_*100})

feature_importance.sort_values('importance', 
            ascending=False).head(10).style.background_gradient(cmap='coolwarm', low=1, high=0)

Unnamed: 0,features,importance
3,Recency,8.475428
37,AcceptedCmpOverall,8.151117
24,Customer_Days,7.842309
0,Income,5.412607
35,MntTotal,5.098285
4,MntWines,4.935032
36,MntRegularProds,4.71386
6,MntMeatProducts,4.602892
18,AcceptedCmp1,4.353262
23,Age,4.002047
