# Customer Analytics Project

## Background:
Simulated data downloaded from the LinkedIn course called "Predictive Customer Analytics".


### *Primary Question*

*Can we predict customer behavior?*


## Outline:
* Part 1 - Customer Propensity
* Part 2 - Customer Prefrences
* Part 3 - Predict Customer Lifetime Value (CLV)
* Part 4 - Group Customer Issues
* Part 5 - Identify Customer Retention

***

# PART 1: CUSTOMER PROPENSITY

# Import libraries and load data

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pylab as plt
%matplotlib inline

In [2]:
browsing = pd.read_csv('data/browsing_data.csv')

# Explore the Data

In [3]:
# No missing data and all integers wih the target variable "BUY"
browsing.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 12 columns):
SESSION_ID         500 non-null int64
IMAGES             500 non-null int64
REVIEWS            500 non-null int64
FAQ                500 non-null int64
SPECS              500 non-null int64
SHIPPING           500 non-null int64
BOUGHT_TOGETHER    500 non-null int64
COMPARE_SIMILAR    500 non-null int64
VIEW_SIMILAR       500 non-null int64
WARRANTY           500 non-null int64
SPONSORED_LINKS    500 non-null int64
BUY                500 non-null int64
dtypes: int64(12)
memory usage: 47.0 KB


In [4]:
browsing.describe()

Unnamed: 0,SESSION_ID,IMAGES,REVIEWS,FAQ,SPECS,SHIPPING,BOUGHT_TOGETHER,COMPARE_SIMILAR,VIEW_SIMILAR,WARRANTY,SPONSORED_LINKS,BUY
count,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0
mean,1250.5,0.51,0.52,0.44,0.48,0.528,0.5,0.58,0.468,0.532,0.55,0.37
std,144.481833,0.500401,0.5001,0.496884,0.5001,0.499715,0.500501,0.494053,0.499475,0.499475,0.497992,0.483288
min,1001.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,1125.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1250.5,1.0,1.0,0.0,0.0,1.0,0.5,1.0,0.0,1.0,1.0,0.0
75%,1375.25,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
max,1500.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [5]:
# Session_id is each session on the website, all binary results
# Not much to do with EDA and distributions
browsing.head()

Unnamed: 0,SESSION_ID,IMAGES,REVIEWS,FAQ,SPECS,SHIPPING,BOUGHT_TOGETHER,COMPARE_SIMILAR,VIEW_SIMILAR,WARRANTY,SPONSORED_LINKS,BUY
0,1001,0,0,1,0,1,0,0,0,1,0,0
1,1002,0,1,1,0,0,0,0,0,0,1,0
2,1003,1,0,1,1,1,0,0,0,1,0,0
3,1004,1,0,0,0,1,1,1,0,0,0,0
4,1005,1,1,1,0,1,0,1,0,0,0,0


In [6]:
# I will look at the correlation matrix of these variables
browsing.corr()['BUY']

SESSION_ID         0.026677
IMAGES             0.046819
REVIEWS            0.404628
FAQ               -0.095136
SPECS              0.009950
SHIPPING          -0.022239
BOUGHT_TOGETHER   -0.103562
COMPARE_SIMILAR    0.190522
VIEW_SIMILAR      -0.096137
WARRANTY           0.179156
SPONSORED_LINKS    0.110328
BUY                1.000000
Name: BUY, dtype: float64

In [7]:
# Will now remove the features with little correlation to BUY
browse_feat = browsing[['REVIEWS','BOUGHT_TOGETHER','COMPARE_SIMILAR','WARRANTY','SPONSORED_LINKS']]
target = browsing.BUY

# Build and Train the Model

In [8]:
# Import libraries
from sklearn.model_selection  import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report,confusion_matrix
import sklearn.metrics
from sklearn.naive_bayes import GaussianNB

In [9]:
# Split Train and Test
X_train, X_test, y_train, y_test = train_test_split(browse_feat, target, test_size=0.30, random_state=123)

In [10]:
# Train and predict the model
model=GaussianNB()
model=model.fit(X_train,y_train)

predict = model.predict(X_test)

In [11]:
print(confusion_matrix(y_test,predict))
print(classification_report(y_test,predict))

[[80 20]
 [20 30]]
             precision    recall  f1-score   support

          0       0.80      0.80      0.80       100
          1       0.60      0.60      0.60        50

avg / total       0.73      0.73      0.73       150



***
**Not a high accuracy score on this one but could be to the small dataset. Possibility to predict if a customer will buy a product based on their browsing on the website.**