<b><font size="6">Data-Driven Marketing with Google Merch Store Customer Data and Google Play Store App Reviews</font>

    Notebook by Allison Kelly - allisonkelly42@gmail.com

    Blog post - placeholder

    Presentation - placeholder

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Imports" data-toc-modified-id="Imports-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Imports</a></span></li><li><span><a href="#Customer-Lifetime-Value-and-Customer-Attrition-Prediction" data-toc-modified-id="Customer-Lifetime-Value-and-Customer-Attrition-Prediction-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Customer Lifetime Value and Customer Attrition Prediction</a></span><ul class="toc-item"><li><span><a href="#Obtaining-the-data" data-toc-modified-id="Obtaining-the-data-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Obtaining the data</a></span></li><li><span><a href="#Cleaning-the-Data" data-toc-modified-id="Cleaning-the-Data-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Cleaning the Data</a></span></li></ul></li><li><span><a href="#NLP-Google-Play-Store-Reviews" data-toc-modified-id="NLP-Google-Play-Store-Reviews-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>NLP Google Play Store Reviews</a></span></li><li><span><a href="#Marketing-Analytics-with-Google-Demo-Account" data-toc-modified-id="Marketing-Analytics-with-Google-Demo-Account-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Marketing Analytics with Google Demo Account</a></span></li><li><span><a href="#Future-Work" data-toc-modified-id="Future-Work-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Future Work</a></span></li></ul></div>

# Introduction

Leveraging machine learning and AI with the massive amounts of consumer history, web traffic, and product reviews can have a measurable impact on marketing campaigns. By harnessing the power of statistical and analytical tools, patterns untracked by basic dashboards can be unearthed, translating into higher conversion rates over traditional marketing strategies. The following notebook is one such demonstration of this power.

Part one explores customer purchase history on the <a href="https://shop.googlemerchandisestore.com/">Google Merchandise Store.</a> By using machine learning, I will predict the customer lifetime value (CLV) of each customer. This data can then be used to segment ads, email marketing campaigns, and other traditional marketing ventures. I will also predict and measure customer attrition which can be used to develop proactive prevention strategies and increase revenue. 

Part two uses NLP to parse through Google Play store app reviews, conduct sentiment analysis, and classify reviews. The NLP machine learning models can be used to recommend apps and get feedback in realtime to prompt bug fixes and improvements.

Finally, part three will focusing on tracking important metrics and gleaning insights from the Google Analytics demo account for the Google Merchandise Store. Setting benchmarks for KPIs and tracking campaign results are essential to understanding consumer behavior and making informed business decisions. 

# Imports

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns 
import datetime
import numpy as np
import json # for data cleaning

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics

# Customer Lifetime Value and Customer Attrition Prediction

In marketing, according to <a href="https://en.wikipedia.org/wiki/Customer_lifetime_value">Wikipedia</a>, Customer Lifetime Value (CLV) can be defined as "a prediction of the net profit attributed to the entire future relationship with a customer." CLV is an important metric to track as it can help identify your customer acquisition budget, help your marketing team to segment your customer base, increase retention and satisfaction, etc. CLV is represented as a dollar amount and varies customer to customer. 

<img src="brandwise-clv-bellcurve.gif">

## Obtaining the data

This dataset was derived from the Google Merchandise Store demo account as provided by Google BigQuery on Kaggle. You can find the data <a href="https://www.kaggle.com/c/ga-customer-revenue-prediction">here.</a> 

In [2]:
df = pd.read_csv('train.csv') # loading and checking out the dataset
df.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,channelGrouping,date,device,fullVisitorId,geoNetwork,sessionId,socialEngagementType,totals,trafficSource,visitId,visitNumber,visitStartTime
0,Organic Search,20160902,"{""browser"": ""Chrome"", ""browserVersion"": ""not a...",1131660440785968503,"{""continent"": ""Asia"", ""subContinent"": ""Western...",1131660440785968503_1472830385,Not Socially Engaged,"{""visits"": ""1"", ""hits"": ""1"", ""pageviews"": ""1"",...","{""campaign"": ""(not set)"", ""source"": ""google"", ...",1472830385,1,1472830385
1,Organic Search,20160902,"{""browser"": ""Firefox"", ""browserVersion"": ""not ...",377306020877927890,"{""continent"": ""Oceania"", ""subContinent"": ""Aust...",377306020877927890_1472880147,Not Socially Engaged,"{""visits"": ""1"", ""hits"": ""1"", ""pageviews"": ""1"",...","{""campaign"": ""(not set)"", ""source"": ""google"", ...",1472880147,1,1472880147
2,Organic Search,20160902,"{""browser"": ""Chrome"", ""browserVersion"": ""not a...",3895546263509774583,"{""continent"": ""Europe"", ""subContinent"": ""South...",3895546263509774583_1472865386,Not Socially Engaged,"{""visits"": ""1"", ""hits"": ""1"", ""pageviews"": ""1"",...","{""campaign"": ""(not set)"", ""source"": ""google"", ...",1472865386,1,1472865386
3,Organic Search,20160902,"{""browser"": ""UC Browser"", ""browserVersion"": ""n...",4763447161404445595,"{""continent"": ""Asia"", ""subContinent"": ""Southea...",4763447161404445595_1472881213,Not Socially Engaged,"{""visits"": ""1"", ""hits"": ""1"", ""pageviews"": ""1"",...","{""campaign"": ""(not set)"", ""source"": ""google"", ...",1472881213,1,1472881213
4,Organic Search,20160902,"{""browser"": ""Chrome"", ""browserVersion"": ""not a...",27294437909732085,"{""continent"": ""Europe"", ""subContinent"": ""North...",27294437909732085_1472822600,Not Socially Engaged,"{""visits"": ""1"", ""hits"": ""1"", ""pageviews"": ""1"",...","{""campaign"": ""(not set)"", ""source"": ""google"", ...",1472822600,2,1472822600


In [3]:
print(df.info())

print('\n\nNumber of individual visitors to the Google Merch Store: ', 
      len(df.fullVisitorId.unique()))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 903653 entries, 0 to 903652
Data columns (total 12 columns):
channelGrouping         903653 non-null object
date                    903653 non-null int64
device                  903653 non-null object
fullVisitorId           903653 non-null object
geoNetwork              903653 non-null object
sessionId               903653 non-null object
socialEngagementType    903653 non-null object
totals                  903653 non-null object
trafficSource           903653 non-null object
visitId                 903653 non-null int64
visitNumber             903653 non-null int64
visitStartTime          903653 non-null int64
dtypes: int64(4), object(8)
memory usage: 82.7+ MB
None


Number of individual visitors to the Google Merch Store:  742735


The dataframe is consisted of 903,653 datapoints describing visits to the Google Merch Store with 742,753 unique visitors. Most features consist of objects, though the integers are either dates, times or immutable ID numbers that should be categorized accordingly. 

## Cleaning the Data

In [4]:

def datetime_edits(dataframe):
    
    """This function changes the two columns 
    with dates/times into datetime objects.
    To be used with dataframes with similar
    setup to this one."""
    
    dataframe.date.astype(str)
    dataframe.date = pd.to_datetime(dataframe.date, 
                                    format='%Y%m%d')

    dataframe.visitStartTime = pd.to_datetime(dataframe.visitStartTime, unit='s')
    
    # confirming proper edits were made
    return "Date column:", dataframe.date.dtype, "Time column:", dataframe.visitStartTime.dtype


In [5]:
datetime_edits(df)

('Date column:', dtype('<M8[ns]'), 'Time column:', dtype('<M8[ns]'))

In [6]:
# changing immutable IDs to strings
df.visitId = df.visitId.astype(str)

In [7]:

def unpack_col(df, col):
    
    df[col] = df[col].dropna().apply(json.loads)
    unpacked_df = df[col].apply(pd.Series)
        
    return unpacked_df
        
device_df = unpack_col(df, 'device')
geo_df = unpack_col(df, 'geoNetwork')
totals_df = unpack_col(df, 'totals')
traffic_df = unpack_col(df, 'trafficSource')

In [9]:
df_expanded = pd.concat([df,device_df, geo_df, totals_df, traffic_df], axis=1)

In [10]:
df_expanded.head()

Unnamed: 0,channelGrouping,date,device,fullVisitorId,geoNetwork,sessionId,socialEngagementType,totals,trafficSource,visitId,...,transactionRevenue,campaign,source,medium,keyword,adwordsClickInfo,isTrueDirect,referralPath,adContent,campaignCode
0,Organic Search,2016-09-02,"{'browser': 'Chrome', 'browserVersion': 'not a...",1131660440785968503,"{'continent': 'Asia', 'subContinent': 'Western...",1131660440785968503_1472830385,Not Socially Engaged,"{'visits': '1', 'hits': '1', 'pageviews': '1',...","{'campaign': '(not set)', 'source': 'google', ...",1472830385,...,,(not set),google,organic,(not provided),{'criteriaParameters': 'not available in demo ...,,,,
1,Organic Search,2016-09-02,"{'browser': 'Firefox', 'browserVersion': 'not ...",377306020877927890,"{'continent': 'Oceania', 'subContinent': 'Aust...",377306020877927890_1472880147,Not Socially Engaged,"{'visits': '1', 'hits': '1', 'pageviews': '1',...","{'campaign': '(not set)', 'source': 'google', ...",1472880147,...,,(not set),google,organic,(not provided),{'criteriaParameters': 'not available in demo ...,,,,
2,Organic Search,2016-09-02,"{'browser': 'Chrome', 'browserVersion': 'not a...",3895546263509774583,"{'continent': 'Europe', 'subContinent': 'South...",3895546263509774583_1472865386,Not Socially Engaged,"{'visits': '1', 'hits': '1', 'pageviews': '1',...","{'campaign': '(not set)', 'source': 'google', ...",1472865386,...,,(not set),google,organic,(not provided),{'criteriaParameters': 'not available in demo ...,,,,
3,Organic Search,2016-09-02,"{'browser': 'UC Browser', 'browserVersion': 'n...",4763447161404445595,"{'continent': 'Asia', 'subContinent': 'Southea...",4763447161404445595_1472881213,Not Socially Engaged,"{'visits': '1', 'hits': '1', 'pageviews': '1',...","{'campaign': '(not set)', 'source': 'google', ...",1472881213,...,,(not set),google,organic,google + online,{'criteriaParameters': 'not available in demo ...,,,,
4,Organic Search,2016-09-02,"{'browser': 'Chrome', 'browserVersion': 'not a...",27294437909732085,"{'continent': 'Europe', 'subContinent': 'North...",27294437909732085_1472822600,Not Socially Engaged,"{'visits': '1', 'hits': '1', 'pageviews': '1',...","{'campaign': '(not set)', 'source': 'google', ...",1472822600,...,,(not set),google,organic,(not provided),{'criteriaParameters': 'not available in demo ...,True,,,


In [11]:
df_expanded.to_csv('expanded_df.csv')

# NLP Google Play Store Reviews

# Marketing Analytics with Google Demo Account

# Future Work