# Machine Learning: Classification 
Goal of this notebook is to interpret data from Google Analytics and classify whether or not the session resulted in a transaction. Data for this notebook was source from Google Analytics sample data in BigQuery, with the query and data maintained in this notebook. Marketers can use this information to see which targeting features drive the highest impact on conversion probability!

## Part 0: Package Import
Goal:
- Import requisite packages

In [1]:
# Package Import
import pandas as pd
import numpy as np
import matplotlib.pyplot as pltb
import seaborn as sb
import scipy
import statsmodels.formula.api as smf
import statsmodels.api as sm
from sklearn import model_selection, preprocessing, feature_selection, ensemble, linear_model, metrics, decomposition
import datetime

## Part 1: Data Import
Goal:
- Read in raw data from working directory
- Note - data source from BigQuery

In [2]:
# Data Import
rawData = pd.read_csv(r"rawData.csv", parse_dates = ['date'])

rawData.head()

Unnamed: 0,date,campaign,transactions,pageviews,ismobile,country
0,2017-02-25,AW - Dynamic Search Ads Whole Site,0,1,False,United States
1,2017-02-25,Data Share Promo,0,1,False,India
2,2017-02-25,Data Share Promo,0,1,False,China
3,2017-02-25,Data Share Promo,0,1,False,Spain
4,2017-02-25,AW - Accessories,0,1,False,United States


## Part 2: Exploratory Analysis

In [3]:
# Info on the dataset
rawData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   date          5000 non-null   datetime64[ns]
 1   campaign      5000 non-null   object        
 2   transactions  5000 non-null   int64         
 3   pageviews     5000 non-null   int64         
 4   ismobile      5000 non-null   bool          
 5   country       5000 non-null   object        
dtypes: bool(1), datetime64[ns](1), int64(2), object(2)
memory usage: 200.3+ KB


In [4]:
# Describing the dataset
rawData.describe()

Unnamed: 0,transactions,pageviews
count,5000.0,5000.0
mean,0.0078,4.1698
std,0.087981,6.250407
min,0.0,1.0
25%,0.0,1.0
50%,0.0,2.0
75%,0.0,5.0
max,1.0,108.0


In [5]:
# Visualizing how many transactions we have to work with

# Total Tranasctions
totalTransactions = rawData['transactions'].sum()

# Counting transactions by campaign
campaignTransactions = rawData[['campaign', 'transactions']].groupby('campaign').sum()

campaignTransactions

Unnamed: 0_level_0,transactions
campaign,Unnamed: 1_level_1
AW - Accessories,22
AW - Apparel,0
AW - Dynamic Search Ads Whole Site,17
AW - Electronics,0
All Products,0
Data Share Promo,0
