# Looking at event session data in R using Google Analytics
I use https://ga-dev-tools.appspot.com/explorer/ to build the query that I'm going to use to save time. 

In [8]:
setwd("C:/Users/Bill/Desktop/Tahzoo.com Redesign/GoogleAnalytics/Experiments")
library(RGoogleAnalytics)

Loading required package: lubridate

Attaching package: 'lubridate'

The following object is masked from 'package:base':

    date

Loading required package: httr


In [94]:
load("C:/Users/Bill/Documents/google-analytics/GoogleAnalytics/toekn_file")
ValidateToken(token)

Access Token successfully updated


The process for building a query is very straight forward. I usually play around with the query explorer provided by google until I have the table that I want and then just type the arguments in here.

In [78]:
days_back <- 120
ql = Init(start.date = format(Sys.Date()-days_back,"%Y-%m-%d"), #note the super clever way of counting days
            end.date = format(Sys.Date(),"%Y-%m-%d"),  #you can enter specific dates as well. 
            metrics =  "ga:sessions,ga:totalEvents",
            dimensions = "ga:eventLabel",
            max.results = 10000,            #10000 is the max, you will have to paginate your query
            table.id = "ga:56928074")

q2 = Init(start.date = format(Sys.Date()-days_back,"%Y-%m-%d"), 
            end.date = format(Sys.Date(),"%Y-%m-%d"),
            metrics =  "ga:users",
            max.results = 10000,   
            table.id = "ga:56928074")

c("from: ", format(Sys.Date()-days_back,"%Y-%m-%d"), " To: ", format(Sys.Date(),"%Y-%m-%d"))



In [79]:
gq = QueryBuilder(ql)
gq2 = QueryBuilder(q2)

gd = GetReportData(gq, token, paginate_query = F)
users = GetReportData(gq2, token)

Status of Query:
The API returned 4723 results
Status of Query:
The API returned 1 results


In [62]:
head(gd)

Unnamed: 0,eventLabel,sessions,totalEvents
1,,2628,2
2,GA1.2.1000767845.1461767781,2,2
3,GA1.2.1001075852.1456520683,2,4
4,GA1.2.1001299827.1464302474,1,2
5,GA1.2.1001361534.1458136980,4,5
6,GA1.2.1001524715.1453749442,5,6


In [63]:
c("Total sessions with no GA code", gd[gd["eventLabel"]=="","sessions"]/sum(gd$sessions))
c("Total sessions in dataset",sum(gd$sessions))

In [64]:
c("Number of unique visitors who are tracked",length(unique(gd$eventLabel))-1)

In [65]:
gd$eventsPerSession <- gd$totalEvents/gd$sessions
summary(gd)

  eventLabel           sessions         totalEvents      eventsPerSession   
 Length:4723        Min.   :   1.000   Min.   :  1.000   Min.   : 0.000761  
 Class :character   1st Qu.:   1.000   1st Qu.:  1.000   1st Qu.: 1.000000  
 Mode  :character   Median :   2.000   Median :  3.000   Median : 1.000000  
                    Mean   :   4.985   Mean   :  6.266   Mean   : 1.268770  
                    3rd Qu.:   5.000   3rd Qu.:  6.000   3rd Qu.: 1.250000  
                    Max.   :2628.000   Max.   :627.000   Max.   :21.000000  

I'm going to drop the blank label from the dataset because it's bad data. I just wanted to show that untagged sessions are a small portion of the total audience.

In [66]:
clean_gd <- gd[gd$eventLabel!="",]
summary(clean_gd)

  eventLabel           sessions        totalEvents      eventsPerSession
 Length:4722        Min.   :  1.000   Min.   :  1.000   Min.   : 1.000  
 Class :character   1st Qu.:  1.000   1st Qu.:  1.000   1st Qu.: 1.000  
 Mode  :character   Median :  2.000   Median :  3.000   Median : 1.000  
                    Mean   :  4.429   Mean   :  6.267   Mean   : 1.269  
                    3rd Qu.:  5.000   3rd Qu.:  6.000   3rd Qu.: 1.250  
                    Max.   :231.000   Max.   :627.000   Max.   :21.000  

In [80]:
users

Unnamed: 0,users
1,7806


In [89]:
c("Percent of total users who do not have a tracking code: ",
      users[1,"users"]-length(unique(clean_gd$eventLabel)),
      1-(length(unique(clean_gd$eventLabel))/users[1,"users"]))

Clearly the GA tracking code changes out when clients delete cookies or refuse to accept cookies in their browser. Google Analytics tracks more unique users than I have GA tracking codes in my dataset. Although cookies cannot be used target ALL individuals for personalized messaging, it can be used to build models that indicate which items are predictors of certain behavior groups or "Personas". For those models I’m turning over to the Scikit Learn Library in Python. But I'm going to include the data extraction here as it pertains to the RGoogleAnalytics Library. 

In [100]:
q3 = Init(start.date = format(Sys.Date()-days_back,"%Y-%m-%d"), 
            end.date = format(Sys.Date(),"%Y-%m-%d"),
            metrics =  "ga:users",   #users is just a placeholder here, I just want a list of events and labels
            dimensions = "ga:eventLabel, ga:eventCategory",
            max.results = 10000,   
            filters = "ga:eventCategory!=::",  #filtering out events where the lable is blank.  "::" is a blank Category 
            table.id = "ga:56928074")

gq3 = QueryBuilder(q3)
events = GetReportData(gq3, token, paginate_query = T)


Access Token is valid
Getting data starting at row 10001 
The API returned 18648 results


In [98]:
summary(events)

  eventLabel        eventCategory          users  
 Length:19228       Length:19228       Min.   :1  
 Class :character   Class :character   1st Qu.:1  
 Mode  :character   Mode  :character   Median :1  
                                       Mean   :1  
                                       3rd Qu.:1  
                                       Max.   :2  

In [101]:
head(events)

Unnamed: 0,eventLabel,eventCategory,users
1,GA1.2.1000767845.1461767781,::ABOUT,1
2,GA1.2.1000767845.1461767781,column twelve::BRAD HEIDEMANN Chief Executive Officer,1
3,GA1.2.1001075852.1456520683,::Clients,1
4,GA1.2.1001075852.1456520683,slick-next::Next,1
5,GA1.2.1001361534.1458136980,::ABOUT,1
6,GA1.2.1001361534.1458136980,list__item__link link::Locations,1


In [104]:
write.table(events, file = "labels_categories_actions.tsv", sep = "\t",
            fileEncoding = "utf-8")