Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting data into multiple rows when using anti_sample=true #49

Closed
martijnvv opened this issue Dec 7, 2016 · 7 comments
Closed

Splitting data into multiple rows when using anti_sample=true #49

martijnvv opened this issue Dec 7, 2016 · 7 comments
Labels

Comments

@martijnvv
Copy link

martijnvv commented Dec 7, 2016

I run the following code with or without anti_sample:

cf <- dim_filter("dimension7","EXACT",campaign_name,not = FALSE)
fc <- filter_clause_ga4(list(cf))
gaDataFunnel_dev1 <- google_analytics_4(viewId, 
                                    date_range = c(dateStart, dateYesterday),
                                    dimensions=c("deviceCategory", "dimension7"), 
                                    metrics = c("uniquePageviews"),
                                    order = order_type("deviceCategory", "ASCENDING"),
                                    dim_filters = fc,
                                    anti_sample = TRUE)

The code with anti_sample=true returns the same values separated into two rows:
anti_sample true

Without anti_sample=true, the table is no longer split into multiple rows:
no_anti_sample true

This seems to occur only in the past few days. It didn't happen before.
I'm running version 0.3.0 of the package

@MarkEdmondson1234
Copy link
Collaborator

Thanks - may I ask how many rows are in the tables you are fetching? Also the results of sessionInfo() just to check all dependency versions. You may want to try using latest version of googleAuthR from Github to see if that helps.

@martijnvv
Copy link
Author

The table has 6 vs 3 rows (deviceCategory).
I just ran the same query after updating the googleAuthR version from github. Issue persists.

See sessionInfo() below:

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252    LC_MONETARY=Dutch_Netherlands.1252
[4] LC_NUMERIC=C                       LC_TIME=Dutch_Netherlands.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gdata_2.17.0           ggthemes_3.2.0         ggplot2_2.1.0          highcharter_0.4.0     
 [5] forecast_7.3           timeDate_3012.100      zoo_1.7-13             reshape2_1.4.1        
 [9] rmarkdown_1.1          RJSONIO_1.3-0          tidyjson_0.2.1         jsonlite_1.1          
[13] knitr_1.14             lubridate_1.6.0        googlesheets_0.2.1     dplyr_0.5.0           
[17] openxlsx_3.0.0         RGA_0.4.2              googleAnalyticsR_0.3.0

@MarkEdmondson1234
Copy link
Collaborator

Weird, ok thanks, will take a look.

@MarkEdmondson1234
Copy link
Collaborator

Hi @martijnvv , I can't replicate this with the below code:

> dateStart <- Sys.Date() - 30
> dateYesterday <- Sys.Date() - 1
> cf <- dim_filter("source","EXACT","google",not = FALSE)
> fc <- filter_clause_ga4(list(cf))
> gaDataFunnel_dev1 <- google_analytics_4(viewId, 
+                                         date_range = c(dateStart, dateYesterday),
+                                         dimensions=c("deviceCategory", "source"), 
+                                         metrics = c("uniquePageviews"),
+                                         order = order_type("deviceCategory", "ASCENDING"),
+                                         dim_filters = fc,
+                                         anti_sample = TRUE)
anti_sample set to TRUE. Mitigating sampling via multiple API calls.
Finding how much sampling in data request...
Auto-refreshing stale OAuth token.
Downloaded [3] rows from a total of [3].
No sampling found, returning call
Downloaded [3] rows from a total of [3].

Could you perhaps also report the message feedback when you set options(googleAuthR.verbose = 1) before running your call.

And also install the version of googleAnalyticsR from Github v0.3.0.9000, there was a bug with zero-rows in anti-sample calls that may have an effect.

@martijnvv
Copy link
Author

martijnvv commented Dec 8, 2016

Hi @MarkEdmondson1234 ,

I first updated to version 0.3.0.9000, that made no difference...
Below is the response to my query. Your query may not have this issue, because there is no unsampling of the data required in your request.

anti_sample set to TRUE. Mitigating sampling via multiple API calls.
Finding how much sampling in data request...
Calling APIv4....
Single v4 batch
Token exists.
Valid local token
Request: https://analyticsreporting.googleapis.com/v4/reports:batchGet/
Body JSON parsed to: {"reportRequests":[{"viewId":"ga:5234635","dateRanges":[{"startDate":"2016-11-17","endDate":"2016-12-07"}],"samplingLevel":"LARGE","dimensions":[{"name":"ga:deviceCategory"}],"metrics":[{"expression":"ga:uniquePageviews","alias":"uniquePageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"dimensionFilterClauses":{"operator":"OR","filters":[{"dimensionName":"ga:pagePath","not":false,"operator":"BEGINS_WITH","expressions":["/DL-Zorgverzekering/berekening/Start/"],"caseSensitive":false}]},"orderBys":{"fieldName":"ga:deviceCategory","orderType":"VALUE","sortOrder":"ASCENDING"},"pageToken":"0","pageSize":10,"includeEmptyRows":true}]}
-> POST /v4/reports:batchGet/ HTTP/1.1
-> Host: analyticsreporting.googleapis.com
-> User-Agent: googleAuthR/0.4.0.9000 (gzip)
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Accept-Encoding: gzip
-> Authorization: Bearer ya29.CjCuA8UgttmD7g9Ggv7IBDdklIP-IFNJGT4qzqLrB3NHH8vugTAvk74xSxkGhOxXMUk
-> Content-Length: 639
-> 
>> {"reportRequests":[{"viewId":"ga:5234635","dateRanges":[{"startDate":"2016-11-17","endDate":"2016-12-07"}],"samplingLevel":"LARGE","dimensions":[{"name":"ga:deviceCategory"}],"metrics":[{"expression":"ga:uniquePageviews","alias":"uniquePageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"dimensionFilterClauses":{"operator":"OR","filters":[{"dimensionName":"ga:pagePath","not":false,"operator":"BEGINS_WITH","expressions":["/DL-Zorgverzekering/berekening/Start/"],"caseSensitive":false}]},"orderBys":{"fieldName":"ga:deviceCategory","orderType":"VALUE","sortOrder":"ASCENDING"},"pageToken":"0","pageSize":10,"includeEmptyRows":true}]}

<- HTTP/2 200 
<- content-type: application/json; charset=UTF-8
<- vary: Origin
<- vary: X-Origin
<- vary: Referer
<- content-encoding: gzip
<- date: Thu, 08 Dec 2016 15:43:27 GMT
<- server: ESF
<- cache-control: private
<- content-length: 364
<- x-xss-protection: 1; mode=block
<- x-frame-options: SAMEORIGIN
<- x-content-type-options: nosniff
<- alt-svc: quic=":443"; ma=2592000; v="35,34"
<- 
Downloaded [3] rows from a total of [3].
Data is sampled, based on 90.3% of sessions.
Finding number of sessions for anti-sample calculations...
Calling APIv4....
Single v4 batch
Token exists.
Valid local token
Request: https://analyticsreporting.googleapis.com/v4/reports:batchGet/
Body JSON parsed to: {"reportRequests":[{"viewId":"ga:5234635","dateRanges":[{"startDate":"2016-11-17","endDate":"2016-12-07"}],"samplingLevel":"DEFAULT","dimensions":[{"name":"ga:date"}],"metrics":[{"expression":"ga:sessions","alias":"sessions","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"pageToken":"0","pageSize":1000,"includeEmptyRows":true}]}
-> POST /v4/reports:batchGet/ HTTP/1.1
-> Host: analyticsreporting.googleapis.com
-> User-Agent: googleAuthR/0.4.0.9000 (gzip)
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Accept-Encoding: gzip
-> Authorization: Bearer ya29.CjCuA8UgttmD7g9Ggv7IBDdklIP-IFNJGT4qzqLrB3NHH8vugTAvk74xSxkGhOxXMUk
-> Content-Length: 328
-> 
>> {"reportRequests":[{"viewId":"ga:5234635","dateRanges":[{"startDate":"2016-11-17","endDate":"2016-12-07"}],"samplingLevel":"DEFAULT","dimensions":[{"name":"ga:date"}],"metrics":[{"expression":"ga:sessions","alias":"sessions","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"pageToken":"0","pageSize":1000,"includeEmptyRows":true}]}

<- HTTP/2 200 
<- content-type: application/json; charset=UTF-8
<- vary: Origin
<- vary: X-Origin
<- vary: Referer
<- content-encoding: gzip
<- date: Thu, 08 Dec 2016 15:43:28 GMT
<- server: ESF
<- cache-control: private
<- content-length: 465
<- x-xss-protection: 1; mode=block
<- x-frame-options: SAMEORIGIN
<- x-content-type-options: nosniff
<- alt-svc: quic=":443"; ma=2592000; v="35,34"
<- 
Downloaded [21] rows from a total of [21].
Calculated [2] batches are needed to download approx. [4] rows unsampled.
Found [447099] sampleReadCounts from a [549840] samplingSpaceSize.
Anti-sample call covering 16 days: 2016-11-17, 2016-12-02
Calling APIv4....
Single v4 batch
Token exists.
Valid local token
Request: https://analyticsreporting.googleapis.com/v4/reports:batchGet/
Body JSON parsed to: {"reportRequests":[{"viewId":"ga:5234635","dateRanges":[{"startDate":"2016-11-17","endDate":"2016-12-02"}],"samplingLevel":"LARGE","dimensions":[{"name":"ga:deviceCategory"}],"metrics":[{"expression":"ga:uniquePageviews","alias":"uniquePageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"dimensionFilterClauses":{"operator":"OR","filters":[{"dimensionName":"ga:pagePath","not":false,"operator":"BEGINS_WITH","expressions":["/DL-Zorgverzekering/berekening/Start/"],"caseSensitive":false}]},"orderBys":{"fieldName":"ga:deviceCategory","orderType":"VALUE","sortOrder":"ASCENDING"},"pageToken":"0","pageSize":4,"includeEmptyRows":true}]}
-> POST /v4/reports:batchGet/ HTTP/1.1
-> Host: analyticsreporting.googleapis.com
-> User-Agent: googleAuthR/0.4.0.9000 (gzip)
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Accept-Encoding: gzip
-> Authorization: Bearer ya29.CjCuA8UgttmD7g9Ggv7IBDdklIP-IFNJGT4qzqLrB3NHH8vugTAvk74xSxkGhOxXMUk
-> Content-Length: 638
-> 
>> {"reportRequests":[{"viewId":"ga:5234635","dateRanges":[{"startDate":"2016-11-17","endDate":"2016-12-02"}],"samplingLevel":"LARGE","dimensions":[{"name":"ga:deviceCategory"}],"metrics":[{"expression":"ga:uniquePageviews","alias":"uniquePageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"dimensionFilterClauses":{"operator":"OR","filters":[{"dimensionName":"ga:pagePath","not":false,"operator":"BEGINS_WITH","expressions":["/DL-Zorgverzekering/berekening/Start/"],"caseSensitive":false}]},"orderBys":{"fieldName":"ga:deviceCategory","orderType":"VALUE","sortOrder":"ASCENDING"},"pageToken":"0","pageSize":4,"includeEmptyRows":true}]}

<- HTTP/2 200 
<- content-type: application/json; charset=UTF-8
<- vary: Origin
<- vary: X-Origin
<- vary: Referer
<- content-encoding: gzip
<- date: Thu, 08 Dec 2016 15:43:31 GMT
<- server: ESF
<- cache-control: private
<- content-length: 338
<- x-xss-protection: 1; mode=block
<- x-frame-options: SAMEORIGIN
<- x-content-type-options: nosniff
<- alt-svc: quic=":443"; ma=2592000; v="35,34"
<- 
Downloaded [3] rows from a total of [3].
Anti-sample call covering 5 days: 2016-12-03, 2016-12-07
Calling APIv4....
Single v4 batch
Token exists.
Valid local token
Request: https://analyticsreporting.googleapis.com/v4/reports:batchGet/
Body JSON parsed to: {"reportRequests":[{"viewId":"ga:5234635","dateRanges":[{"startDate":"2016-12-03","endDate":"2016-12-07"}],"samplingLevel":"LARGE","dimensions":[{"name":"ga:deviceCategory"}],"metrics":[{"expression":"ga:uniquePageviews","alias":"uniquePageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"dimensionFilterClauses":{"operator":"OR","filters":[{"dimensionName":"ga:pagePath","not":false,"operator":"BEGINS_WITH","expressions":["/DL-Zorgverzekering/berekening/Start/"],"caseSensitive":false}]},"orderBys":{"fieldName":"ga:deviceCategory","orderType":"VALUE","sortOrder":"ASCENDING"},"pageToken":"0","pageSize":4,"includeEmptyRows":true}]}
-> POST /v4/reports:batchGet/ HTTP/1.1
-> Host: analyticsreporting.googleapis.com
-> User-Agent: googleAuthR/0.4.0.9000 (gzip)
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Accept-Encoding: gzip
-> Authorization: Bearer ya29.CjCuA8UgttmD7g9Ggv7IBDdklIP-IFNJGT4qzqLrB3NHH8vugTAvk74xSxkGhOxXMUk
-> Content-Length: 638
-> 
>> {"reportRequests":[{"viewId":"ga:5234635","dateRanges":[{"startDate":"2016-12-03","endDate":"2016-12-07"}],"samplingLevel":"LARGE","dimensions":[{"name":"ga:deviceCategory"}],"metrics":[{"expression":"ga:uniquePageviews","alias":"uniquePageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"dimensionFilterClauses":{"operator":"OR","filters":[{"dimensionName":"ga:pagePath","not":false,"operator":"BEGINS_WITH","expressions":["/DL-Zorgverzekering/berekening/Start/"],"caseSensitive":false}]},"orderBys":{"fieldName":"ga:deviceCategory","orderType":"VALUE","sortOrder":"ASCENDING"},"pageToken":"0","pageSize":4,"includeEmptyRows":true}]}

<- HTTP/2 200 
<- content-type: application/json; charset=UTF-8
<- vary: Origin
<- vary: X-Origin
<- vary: Referer
<- content-encoding: gzip
<- date: Thu, 08 Dec 2016 15:43:33 GMT
<- server: ESF
<- cache-control: private
<- content-length: 321
<- x-xss-protection: 1; mode=block
<- x-frame-options: SAMEORIGIN
<- x-content-type-options: nosniff
<- alt-svc: quic=":443"; ma=2592000; v="35,34"
<- 
Downloaded [3] rows from a total of [3].
Finished unsampled data request, total rows [6]
Successfully avoided sampling

@MarkEdmondson1234
Copy link
Collaborator

@martijnvv The latest Github version now includes a aggregateGAData function that aggregates across dimensions, which is useful on its own, but I have also applied it to the results of anti_sample runs which should solve the above issue. Let me know if it works, in particular keep an eye on if a metric should be summed or averaged as this is performed by the library and not via the API as is done in other calls, so could be different and have strange behaviour (things like bounceRate, avgTimeOnsite etc.)

@martijnvv
Copy link
Author

@MarkEdmondson1234 Consider that bug squashed. I haven't run any queries with averaged values or percentages, only absolutes.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants