Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calls with slow_fetch = TRUE only return the first data batch #321

Closed
wdwatkins opened this issue Jun 20, 2020 · 4 comments
Closed

calls with slow_fetch = TRUE only return the first data batch #321

wdwatkins opened this issue Jun 20, 2020 · 4 comments
Assignees
Labels

Comments

@wdwatkins
Copy link

wdwatkins commented Jun 20, 2020

What goes wrong

When using slow_fetch=TRUE only the first data batch is returned. See #320 for a potential fix

Steps to reproduce the problem

#First authenticate somehow with your Google account
#This is a fairly high-traffic view
view_id <- "xxxxx"
large_load_time <- google_analytics(viewId =  view_id,
                                     date_range = c('2020-05-01', '2020-05-31'),
                                    dimensions = c("pagePath"),
                                     metrics = c("pageLoadSample", "avgPageLoadTime",
                                                 "avgPageDownloadTime",
                                               "avgDomContentLoadedTime",
                                                "exitRate"),
                                    anti_sample = TRUE, slow_fetch=TRUE)

Expected output

All available rows are returned

ℹ 2020-06-19 16:51:42 > anti_sample set to TRUE. Mitigating sampling via multiple API calls.
ℹ 2020-06-19 16:51:42 > Finding how much sampling in data request...
ℹ 2020-06-19 16:51:42 > Downloaded [ 10 ] rows from a total of [ 16660 ].
ℹ 2020-06-19 16:51:42 > No sampling found, returning call
ℹ 2020-06-19 16:51:42 > Slow fetch: [ 0 ] from estimated actual Rows [ 9999 ]
ℹ 2020-06-19 16:51:48 > Downloaded [ 6660 ] rows from a total of [ 16660 ].
ℹ 2020-06-19 16:51:48 > All data downloaded, total of [ 16660 ] - download time:  5.49545 secs
nrow(large_load_time)
[1] 16660

Actual output

(with more verbose set to 2)

ℹ 2020-06-19 16:59:20 > anti_sample set to TRUE. Mitigating sampling via multiple API calls.
ℹ 2020-06-19 16:59:20 > Finding how much sampling in data request...
ℹ 2020-06-19 16:59:20 > Calling APIv4....
ℹ 2020-06-19 16:59:20 > Single v4 batch
ℹ 2020-06-19 16:59:20 > Token exists.
ℹ 2020-06-19 16:59:20 > Request:  https://analyticsreporting.googleapis.com/v4/reports:batchGet/?quotaUser=wwatkins
ℹ 2020-06-19 16:59:20 > Body JSON parsed to:  {"reportRequests":[{"viewId":"ga:xxxx","dateRanges":[{"startDate":"2020-05-01","endDate":"2020-05-31"}],"samplingLevel":"LARGE","dimensions":[{"name":"ga:pagePath"}],"metrics":[{"expression":"ga:pageLoadSample","alias":"pageLoadSample","formattingType":"METRIC_TYPE_UNSPECIFIED"},{"expression":"ga:avgPageLoadTime","alias":"avgPageLoadTime","formattingType":"METRIC_TYPE_UNSPECIFIED"},{"expression":"ga:avgPageDownloadTime","alias":"avgPageDownloadTime","formattingType":"METRIC_TYPE_UNSPECIFIED"},{"expression":"ga:avgDomContentLoadedTime","alias":"avgDomContentLoadedTime","formattingType":"METRIC_TYPE_UNSPECIFIED"},{"expression":"ga:exitRate","alias":"exitRate","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"pageToken":"0","pageSize":10,"includeEmptyRows":true}]}
ℹ 2020-06-19 16:59:20 > Downloaded [ 10 ] rows from a total of [ 16660 ].
ℹ 2020-06-19 16:59:20 > No sampling found, returning call
ℹ 2020-06-19 16:59:20 > Calling APIv4 slowly....
ℹ 2020-06-19 16:59:20 > Slow fetch: [ 0 ] from estimated actual Rows [ 9999 ]
ℹ 2020-06-19 16:59:20 > Token exists.
ℹ 2020-06-19 16:59:20 > Request:  https://analyticsreporting.googleapis.com/v4/reports:batchGet/?quotaUser=wwatkins
ℹ 2020-06-19 16:59:20 > Body JSON parsed to:  {"reportRequests":{"viewId":"ga:xxxxx","dateRanges":[{"startDate":"2020-05-01","endDate":"2020-05-31"}],"samplingLevel":"LARGE","dimensions":[{"name":"ga:pagePath"}],"metrics":[{"expression":"ga:pageLoadSample","alias":"pageLoadSample","formattingType":"METRIC_TYPE_UNSPECIFIED"},{"expression":"ga:avgPageLoadTime","alias":"avgPageLoadTime","formattingType":"METRIC_TYPE_UNSPECIFIED"},{"expression":"ga:avgPageDownloadTime","alias":"avgPageDownloadTime","formattingType":"METRIC_TYPE_UNSPECIFIED"},{"expression":"ga:avgDomContentLoadedTime","alias":"avgDomContentLoadedTime","formattingType":"METRIC_TYPE_UNSPECIFIED"},{"expression":"ga:exitRate","alias":"exitRate","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"pageToken":"0","pageSize":10000,"includeEmptyRows":true}} ]
> nrow(large_load_time)
[1] 10000

Session Info

> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] googleAnalyticsR_0.7.1.9001 lubridate_1.7.8             arrow_0.17.1               
[4] tidyr_1.1.0                 dplyr_1.0.0                 googleAuthR_1.3.0          

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6     rstudioapi_0.11  magrittr_1.5     bit_1.1-15.2     tidyselect_1.1.0
 [6] R6_2.4.1         rlang_0.4.6      fansi_0.4.1      httr_1.4.1       tools_4.0.0     
[11] cli_2.0.2        askpass_1.1      ellipsis_0.3.1   openssl_1.4.1    bit64_0.9-7     
[16] assertthat_0.2.1 digest_0.6.25    tibble_3.0.1     gargle_0.5.0     lifecycle_0.2.0 
[21] crayon_1.3.4     purrr_0.3.4      vctrs_0.3.1      fs_1.4.1         curl_4.3        
[26] memoise_1.1.0    glue_1.4.1       compiler_4.0.0   pillar_1.4.4     generics_0.0.2  
[31] jsonlite_1.6.1   pkgconfig_2.0.3 
@MarkEdmondson1234
Copy link
Collaborator

Thanks, it looks like anti_sampling isn’t necessary so does the fetch complete when that is also turned off?

@wdwatkins
Copy link
Author

wdwatkins commented Jun 23, 2020

No, the same thing happens without anti_sample = TRUE:

large_load_time <- google_analytics(viewId =  view_id,
                                    date_range = c('2020-05-01', '2020-05-31'),
                                    dimensions = c("pagePath"),
                                    metrics = c("pageLoadSample", "avgPageLoadTime",
                                                "avgPageDownloadTime",
                                                "avgDomContentLoadedTime",
                                                "exitRate"),
                                     slow_fetch=TRUE, max = -1)
ℹ 2020-06-22 17:12:41 > Calling APIv4 slowly....
ℹ 2020-06-22 17:12:41 > Slow fetch: [ 0 ] from estimated actual Rows [ 9999 ]
ℹ 2020-06-22 17:12:41 > Token exists.
ℹ 2020-06-22 17:12:41 > Request:  https://analyticsreporting.googleapis.com/v4/reports:batchGet/?quotaUser=wwatkins
ℹ 2020-06-22 17:12:41 > Body JSON parsed to:  {"reportRequests":{"viewId":"ga:xxxxx","dateRanges":[{"startDate":"2020-05-01","endDate":"2020-05-31"}],"samplingLevel":"DEFAULT","dimensions":[{"name":"ga:pagePath"}],"metrics":[{"expression":"ga:pageLoadSample","alias":"pageLoadSample","formattingType":"METRIC_TYPE_UNSPECIFIED"},{"expression":"ga:avgPageLoadTime","alias":"avgPageLoadTime","formattingType":"METRIC_TYPE_UNSPECIFIED"},{"expression":"ga:avgPageDownloadTime","alias":"avgPageDownloadTime","formattingType":"METRIC_TYPE_UNSPECIFIED"},{"expression":"ga:avgDomContentLoadedTime","alias":"avgDomContentLoadedTime","formattingType":"METRIC_TYPE_UNSPECIFIED"},{"expression":"ga:exitRate","alias":"exitRate","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"pageToken":"0","pageSize":10000,"includeEmptyRows":true}}
> nrow(large_load_time)
[1] 10000

My use case involves running the same query over multiple views, some of which are large enough to need anti-sampling, some of which aren't, so I'd like to be able to leave it set to TRUE so it is there when needed.

@MarkEdmondson1234
Copy link
Collaborator

Many thanks @wdwatkins - should be fixed now in GitHub version, will be on CRAN version 0.8.0 soon-ish.

@wdwatkins
Copy link
Author

Thanks for maintaining this package!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants