Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unsampled report downloads #44

Closed
MarkEdmondson1234 opened this issue Oct 18, 2016 · 21 comments
Closed

unsampled report downloads #44

MarkEdmondson1234 opened this issue Oct 18, 2016 · 21 comments

Comments

@MarkEdmondson1234
Copy link
Collaborator

Fetch from management API for 360 properties

@j450h1
Copy link
Contributor

j450h1 commented Nov 22, 2017

I can work on this one before the end of the year. Feel free to assign it to me.

@MarkEdmondson1234
Copy link
Collaborator Author

@j450h1 that would be great! There are several management API functions that have been requested, mostly those that help setup (like editing filters) so if you are looking for more just say!

@j450h1
Copy link
Contributor

j450h1 commented Dec 18, 2017

See PR #139. A couple things to note:

  • User must authenticate with the additional google drive scope. I don't know where this should be mentioned?

options(googleAuthR.scopes.selected = c("https://www.googleapis.com/auth/analytics",
"https://www.googleapis.com/auth/drive"))

  • I'm aware of the bug, when the progress bar overwrites the text if there are multiple reports with the same report title (I also made the decision to request the title (which may not be unique) vs the id - not sure if that's the best way, just thinking from the user's perspective, would they get the name from the UI or would they list all reports first in R - in which case the id might be better?)

  • This function currently only supports files < 25 mb, however I do have the pseduocode which I think should work (maybe I'll get to it in the new year):

    • if file is over 25mb, save respose as html instead of csv
    • parse that html for the "confirm" link
    • download with that confirm link, saving as csv

@MarkEdmondson1234
Copy link
Collaborator Author

@j450h1 did you look into using googledrive as a method of downloading? It may mean some code can be taken out if we depend on that.

@j450h1
Copy link
Contributor

j450h1 commented Jan 20, 2018

Yes, I explored that option. I believe the problem was v3 of the API only allows you to export Google Doc type files (mimeType): https://developers.google.com/apis-explorer/#p/drive/v3/drive.files.export and I believe that package uses v3. I think another issue was that file was not in "my" google drive which is what the functions in that package allow you to download/upload files from. Nonetheless, I obviously tried to go with a simple approach first.

@MarkEdmondson1234
Copy link
Collaborator Author

Working with it now, and to handle the extra scope issues I don't want to add the general scope to the whole package, and just have some documentation to add the scope when needed, and a check at the start of the download that checks options(googleAuthR.scopes.selected) that raises an error if the drive scope isn't present.

Also I'd like the ga_unsampled_download to do a little less: only download a file of reportTitle, so it can chain with ga_unsampled_list i.e. make the output of ga_unsampled_list easier to work with, and then pass that to ga_unsample_list(reportTitle) so it requires less arguments.

@j450h1
Copy link
Contributor

j450h1 commented Jan 20, 2018

Sure sounds good to me. Whatever will make it easier to work with.

@j450h1
Copy link
Contributor

j450h1 commented Jan 27, 2018

Just following up on this one, do you want me to remove the save to dataframe option or are you going to take care of it?

@MarkEdmondson1234
Copy link
Collaborator Author

Didn't want to remove the save to dataframe option, but rather change ga_unsampled_list() so the parsing currently done at the start of ga_unsampled_download() is unnecessary.

That way it will only need these arguments:

ga_unsampled_download <- function(reportTitle, 
                                  file=sprintf("%s.csv", reportTitle), 
                                  downloadFile=TRUE)

...and could be chained and looped via something like:

library(googleAnalyticsR)
library(tidyverse)

## download all unsampled reports
ga_unsampled_list(accountId, webPropertyId, profileId) %>%
   select(title) %>% 
   map(ga_unsampled_download)

Some more documentation and examples won't hurt either.

@j450h1
Copy link
Contributor

j450h1 commented Feb 3, 2018

Okay, so " a check at the start of the download that checks options(googleAuthR.scopes.selected) that raises an error if the drive scope isn't present." is done.

Regarding, the 2nd point, I have changed the ga_unsampled_list to return a dataframe. This might require updating other code that uses this function.

However, it is working quite well for this ga_unsampled_download function. I'm having a little trouble using walk2 and map2 as you can see it is split up when not needed (technically map2 isn't needed, but just in case the user enters the title when they want the dataframes - even though that argument won't be used) in a pipe, but I managed to get these two test cases working for now:

library(tidyverse)
## download all unsampled reports and create a list of dataframes
test <- ga_unsampled_list(accountId, webPropertyId, profileId) %>%
  select(driveDownloadDetails, title) %>% 
  na.omit() 

#download
walk2(unlist(test$driveDownloadDetails), 
      unlist(test$title),
      ga_unsampled_download)

#list of dataframes
dataframes_test <- map(unlist(test$driveDownloadDetails), 
      ga_unsampled_download, downloadFile=FALSE)

@j450h1
Copy link
Contributor

j450h1 commented Feb 3, 2018

If a user wanted to download just 1 report:

reportTitle <- "googleanalyticsR_test_download"

small <- ga_unsampled_list(accountId, webPropertyId, profileId) %>%
  filter(title == reportTitle) %>%
  select(driveDownloadDetails) %>% 
  na.omit() %>%
  walk(ga_unsampled_download)

This is what I mean, by how walk2, map2, or walk like this one should all be in one pipeline (not sure if there is a better word?). I just couldn't get it working, maybe you know the correct syntax? This one only works because it is 1 item, if it wasn't filtered it wouldn't work properly. So I realize it definitely needs more testing.

Also by default right now, the filename is the documentId/driveDownloadDetails (example: 19dydgPj1A9L7QRDgNvE6qTNcXN0rqBxj.csv), unless user selects the title column which should be best practice when downloading (hence use of walk2), while when creating a list of dataframes it is not needed/used.

@j450h1
Copy link
Contributor

j450h1 commented Feb 3, 2018

The downside of this approach is if the user doesn't want to use pipes, he/she will need to enter the driveDownloadDetails and can't just enter the report name. That was the advantage of the old approach. But I guess, if its documented to review the list first and choose the driveDownloadDetails or maybe most people are using %>% now.

@MarkEdmondson1234
Copy link
Collaborator Author

Its perhaps easier to download the files as they are named in the GA UI (perhaps with date too) then show an example on how to rename them if they want, that way you don't have to juggle two argument loops.

My overall strategy over time has been to try and get each function to be as useful but do as little as possible, with as many sensible defaults for beginners but the arguments in there for advanced users if they want it, as its the most flexible and easiest to maintain.

If its just one report, I guess they could just read the reportTitle from the UI, and pass that in:

ga_unsample_download(what_I_copied_from_ui)

How come you needed na.omit()in your example, is that something that could perhaps be within ga_unsampled_list()?

@j450h1
Copy link
Contributor

j450h1 commented Feb 5, 2018

I think that approach makes a lot of sense.

Regarding na.omit(), it is required because the list includes what appear to be standard reports not created by the user and therefore no download details. Yes, I can include it as part of the ga_unsampled_list. Should a tibble be returned instead of a dataframe? Just trying to be consistent with the rest of the repo.

@j450h1
Copy link
Contributor

j450h1 commented Feb 5, 2018

Regarding allowing the user to enter the ReportTitle, if we go with that approach it will either require all the same arguments as ga_unsampled_list (as was originally done) or the returned dataframe/tibble from this function. So their is a tradeoff because you can enter the reportTitle which will essentially just filter the dataframe, however then you cannot simply pipe the reportTitle as mentioned. Looks like the simpler approach for the user is to allow the reportTitle be entered, so I will change it back to that approach. What do you think of requiring the same arguments of ga_unsampled_list or just requiring the returned df/tibble?

@MarkEdmondson1234
Copy link
Collaborator Author

Should a tibble be returned instead of a dataframe? Just trying to be consistent with the rest of the repo.

I'm not sure it is consistent in the package, but IIRC it should fall back to data.frame from tibble if they don't have it loaded - the ga_account_summary() does that.

It doesn't necessarily have to be piped, in fact a motive is to allow other ways that we haven't thought of to cover. A base R example may be:

library(googleAnalyticsR)

## download all unsampled reports
unsample_df <- ga_unsampled_list(accountId, webPropertyId, profileId)

# you need the title to pass in to ga_unsampled_download
lapply(unsample_df$title, ga_unsampled_download)

@j450h1
Copy link
Contributor

j450h1 commented Feb 8, 2018

Ok thanks for clarifying. I think I know what updates to make now.

@j450h1
Copy link
Contributor

j450h1 commented Feb 8, 2018

Give this a crack and let me know. Note: accountId, webPropertyId, profileId are required again because I have to call ga_unsampled_list within ga_unsampled_download. The only other option I see is to pass the dataframe/tibble object itself (similar to what I was doing before this) and the user cannot then enter a reportTitle if they wanted to do that. Anyways here it is:

# Download multiple reports with lapply
## download all unsampled reports
unsample_df <- ga_unsampled_list(accountId, webPropertyId, profileId)
lapply(unsample_df$title, ga_unsampled_download, accountId, webPropertyId, profileId)

library(tidyverse)
# Download multiple reports with pipes
ga_unsampled_list(accountId, webPropertyId, profileId) %>%
  select(title) %>%
  unlist() %>%
  map(ga_unsampled_download, accountId, webPropertyId, profileId)

@j450h1
Copy link
Contributor

j450h1 commented Feb 8, 2018

# Download 1 file
reportTitle <- "googleanalyticsR_test_download" #user can enter this without having to explicitly call ga_unsampled_list first and then possibly filtering if we went with 2nd option of using dataframe/tibble as argument instead of reportTitle and 3 other things
ga_unsampled_download(reportTitle,
                      accountId,
                      webPropertyId,
                      profileId)

MarkEdmondson1234 added a commit that referenced this issue Feb 8, 2018
unsampled report downloads #44
MarkEdmondson1234 added a commit that referenced this issue Feb 8, 2018
@j450h1
Copy link
Contributor

j450h1 commented May 17, 2018

Hey Mark. Just wondering if this issue can be closed or is there a reason it is still open? I'm happy to clear up any loose ends if there are any!

@MarkEdmondson1234
Copy link
Collaborator Author

I just forgot to close it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants