-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pagination #30
Comments
That's a good question. This is not a particularly strong opinion, but all else being equal, I would prefer to let pagination be handled downstream. There are a few reasons for this:
I would have assumed that GitHub itself wouldn't be fans of auto-pagination because it will hit their servers more often than necessary, but octokit is written by GitHubbers, so I'm clearly wrong there. So let's look at it from the other side, and assume we want to implement auto-pagination. What would it look like in the API? Option 1) We could make it a named parameter that is interpreted in a special way. This would mean that code which calls At the same time, I like that the base implementation of So the solution is to find all the HTTP entry points that return paginated results, and write a wrapper. Something like Option 2) For all entry points that return paginated results, create a new entry point that does auto-pagination. so you'd have Which would you prefer? |
This is not a proper response, but adding another link where the GitHubbers really dig into the details of Traversing with pagination. This seems relevant to the discussion regardless. |
I haven't done enough of this manually yet to say how deeply I want this integrated into |
Your mention of the idiom made me think of a general “pagination wrapper” call which uses “computing-on-the-language” (to use Hadley’s parlance) to do the right thing. Something like this:
I think that should be doable, looks elegant, and wouldn’t require changing the base code at all. What do you think? |
I think that is great idea! I will try to turn my evolving "manual" approach into that. I have already written helpers to parse the link headers, which will be generally useful for the |
Maybe this is a simple question, but I can't figure it out. How do I implement a "manual" approach? I'm trying to get a list of all files changed by a specific pull request: How can I access them all? |
You will have to do all of this "manually". I have written some functions to facilitate this but this is all in my own local fork of this package and is very rough. Here's the sketch of what I do. In my case, I was trying to list repositories, but I would imagine the same thing works for pull requests. First, max out the number of repos per page, i.e. set it to 100, and see if that gets everything. repos <- get.organization.repositories(org = "STAT545-UBC", per_page = 100) In the repos$headers$link If it does not, you have gotten everything! YAY! You're done. If it does, you absolutely must traverse pages. Boo. We continue. I get the total number of repositories by asking for all repositories, setting per_page to 1: repos <- get.organization.repositories(org = "STAT545-UBC", per_page = 1) Then you must parse the information in the You go back to asking for 100 repos per page. Determine how many pages the results will be broken into. Do this inside some iterative structure. Request the repos with per page set to 100, parse the link field of the headers to get the URL for the next page, Repeat until you're done. Then you also have to glue all the results together again manually, which was my motivation for this question I put out on Twitter. In short, this is possible but a total pain in the butt. |
I don't think the @jennybc, would you mind pointing to where in your forked repo you have the workaround? |
@aronlindberg I haven't pushed that stuff to my fork on GitHub. It's in a truly embarrassing state. If you look at the header of the example of listing pull requests via the API, you'll see there are indeed the usual "next" and "last" links: https://developer.github.com/v3/pulls/#list-pull-requests so obviously pagination happens, as of course you have already discovered. Looking at the relevant function in this package, I see there is no way for the user to pass Lines 84 to 85 in 153bde1
Contrast that to the function for requesting a list of organization repositories, which uses the Lines 35 to 36 in 153bde1
Best case, the definition of |
I tried adding
However, calling https://developer.github.com/v3/pulls/#list-pull-requests-files and it looks like I should be able to pass the |
isn't it |
Neither works...Shouldn't I be able to pass both of them once the |
I'm going to try changing def'n of |
OK that worked. I'll make a PR to @cscheid in a moment. Here you can prove it to yourself: library(github)
# I store my access token in an environment variable
ctx <- create.github.context(access_token = Sys.getenv("GITHUB_TOKEN"))
## using rgithub "as is"
req <- get.pull.request.files(owner = "rails", repo = "rails", id = 572)
length(req$content) # by default, 30 items are retrieved
req$headers$link # I can see 30 items per page will imply we need 3 pages
## new definition of get.pull.request.files()
## I will make a PR to cscheid/rgithub momentarily
jfun <- function(owner, repo, id, ..., ctx = get.github.context())
github:::.api.get.request(ctx, c("repos", owner, repo, "pulls", id, "files"),
params=list(...))
jreq <- jfun(owner = "rails", repo = "rails", id = 572, per_page = 3)
length(jreq$content) # yep, it's honoring my request for 3 per page
jreq$headers$link # at this rate, we'll need 23 pages |
Cool! =) So I was using: while you are using is it then that the token allows you to pass |
No I don't think this has anything to do with our different approaches to authentication. When you redefined |
Aaaah! I see, yes, it works with my authentication too! The dots were just sooooo small, so I missed them. =) Thanks! |
@aronlindberg I've made a PR now (#39). This change is reflected in my fork, in case you're in a huge hurry. You could install from my fork to get the functionality and switch back once it's merged. |
@jennybc - thanks, I was able to implement the same change in my own fork. Thanks for having a dialogue about it, it helps me understand how these functions are written better. Now I can start trying to construct a manual pagination approach for the script I'm writing. |
@aronlindberg Here's the function I have in my "private and embarrassing" branch I haven't pushed that's a good start on digesting link headers for the depagination operation 😄 https://gist.github.com/jennybc/862a01dc9243118d83c9#file-digest_header_links-r As you can see, I use some functions from |
Thanks @jennybc that's helpful! It seems that since |
I thought it was 100. Are you actually getting 1000 items in one request? |
Whoops. I think it's pretty clear that the limit is 100: https://developer.github.com/guides/traversing-with-pagination/ I'll have to start digging into your manual mechanism... |
A rather messy example (mine) of how to implement @jennybc 's function can be found here: https://gist.github.com/aronlindberg/2a9e9802579b2d239655 |
Glad you are making progress @aronlindberg! I think that will work for your specific application (pull request files). However, to build something into this package, you actually have to rely on the "next" links, rather than requesting specific pages via
So, using a |
Just skimming, but I don't think you should need computing on the language to do this - if the response object was a bit richer, I think you should be able to do autopagination by inspecting the request metadata. |
I am now enjoying the page traversal @gaborcsardi put into https://github.com/gaborcsardi/gh/blob/master/R/pagination.R |
Oh wait, any reason we're using EDIT: That one is a little too low-level for my taste. So we'll leave with |
Maybe rgithub could rely on gh for the lowlevel stuff? |
Just wanted to chime in. I did create a simple paginator for I used it a lot last few months and it seems to work reliably. It could be used as follows: # without paginator
first_30_forks <- get.repository.forks(owner_name, repo_name)
# with paginator
all_forks <- auto.page( get.repository.forks(owner_name, repo_name) ) Here is the code of that future pull request (given that it is short I am typing it here as a comment as well) # Make automated paging till response is empty
auto.page <- function(f) {
f_call <- substitute(f)
stopifnot(is.call(f_call))
i <- 1
req <- list()
result_lst <- list()
repeat {
# Specify the page to download
f_call$page <- i
req <- eval(f_call, parent.frame())
# Last page has empty content
if (length(req$content)<=0) break
result_lst[[i]] <- req$content
i <- i+1
}
result_req <- req
result_req$content <- unlist(result_lst, recursive = FALSE)
(result_req)
} |
@cscheid great! Would love to hear the feedback! One word of caution:
Say get.organization.teams <- function(org, ctx = get.github.context())
.api.get.request(ctx, c("orgs", org, "teams")) However, get.organization.teams2 <- function(org, ctx = get.github.context(), ...)
.api.get.request(ctx, c("orgs", org, "teams"), params = list(...)) The latter would work perfectly with |
One observation from the pagination workaround I was using with this package: Once you deal the #40 problem, you can make an initial request with |
@jennybc Your points are well-taken! Unfortunately, that approach did not work for me (we are using Github Enterprise): |
That's weird! No I haven't had that problem on Github.com. In Traversing with Pagination they try to scare us away from asking for specific pages by number:
but ... I've never accessed an exotic endpoint w/ non-numeric pagination. Sounds like you have something that works 😃. |
Hello, is it me or the pagination doesn't work with |
Yes, this is an issue. As you can see, I've been unable to work on this for a long time now. I would really welcome a pull request if you have one. Sorry about that. |
Closed (wontfix) by #70. |
Have you thought about pagination at all?
https://developer.github.com/v3/#pagination
I'm not sure what the conventions are for API wrappers and assisting the user to fetch multiple or all pages. But the Ruby wrapper for the GitHub API talks about this in its README:
I'm thinking about this because I'm using
get.my.repositories()
and have realized that by default I'm getting the only the first 30. Before I start playing around with explicit requests for specific pages, I'm wondering if you are contemplating adding any auto pagination to your package…..The text was updated successfully, but these errors were encountered: