Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output a single JSON document from api --paginate #1268

Closed
carnei-ro opened this issue Jun 25, 2020 · 29 comments · Fixed by #7190 or #8620
Closed

Output a single JSON document from api --paginate #1268

carnei-ro opened this issue Jun 25, 2020 · 29 comments · Fixed by #7190 or #8620
Labels
enhancement a request to improve CLI gh-api relating to the gh api command help wanted Contributions welcome

Comments

@carnei-ro
Copy link

First of all, thanks for the --paginate flag. Although I'd like to get one array of objects instead. Today I'm using jq to do the work:

gh api --paginate repos/me/my-repo/pulls/5/files | jq 'reduce inputs as $i (.; . += $i)'
@carnei-ro carnei-ro added the enhancement a request to improve CLI label Jun 25, 2020
@mislav
Copy link
Contributor

mislav commented Jun 26, 2020

Thanks for the feedback! I think reducing is a good idea, but it might be tricky to achieve for GraphQL requests. That's why we currently dump raw JSON responses without changing them.

@vilmibm vilmibm added the core This issue is not accepting PRs from outside contributors label Oct 7, 2020
@jglick
Copy link

jglick commented Nov 9, 2020

At least document the current behavior. I would have assumed that --paginate does the intuitive thing of giving you one aggregate list (the way hub4j/github-api does for example). Instead

$ gh api -XGET --paginate /repos/:owner/:repo/pulls -F state=closed | jq ' . | length'
100
100
100
54

@mislav mislav changed the title Auto reduce when use --paginate Output a single JSON document from api --paginate Jan 11, 2021
@mislav mislav added help wanted Contributions welcome and removed core This issue is not accepting PRs from outside contributors labels Jan 11, 2021
@heaths
Copy link
Contributor

heaths commented Aug 25, 2021

Will GraphQL always output a JSON object? If so, could graphql --paginate wrap the output with [ and ] and insert a comma between each JSON object it outputs? At least it would come back as valid JSON. Merging could be an exercise left to the caller, but since the schema should always be the same it may be great - perhaps opting in e.g., --merge - to merge them internally. Afterall, it's cmd/api/api.go that best knows when it's processing each response.

@mislav
Copy link
Contributor

mislav commented Aug 25, 2021

Individual API responses during REST pagination always return JSON arrays. We can easily concatenate those arrays into a single one.

Individual API responses during GraphQL pagination will return something like this:

{
  "data": {
    "repository": {
      "branchProtectionRules": {
        "nodes": ["PAGINATED DATA APPEARS HERE"],
        "pageInfo": {
          "hasNextPage": false,
          "endCursor": "Y3Vyc29yOnYyOpK0MjAyMC0wNS0yN1QxMjowNDo1N1rOAPZ2gA=="
        }
      }
    }
  }
}

If we concatenate each of these responses as a whole into a JSON array, it will significantly modify the overall structure of the response and might be confusing to parse. Instead, we should probably identify the nested array that's being paginated and concatenate all results into that one. In the example above, the paginated array is .data.repository.branchProtectionRules.nodes.

As a workaround until this is solved, I would use the --jq filter to isolate only the nested array of each response and then pipe it to jq --slurp again:

# should produce a single JSON array of all paginated result concatenated:
gh api graphql -f query='QUERY' --paginate --jq '.data.repository.branchProtectionRules.nodes[]' | jq -s

@heaths
Copy link
Contributor

heaths commented Aug 25, 2021

I had considered that, but jq may not be installed. It wasn't in Ubuntu 18.04, for example. I know gh relies on git's installed bash shell for invocation. Is it installed in there?

After playing around with it a bunch last night, I discovered a solution using bytes.SplitAfter to split the individual responses (can work for GraphQL or REST responses).

As for backward-compatible support - since templates (maybe) or processing raw responses (definitely) will change - a switch to opt into this behavior would be ideal.

As for cleaning up the pageInfo, I don't even think it's be necessary as long as valid JSON is returned. On my coffee, for example, I can simply turn Data into an array, take out the splitter, and it'd just work.

@ldelossa
Copy link

ldelossa commented May 14, 2022

Hey, I've been working on https://github.com/ldelossa/gh.nvim.

This Neovim plugin uses the gh CLI tool extensively to retrieve pull request data.
In Neovim, there's not really a great way to deal with json data streams.

Ideally, if you specify "--paginate" we recieve one json object with all the data that can be iterated.

Another solution would be to allow the a "slurp" flag to the --jq flag.
I'd be dissapointed if I have to make the users of gh.nvim depend on the external jq binaries due to this. Its also a bit of a pain to setup a a stdin/stdout pipe flow from gh -> jq -> neovim to pipe the gh --paginate output to jq.

I'm not seeing a great way to paginate manually with the gh CLI tool as of today either, is that correct? I.e. I do not get any headers or response data indicating how many more pages I should call for, nor a way to start the request at an offset?

@heaths
Copy link
Contributor

heaths commented May 15, 2022

I started working on a possible solution. Basically, a utility function that will merge JSON by appending arrays and overwriting properties. Effectively,

  1. Any arrays are merged into a single array.
  2. The hasNextPage ends up being false because it's the last one. While this probably isn't that big a deal for anyone passing --paginate, seems fitting.

This also has the benefit that will fix any template using {{tablerender}} explicitly. Consider the following:

gh api graphql --paginate -f owner=cli -f name=cli -f query='query($owner:String!,$name:String!,$endCursor:String){repository(owner:$owner,name:$name){labels(first:10,after:$endCursor){nodes{name,description},pageInfo{hasNextPage,endCursor}}}}' --template '{{range .data.repository.labels.nodes}}{{tablerow .name .description}}{{end}}{{tablerender}}'

This renders misaligned columns:

bug               Something isn't working
tracking issue    
blocked           
needs-design      An engineering task needs design to proceed
enhancement       a request to improve CLI
windows           
needs-user-input  
mac               
linux             
tech debt         A chore that addresses technical debt
packaging            
p2                   Affects more than a few users but doesn't prevent core functions
p3                   Affects a small number of users or is largely cosmetic
p1                   Affects a large population and inhibits work
docs                 
needs-investigation  CLI team needs to investigate
help wanted          Contributions welcome
config               
checks               
gist                 
auth              related to tokens, authentication state, or oauth
hackday           PRs that came out of a Hack Day
accessibility     
feedback          
actions           
core              This issue is not accepting PRs from outside contributors
good first issue  
prompts           
extensions        
dependencies      Pull requests that update a dependency file
go                      Pull requests that update Go code
needs-triage            needs to be reviewed
hacktoberfest-accepted  
discuss                 Feature changes that require discussion primarily among the GitHub CLI team
platform                Problems with the GitHub platform rather than the CLI client
external                pull request originating outside of the CLI core team
extension idea          An idea that could make a good GitHub CLI extension
github_actions          Pull requests that update GitHub Actions code

The workaround is simple enough: don't use {{tablerender}} and let the implicit rendering I added occur when the template rendering is complete. But that may not be obvious.

So I'm thinking that processResponse in pkg/cmd/api/api.go needs to be refactored to return individual responses that the caller can merge if and only if --json or --template is specified. This would work for both REST and GraphQL.

@ldelossa
Copy link

Im not familiar with graphql pagination or "rendertable" does this effect me if I simply want a aggregated array of pagination responses?

One other question is, how does the "--paginate" flag work with nested Graphql queries? For instance, I typically retrieve 100 pullRequestReviewThreads which themselves request 100 pullReviewRequestComments - would the inner comments be paginated also?

@heaths
Copy link
Contributor

heaths commented May 16, 2022

My reply was intended for @mislav, about a possible solution. As he mentioned, for now you can pipe the output to jq -s to merge the arrays. I hope to do this within the gh api implementation itself, effectively.

heaths added a commit to heaths/cli that referenced this issue May 16, 2022
heaths added a commit to heaths/cli that referenced this issue May 17, 2022
heaths added a commit to heaths/cli that referenced this issue May 17, 2022
heaths added a commit to heaths/cli that referenced this issue May 17, 2022
heaths added a commit to heaths/cli that referenced this issue May 26, 2022
heaths added a commit to heaths/cli that referenced this issue Jun 4, 2022
zerowidth added a commit to zerowidth/dotfiles that referenced this issue Jul 11, 2022
@naikrovek
Copy link

As a workaround until this is solved, I would use the --jq filter to isolate only the nested array of each response and then pipe it to jq --slurp again:

# should produce a single JSON array of all paginated result concatenated:
gh api graphql -f query='QUERY' --paginate --jq '.data.repository.branchProtectionRules.nodes[]' | jq -s

This is everything I needed right here... I think... It certainly solves my immediate need. Thank you.

heaths added a commit to heaths/cli that referenced this issue Dec 12, 2022
@heaths
Copy link
Contributor

heaths commented Jun 8, 2023

@justin-octo that's a good idea. @mislav could we use my code changes behind an opt-in --merge (slurp probably doesn't have a lot of meaning across different spoken languages) so that current behavior - though I can't imagine how current behavior is useful nor how it would break even if passed to jq --slurp - isn't broken?

@heaths
Copy link
Contributor

heaths commented Jun 9, 2023

mislav closed this as completed in https://github.com/cli/cli/pull/7190[2 hours ago](#1268 (comment))

@mislav should this really be closed, though? GraphQL responses still have this problem and given that's what GitHub documents as preferred over REST, shouldn't we still tackle this problem like we had previously discussed? Should I open a separate issue to track it just for GraphQL responses instead?

@samcoe samcoe reopened this Jun 10, 2023
@mislav
Copy link
Contributor

mislav commented Jul 11, 2023

@heaths Sorry for prematurely closing. I set for myself a goal to first solve the pagination for REST requests, and only later think about a potential approach to GraphQL ones that satisfies the streaming requirement. However, I will not work on this solution anymore, as I have left GitHub and the CLI team some weeks ago.

I think that a proper solution to GraphQL pagination wouldn't just solve the JSON array problem, but also alleviate the need for the GraphQL query to be specially prepared for CLI pagination. Ideally, the user should just be able to supply the "selector" within a GraphQL query to be paginated, and gh api would:

  • add the pageInfo{nextNextPage,endCursor} block to the query;
  • add the $endCursor query input to subsequent, paginated queries;
  • output the paginated items at the defined selector as a single JSON array.

Implementing this would be non-trivial, but I think that GraphQL pagination should either have a full solution, or no solution (i.e. either keep exactly what we have now, or delegate a pagination implementation to the user of gh api).

@justin-octo
Copy link

Sorry to see you go! Hope you have a great summer and great success in your future endeavors.

I would love to see this implemented if anyone else on the team is available. If they are please hand it off to them :)

If not, I have been using --paginate and JQ with success using the following extremely long command line (on Mac):
gh api --method=GET search/code --paginate -f q='gruntwork-io/terraform-aws-security.git org:your_org_here' | jq '.items | map({ repo_name: .repository.name, file_name: .path })' | jq 'reduce inputs as $i (.; . += $i) | reduce .[] as $d (null; .[$d.repo_name] += [$d.file_name])' | jq -s

@heaths
Copy link
Contributor

heaths commented Jul 11, 2023

@mislav I echo what @justin-octo says. Thanks for all your hard work on the CLI over the years.

@justin-octo I do have a PR open that solves this for both JSON and GraphQL in a way, though it changes streaming; however, I don't think that's a problem given - without jq -s or my extension that basically does what my PR does - it was already broken. I think the major concern was using an external module, though it seems fairly well-used. If @samcoe or @vilmibm or whoever would prefer, I could write something similar (or lift and attribute, or vendor the module) in the CLI. Maintaining a behavior for a scenario that was already broken (and only when it was already broken) doesn't seem concerning.

@heaths
Copy link
Contributor

heaths commented Jul 11, 2023

@mislav wrote,

Ideally, the user should just be able to supply the "selector" within a GraphQL query to be paginated, and gh api would:

  • add the pageInfo{nextNextPage,endCursor} block to the query;
  • add the $endCursor query input to subsequent, paginated queries;
  • output the paginated items at the defined selector as a single JSON array.

Conceptually I like it, but seems the only robust way to do that is to hit the service once to get the schema based on the user query to find where (and probably err on more than one) the pagination response schema should go. Or should the CLI even cache the full schema? Are etags (or conditional requests in general) support? Might be overkill for this, though. The schema is fairly large.

@andyfeller
Copy link
Contributor

Summarizing meeting between @heaths, @samcoe, @williammartin and myself:

  1. @heaths is going to work on bifurcating the new --paginate-all/--paginate-slurp capability being developed here into a separate code path than --paginate to avoid breaking any existing integrations using it as well as homing the usage docs close together
  2. @samcoe @williammartin @andyfeller will prioritize this PR for review and feedback to resolve effectively
  3. When we consider future 3.0 plans, that might be a good time to make this the default behavior
  4. We are all on the same page about ensuring no modifications to underlying JSON responses from GitHub REST and GraphQL APIs aside from stitching them together
  5. Including in the help usage docs that this is an experimental feature, which we can later graduate once we get usage and feedback from the community

Once again, big thanks to @heaths for his diligence in addressing this recurring need for the GitHub CLI community! ✨ :fishsticks:

@heaths
Copy link
Contributor

heaths commented Dec 12, 2023

I'll also add a bevy of tests for mergo to make sure we don't regress any behavior now or later e.g., date-time strings should never change format. It's unlikely already since we're just working with []interface{} or map[string]interface{}, but tests will help ensure that.

heaths added a commit to heaths/go-gh that referenced this issue Jan 25, 2024
heaths added a commit to heaths/cli that referenced this issue Jan 25, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
heaths added a commit to heaths/cli that referenced this issue Jan 31, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
heaths added a commit to heaths/cli that referenced this issue Feb 2, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
heaths added a commit to heaths/cli that referenced this issue Feb 15, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
heaths added a commit to heaths/cli that referenced this issue Mar 12, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
heaths added a commit to heaths/cli that referenced this issue Apr 4, 2024
Partly resolves cli#1268 and replaces cli#5652. Requires cli/go-gh#148 to be merged and optionally released.
@jglick
Copy link

jglick commented Apr 17, 2024

My example from #1268 (comment) now just works in 2.48.0. Thanks!

@carnei-ro
Copy link
Author

thank you guys

@williammartin
Copy link
Member

williammartin commented Apr 17, 2024

My example from #1268 (comment) now just works in 2.48.0. Thanks!

Reading your example and reproducing targeting go-gh:

➜  ~ gh api -X GET --paginate /repos/cli/go-gh/pulls -F state=closed | jq ' . | length'
100

I was very confused about whether we'd broken something in pagination. Turns out go-gh actually has exactly 100 closed PRs. 😅

Thanks for the feedback!

@jglick
Copy link

jglick commented Apr 17, 2024

Just as a stress test, I checked on jenkinsci/jenkins and (after a while) got 9059, which sounds about right.

@heaths
Copy link
Contributor

heaths commented Apr 17, 2024

My example from #1268 (comment) now just works in 2.48.0. Thanks!

Reading your example and reproducing targeting go-gh:

➜  ~ gh api -X GET --paginate /repos/cli/go-gh/pulls -F state=closed | jq ' . | length'
100

I was very confused about whether we'd broken something in pagination. Turns out go-gh actually has exactly 100 closed PRs. 😅

Thanks for the feedback!

FWIW, technically that was using Nate's merging of only REST array responses done previously, and does not use anything of the my PR that was just merged. Good to know it wasn't regressed, but the command should've worked already.

@williammartin
Copy link
Member

FWIW, technically that was using Nate's merging of only REST array responses done previously, and does not use anything of the my PR that was just merged. Good to know it wasn't regressed, but the command should've worked already.

Yeh I just bisected it because it occurred to me that nothing should have changed without the addition of --slurp and I got a bit worried. Indeed I tracked it to #7190 which went into https://github.com/cli/cli/releases/tag/v2.31.0. Crisis averted 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement a request to improve CLI gh-api relating to the gh api command help wanted Contributions welcome
Projects
None yet