Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using the Git Data API to support larger files #3

Open
JamesMGreene opened this issue Sep 22, 2021 · 5 comments · May be fixed by #9 or #23
Open

Consider using the Git Data API to support larger files #3

JamesMGreene opened this issue Sep 22, 2021 · 5 comments · May be fixed by #9 or #23

Comments

@JamesMGreene
Copy link

During a recent workflow run, I attempted to use the checkout-files Action to checkout a fairly large package-lock.json file (npm lockfile) weighing in at about 2 MB. I received this unhandled promise rejection:

Run Bhacaz/checkout-files@c8f01756bfd894ba746d5bf48205e19000b0742b
(node:1939) UnhandledPromiseRejectionWarning: HttpError: This API returns blobs up to 1 MB in size. The requested blob is too large to fetch via the API, but you can use the Git Data API to request blobs up to 100 MB in size.: {"resource":"Blob","field":"data","code":"too_large"}
    at /home/runner/work/_actions/Bhacaz/checkout-files/c8f01756bfd894ba746d5bf48205e19000b0742b/node_modules/@octokit/request/dist-node/index.js:66:23
    at processTicksAndRejections (internal/process/task_queues.js:93:5)

You should consider using the Git Data API to support downloading larger files. 📦

@sun
Copy link
Contributor

sun commented Oct 18, 2021

How do we get the value of file_sha for https://docs.github.com/en/rest/reference/git#get-a-blob?

The current action uses getContent(), which accepts the file path.

@JamesMGreene
Copy link
Author

JamesMGreene commented Nov 17, 2021

I had to go a little digging to explore that question! I've identified at least 2 viable ways.

Given that both of these approaches result in extra API calls, it might also be worthwhile to keep the current approach as the primary one, and only utilize this secondary approach if the request fails with a 403 status code. That would mean a bit more code to maintain but is probably the most optimal approach for most use cases. 🤷🏻

Combining the Repository Contents and Git Data APIs

  1. Reduce the list of requested file paths into a unique list of file path parent directories
  2. Get the repo contents for each parent directory path (instead of for each file path)
  3. In each response, find those files with matching path values and grab their sha property (or just the full GitHub Data Blob API URL from the _links.git property if you don't want to dynamically build the URL)
  4. Get a blob for each entry, still converting the responses from base64 as current

Using only the Git Data API

  1. Get a tree for the current branch/sha
  • Would require PR Added input for branch name. #6, or analysis of the GitHub Actions event data to get the branch/sha/repository default_branch (or PR base branch, perhaps?)
  • If any of the requested file paths are not in the root directory, then you must add the ?recursive=true query param, or else make multiple queries to get individual trees based on the first response (especially if the response has truncated: true)
  1. For the entries in the response's tree array, find those with matching path values and grab their sha property (or just the full GitHub Data Blob API URL from the url property if you don't want to dynamically build the URL)
  2. Get a blob for each entry, still converting the responses from base64 as current

@JamesMGreene
Copy link
Author

JamesMGreene commented Jun 1, 2022

Here's an updated and easier option! 🎉

https://github.blog/changelog/2022-05-03-increased-file-size-limit-when-retrieving-file-contents-via-rest-api/

TL;DR: Keep doing things as you are today, but just set this custom media type on the file retrieval request headers:

Accept: application/vnd.github.v3.raw

JamesMGreene added a commit to JamesMGreene/checkout-files that referenced this issue Jun 1, 2022
@JamesMGreene JamesMGreene linked a pull request Jun 1, 2022 that will close this issue
@JamesMGreene
Copy link
Author

Created PR: #9

@jordanmnunez
Copy link

@sun + @Bhacaz, I have opened a PR to utilize the raw endpoint that @JamesMGreene mentioned: #23

Any interest in merging this into the action? If not, I will promote this to my own public action.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants