Use Azure storage SDK to download cache#497
Conversation
joshmgross
left a comment
There was a problem hiding this comment.
Overall looks good to me, left some comments around making this option more friendly to other action authors.
Could we add some more context to the PR description? Would be good to have some background and link to the actions/cache issue that this relates to
| const disableAzureSdk = process.env['DISABLE_AZURE_SDK'] ?? '' | ||
|
|
||
| if ( | ||
| archiveUrl.hostname.endsWith('.blob.core.windows.net') && |
There was a problem hiding this comment.
Do we have any guarantees that all blob storage downloads will be at this address?
There was a problem hiding this comment.
Guarantee? No. This is what they've always used and I don't see it changing anytime soon. If it were to change, we would just fall back to http-client.
|
@joshmgross Any idea why the toolkit When the action runs fine in the cache tests? |
Looks like it was just a type issue with |
| "devDependencies": { | ||
| "@types/jest": "^24.0.11", | ||
| "@types/node": "^11.13.5", | ||
| "@types/node": "^12.12.47", |
There was a problem hiding this comment.
@thboop @ericsciple FYI I bumped this package version since all actions run on Node 12 and it was causing issues in the test, see #497 (comment)
Restructure files and add tests
|
@joshmgross Can you please take one more pass? My last commit added some testing. In order to make it work with
If you're good, then this should be ready to merge (I don't have write permission on this repo so I can't merge). |
joshmgross
left a comment
There was a problem hiding this comment.
Changes look good to me, but moving the exported functions to new files will break existing imports. We can either bump the major version or keep the functions where they are and have them call the new function locations.
This package is well tested, so I'm fine with releasing a version 1
| retryOptions: { | ||
| // Override the timeout used when downloading each 4 MB chunk | ||
| // The default is 2 min / MB, which is way too slow | ||
| tryTimeoutInMs: options?.timeoutInMs ?? 30000 |
There was a problem hiding this comment.
I think we should do something similar to https://github.com/actions/toolkit/blob/master/packages/glob/src/internal-glob-options-helper.ts so that we can initialize a default options object and keep all the defaults in a single place.
There was a problem hiding this comment.
Done. The one issue I hit with this approach is that even though we fill in the default value if it's undefined, the interface still shows the properties as optional, so we still need some check if it's undefined. I ended up adding an assertDefined method that throws if the property is undefined, but that should never happen in practice...it's just to make TypeScript happy.
If you know of a better way to do this, please let me know.
There was a problem hiding this comment.
(I also tried using the non-null operator !, but lint doesn't like it)
| - Fix to await async function getCompressionMethod | ||
|
|
||
| ### 1.0.0 | ||
| - Downloads Azure-hosted caches using the Azure SDK for speed and reliability No newline at end of file |
There was a problem hiding this comment.
Could we list the breaking changes here?
There was a problem hiding this comment.
Done. Are there any other breaking changes we would want to make now?
There was a problem hiding this comment.
I'm working on a follow-on PR to display progress. Let's not release 1.0.0 just yet.
There was a problem hiding this comment.
@joshmgross Here's the follow-on PR - https://github.com/dhadka/toolkit/pull/2
| return false | ||
| } | ||
| const retryableStatusCodes = [ | ||
| HttpCodes.BadGateway, |
There was a problem hiding this comment.
Do we get throttled here? Should we retry on 429?
There was a problem hiding this comment.
I don't think we should retry on throttling - it probably will make it worse, no? :D
There was a problem hiding this comment.
(Note that I didn't add this code, just moved it to a different file for testing purposes)
I believe it's this way because a 429 would generally mean we need to wait for some period of time. The question then becomes whether it's better for us to wait and retry, or just "cache miss" and let the user rebuild the cache content. These retries are only used when accessing the ArtifactCache API, so this would mean RU is throttling the host and that host would likely remain throttled for some time.
| } | ||
|
|
||
| if (copy) { | ||
| if (typeof copy.useAzureSdk === 'boolean') { |
There was a problem hiding this comment.
any reason for checking these types? we are already in typescript, we should have good typechecking. I guess are these sent by customers and they can send anything they want? Why don't we validate them and fail if that's the case
There was a problem hiding this comment.
I think the purpose of this if condition is to check if the value is set (not undefined and not null). So I think this is just an easier expression. (This pattern is used elsewhere, such as
)The @actions/cache version currently being used has performance issue with large gem caches. See actions/cache#267 This was addressed here actions/toolkit#497 Bump the @actions/cache version to a newer release that contains that fix.
The @actions/cache version currently being used has performance issue with large gem caches. See actions/cache#267 This was addressed here actions/toolkit#497 Bump the @actions/cache version to a newer release that contains that fix.
Use the Azure Storage SDK for Node (https://github.com/Azure/azure-storage-node) when downloading cache content hosted on Azure blob storage. This is intended to improve reliability and performance because the download call,
downloadToBuffer, downloads the file in 4 MB chunks which can be independently retried and parallelized.In testing, we observed download speeds reduced by about 50% using this approach and a significant improvement in reliability. Additionally, this follows the suggestions made by Azure in our investigation into download flakiness (actions/cache#267).
One limitation of
downloadToBufferis Node only supports buffers up to ~1 GB on 32-bit systems and ~2 GBs on 64-bit systems. While the vast majority of caches are < 1 GB, there are a good number of caches > 1 GB and we technically allow up to 5 GBs. As a result, any download that exceeds the buffer limit will be split up and downloaded in multiple calls todownloadToBuffer.Proxies are still supported using the existing
http_proxyandhttps_proxyenvironment variables.