Skip to content

Fix slow cache restore on Windows #515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

antontroshin
Copy link

@antontroshin antontroshin commented Nov 19, 2024

Description:
The issue with running setup-go and restoring relatively big cache files, causes the cache restore process to be very slow on Github Action Windows runners.
Similar to the change that was made in version v5.0.0 (PR #411), I've made a change to override the default GOCACHE and GOMODCACHE variables on Windows runners to be using drive D.
On average, restoring ~1.7Gb cache on Windows pipelines took about 13-15 minutes to extract.
With this change, the extraction time was reduced to ~1m.

Restore cache logs before the change:

Wed, 20 Nov 2024 02:24:29 GMT Cache Size: ~1784 MB (1870225672 B)
Wed, 20 Nov 2024 02:24:30 GMT Received 1870225672 of 1870225672 (100.0%), 295.9 MBs/sec
Wed, 20 Nov 2024 02:37:21 GMT Cache restored successfully
Restore elapsed: 12 min 51 sec
Wed, 20 Nov 2024 01:38:54 GMT Cache Size: ~1784 MB (1870225672 B)
Wed, 20 Nov 2024 01:38:55 GMT Received 1870225672 of 1870225672 (100.0%), 118.7 MBs/sec
Wed, 20 Nov 2024 01:54:19 GMT Cache restored successfully
Restore elapsed: 15 min 24 sec

Restore cache logs after the change:

Wed, 20 Nov 2024 03:38:03 GMT Cache Size: ~1780 MB (1866088667 B)
Wed, 20 Nov 2024 03:38:04 GMT Received 1866088667 of 1866088667 (100.0%), 197.3 MBs/sec 
Wed, 20 Nov 2024 03:38:59 GMT Cache restored successfully
Restore elapsed: 55 sec
Wed, 20 Nov 2024 03:38:10 GMT Cache Size: ~1780 MB (1866088667 B)
Wed, 20 Nov 2024 03:38:11 GMT Received 1866088667 of 1866088667 (100.0%), 197.6 MBs/sec
Wed, 20 Nov 2024 03:39:06 GMT Cache restored successfully
Restore elapsed: 55 sec
Wed, 20 Nov 2024 04:04:07 GMT Cache Size: ~1780 MB (1866088667 B)
Wed, 20 Nov 2024 04:04:07 GMT Received 1866088667 of 1866088667 (100.0%), 88.8 MBs/sec
Wed, 20 Nov 2024 04:05:05 GMT Cache restored successfully
Restore elapsed: 1 min 2 sec

Solution:

At first, I tried to implement a solution similar to #411 by using symlink, however, during testing, it was clear that the GOCACHE and GOMODCACHE directories were created early by the tooling, which did not present an easy way to create symlinks without deleting the already in-use directories and introducing bigger code changes.

The solution to override the path for env vars as early as possible only on Windows runners is easier.

This PR adds the code to make this change by default for Windows runners, this "workaround" is possible to use via the Github Action code itself, by manually overriding the env vars.
Preferably, this change would be accepted and merged (after review of course), so users can enjoy this improvement by default without manually modifying each and every Github action.

Related issue:
Numerous related issues were raised regarding this issue, some improvements were made to address it.

Check list:

  • Mark if documentation changes are required.
  • Mark if tests were added or updated to cover the changes.

Use D drive for faster cache restore

Signed-off-by: Anton Troshin <anton@diagrid.io>
Signed-off-by: Anton Troshin <anton@diagrid.io>
Signed-off-by: Anton Troshin <anton@diagrid.io>
Signed-off-by: Anton Troshin <anton@diagrid.io>
Signed-off-by: Anton Troshin <anton@diagrid.io>
Signed-off-by: Anton Troshin <anton@diagrid.io>
Signed-off-by: Anton Troshin <anton@diagrid.io>
Signed-off-by: Anton Troshin <anton@diagrid.io>
Signed-off-by: Anton Troshin <anton@diagrid.io>
Signed-off-by: Anton Troshin <anton@diagrid.io>
Signed-off-by: Anton Troshin <anton@diagrid.io>
Signed-off-by: Anton Troshin <anton@diagrid.io>
Signed-off-by: Anton Troshin <anton@diagrid.io>
…rs for Windows

Signed-off-by: Anton Troshin <anton@diagrid.io>
Signed-off-by: Anton Troshin <anton@diagrid.io>
@antontroshin antontroshin changed the title Add GOCACHE AND GOMODCACHE symlink on Windows Fix slow cache restore on Windows Nov 20, 2024
@antontroshin antontroshin marked this pull request as ready for review November 20, 2024 04:33
@antontroshin antontroshin requested a review from a team as a code owner November 20, 2024 04:33
@y-bruin
Copy link

y-bruin commented Dec 31, 2024

@antontroshin When do you plan to merge it?

@antontroshin
Copy link
Author

@antontroshin When do you plan to merge it?

@y-bruin Unfortunately, I cannot merge, as this is up to repo maintainers, once the PR is reviewed and approved.

@lmb
Copy link

lmb commented Jan 21, 2025

For anyone else also waiting for this to get some attention:

    env:
      # Fix slow Go compile and cache restore
      # See https://github.com/actions/setup-go/pull/515
      GOCACHE: D:\gocache
      GOMODCACHE: D:\gomodcache
      GOTMPDIR: D:\gotmp

    steps:
      # Go requires gotmp to be present
      - run: mkdir D:\gotmp
        shell: pwsh

Setting GOTMPDIR to D: was necessary to cut down the time it took to go install from cache.

add validation for GOCACHE, GOMODCACHE, and GOTMPDIR on Windows

Signed-off-by: Anton Troshin <anton@diagrid.io>
@antontroshin
Copy link
Author

Hi @HarithaVattikuti, @priya-kinthali, @aparnajyothi-y, @priyagupta108
I'm very sorry for tagging you directly. Is it possible to request your review of this PR?
I, and many others, would benefit greatly from having this feature merged and enabled by default.
Gladly would address any comments or requests.
Thanks!

@aparnajyothi-y
Copy link
Contributor

Hello @antontroshin, Thank you for submitting this pull request. We will review and get back to you once we have some feedback on this :)

@HarithaVattikuti
Copy link
Contributor

Closing and Reopening the PR to trigger the CI checks

@lmvysakh
Copy link

Hi @antontroshin,

Firstly, Thank you for your contribution and the code changes to address the slow cache restore issue on Windows. Your PR is successfully minimizing cache timing on GitHub-hosted Windows runners.

However, We noticed that there is a test case failing as per the checks and the changes in this PR are currently not working on self-hosted runners when checked in both cases (With and Without D Drive)—they are causing failures in those environments. The screenshot of the failed workflow when ran with self hosted runner having D drive is given as of below.

Failed workflow

Could you please update the PR to fix the failing test case and ensure that the changes do not negatively affect self-hosted runners, while still resolving the cache timing issue for GitHub-hosted Windows runners?

@antontroshin
Copy link
Author

Hi @lmvysakh
Thanks for checking the PR. The code was a bit outdated, so I've merged the changes from main, and I hope this will fix the failed Action run. I didn't find the check as it appears in the screenshot, so I wasn't able to investigate the exact cause.

As for lack of drive "D:", I've added this check to skip the customization of the Go cache paths because the only safe assumption we can make in this case is that drive "C:" (default path locations) is the only drive that exist.

It may be a good idea to add this limitation for Windows machines to the README, with instructions on how to customize these paths by setting ENV variables manually, as @lmb commented above.
WDYT?

@lmvysakh
Copy link

Hi @antontroshin,

Thank you for your contribution and for working to improve cache restore performance on Windows runners!

We wanted to highlight that, as per the update, the D drive will no longer be accessible on GitHub-hosted Windows Server 2025 runners starting from July 14, 2025. Most Windows runners already do not have a D drive available, and this change will make it unavailable across all relevant runners for performance and startup reasons.

Since the current implementation in this PR achieves its speed improvement primarily by utilising the D drive, it might not work reliably for the majority of users and will break entirely once the D drive is removed from the images.

Would you be open to exploring alternative approaches that do not rely on the D drive? This will help ensure broader compatibility and long-term support for all users of the action.

Thank you again for your efforts and for considering this feedback!

@antontroshin
Copy link
Author

Hi @lmvysakh,
Thanks for the update, that is indeed very welcome news! It seems that the new runners will be using fast disks for the C drive.
To address the absence of drive D, I've added the following check in my last commit.

export const setWindowsCacheDirectories = async () => {
  if (os.platform() !== 'win32') return;

  if (!fs.existsSync('D:')) return;
  ...
}

If there's no D drive, the function will exit early, and env variables will be kept as-is, pointing to default locations, which is drive C, and thus preserving original behavior.
This should work seamlessly on new runners, given the drives will be fast, and on the old runners that still have the D drive.

Please let me know if this answers the concerns. Of course, I'm open to feedback if anything needs to be changed or improved.

One last thing, would you be so kind as to please rerun the one failed action Validate 'setup-go'? It seems to have failed on HTTP error 500, which happened on the day of the Google outage. Most likely was a temporary issue.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants