Cache Stuck #658

ScottPierce · 2020-12-02T04:41:41Z

Describe the bug
All my actions are running, but are refusing to cache with the same error:

Post job cleanup.
Unable to reserve cache with key Linux-modules-0c1e633988f0463573ae79cec4e4e741153037a6a270a901c8bf040b3d674a4f, another job may be creating this cache.

There must be some sort of race condition, and the cache is stuck, because they all have the same precache log:

Run actions/cache@v2
  with:
    path: **/node_modules
    key: Linux-modules-0c1e633988f0463573ae79cec4e4e741153037a6a270a901c8bf040b3d674a4f
Cache not found for input keys: Linux-modules-0c1e633988f0463573ae79cec4e4e741153037a6a270a901c8bf040b3d674a4f

I imagine that if I changed my yarn lock file, which the cache is based on, this issue would resolve itself.

Expected behavior
That the cache always works, and when it gets stuck like this, it would automatically fix itself.

Additional context

My github runner is running inside a docker container. Not deferring to a docker container, but are running inside a container where the runner is installed. I don't think this impacts things as my cache normally works.
I don't know how to reproduce this. Normally the cache works. There just is clearly some sort of race condition where it can get stuck.

The text was updated successfully, but these errors were encountered:

Pentarctagon · 2020-12-04T05:07:55Z

I've run into the same issue. Each time it starts with Cache not found for input keys: windows-master-N002 and ends with Unable to reserve cache with key windows-master-N002, another job may be creating this cache., so no cache ends up actually getting created.

rcowsill · 2020-12-05T18:14:59Z

I'm getting this too, on a GitHub-hosted runner. Logs are here: https://github.com/rcowsill/NodeGoat/runs/1504051837 (see the Post Use cache (docker layers) section)

In my case it's happening when the cache with the matching key already exists. Normally that would fail with the message "Cache already exists." instead of "Unable to reserve cache with key [...]"

I'm using satackey/action-docker-layer-caching@v0.0.8, which imports @actions/cache@1.0.4. In case it's important, that action is running up to four cache uploads in parallel, but all caches have unique keys within a single build. I was only running that one build using those cache keys at the time.

EDIT: I tested with parallel uploads switched off and got the same result. I'm wondering if the serverside response has been changed for the "cache exists" case to avoid retries which will never succeed.

ScottPierce · 2020-12-11T05:58:26Z

I've seen this happen again. This might be the initial error:

Error: Cache upload failed because file read failed with EBADF: bad file descriptor, read

ScottPierce · 2020-12-11T06:57:28Z

I've also noticed this is extremely reproducible for some reason. This failure happens a lot for my docker runners when things are in parallel, especially now that I have 8 runners now.

ScottPierce · 2020-12-12T19:10:14Z

This is now blocking the build system. I'm seeing this build error constantly:

Warning: uploadChunk (start: 100663296, end: 134217727) failed: Cache service responded with 503
/_work/_actions/actions/cache/v2/dist/save/index.js:3305
                        throw new Error(`Cache upload failed because file read failed with ${error.message}`);
                        ^

Error: Cache upload failed because file read failed with EBADF: bad file descriptor, read
    at ReadStream.<anonymous> (/_work/_actions/actions/cache/v2/dist/save/index.js:3305:31)
    at ReadStream.emit (events.js:210:5)
    at internal/fs/streams.js:167:12
    at FSReqCallback.wrapper [as oncomplete] (fs.js:470:5)

I'm going to remove caching to fix it.

klausbadelt · 2021-01-17T02:26:30Z

Same issue, macOS runner. Works on GitHub runner, but not local. Our key is key: ${{ runner.os }}-yarn-${{ hashFiles('web/yarn.lock') }}.

Unable to reserve cache with key macOS-yarn-5967645e7e3b673f1a9f0792c61e21863dc66df3a592de7561b991f867052c7b, another job may be creating this cache

christianchownsan · 2021-01-26T15:22:37Z

I think we encountered this problem by manually cancelling a run that had begun caching with a certain key before it completed the caching step. It seemed to never released the key, and subsequent runs that try to use the same key fail to reserve it:

Run actions/cache@v2
Cache not found for input keys: <key>

...

Post Run actions/cache@v2
Post job cleanup.
Unable to reserve cache with key <key>, another job may be creating this cache.

Our only workaround was to change the key.

This is to work around actions/toolkit#658

bilelmoussaoui · 2021-04-01T15:19:08Z

I'm also experiencing this issue with my custom github action, it seems that it really insists on the cache key being "unique"? otherwise it keeps failing with Error: Failed to save cache: ReserveCacheError: Unable to reserve cache with key flatpak-builder-0f51bddf4f15c39e39b55b6c92c8249d99d44205, another job may be creating this cache. Except there are 0 other jobs running at the same time. The issue can be triggered by re-running a workflow job that just created the cache.

RSickenberg · 2021-04-08T16:51:07Z

Same here, or some Warning: uploadChunk (start: 67108864, end: 100663295) failed: Cache service responded with 503

rafis · 2021-05-09T05:20:55Z

Post job cleanup.
/usr/bin/tar --posix --use-compress-program zstd -T0 -cf cache.tzst -P -C /home/runner/work/***/*** --files-from manifest.txt
Warning: uploadChunk (start: 33554432, end: 67108863) failed: Cache service responded with 503
/home/runner/work/_actions/actions/cache/v2/dist/save/index.js:4043
                        throw new Error(`Cache upload failed because file read failed with ${error.message}`);
                        ^

Error: Cache upload failed because file read failed with EBADF: bad file descriptor, read
    at ReadStream.<anonymous> (/home/runner/work/_actions/actions/cache/v2/dist/save/index.js:4043:31)
    at ReadStream.emit (events.js:210:5)
    at internal/fs/streams.js:167:12
    at FSReqCallback.wrapper [as oncomplete] (fs.js:470:5)

After reruning the workflow error is gone.

Mubelotix · 2021-11-05T12:48:21Z

I had a relative path in the list of files to cache. But it seems that relative paths are not supported. Removing it fixed the error.

btrepp · 2021-12-11T02:52:31Z

I think I am hitting this on two workflows, as rust-cache uses this underneath.
Is there any way to view/delete the caches if they are in a 'funky' state?

Manually overriding cache keys, and doing a commit seems like a bit of whack-a-mole approach. Happy to manually delete the cache from a UI (or API) feels like a bit of a cleaner workaround until the root cause is fixed.

MichaelTamm · 2021-12-17T16:47:47Z

I also experienced Warning: uploadChunk (start: 67108864, end: 100663295) failed: Cache service responded with 503

mabdullahadeel · 2022-01-20T23:10:54Z

Faced the same on macOS matrix.

Error: uploadChunk (start: 0, end: 15553027) failed: Cache service responded with 503

Though everything worked on the very next run.

swalkinshaw · 2022-01-23T02:21:55Z

I ran into this issue and think I figured out the most common cause. It seems like there's two causes for this:

Transient uploading issues (bad file descriptor, chunk, 503s, etc)
trying to save the cache for a cache key that was already saved (and restored)

If you're running into this problem and don't have any errors uploading, I'm guessing the root cause is the last one.

Basically I compared what this cache library was doing with how the official actions/cache action actually uses it and here's the key part: https://github.com/actions/cache/blob/611465405cc89ab98caca2c42504d9289fb5f22e/src/save.ts#L39-L54

The official action does not try to save the cache if there was previously an exact key match on a cache hit. If you just naively do a restore cache + save cache (like I tried), you'll run into this error every time there's a cache hit (meaning you're trying to save a cache key which is already cached). Ideally saveCache was an atomic operation, but since it's not we have to replicate that behaviour.

So unfortunately the solution is to replicate all the logic within https://github.com/actions/cache/blob/611465405cc89ab98caca2c42504d9289fb5f22e/src/save.ts.

Here's a utility function to wrap a cacheable function (like calling exec.exec('some command')) which works for me:

async function withCache(cacheable, paths, baseKey, hashPattern) {
  const keyPrefix = `${process.env.RUNNER_OS}-${baseKey}-`;
  const hash = await glob.hashFiles(hashPattern);
  const primaryKey = `${keyPrefix}${hash}`;
  const restoreKeys = [keyPrefix];

  const cacheKey = await cache.restoreCache(paths, primaryKey, restoreKeys);

  if (!cacheKey) {
    core.info(`Cache not found for keys: ${[primaryKey, ...restoreKeys].join(", ")}`);
  }

  core.info(`Cache restored from key: ${cacheKey}`);

  await cacheable();

  if (isExactCacheKeyMatch(primaryKey, cacheKey)) {
    core.info(`Cache hit occurred on the primary key ${primaryKey}, not saving cache.`);
    return;
  }

  await cache.saveCache(paths, primaryKey);
}

await withCache(async () => {
  await exec.exec('npm install')
}, ['node_modules], 'npm', '**/package.json');

You might want to customize the arguments and how the keys are built (maybe accept a list of restore keys too).

Current SSCACHE is stuck due to: actions/toolkit#658 The simplest solution is to invalidate it by switching the start of the cache key to v1. Signed-off-by: Jakub Sztandera <kubuxu@protocol.ai>

filipworksdev · 2022-06-14T20:40:10Z

A build fails with Unable to reserve cache with key 3.1.10-x64-master, another job may be creating this cache. First time I have seen this about 1-2 weeks ago on different repo. Now I see it at least once in every repo per week.

Francesco146 · 2023-10-08T12:17:21Z

the issue is still reproducible, there's not a workaround at the moment

natemcintosh · 2024-01-13T15:52:32Z

Had a similar issue with ruff:

ruff failed
  Cause: Failed to create cache file '/home/runner/work/aoc_2023/aoc_2023/.ruff_cache/0.1.12/1323824952410372998'
  Cause: No such file or directory (os error 2)

Emptying the workflow file, committing, then pasting it back in fixed the issue.

Alfmac22 · 2024-01-18T18:29:06Z

ruff failed
Cause: Failed to create cache file '/home/runner/work/aoc_2023/aoc_2023/.ruff_cache/0.1.12/1323824952410372998'
Cause: No such file or directory (os error 2)

### Description - Uses derived hook and ISM config and dispatchTx of message to implement metadata fetching ### Drive-by changes - Change yarn cache key to workaround actions/toolkit#658 - Make `hyperlane message send` use `HyperlaneCore.sendMessage` ### Related issues - Fixes #3450 ### Backward compatibility Yes ### Testing E2E testing BaseMetadataBuilder

ScottPierce added the bug Something isn't working label Dec 2, 2020

alexplischke mentioned this issue Feb 23, 2021

Cache modules in pipeline saucelabs/saucectl#219

Closed

SanjayVas added a commit to world-federation-of-advertisers/actions that referenced this issue Feb 24, 2021

Do not mark bazel-build-test action as failed on ReserveCacheError.

52c324d

This is to work around actions/toolkit#658

SanjayVas mentioned this issue Feb 24, 2021

Do not mark bazel-build-test action as failed on ReserveCacheError. world-federation-of-advertisers/actions#3

Merged

SanjayVas added a commit to world-federation-of-advertisers/actions that referenced this issue Feb 24, 2021

Do not mark bazel-build-test action as failed on ReserveCacheError. (#3)

d5fbef6

This is to work around actions/toolkit#658

alexarchambault mentioned this issue Mar 15, 2021

Transient errors in post-job cache action coursier/cache-action#131

Closed

Kami mentioned this issue Mar 28, 2021

Experiment with running tests in parallel on GHA StackStorm/st2#5211

Merged

2 tasks

bilelmoussaoui mentioned this issue Apr 1, 2021

caching support flatpak/flatpak-github-actions#29

Merged

kimburgess mentioned this issue Apr 5, 2021

Builds failing due to cache issues PlaceOS/PlaceOS#59

Closed

bilelmoussaoui mentioned this issue Apr 19, 2021

Daily Deploy elementary/calculator#169

Merged

ksaur mentioned this issue Apr 24, 2021

caching is broken in the pipeline microsoft/hummingbird#506

Closed

thboop added the cache label Apr 30, 2021

peterbe mentioned this issue Aug 17, 2021

node_modules caching in GitHub Actions often fails mdn/content#8014

Closed

edewata mentioned this issue Oct 27, 2021

Cache build dependencies dogtagpki/pki#3800

Merged

bilelmoussaoui mentioned this issue Nov 23, 2021

Do not error if the cache already exists flatpak/flatpak-github-actions#56

Closed

spring1843 mentioned this issue Feb 18, 2022

Fix GHA cache key for go modules aws/karpenter-provider-aws#1372

Merged

3 tasks

Kubuxu mentioned this issue Mar 27, 2022

Invalidate SSCACHE on github filecoin-project/ref-fvm#413

Merged

GeorgesStavracas mentioned this issue Apr 13, 2022

CI: Use manifest hash as Flatpak cache key obsproject/obs-studio#5137

Merged

6 tasks

KaylaBrady mentioned this issue May 11, 2022

feat(dialyzer): Ability to disable restore cache keys & increment cache key mbta/actions#25

Merged

djaglowski mentioned this issue May 23, 2022

[github-actions] Tools unavailable due to cache failure open-telemetry/opentelemetry-collector-contrib#9280

Closed

yudonlee mentioned this issue Nov 22, 2022

[FEAT]: Github Action Swift Package manager Caching 구현 DeveloperAcademy-POSTECH/MacC-Team-Vegeting#161

Merged

EricCrosson mentioned this issue Jan 8, 2023

Re-enable caching by default EricCrosson/install-github-release-binary#6

Closed

vadz mentioned this issue Jan 17, 2023

Make cache key timestamping optional. hendrikmuhs/ccache-action#126

Merged

GMNGeoffrey mentioned this issue Apr 6, 2023

Disable caching on GitHub-host MacOS X86_64 runners iree-org/iree#12905

Merged

pwojcikdev mentioned this issue Apr 18, 2023

Multiple action (unit test) runs for the same OS may conflict each other when saving the build cache nanocurrency/nano-node#4214

Open

wlee221 mentioned this issue Jul 17, 2023

chore: Migrate CI/CD to GitHub Actions aws-amplify/amplify-js#11638

Merged

2 tasks

xeho91 mentioned this issue Nov 14, 2023

Fix the race condition in CI caused by caching rust-cleaners/polished-css#49

Closed

Alfmac22 mentioned this issue Jan 13, 2024

Had a similar issue with ruff: teslamotors/react-native-camera-kit#635

Closed

ohookins mentioned this issue Jan 29, 2024

cache.saveCache should raise an error on reserve failure, but doesn't #1642

Open

WontonSam mentioned this issue Mar 15, 2024

[Snyk] Upgrade semver from 6.3.1 to 7.6.0 WontonSam/toolkit#6

Open

yorhodes mentioned this issue May 12, 2024

feat(sdk): ISM metadata building from message context hyperlane-xyz/hyperlane-monorepo#3702

Merged

zoey-juan mentioned this issue Jun 17, 2024

Error: Cache upload failed because file read failed with EBADF: bad file descriptor, read gradle/actions#238

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache Stuck #658

Cache Stuck #658

ScottPierce commented Dec 2, 2020 •

edited

Loading

Pentarctagon commented Dec 4, 2020

rcowsill commented Dec 5, 2020 •

edited

Loading

ScottPierce commented Dec 11, 2020

ScottPierce commented Dec 11, 2020

ScottPierce commented Dec 12, 2020 •

edited

Loading

klausbadelt commented Jan 17, 2021

christianchownsan commented Jan 26, 2021

bilelmoussaoui commented Apr 1, 2021

RSickenberg commented Apr 8, 2021

rafis commented May 9, 2021 •

edited

Loading

Mubelotix commented Nov 5, 2021

btrepp commented Dec 11, 2021

MichaelTamm commented Dec 17, 2021

mabdullahadeel commented Jan 20, 2022

swalkinshaw commented Jan 23, 2022 •

edited

Loading

filipworksdev commented Jun 14, 2022

Francesco146 commented Oct 8, 2023

natemcintosh commented Jan 13, 2024

Alfmac22 commented Jan 18, 2024

Cache Stuck #658

Cache Stuck #658

Comments

ScottPierce commented Dec 2, 2020 • edited Loading

Pentarctagon commented Dec 4, 2020

rcowsill commented Dec 5, 2020 • edited Loading

ScottPierce commented Dec 11, 2020

ScottPierce commented Dec 11, 2020

ScottPierce commented Dec 12, 2020 • edited Loading

klausbadelt commented Jan 17, 2021

christianchownsan commented Jan 26, 2021

bilelmoussaoui commented Apr 1, 2021

RSickenberg commented Apr 8, 2021

rafis commented May 9, 2021 • edited Loading

Mubelotix commented Nov 5, 2021

btrepp commented Dec 11, 2021

MichaelTamm commented Dec 17, 2021

mabdullahadeel commented Jan 20, 2022

swalkinshaw commented Jan 23, 2022 • edited Loading

filipworksdev commented Jun 14, 2022

Francesco146 commented Oct 8, 2023

natemcintosh commented Jan 13, 2024

Alfmac22 commented Jan 18, 2024

ScottPierce commented Dec 2, 2020 •

edited

Loading

rcowsill commented Dec 5, 2020 •

edited

Loading

ScottPierce commented Dec 12, 2020 •

edited

Loading

rafis commented May 9, 2021 •

edited

Loading

swalkinshaw commented Jan 23, 2022 •

edited

Loading