Skip to content

bvfs: fix cache race#2642

Merged
BareosBot merged 2 commits into
bareos:masterfrom
pstorz:dev/pstorz/master/fix-bvfs-race
May 4, 2026
Merged

bvfs: fix cache race#2642
BareosBot merged 2 commits into
bareos:masterfrom
pstorz:dev/pstorz/master/fix-bvfs-race

Conversation

@pstorz
Copy link
Copy Markdown
Member

@pstorz pstorz commented Apr 30, 2026

Atomically claim BVFS cache generation before filling PathVisibility so concurrent .bvfs_update runs cannot start the same job twice.

Add a python-bareos system test that runs concurrent BVFS updates against the same job set to cover the regression.

Thank you for contributing to the Bareos Project!

Please check

  • Short description and the purpose of this PR is present above this paragraph
  • Your name is present in the AUTHORS file (optional)

If you have any questions or problems, please give a comment in the PR.

Helpful documentation and best practices

Checklist for the reviewer of the PR (will be processed by the Bareos team)

Make sure you check/merge the PR using devtools/pr-tool to have some simple automated checks run and a proper changelog record added.

General
  • Is the PR title usable as CHANGELOG entry?
  • Purpose of the PR is understood
  • Commit descriptions are understandable and well formatted
  • Required backport PRs have been created
  • If a release should wait for this PR to be finished, set that release's milestone.
Source code quality
  • Source code changes are understandable
  • Variable and function names are meaningful
  • Code comments are correct (logically and spelling)
  • Required documentation changes are present and part of the PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 24d0bee3-9380-4bdd-9454-084f1712c796

📥 Commits

Reviewing files that changed from the base of the PR and between c972e6a and 3a76494.

📒 Files selected for processing (1)
  • CHANGELOG.md

📝 Walkthrough

Walkthrough

UpdatePathHierarchyCache in core/src/cats/bvfs.cc now atomically claims cache work with UPDATE Job SET HasCache=-1 WHERE JobId=%s AND HasCache=0, then uses the affected row count to decide whether to proceed, return "already computed", or return "in progress"; prior separate SELECT checks and unconditional UPDATE were removed.

Changes

Cache Claim Optimization

Layer / File(s) Summary
Atomic claim
core/src/cats/bvfs.cc
Adds UPDATE Job SET HasCache=-1 WHERE JobId=%s AND HasCache=0 and captures updated_rows to atomically claim cache population.
Decision / Early exit
core/src/cats/bvfs.cc
Handles updated_rows < 0 as failure; when updated_rows == 0 performs SELECT HasCache to set retval = true (already computed) or retval = false (in-progress) and returns early.
Removed previous flow
core/src/cats/bvfs.cc
Removes prior separate SELECT checks for HasCache=1 / HasCache=-1 and the unconditional UPDATE to HasCache=-1.
Changelog
CHANGELOG.md
Adds "bvfs: fix cache race [PR #2642]" under Unreleased and the [PR #2642] reference link.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • bvfs: fix cache race #2642: Same change to BareosDb::UpdatePathHierarchyCache (adjusts HasCache update logic to fix BVFS cache race).

Suggested reviewers

  • sebsura

Poem

A rabbit taps the database gate,
"Claim once, don't spin, let's keep it straight."
A single UPDATE, swift and small,
Quiet caches, no race to brawl. 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'bvfs: fix cache race' clearly identifies the component (bvfs) and the specific issue being fixed (cache race condition).
Description check ✅ Passed The PR description includes a clear purpose above the template, marks required checklist items appropriately, and covers the main objectives of atomically claiming cache and adding test coverage.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@sebsura sebsura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks correct, but I have not been able to reliably reproduce the problem.

The test itself does not seem to be able to reproduce the issue as it always succeeds, even if i run it without the proposed fix.

Comment thread systemtests/tests/python-bareos/test_bvfs.py Outdated
Comment thread systemtests/tests/python-bareos/test_bvfs.py Outdated
Comment thread systemtests/tests/python-bareos/test_bvfs.py Outdated
Atomically claim BVFS cache generation before filling
PathVisibility so concurrent .bvfs_update runs cannot start
the same job twice.
@pstorz pstorz force-pushed the dev/pstorz/master/fix-bvfs-race branch from 3aaa258 to c972e6a Compare May 4, 2026 10:56
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/src/cats/bvfs.cc`:
- Around line 150-160: The current branch treats a benign "already in progress"
as an error because it only checks existence with "SELECT 1 ... HasCache=1" and
returns false; change the query and return semantics so the caller can
distinguish three outcomes: computed, in-progress, and missing/error. In the
bvfs.cc path that checks updated_rows (the block using Mmsg/QueryDb/SqlNumRows
and JobId), SELECT HasCache FROM Job WHERE JobId = %s and examine the returned
HasCache value (1 => computed -> retval=true; 0 or -1 => in-progress -> return a
distinct status or encoded value, e.g., set retval to a new enum/constant
BVFS_IN_PROGRESS instead of false; NULL/no row => treat as missing/error). Then
update DotBvfsUpdateCmd to handle the new BVFS_IN_PROGRESS value separately from
an error (currently it treats any false as ERROR).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fe6e4749-91bd-4fb1-82e6-1166749b95d9

📥 Commits

Reviewing files that changed from the base of the PR and between 3aaa258 and c972e6a.

📒 Files selected for processing (1)
  • core/src/cats/bvfs.cc

Comment thread core/src/cats/bvfs.cc
Comment on lines +150 to +160
if (updated_rows == 0) {
Mmsg(cmd, "SELECT 1 FROM Job WHERE JobId = %s AND HasCache=1", jobid);
if (!QueryDb(jcr, cmd)) { goto bail_out; }

if (!QueryDb(jcr, cmd) || SqlNumRows() > 0) {
Dmsg1(dbglevel, "already in progress %d\n", (uint32_t)JobId);
retval = false;
if (SqlNumRows() > 0) {
Dmsg1(dbglevel, "Already computed %d\n", (uint32_t)JobId);
retval = true;
} else {
Dmsg1(dbglevel, "already in progress %d\n", (uint32_t)JobId);
retval = false;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't report the expected "already in progress" path as an error.

This branch returns false for the benign concurrent case, and DotBvfsUpdateCmd() in core/src/dird/ua_dotcmds.cc:110-126 treats every false as ERROR. It also can't distinguish a missing JobId from HasCache=-1, because SELECT 1 ... HasCache=1 only tells you "computed" vs "not computed". Please surface a distinct status here, or otherwise handle "in progress" separately from real failures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/cats/bvfs.cc` around lines 150 - 160, The current branch treats a
benign "already in progress" as an error because it only checks existence with
"SELECT 1 ... HasCache=1" and returns false; change the query and return
semantics so the caller can distinguish three outcomes: computed, in-progress,
and missing/error. In the bvfs.cc path that checks updated_rows (the block using
Mmsg/QueryDb/SqlNumRows and JobId), SELECT HasCache FROM Job WHERE JobId = %s
and examine the returned HasCache value (1 => computed -> retval=true; 0 or -1
=> in-progress -> return a distinct status or encoded value, e.g., set retval to
a new enum/constant BVFS_IN_PROGRESS instead of false; NULL/no row => treat as
missing/error). Then update DotBvfsUpdateCmd to handle the new BVFS_IN_PROGRESS
value separately from an error (currently it treats any false as ERROR).

@BareosBot BareosBot merged commit 59b3fe1 into bareos:master May 4, 2026
1 check was pending
@sebsura sebsura linked an issue May 7, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parallel invocations of bvfs_update lead to database errors

3 participants