fix: correct FSI before throwing "Not enough space" to break poisoned-cache loop#59
Conversation
…-cache loop When InsertIntoPage detects that the physical page has less free space than the FSI claimed, update the FSI with the actual physical free-space value before throwing the InvalidOperationException. This prevents the "poisoned-cache loop" where subsequent inserts keep routing to the same full page indefinitely after a transaction rollback caused FSI desync. Apply the same fix in DynamicCollection.InsertIntoPage. Add regression test: InsertAsync_SucceedsAfterRollbackInducedFsiDesync. Agent-Logs-Url: https://github.com/EntglDb/BLite/sessions/42159dfa-db7f-49de-b7b8-352a160c3070 Co-authored-by: mrdevrobot <12503462+mrdevrobot@users.noreply.github.com>
- Show the stale FSI value (before correction) in the "Not enough space" error message for clearer diagnosis - Fix American English spelling in test comment Agent-Logs-Url: https://github.com/EntglDb/BLite/sessions/42159dfa-db7f-49de-b7b8-352a160c3070 Co-authored-by: mrdevrobot <12503462+mrdevrobot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Addresses a rollback-induced Free Space Index (FSI) desynchronization where a stale (overly optimistic) in-memory FSI entry can cause repeated inserts to be routed to an actually-full page, resulting in a “Not enough space” failure loop.
Changes:
- Update FSI with the page’s actual free space before throwing “Not enough space” in
DocumentCollection.InsertIntoPage. - Apply the same “correct FSI before throw” behavior in
DynamicCollection.InsertIntoPage. - Add a regression test that reproduces the rollback-induced FSI desync and asserts inserts succeed within two attempts.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/BLite.Tests/InsertBulkTests.cs | Adds regression test reproducing the poisoned-cache loop scenario. |
| src/BLite.Core/DynamicCollection.cs | Updates FSI before throwing on insufficient page space to break the loop. |
| src/BLite.Core/Collections/DocumentCollection.cs | Captures stale FSI value for diagnostics and corrects FSI before throwing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| { | ||
| // Correct the FSI to match the physical page state so that the next call to | ||
| // FindPageWithSpace does not route back to this page (breaks the poisoned-cache loop | ||
| // that can arise when a transaction that updated the FSI was later rolled back). | ||
| _fsi.Update(pageId, header.AvailableFreeSpace); | ||
| throw new InvalidOperationException($"Not enough space in page {pageId}: need {requiredSpace}, have {header.AvailableFreeSpace}"); | ||
| } |
There was a problem hiding this comment.
This fix was applied to DynamicCollection.InsertIntoPage as well, but the regression test added in this PR only exercises the typed DocumentCollection path. Since DynamicCollection has its own test suite (e.g., tests/BLite.Tests/DynamicCollectionTests.cs), consider adding a corresponding rollback-induced FSI desync test for DynamicCollection to prevent regressions in that codepath too.
| _fsi.TryGetFreeBytes(pageId, out var staleFsi); | ||
| _fsi.Update(pageId, freeSpace); | ||
| throw new InvalidOperationException($"Not enough space: need {requiredSpace}, have {freeSpace} | PageId={pageId} | SlotCount={header.SlotCount} | Start={header.FreeSpaceStart} | End={header.FreeSpaceEnd} | FSI={staleFsi}"); |
There was a problem hiding this comment.
staleFsi will be 0 when TryGetFreeBytes returns false, which changes the diagnostic from the previous FSI=n/a to a misleading FSI=0. Consider preserving the prior behavior by capturing the bool return and formatting FSI as n/a when the page isn't tracked in the FSI yet.
| _fsi.TryGetFreeBytes(pageId, out var staleFsi); | |
| _fsi.Update(pageId, freeSpace); | |
| throw new InvalidOperationException($"Not enough space: need {requiredSpace}, have {freeSpace} | PageId={pageId} | SlotCount={header.SlotCount} | Start={header.FreeSpaceStart} | End={header.FreeSpaceEnd} | FSI={staleFsi}"); | |
| var hasStaleFsi = _fsi.TryGetFreeBytes(pageId, out var staleFsi); | |
| var fsiDiagnostic = hasStaleFsi ? staleFsi.ToString() : "n/a"; | |
| _fsi.Update(pageId, freeSpace); | |
| throw new InvalidOperationException($"Not enough space: need {requiredSpace}, have {freeSpace} | PageId={pageId} | SlotCount={header.SlotCount} | Start={header.FreeSpaceStart} | End={header.FreeSpaceEnd} | FSI={fsiDiagnostic}"); |
| inserted = true; | ||
| break; | ||
| } | ||
| catch (InvalidOperationException ex) when (ex.Message.StartsWith("Not enough space")) |
There was a problem hiding this comment.
The exception filter uses ex.Message.StartsWith("Not enough space") without a StringComparison, which makes the match culture-sensitive and a bit brittle. Use the StartsWith(string, StringComparison.Ordinal) overload (or another non-message-based discriminator if available) to keep the test stable across environments.
| catch (InvalidOperationException ex) when (ex.Message.StartsWith("Not enough space")) | |
| catch (InvalidOperationException ex) when (ex.Message.StartsWith("Not enough space", System.StringComparison.Ordinal)) |
The in-memory Free Space Index (
_fsi) is updated immediately (non-transactionally) whenDeleteCorecompacts a page. A rolled-back delete reverts the physical page via WAL but leaves the FSI with an inflated free-space value. Every subsequent insert is routed to that stale page, hits the physical capacity limit, and throws — without ever correcting the FSI — creating an unrecoverable loop.Changes
DocumentCollection.InsertIntoPage— call_fsi.Update(pageId, actualFreeSpace)before throwing"Not enough space". The error message now captures the stale FSI value (pre-correction) for diagnostics:DynamicCollection.InsertIntoPage— same fix.Regression test
InsertAsync_SucceedsAfterRollbackInducedFsiDesync— populates pages with large documents, induces desync via delete + rollback, then asserts that a subsequent insert succeeds within two attempts (first attempt corrects the FSI; second allocates a fresh page).Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
av-build-tel-api-v1.avaloniaui.net/usr/share/dotnet/dotnet dotnet exec --runtimeconfig /home/REDACTED/.nuget/packages/avalonia.buildservices/11.3.2/tools/netstandard2.0/runtimeconfig.json /home/REDACTED/.nuget/packages/avalonia.buildservices/11.3.2/tools/netstandard2.0/Avalonia.BuildServices.Collector.dll(dns block)If you need me to access, download, or install something from one of these locations, you can either: