perf: cut create-export latency by ~50% — three independent fixes#57
Merged
Conversation
End-to-end VM create (NATS claim → Running) dropped from ~1063ms to
~535ms on identical traces (same image, same agent-hash). The
"GlideFS CoW" span that was the visible long pole in trace UI went
from 415ms to 171ms — almost all of which is now the unavoidable S3
PUT for save_export.
Three independent fixes, each landed against measured numbers from
structured timing logs added alongside.
1. Snapshot manifest cache (router.rs)
`snapshots/{name}/{seq:020}` is keyed by a monotonic, append-only
sequence — write-once by construction, byte-identical for any given
(s3_prefix, name, seq) triple. A misleading comment claimed
"snapshots are mutable" and the code refused to cache them, so every
fork-from-snapshot paid a fresh ~123ms S3 GET. box-manager's
ensure_derived_snapshot flow forks every VM from the same staging
snapshot — i.e. every single VM was re-fetching the same bytes.
Adds a bounded HashMap keyed by (s3_prefix, manifest_name, sequence)
alongside base_manifest_cache. Pre-populates on snapshot_export so
the daemon that wrote the snapshot can serve forks of it for free.
Cache hit on warm path drops manifest_fetch_ms from 123 → 0.
2. Background sysfs queue tuning (ublk/device.rs)
The wbt_lat_usec and scheduler sysfs writes ran inside register_inner
and cost ~50ms each on this kernel — the block layer reconfigure is
surprisingly heavy. They're tuning hints; the device works fine
without them. spawn_blocking them off the response path saves ~100ms
per device-create.
3. Tick the executor BEFORE io_uring_enter (ublk/worker_pool.rs)
The biggest win and the most surprising bug. The kernel's
`ublk_ctrl_start_dev` blocks on `wait_for_completion_interruptible`
until every queue's `nr_io_ready` reaches `queue_depth` — i.e. until
every io_task has submitted its initial UBLK_IO_FETCH_REQ uring_cmd.
The worker loop order was: drain inbox (handle_add_queue spawns 64
io_task futures per queue), `submit_with_args(to_wait=1, ...)`,
drain CQEs, then finally `executor.tick()`. But io_tasks submit
their FETCH_REQ SQEs on first poll — and the first poll only ran
AFTER the io_uring_enter wait. So the worker slept the entire
`WORKER_IDLE_NSEC = 250_000_000` (250ms) timeout waiting for CQEs
that physically couldn't arrive, while START_DEV sat blocked on
the matching completion.
Moving the executor tick to BEFORE the submit flushes the FETCH_REQ
SQEs into the ring first, the submit pushes them to the kernel,
ublk_mark_io_ready fires, complete_all(&ub->completion) runs, and
START_DEV returns essentially instantly. One block-of-code moved
up bought 250ms per device-create. The 250ms ublk START_DEV
ioctl is now 0.5-1ms.
Verification
| metric | before | after | delta |
| GlideFS CoW span | 415 ms | 171 ms | -244 ms |
| boot_duration_ms | 584 ms | 265 ms | -319 ms |
| NATS claim → VM Running | 1063 ms | 535 ms | -528 ms |
| register_device_ms | 252 ms | 1 ms | -251 ms |
| START_DEV kernel ioctl | 250 ms | 0.5 ms | -249 ms |
| sysfs queue-tuning (sync) | 99 ms | 0 ms | -99 ms |
| manifest_fetch_ms (warm) | 123 ms | 0 ms | -123 ms |
cargo test -p glidefs --features ublk: 893 / 893 pass
cargo test -p ublk-core: 10 / 10 pass
Also: structured tracing logs at target="glidefs.timing" on each step
of create_export, register_inner, the tokio::join legs in api.rs, and
inside ublk-core's start_dev (prep/wait_buf_reg/start_ioctl breakdown).
These are what made the bug findable in the first place — keeping them
in for ongoing observability.
Incidental: ublk-core doctests referenced `libublk::*` (old vendored
crate name) and `UblkQueue<'_>` (lifetime removed in a prior refactor).
Fixed those — 10/10 doctests now pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit added `state.executor.tick()` unconditionally
before every `submit_with_args` so newly spawned `io_task` futures
could flush their initial FETCH_REQ SQEs into the ring before the
worker blocked in `io_uring_enter`. That cut `START_DEV` latency
from 250ms to <1ms in production — verified across many VM creates.
But the docker-tests `test_overwrite_survives_restart_ublk` (and
likely other tests exercising the shutdown → restart cycle) hung
with that change. Without the change: passes in 3s. With it:
indefinite. Root cause is a steady-state interaction I could not
isolate without running the test on the homelab host (which I can't
do safely — that path wedged the kernel earlier in this session).
Scope the tick to ONLY the iteration of the worker loop that just
processed an AddQueue message. In steady-state I/O — and during
shutdown / RemoveQueue drain — behavior is byte-for-byte identical
to the pre-fix code, so whatever invariant the test depends on is
preserved. The AddQueue speedup is unchanged: handle_add_queue
spawns the io_tasks, the new tick polls them, FETCH_REQs land in
the ring, the same iteration's submit_with_args pushes them to the
kernel, and `start_dev`'s `wait_for_completion_interruptible`
returns essentially instantly.
Verification
- `test_overwrite_survives_restart_ublk`: passes in 2.33s
(previously hung indefinitely with the broad fix).
- Production VM create on this binary: `register_device_ms=1`,
`start_ioctl_us=478` cold / 419 warm, `PUT total_ms=49` warm.
Same end-to-end speedup as before — ~10x.
- 65-device recovery on daemon handoff: every `start_dev_us`
sub-millisecond (60-2000 µs), no 250ms outliers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier `spawn_blocking` of `wbt_lat_usec=0` + `scheduler=none` saved ~99ms per device-create by moving those sysfs writes off the critical path. Production VM-create kept working because the firecracker boot (~150ms of guest kernel boot) gave the async sysfs writes plenty of time to land before any I/O hit the device. But four `docker_integration` ublk tests hung in CI: - test_unwritten_blocks_return_zeros_ublk - test_overwrite_survives_restart_ublk - test_cold_wake_from_different_node_ublk - test_export_discovery_from_s3_ublk All four issue I/O to the device almost immediately after add — no firecracker boot in between. With the sysfs writes backgrounded the device still had the default `mq-deadline` scheduler when those reads landed, and mq-deadline's deadline queue appears to hold single, idle-device requests long enough that the tests don't make progress within their timeout. The simple `test_unwritten_blocks_return_zeros_ublk` case — single server, single read at offset 512KB, no restart cycle — was the clearest fingerprint. Restore the synchronous writes. Costs us the 99ms back. The tick fix in `worker_pool.rs` (250ms START_DEV → 1ms) is unaffected. Verified locally with the four tests above all passing in 2.0-2.6s after the revert. Future direction: apply `scheduler=none` BEFORE `add_disk` rather than after — either via a `udev` rule keyed on `KERNEL=="ublkb*"` or via a kernel-side ublk_param. Either path eliminates the post-add tuning window entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reverts 1c3bb8d. That revert was based on a wrong attribution. When the prior commit landed on the PR, CI reported four ublk tests `running for over 60s` and I assumed the sysfs backgrounding was at fault — the mq-deadline-vs-sync-scheduler-write theory was plausible. Reverted it to "be safe." But running the failing tests locally in --test-threads=4 parallel mode (matching CI's contention model) under three configurations, 10 runs each: PR with ASYNC sysfs: 8/10 pass, 2/10 fail PR with SYNC sysfs: 6/10 pass, 4/10 fail MAIN, no PR changes: 6/10 pass, 4/10 fail ← same as sync! The flakes are pre-existing on `main` — most likely MinIO under parallel-test contention (each test spawns its own testcontainer, 4 of them compete for host resources). The CI "hanging" reports were these intermittent EIO failures surfacing as "still running" status before the panic; not actual hangs. So the sync sysfs version isn't fixing anything. Restoring the async path reclaims ~99ms per device-create with no observable downside vs the sync path. Net PR result: snapshot cache + conservative tick + sysfs bg → ~470ms savings per warm-cache create, same flake rate as main. Flake fix tracked separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `docker_integration` job has been intermittently failing with EIO on reads and empty discover_exports asserts. Hit rate suggested ~40% on this PR; verified the same rate on `main` (10x runs of the 4 most-flaky ublk tests with --test-threads=4: 6 pass / 4 fail on main; 5/5 pass with --test-threads=1). Root cause is contention from running multiple testcontainers in parallel — each test calls `TestContext::new()` which spawns its own MinIO container. On a 4-vCPU CI runner with the default cargo test parallelism (= num_cpus = 4), four MinIO containers compete for host resources, and MinIO returns transient errors that bubble out as either `Input/output error (os error 5)` on data reads (handler.read_into → cache.read → content_store S3 GET failure) or as empty list results (`should discover at least one export from S3`). Cleanest near-term fix: run docker_integration tests one at a time. Adds ~5min to the docker_integration job (137 tests at ~3-7s each instead of /4 parallelism) but removes the flake. A more elegant follow-up would be to share a single MinIO container across tests via per-test bucket prefixes, but that's structural test-harness work that doesn't need to ride this PR. The integrity-suite job (filter=integrity_suite) already has --ignored --nocapture and runs few tests, so it's not affected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each TestContext used to spawn its own MinIO testcontainer. Under parallel test execution (cargo default = num_cpus) this produced ~40% flake rate on the homelab + CI runners: transient S3 errors that surfaced as EIO on reads, empty discover_exports listings, and "ublk read failed". Verified pre-existing on `main` — not a PR regression — but worth fixing properly since the prior workaround of "--test-threads=1 in CI" papered over the contention rather than removing it. The previous setup tested glidefs against four MinIOs competing for host CPU/IO, not against a single S3 endpoint. That isn't representative of production (one S3 backend per glidefs daemon, even when serving many concurrent VMs). This commit: - Spins up ONE MinIO process-wide via a `tokio::sync::OnceCell`, reused for every `TestContext`. Container lives for the duration of the test process; teardown happens automatically at exit. - Each `TestContext::new()` allocates a unique bucket (`test-bucket-NNNNNN`) from a monotonic counter, giving each test a fully isolated S3 namespace. - Adds a `/minio/health/ready` probe loop on container startup — `start()` returns before the HTTP listener actually answers on heavily loaded hosts, which produced spurious "connection refused" failures during bucket creation. Verification Before: PASS=6 / FAIL=4 / 10 runs (--test-threads=4, four MinIOs) After: PASS=10 / FAIL=0 / 10 runs (--test-threads=4, one MinIO) Each run is also ~25% faster (no per-test container startup): 2.6-3.5s vs 3.5-4.5s. Re-enables parallel CI execution by reverting the `--test-threads=1` workaround. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shared-MinIO refactor (dd26dbf) eliminated the per-test MinIO contention we'd been seeing, and locally the full ublk suite (30 tests) runs reliably in --test-threads=4 with the shared MinIO. But CI hung again on test_unwritten_blocks_return_zeros_ublk on the most recent push. Bisecting across the PR commits, all of them pass 10/10 locally in isolation for that test (main, 1c3bb8d, 23dc94f, dd26dbf). So the regression isn't tied to any single commit. The hang seems to depend on something specific about the CI runner (kernel version, num_cpus=4, ext-of-test concurrency). Falling back to --test-threads=1 in CI: we don't have a story for what specifically races, and running storage tests serially when we can't reproduce the failure is the conservative call. Locally with --test-threads=1 we measured ~7s per run instead of ~3.5s parallel — adds maybe 5min to docker_integration CI total. This is *not* a satisfying resolution. Track-down items: - The hang reproduces on host runs at ~1/10 in isolation when the kernel has hundreds of leaked QUIESCED ublk devices (from prior SIGTERM'd test runs) but passes 10/10 when device count is low. Suggests kernel ublk resource pressure interacts with our daemon path, but the specific deadlock is unidentified. - The shared-MinIO refactor in dd26dbf was a real improvement and stays in; the bug we found there was real (per-test MinIO contention caused 40% flake at threads=4 on main). - A real follow-up should investigate the test_unwritten ublk hang with kernel tracing in a CI-shaped environment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This reverts commit cb4e45e.
Found the actual cause of the docker_integration ublk test hangs.
Standing up an Ubuntu 24.04 VM with the same kernel CI runs
(linux-image-6.17.0-1013-azure) and capturing the kernel stack of
the hung thread reveals:
blk_mq_freeze_queue_wait+0x97/0xe0
blk_mq_freeze_queue_nomemsave+0x22/0x30
elevator_change+0x79/0x180
elv_iosched_store+0x18b/0x1e0
queue_attr_store+0xe4/0x120
sysfs_kf_write+0x4c/0x60
...
This is the `tokio::task::spawn_blocking` task that writes
`/sys/block/ublkbN/queue/scheduler=none`. On 6.17 the kernel's
`elv_iosched_store` calls `blk_mq_freeze_queue` which waits for
in-flight requests to drain — and the kernel counts our armed
FETCH_REQ uring_cmds as in-flight. They never "complete" because
they're long-lived (parked waiting for the next I/O). The freeze
waits forever. The spawn_blocking task hangs, the device is
otherwise functional but our test process eventually times out
waiting on something downstream that depends on it.
(The same code on kernel 6.12 happens to work — either earlier
kernels don't count uring_cmds toward the freeze or the timing
happens to never overlap. Either way, 6.17 made it deterministic.)
Fix: drop the `scheduler=none` write. Keep `wbt_lat_usec=0` (a
simple per-queue store, no freeze, safe on any kernel). The
default `mq-deadline` scheduler costs us some throughput overhead
under heavy load but is functionally fine for ublk. Reclaiming
the perf cleanly requires either a udev rule that fires during
`add_disk`'s KOBJ_ADD uevent (BEFORE FETCH_REQs are armed) or a
kernel-side ublk_param flag — tracked as follow-up.
Verification
- 6.17 VM (the failing kernel):
30/30 ublk tests pass in --test-threads=4, 22s
(before this fix: test_export_discovery_from_s3_ublk hangs
indefinitely with the kernel stack above)
- 6.12 homelab (production kernel):
29/30 ublk tests pass; the one failure
(test_fs_crash_fsync_honored_ublk) is a pre-existing
parallel-test flake unrelated to this fix — passes 1/1
isolated, same flake exists on `main`.
Also retires the 56-day-old memory `project_ublk_617.md`
("START_DEV hangs on Azure 6.17, tests skip until fixed"). The
hang wasn't in START_DEV; it was in our sysfs cleanup running
after `add_disk`. The fix is in our code, not in skipping the
kernel.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reclaim scheduler=none + the other tunables we wanted, properly.
The previous commit dropped the post-add_disk sysfs write because it
deadlocks on kernel 6.17 (`elv_iosched_store` → `blk_mq_freeze_queue
_wait` blocks forever waiting on armed FETCH_REQ uring_cmds to
"complete"). That fixed the hang but left us running with the kernel
default `mq-deadline` scheduler — functional, but a real performance
loss under load.
Real fix: apply the tunables via a udev rule that fires during the
kernel's `add_disk` KOBJ_ADD uevent — BEFORE userspace can open the
device and BEFORE any bios are routed through it. At that moment
there are no in-flight requests and no held queue references, so the
`blk_mq_freeze_queue_wait` inside `elv_iosched_store` completes
immediately. Verified on the Azure 6.17.0-1013 kernel: previously-
hanging tests pass with `scheduler=[none]` active on every ublk
device.
Files:
- `deploy/udev/99-glidefs-ublk.rules` (new): the rule. Sets
scheduler=none, wbt_lat_usec=0, add_random=0, read_ahead_kb=0 on
ublkb* device-add. Each tunable is documented inline with WHY it
applies to ublk specifically (different from spinning rust /
default-SSD assumptions baked into the kernel's defaults).
- `glidefs/src/cli/server.rs`: `run_server` now calls
`install_ublk_udev_rule()` at startup. The rule body is
`include_str!`-embedded from the file above, so the binary is the
source of truth — operators can't accidentally ship a stale rule,
and there's no out-of-band file to keep in sync. Idempotent:
reads the existing file and skips the write+udevadm-reload if
content already matches. Non-fatal on failure (read-only fs, no
udevadm on PATH, etc.): daemon comes up with a warning and the
devices fall back to kernel defaults.
- `glidefs/src/block/ublk/device.rs`: removed the `tokio::task::
spawn_blocking` that was writing wbt_lat_usec post-add_disk.
Redundant now that udev sets it at add-time, plus the spawn was
a detached task that could leak its thread if the write ever
blocked (as we proved it could on 6.17).
No changes needed in beyond/ansible — the binary handles installation
itself.
Verification
Manually applied the rule on the 6.17 VM and ran the previously-
hanging test set:
30/30 ublk tests pass at --test-threads=4 in 22.58s
`scheduler=[none]` active on every ublk device
On 6.12 (homelab): no behavior change — the rule overrides what
the old in-code write was already doing, just via a different
mechanism. 29/30 tests pass at --test-threads=4; the one parallel
flake (test_fs_crash_fsync_honored_ublk) is pre-existing and
unrelated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without this, a 503 from PUT /api/exports leaves the export in `self.exports` but absent from S3. The next retry hits `create_export`'s idempotency check, returns 200 immediately, but `export.json` is still missing — so the export silently vanishes on the next daemon restart. `cleanup_failed_create` drops the in-memory entry, removes any kernel device, tears down flush/prefetch tasks, and clears local cache files so a retry re-runs `create_export` from scratch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PutFailingStore wraps InMemory and fails put_opts on demand; the
test arms it, fires PUT /api/exports/vol1 through the real handler,
and asserts the full Stage 2b contract:
1. response is 503
2. GET /api/exports/vol1 returns 404 (in-memory state torn down)
3. retry after un-arming returns 201 (not 200) — proves the path
re-ran create_export rather than hitting the idempotency check
Without the fix the retry would 200 from the idempotency branch and
S3 would still have no export.json.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`run_server_as_successor` skipped the install_ublk_udev_rule() that `run_server` calls on cold start. Result: on rolling deploys (handoff predecessor → successor) the rule never lands on the host, and any new ublk device created by the successor came up with default tunables (mq-deadline, wbt_lat_usec=2000us, kernel readahead) — a silent regression on every handoff. The install function is idempotent (compares content, skips if matches) and non-fatal on failure, so calling it from both paths is safe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces base_manifest_cache + snapshot_cache (two count-bounded
HashMaps with refusal-on-full) with a single foyer::Cache keyed by
encoded prefix ("b:..."/"s:...") and weighted by VolumeManifest's
estimated heap bytes.
Two problems the previous design had:
1. Count-bounded, not memory-bounded. A 128GB-volume manifest is
~70KB; a 10TB-volume manifest is ~5.5MB. The same 64-entry cap
sized the cache at 4.5MB or 350MB depending on the working-set
geometry — invisible to the operator either way.
2. Refusal-on-full evicts nothing. The first 64 distinct manifests
pinned the cache and every miss after that re-fetched from S3
forever. Fine for tiny base fleets, broken once snapshot churn
or volume diversity entered the mix.
S3-FIFO eviction (same policy as the block cache) handles the
working-set drift. 64MiB default budget, configurable via
RouterConfig.manifest_cache_bytes. Entries are immutable by
construction (base = sealed at bless, snapshot = monotonic-sequence
addressed), so no staleness concern regardless of eviction policy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
I missed the standalone integration tests when adding the new RouterConfig field. Build was failing in CI on every job that compiled the integration test crates (Build and Test, Data Integrity Suite, Docker Integration Tests, Kernel Devices, Clippy). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Workspace-wide autofix for safely widening casts (`x as u64` where x is a narrower unsigned → `u64::from(x)`). 38 files, mechanical. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
End-to-end VM create (NATS claim → Running) dropped from ~1063ms → ~535ms on identical traces (same image, same agent-hash, same host). The "GlideFS CoW" span that was the visible long pole in Beyond's trace UI went from 415ms → 171ms — almost all of which is now the unavoidable S3 PUT for
save_export.boot_duration_msregister_device_msSTART_DEVkernel ioctlmanifest_fetch_ms(warm)The three fixes
1. Snapshot manifest cache (
router.rs)snapshots/{name}/{seq:020}is keyed by a monotonic, append-only sequence — write-once by construction, byte-identical for any given(s3_prefix, name, seq)triple. A misleading comment in the fork path claimed "snapshots are mutable" and the code refused to cache them, so every fork-from-snapshot paid a fresh ~123ms S3 GET. box-manager'sensure_derived_snapshotforks every VM from the same staging snapshot, so every single VM was re-fetching the identical bytes.Adds a bounded
HashMap<(s3_prefix, manifest_name, sequence), Arc<VolumeManifest>>next tobase_manifest_cache. Pre-populates fromsnapshot_exportso the daemon that wrote a snapshot can serve forks of it for free.2. Background the sysfs queue-tuning writes (
ublk/device.rs)wbt_lat_usec=0andscheduler=nonewere written synchronously insideregister_inner, costing ~50ms each — the block layer reconfigure is surprisingly heavy on this kernel. They're tuning hints; the device is fully functional without them.spawn_blocking-ing them off the response path saves ~100ms per device-create.3. Tick the executor before
io_uring_enter(ublk/worker_pool.rs) — the big oneThe biggest win and the most surprising bug. One block of code moved up bought 250ms per device-create.
The kernel's
ublk_ctrl_start_devblocks onwait_for_completion_interruptible(&ub->completion)until every queue'snr_io_readyreachesqueue_depth— i.e. until everyio_taskhas submitted its initialUBLK_IO_FETCH_REQuring_cmd.The worker loop order was:
handle_add_queuespawns 64io_taskfutures per queue)submit_with_args(to_wait=1, ...)← blocks for up toWORKER_IDLE_NSEC = 250msexecutor.tick()io_tasks submit their FETCH_REQ SQEs on first poll — but the first poll only ran after the io_uring_enter wait. So the worker slept the full 250ms timeout waiting for CQEs that physically couldn't arrive (no SQE submitted yet), whileSTART_DEVsat blocked on the matching completion the kernel was waiting for.Moving the executor tick to before the submit flushes the FETCH_REQ SQEs into the ring first, the submit pushes them to the kernel immediately,
ublk_mark_io_readyfires,complete_all(&ub->completion)runs, andSTART_DEVreturns essentially instantly. The 250ms kernel ioctl is now 0.5-1ms.How we found it
Structured timing logs at
target="glidefs.timing"on each step ofcreate_export,register_inner, thetokio::join!legs inapi.rs, and insideublk-core'sstart_dev(prep/wait_buf_reg/start_ioctl breakdown). These were essential to localize the 250ms — initial guesses (S3 latency, partition scan, udevblkid) were disproven by an empirical warmup experiment before we found the actual cause in the worker loop ordering.The instrumentation stays in for ongoing observability.
Incidental cleanup
ublk-coredoctests referencedlibublk::*(old vendored crate name) andUblkQueue<'_>(lifetime removed in a prior refactor). 10/10 doctests now pass.Test plan
cargo test -p glidefs --features ublk— 893 / 893 passcargo test -p ublk-core— 10 / 10 pass (including doctests)cargo clippy -p glidefs --features ublk --all-targets— no errorssystemctl reload glidefs(zero-downtime handoff) and verified create-VM E2E timing against the homelab. Numbers in the table above are from realboxman vm logstraces.executor.tick()call which isO(woken-tasks)— cheap when no work, but worth eyeballing once.)🤖 Generated with Claude Code