Disk buffer instrumented by xgreenx · Pull Request #540 · FuelLabs/data-systems

xgreenx · 2026-04-19T01:07:54Z

No description provided.

Add instrumentation to track object lifetimes via AtomicI64 counters that only decrement from Drop impls, proving actual deallocation. Tracked types: GraphqlFetcher, BlockStream, AvroFileWriters, AvroFileWriter, FinalizedBatchFiles, plus tokio alive task count. Counters log every 100 run() iterations — any counter trending upward over time confirms a leak of that object type.

cursor · 2026-04-19T01:08:00Z

PR Summary

Medium Risk
Changes the core sv-dune ingestion/upload pipeline to use on-disk Avro buffering and new file-streaming S3 uploads, which could affect correctness and retry behavior under failures. Also alters stream reconnection behavior by recreating GraphQL fetchers and tightening channel capacities, which may impact stability if misconfigured.

Overview
Switches sv-dune batch handling from in-memory accumulation to a disk-backed Avro buffer. New DiskBuffer writes blocks/txs/receipts to temporary Avro files (with per-block flush) and finalizes to file paths for upload, then resets to prepare the next batch.

Adds streaming uploads to avoid loading large Avro files into memory. Processor gains process_data_from_file, S3Storage adds store_from_file (streaming put_object or multipart upload from disk), and the service uploads blocks/transactions/receipts sequentially from the finalized files.

Introduces leak instrumentation and reconnection hardening. Adds alloc_counter plus TrackedFetcher/TrackedStream, recreates GraphqlFetcher on each reconnect with reduced channel capacities, and periodically logs counters (including tokio alive task metrics); also tightens multipart upload handling by requiring ETags and making abort best-effort.

Also updates the dune Docker image to run as nobody and installs extra debugging tools, and bumps a few lockfile dependencies plus adds tempfile for new tests.

^{Reviewed by Cursor Bugbot for commit c07dea3. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit c07dea3. Configure here.}

cursor · 2026-04-19T01:12:20Z

+        let file_path = self.create_output(data, &key).await?;
+        tracing::info!("New file saved: {}", file_path);
+        Ok(file_path)
+    }


Unused process_data method is dead code

Low Severity

The newly added process_data method on Processor is never called anywhere in the codebase. Only process_data_from_file is used (by process_finalized_batch in service.rs). This is dead code that adds maintenance burden without providing value.

^{Reviewed by Cursor Bugbot for commit c07dea3. Configure here.}

cursor · 2026-04-19T01:12:20Z

+            // Read a chunk
+            let bytes_read = file.read(&mut buffer).map_err(|e| {
+                StorageError::StoreError(format!("Failed to read file: {}", e))
+            })?;


Short reads may produce undersized S3 multipart parts

Low Severity

upload_multipart_from_file uses a single file.read(&mut buffer) call per chunk, which is not guaranteed to fill the buffer. Read::read may return fewer bytes than requested. S3 requires all non-final multipart parts to be at least 5 MiB; an undersized part would cause complete_multipart_upload to fail with EntityTooSmall. A read loop or Read::read_exact (with EOF handling for the last chunk) would be more robust.

^{Reviewed by Cursor Bugbot for commit c07dea3. Configure here.}

mchristopher and others added 14 commits February 4, 2026 10:58

feat: add disk based event buffering

c89c5b5

fix: clean up docker

a7fcecc

fix: lint

d9d3ded

fix: bump deps

ede6c42

feat: buffer avro to disk as well

713bf8b

fix: handle buffer clears correctly

f5056ce

fix: clean up s3 uploads

a75c47d

fix: memory leak

64e6de2

feat: no intermediate json

b1bd195

fix: clean up lib import

a4194d5

feat: add ps to docker image

d02b817

fix: memory leak

15c10e3

feat: testnet dune memory changes

452d9eb

xgreenx assigned mchristopher and Voxelot Apr 19, 2026

xgreenx merged commit e4899cb into main Apr 19, 2026
7 of 8 checks passed

xgreenx deleted the disk-buffer-instrumented branch April 19, 2026 01:11

cursor Bot reviewed Apr 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk buffer instrumented#540

Disk buffer instrumented#540
xgreenx merged 14 commits intomainfrom
disk-buffer-instrumented

xgreenx commented Apr 19, 2026

Uh oh!

cursor Bot commented Apr 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 19, 2026

Uh oh!

cursor Bot Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xgreenx commented Apr 19, 2026

Uh oh!

cursor Bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 19, 2026

Choose a reason for hiding this comment

Unused process_data method is dead code

Uh oh!

cursor Bot Apr 19, 2026

Choose a reason for hiding this comment

Short reads may produce undersized S3 multipart parts

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cursor Bot commented Apr 19, 2026 •

edited

Loading

Unused `process_data` method is dead code