Improve speed of new client by CEbbinghaus · Pull Request #3 · ArtifactRepository/arx

CEbbinghaus · 2026-03-21T02:15:29Z

No description provided.

Key optimizations: - Add Store::from_fs_path() with direct filesystem I/O methods that bypass OpenDAL's atomic write/fsync overhead (fsync was 148s for 10K files) - Combine file read + hash computation + store write into single spawn_blocking calls to eliminate async/blocking context switches - Remove redundant exists() checks before writes (content-addressed storage makes writes idempotent) - Eliminate double-reads in restore: use entry.mode instead of fetching object header, read tree from store only once - Add read_object_into_headers_parallel() that reads entire object tree in a single blocking task instead of sequential async calls - Fix unnecessary .to_vec() clone in ArchiveBody::from_data() - Increase semaphore limit from 64 to 256 Benchmark results (10K files, 40MB total): commit: 6.064s → 0.599s (10.1x faster) pack: 2.293s → 0.085s (27.0x faster) unpack: 5.875s → 0.367s (16.0x faster) restore: 1.003s → 0.259s (3.9x faster) Root cause: OpenDAL's Fs backend does fsync() after every write and creates parent directories per-file. For content-addressable storage, neither is necessary — partial writes are harmless (hash won't match on read) and all objects are flat files in a single directory. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Make push_cache upload objects concurrently (was sequential) - Make pull_tree download blob objects concurrently (tree reads stay sequential since we need to discover children) - Use direct blocking I/O for index reads in pack, push-archive, restore (avoids OpenDAL async overhead) - Optimize download_object exists check to use direct blocking read instead of async get_object + full body consumption Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Index::to_data(): estimate capacity from metadata size - Tree::to_data(): pre-allocate based on entry count (92 bytes avg) - Tree::from_data(): estimate entry count from data size Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Key changes: - Eliminate TOCTOU exists-then-get patterns in read_index, get_object, get_metadata - Use get_raw_bytes (single read) instead of get_object+read_to_end (stream reader) - Concurrent blob header reads in collect_entry_metadata and tree_walk - Direct I/O for upload_archive and check_missing on Fs backend - Concurrent upload writes for non-Fs backends - Server Fs backend uses from_fs_path for direct I/O support - Derive PartialEq/Eq on Mode enum Benchmarks (10K files, 40MB): GET /metadata: 1.920s → 0.689s (2.8x) GET /archive: 4.266s → 0.613s (7.0x) GET /zip: 3.850s → 1.619s (2.4x) POST /missing: 0.589s → 0.179s (3.3x) POST /upload: 0.648s → 0.037s (17.5x) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… speedup Add blocking tree walk implementations that bypass async runtime overhead: - collect_tree_metadata_blocking: single blocking task for metadata endpoint - collect_entry_metadata_blocking: single blocking task for archive/supplemental - collect_objects_blocking: blocking recursive tree walk Automatically selects blocking path when store has direct I/O support. Async path with concurrent blob reads retained for S3/Memory backends. Server benchmarks (10K files, Fs backend): GET /metadata: 1.920s → 0.095s (20x) GET /archive: 4.266s → 0.103s (41x) GET /zip: 3.850s → 1.711s (2.3x, deflate CPU-bound) POST /missing: 0.589s → 0.055s (10.7x) POST /upload: 0.648s → 0.047s (13.8x) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

CEbbinghaus and others added 5 commits March 21, 2026 04:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve speed of new client#3

Improve speed of new client#3
CEbbinghaus wants to merge 5 commits intoRewriteClientfrom
feat/ImproveSpeed

CEbbinghaus commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CEbbinghaus commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant