Skip to content

Improve speed of new client#3

Open
CEbbinghaus wants to merge 5 commits intoRewriteClientfrom
feat/ImproveSpeed
Open

Improve speed of new client#3
CEbbinghaus wants to merge 5 commits intoRewriteClientfrom
feat/ImproveSpeed

Conversation

@CEbbinghaus
Copy link
Copy Markdown
Collaborator

No description provided.

CEbbinghaus and others added 5 commits March 21, 2026 04:32
Key optimizations:
- Add Store::from_fs_path() with direct filesystem I/O methods that
  bypass OpenDAL's atomic write/fsync overhead (fsync was 148s for 10K files)
- Combine file read + hash computation + store write into single
  spawn_blocking calls to eliminate async/blocking context switches
- Remove redundant exists() checks before writes (content-addressed
  storage makes writes idempotent)
- Eliminate double-reads in restore: use entry.mode instead of
  fetching object header, read tree from store only once
- Add read_object_into_headers_parallel() that reads entire object
  tree in a single blocking task instead of sequential async calls
- Fix unnecessary .to_vec() clone in ArchiveBody::from_data()
- Increase semaphore limit from 64 to 256

Benchmark results (10K files, 40MB total):
  commit:  6.064s → 0.599s  (10.1x faster)
  pack:    2.293s → 0.085s  (27.0x faster)
  unpack:  5.875s → 0.367s  (16.0x faster)
  restore: 1.003s → 0.259s  (3.9x faster)

Root cause: OpenDAL's Fs backend does fsync() after every write and
creates parent directories per-file. For content-addressable storage,
neither is necessary — partial writes are harmless (hash won't match
on read) and all objects are flat files in a single directory.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Make push_cache upload objects concurrently (was sequential)
- Make pull_tree download blob objects concurrently (tree reads stay
  sequential since we need to discover children)
- Use direct blocking I/O for index reads in pack, push-archive,
  restore (avoids OpenDAL async overhead)
- Optimize download_object exists check to use direct blocking read
  instead of async get_object + full body consumption

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Index::to_data(): estimate capacity from metadata size
- Tree::to_data(): pre-allocate based on entry count (92 bytes avg)
- Tree::from_data(): estimate entry count from data size

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Key changes:
- Eliminate TOCTOU exists-then-get patterns in read_index, get_object, get_metadata
- Use get_raw_bytes (single read) instead of get_object+read_to_end (stream reader)
- Concurrent blob header reads in collect_entry_metadata and tree_walk
- Direct I/O for upload_archive and check_missing on Fs backend
- Concurrent upload writes for non-Fs backends
- Server Fs backend uses from_fs_path for direct I/O support
- Derive PartialEq/Eq on Mode enum

Benchmarks (10K files, 40MB):
  GET /metadata:    1.920s → 0.689s (2.8x)
  GET /archive:     4.266s → 0.613s (7.0x)
  GET /zip:         3.850s → 1.619s (2.4x)
  POST /missing:    0.589s → 0.179s (3.3x)
  POST /upload:     0.648s → 0.037s (17.5x)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… speedup

Add blocking tree walk implementations that bypass async runtime overhead:
- collect_tree_metadata_blocking: single blocking task for metadata endpoint
- collect_entry_metadata_blocking: single blocking task for archive/supplemental
- collect_objects_blocking: blocking recursive tree walk

Automatically selects blocking path when store has direct I/O support.
Async path with concurrent blob reads retained for S3/Memory backends.

Server benchmarks (10K files, Fs backend):
  GET /metadata:  1.920s → 0.095s (20x)
  GET /archive:   4.266s → 0.103s (41x)
  GET /zip:       3.850s → 1.711s (2.3x, deflate CPU-bound)
  POST /missing:  0.589s → 0.055s (10.7x)
  POST /upload:   0.648s → 0.047s (13.8x)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant