This repository was archived by the owner on Mar 3, 2026. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 17
This repository was archived by the owner on Mar 3, 2026. It is now read-only.
feat: Use object store and async, byte-range reads #465
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Currently, compute uses the S3 client to retrieve Parquet files before reading them. We have started a transition to using https://docs.rs/object_store/latest/object_store/ which supports (a) reading from multiple object stores and (b) doing a direct byte-range read without fetching the file locally first.
We should finish up this migration to fully benefit from from object_store.
- Call abort_multipart on failure after failures
- Store Parquet metadata in PreparedFile proto to avoid multi-step reads #505
- Cleanup: Pass object store URLs as
ObjectStoreUrlrather than&strorString - Consider how we keep the object stores (and credentials) separate in multi tenant case.
- For reading files during compute (feat: Read directly during compute #471)
- For writing files during prepare (feat: Prepare directly to object stores #475)
- For writing metadata flies during prepare (feat: Prepare directly to object stores #475)
- For reading metadata files during compute (feat: Use object_store for metadata #476)
- For determining file schemas (feat: Use metadata for retrieving the schema #479)
- For writing CSV files during compute (moved to feat: Async Read/Write via object_store for CSV #486)
- For writing Parquet files during compute (feat: Use object_store to write files #492)
- For reading files during prepare (rather than copying to disk) (feat: Use object_store to read prepare inputs #495)
- Make the
keymethod andObjectStoreCrateprivate (ref: Cleanup object_store code a bit #501) - For reading (or at least fetching) the incremental checkpoint (rocksdb) (feat: Use object store for rocksdb and debugging #503)
- For writing (or at least uploading) the incremental checkpoint (rocksdb) (feat: Use object store for rocksdb and debugging #503)
- For uploading the plan yaml and flight records (feat: Use object store for rocksdb and debugging #503)
- Remove s3 helper and s3 crates (feat: Use object store for rocksdb and debugging #503)
- Delete
ConvertURImethods (https://github.com/kaskada-ai/kaskada/blob/main/wren/compute/helpers.go#L19-L23) (feat: Use object store for rocksdb and debugging #503)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request