docs: update blob downloader readme

Nahemah1022 · Nahemah1022 · commit bc6e0e02171e · 2025-12-02T16:59:54.000-08:00
Signed-off-by: Tony Chen &lt;a122774007@gmail.com&gt;
diff --git a/docs/blob_downloader.md b/docs/blob_downloader.md
@@ -1,63 +1,35 @@
-## Background
-
-AIStore supports multiple ways to populate itself with existing datasets, including (but not limited to):
-
-* **on demand**, often during the first epoch;
-* **copy** entire bucket or its selected virtual subdirectories;
-* **copy** multiple matching objects;
-* **archive** multiple objects
-* **prefetch** remote bucket or parts of thereof;
-* **download** raw http(s) addressable directories, including (but not limited to) Cloud storages;
-* **promote** NFS or SMB shares accessible by one or multiple (or all) AIS target nodes;
-
-> The on-demand "way" is maybe the most popular, whereby users just start running their workloads against a [remote bucket](/docs/providers.md) with AIS cluster positioned as an intermediate fast tier.
-
-But there's more. In particular, v3.22 introduces a special facility to download very large remote objects a.k.a. BLOBs.
-
-We call this (new facility):
-
 ## Blob Downloader
 
-AIS blob downloader features multiple concurrent workers - chunk readers - that run in parallel and, well, read certain fixed-size chunks from the remote object.
-
-User can control (or tune-up) the number of workers and the chunk size(s), among other configurable tunables. The tunables themselves are backed up by system defaults - in particular:
-
-| Name | Comment |
-| --- | --- |
-| default chunk size  | 2 MiB |
-| minimum chunk size  | 32 KiB |
-| maximum chunk size  | 16 MiB |
-| default number of workers | 4 |
-
-In addition to massively parallel reading (**), blob downloader also:
+Blob downloader is AIStore's facility for **downloading large remote objects (BLOBs)** using **concurrent range-reads**.  
+Instead of pulling a 10–100+ GiB object with a single sequential stream, blob downloader:
 
-* stores and _finalizes_ (checksums, replicates, erasure codes - as per bucket configuration) downloaded object;
-* optionally(**), concurrently transmits the loaded content to requesting user.
+- **splits the object into chunks** (configurable chunk size),
+- **fetches those chunks in parallel** from the remote backend (configurable number of workers),
+- **writes them directly into AIStore's chunked object layout** so all target disks are writing in parallel, effectively aggregating the full disk write bandwidth of the node.
 
-> (**) assuming sufficient and _not_ rate-limited network bandwidth
+![Blob Downloader](/docs/assets/blob_downloader/blob_downloader_workflow.png)
 
-> (**) see [GET](#2-get-via-blob-downloader) section below
+The result is that, beyond a certain object size, blob downloader can deliver **much higher throughput** than a regular cold GET. In our internal benchmarks, a 4 GiB S3 object fetched with blob downloader was up to **4× faster** than a monolithic cold GET.
 
-## Flavors
+Blob downloader is also **load‑aware**: it consults AIStore's internal load advisors to avoid overcommitting memory or disks, backing off when the node is under pressure and running at full speed when the system has headroom.
 
-For users, blob downloader is currently(**) available in 3 distinct flavors:
+For a deeper dive into the internals and detailed benchmarks, see the [blog post](https://aistore.nvidia.com/blog/2025/11/26/blob-downloader).
 
-| Name | Go API | CLI |
-| --- | --- | --- |
-| 1. `blob-download` job | [api.BlobDownload](https://github.com/NVIDIA/aistore/blob/main/api/blob.go) | `ais blob-download`  |
-| 2. `GET` request | [api.GetObject](https://github.com/NVIDIA/aistore/blob/main/api/object.go) and friends  | `ais get`  |
-| 3. `prefetch` job | [api.Prefetch](https://github.com/NVIDIA/aistore/blob/main/api/multiobj.go) | `ais prefetch`  |
+---
 
+## Usage
 
-> (**) There's a plan to integrate blob downloader with [Internet Downloader](downloader.md) and, generally, all supported mechanisms that one way or another read remote objects and files.
+AIStore exposes blob download functionality through three distinct interfaces, each suited to different use cases.
 
-> (**) At the time of this writing, none of the above is supported (yet) in our [Python SDK](https://github.com/NVIDIA/aistore/tree/main/python/aistore/sdk).
+- **Single object blob-download job** – explicitly start a blob-download job for one or more objects.
+- **Prefetch + blob-threshold** – route large objects in the prefetch job through blob downloader.
+- **Streaming GET** – stream a large object from blob downloader while it is being cached in AIS.
 
-Rest of this text talks separately about each of the 3 "flavors" providing additional details, insights, and context.
+### 1. Single object blob-download job
 
-## 1. Usage
+Use this when you want direct control over which/how objects are fetched with blob downloader.
 
-To put some of the blob downloader's functionality into immediate perspective, let's see some CLI:
+**Help and options**:
 
 ```console
 $ ais blob-download --help
@@ -72,100 +44,126 @@ USAGE:
    ais blob-download BUCKET/OBJECT_NAME [command options]
 
 OPTIONS:
-   --chunk-size value   Chunk size in IEC or SI units, or "raw" bytes (e.g.: 4mb, 1MiB, 1048576, 128k; see '--units')
-   --latest             Check in-cluster metadata and, possibly, GET, download, prefetch, or otherwise copy the latest object version
-                        from the associated remote bucket;
-                        the option provides operation-level control over object versioning (and version synchronization)
-                        without the need to change the corresponding bucket configuration: 'versioning.validate_warm_get';
-                        see also:
-                          - 'ais show bucket BUCKET versioning'
-                          - 'ais bucket props set BUCKET versioning'
-                          - 'ais ls --check-versions'
-                        supported commands include:
-                          - 'ais cp', 'ais prefetch', 'ais get'
-   --list value         Comma-separated list of object or file names, e.g.:
-                        --list 'o1,o2,o3'
-                        --list "abc/1.tar, abc/1.cls, abc/1.jpeg"
-                        or, when listing files and/or directories:
-                        --list "/home/docs, /home/abc/1.tar, /home/abc/1.jpeg"
-   --non-verbose, --nv  Non-verbose (quiet) output, minimized reporting, fewer warnings
-   --num-workers value  Number of concurrent blob-downloading workers (readers); system default when omitted or zero (default: 0)
-   --progress           Show progress bar(s) and progress of execution in real time
-   --refresh value      Time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
-                        valid time units: ns, us (or µs), ms, s (default), m, h
-   --timeout value      Maximum time to wait for a job to finish; if omitted: wait forever or until Ctrl-C;
-                        valid time units: ns, us (or µs), ms, s (default), m, h
-   --wait               Wait for an asynchronous operation to finish (optionally, use '--timeout' to limit the waiting time)
-   --help, -h           Show help
+   chunk-size value   Chunk size in IEC or SI units, or "raw" bytes (e.g.: 4mb, 1MiB, 1048576, 128k)
+   num-workers value  Number of concurrent blob-downloading workers (readers); system default when omitted or zero (default: 0)
+   list value         Comma-separated list of object or file names
+   latest             Check and optionally synchronize the latest object version from the remote bucket
+   progress           Show progress bar(s) in real time
+   wait               Block until the job finishes (optionally use '--timeout' to limit waiting time)
+   ...
 ```
 
-## 2. GET via blob downloader
+**Examples**:
 
-Some of the common use cases boil down to the following:
+- **Single large object**
 
-* user "knows" the size of an object to be read (or downloaded) from remote (cold) storage;
-* there's also an idea of a certain size _threshold_ beyond which the latency of the operation becomes prohibitive.
+  ```console
+  $ ais blob-download s3://my-bucket/large-model.bin \
+        --chunk-size 4MiB \
+        --num-workers 8 \
+        --wait --progress
+  ```
 
-Thus, when the size in question is greater than the _threshold_ there's a motivation to speed up.
+- **Multiple objects in one job**
 
-To meet this motivation, AIS now supports `GET` request with additional (and optional) http headers:
+  ```console
+  $ ais blob-download s3://my-bucket \
+        --list "obj1.tar,obj2.bin,obj3.dat" \
+        --chunk-size 8MiB \
+        --num-workers 4 \
+        --wait --progress
+  ```
 
-| Header | Values (examples) | Comments |
-| --- | --- | --- |
-| `ais-blob-download` | "true", ""  | NOTE: to engage blob downloader, this http header must be present and must be "true" (or "y", "yes", "on" case-insensitive) |
-| `ais-blob-chunk` | "1mb", "1234567", "128KiB"  | [system defaults](#blob-downloader) above |
-| `ais-blob-workers` | "3", "7", "16"  | ditto |
+### 2. Prefetch with blob-threshold
 
-* HTTP headers that AIStore recognizes and supports are always prefixed with "ais-". For the most recently updated list (of headers), please see [the source](https://github.com/NVIDIA/aistore/blob/main/api/apc/headers.go).
+`prefetch` is AIStore's **multi‑object “warm‑up” job** for remote buckets. When you add a **blob size threshold**, it automatically decides which objects are large enough to benefit from blob downloader:
 
-## 3. Prefetch remote buckets w/ blob size threshold
+- Objects **≥ `--blob-threshold`** are fetched via blob downloader (parallel range‑reads, chunked writes).
+- Objects **< `--blob-threshold`** are fetched with the normal cold GET path.
 
-`Prefetch` is another batch operation, one of the supported job types that can be invoked both via Go or Python call, or command line.
+This lets you get the large‑object gains of blob downloader by just tuning prefetch's knobs.
 
-The idea of size threshold applies here as well, with the only difference being the _scope_: single object in [GET](#2-get-via-blob-downloader), all matching objects in `prefetch`.
-
-> The `prefetch` operation supports multi-object selection via the usual `--list`, `--template`, and `--prefix` options.
-
-But first thing first, let's see an example.
+**Example**:
 
 ```console
-$ ais ls s3://abc
+# Inspect a remote bucket
+$ ais ls s3://my-bucket
 NAME             SIZE            CACHED
-aisloader        39.30MiB        no
-largefile        5.76GiB         no
-smallfile        100.00MiB       no
+model.ckpt       12.50GiB        no
+dataset.tar      8.30GiB         no
+config.json      4.20KiB         no
+
+# Prefetch with 1 GiB threshold:
+# - objects ≥ threshold use blob downloader (parallel chunks)
+# - objects < threshold use standard cold GET
+$ ais prefetch s3://my-bucket \
+      --blob-threshold 1GiB \
+      --blob-chunk-size 8MiB \
+      --wait --progress
+prefetch-objects[E-abc123]: prefetch entire bucket s3://my-bucket
 ```
 
-Given the bucket (above), we now run `prefetch` with 1MB size threshold:
+Key prefetch options:
 
-```console
-$ ais prefetch s3://abc --blob-threshold 1mb
-prefetch-objects[E-w0gjdm1z]: prefetch entire bucket s3://abc. To monitor the progress, run 'ais show job E-w0gjdm1z'
-```
+- **`--blob-threshold SIZE`**: turn blob downloader on for objects at/above `SIZE`.
+- **`--blob-chunk-size SIZE`** (if available in your build): override default blob chunk size for this prefetch.
+- **`--prefix` / `--list` / `--template`**: scope which objects are prefetched.
 
-But notice, `prefetch` stats do not move:
+### 3. Streaming GET (Python SDK Only)
 
-```console
-$ ais show job E-w0gjdm1z
-NODE             ID              KIND                 BUCKET     OBJECTS      BYTES       START        END     STATE
-CAHt8081         E-w0gjdm1z      prefetch-listrange   s3://abc   -            -           10:08:24     -       Running
-```
+In addition to CLI jobs, blob downloader can be used to stream large objects while they are concurrently downloaded in the cluster. This is useful when you want to feed data directly into an application (for example, model loading or preprocessing) and still keep a local cached copy in AIS.
 
-And that is because it is the blob downloader that actually does all the work behind the scenes:
+```python
+from aistore import Client
+from aistore.sdk.blob_download_config import BlobDownloadConfig
 
-```console
-$ ais show job blob-download
-blob-download[lP3Lpe5jJ]
-NODE             ID              KIND            BUCKET         OBJECTS      BYTES        START        END     STATE
-CAHt8081         lP3Lpe5jJ       blob-download   s3://abc       -            20.00MiB     10:08:25     -       Running
-```
+# Set up AIS client and bucket
+client = Client("AIS_ENDPOINT")
+bucket = client.bucket(name="my_bucket", provider="aws")
 
-The work that shortly thereafter results in:
+# Configure blob downloader (4 MiB chunks, 16 workers)
+blob_cfg = BlobDownloadConfig(chunk_size="4MiB", num_workers="16")
 
-```console
-$ ais ls s3://abc
-NAME             SIZE            CACHED
-aisloader        39.30MiB        yes
-largefile        5.76GiB         yes
-smallfile        100.00MiB       yes
+# Stream large object using blob downloader settings
+reader = bucket.object("my_large_object").get_reader(blob_download_config=blob_cfg)
+data = reader.read_all()
 ```
+
+---
+
+## Selecting an effective blob-threshold for prefetch
+
+The ideal `--blob-threshold` depends on your cluster (CPU, disks, network), backend (S3/GCS/…​), and object size distribution.  
+Running full `prefetch` experiments for many candidate values can easily take **hours**, so instead we recommend using a **shorter single‑object blob-download benchmark** to pick a good starting point and then using that value directly in your prefetch job.
+
+To do this in practice, **compare cold GET vs. blob-download on a single object**:
+
+1. **Pick a representative large remote object** in your bucket (for example, a model shard or big archive).
+2. **Evict it from AIStore** to ensure a cold path:
+
+   ```console
+   $ ais evict s3://my-bucket --list "large-model.bin"
+   ```
+
+3. **Measure cold GET time** for that object:
+
+   ```console
+   $ time ais get s3://my-bucket/large-model.bin /dev/null
+   ```
+
+4. **Measure blob-download time** for the same object:
+
+   ```console
+   $ ais evict s3://my-bucket --list "large-model.bin"
+
+   $ time ais blob-download s3://my-bucket/large-model.bin --wait
+   ```
+
+5. Repeat the above for a few object sizes (for example: 64 MiB, 256 MiB, 1 GiB, 4 GiB) until you see a pattern:
+
+- **Below some size**, cold GET is as fast or faster (blob overhead dominates).
+- **Above that size**, blob-download is consistently faster.
+
+The **crossover size** where blob-download _wins_ is your **blob-threshold** for prefetch: use that size as `--blob-threshold` when you run your real `ais prefetch` job. This single‑object comparison gives you a quick, reasonable approximation.
+
+In our internal 1.56 TiB S3 benchmark, applying this method led us to a threshold of about **256 MiB**. This value provided the best trade‑off for that specific cluster and workload and delivered roughly **2.3× faster** end‑to‑end prefetch compared to a pure cold‑GET baseline.