# Batch Requests with AIStore: Full Tutorial

The GetBatch API is a high-performance data retrieval interface that allows clients to efficiently fetch data from multiple objects in a single HTTP request rather than making individual requests for each object. This batching approach is particularly valuable for applications that need to retrieve large numbers of objects and/or files within archive contents.

The API works by accepting a batch request containing multiple object specifications (including optional parameters like archive paths for extracting specific files from archives, byte ranges for partial retrieval, and custom metadata), then processing these requests on the server side. The response can be delivered as either a streaming archive containing all requested files, or as a structured multipart response that includes both metadata about each object and the actual file contents, allowing clients to process results efficiently while maintaining detailed information about each retrieved item, including error handling for missing or inaccessible objects.

### 0. Ensure the AIStore SDK is installed and running

In [12]:
! pip show aistore | grep -E "Name|Version"

Name: aistore
Version: 1.14.0


### 1. Initialize your Client and configure your Bucket

In [13]:
import os

from aistore.sdk import Client

DEFAULT_ENDPOINT = "http://localhost:8080"
BCK_NAME = "get_batch_bck"

# Get endpoint url for AIS cluster
ais_url = os.getenv("AIS_ENDPOINT", DEFAULT_ENDPOINT)

# Create client and ensure bucket is created
# If you get retries, cannot access the cluster
client = Client(ais_url)

# Clean bucket before creation
bucket = client.bucket(BCK_NAME).delete(missing_ok=True)
bucket = client.bucket(BCK_NAME).create(exist_ok=True)

### 2. Populate bucket with a couple basic objects

In [14]:
OBJECT_NAME = "test-obj"
OBJECT_DATA = b"This is the data stored in test-obj-"
NUM_OBJECTS = 5

objects = []

# Create basic test objects
for i in range(1, NUM_OBJECTS + 1):
    obj = bucket.object(f"{OBJECT_NAME}-{i}")
    obj.get_writer().put_content(OBJECT_DATA + str(i).encode())

    objects.append(obj)

# Validate object PUT was successful
for entry in bucket.list_all_objects():
    print(entry.name, entry.size)

test-obj-1 37
test-obj-2 37
test-obj-3 37
test-obj-4 37
test-obj-5 37


### 3. Initialize batch request spec (`BatchRequest`)

In [15]:
from aistore.sdk import BatchRequest

"""
We're defining the behavior of the batch request:

* Output format: `.tar` archive
* Continue on errors: skip missing objects instead of failing
* Use streaming: return a streamable `.tar` instead of multipart content

**Note:** Creating a `BatchRequest` only builds the request spec — no data is fetched until you call `get_batch()`.
"""
batch_request = BatchRequest(output_format=".tar", continue_on_err=True, streaming=True)

# Add object to the batch request
for obj in objects:
    batch_request.add_object_request(obj)


# Verify BatchRequest has expected contents
print([item["objname"] for item in batch_request.to_dict()["in"]])

['test-obj-1', 'test-obj-2', 'test-obj-3', 'test-obj-4', 'test-obj-5']


### 4. Use `BatchLoader` to send batch requests to AIStore

In [16]:
# Initialize loader using Client
batch_loader = client.batch_loader()

# Send the batch request and receive the data
batch_iter = batch_loader.get_batch(batch_request)

# Validate batch request
for resp_item, data in batch_iter:
    # The response item will be of type BatchResponseItem
    # from aistore.sdk import BatchResponseItem
    print(f"Name: {resp_item.obj_name}, Content: {data}")

Name: test-obj-1, Content: b'This is the data stored in test-obj-1'
Name: test-obj-2, Content: b'This is the data stored in test-obj-2'
Name: test-obj-3, Content: b'This is the data stored in test-obj-3'
Name: test-obj-4, Content: b'This is the data stored in test-obj-4'
Name: test-obj-5, Content: b'This is the data stored in test-obj-5'


### 5. Uploading data across multiple archive objects in another bucket

In [17]:
BCK_NAME_ARCH = "get_batch_bck_arch"

# Create second bucket
arch_bucket = client.bucket(BCK_NAME_ARCH).delete(missing_ok=True)
arch_bucket = client.bucket(BCK_NAME_ARCH).create(exist_ok=True)

In [18]:
import tarfile
from io import BytesIO

# Create tarfile archives
NUM_ARCHIVES = 3
NUM_FILES = 5

ARCH_NAME = "archive"
FILE_NAME = "file"
FILE_DATA = b"This is the data stored in file_"

archive_objs = []

for arch_i in range(1, NUM_ARCHIVES + 1):
    tar_buffer = BytesIO()

    # For each archive, create 5 text files
    with tarfile.open(fileobj=tar_buffer, mode="w") as tar:
        for file_i in range(1, NUM_FILES + 1):
            tarinfo = tarfile.TarInfo(name=f"{FILE_NAME}_{file_i}.txt")
            tarinfo.size = len(FILE_DATA + str(file_i).encode())
            tar.addfile(tarinfo, BytesIO(FILE_DATA + str(file_i).encode()))

    # Put archive in AIStore
    obj = arch_bucket.object(f"{ARCH_NAME}-{arch_i}.tar")
    obj.get_writer().put_content(tar_buffer.getvalue())

    archive_objs.append(obj)

# Validate archives have been PUT
for obj in archive_objs:
    print(obj.name)
    for entry in arch_bucket.list_archive(obj.name):
        print("-", entry.name)

archive-1.tar
- archive-1.tar/file_1.txt
- archive-1.tar/file_2.txt
- archive-1.tar/file_3.txt
- archive-1.tar/file_4.txt
- archive-1.tar/file_5.txt
archive-2.tar
- archive-2.tar/file_1.txt
- archive-2.tar/file_2.txt
- archive-2.tar/file_3.txt
- archive-2.tar/file_4.txt
- archive-2.tar/file_5.txt
archive-3.tar
- archive-3.tar/file_1.txt
- archive-3.tar/file_2.txt
- archive-3.tar/file_3.txt
- archive-3.tar/file_4.txt
- archive-3.tar/file_5.txt


### 6. Update `BatchRequest` with new archive files

In [19]:
import random

random.seed(42)

# We can reuse our earlier batch request since they are just specs
# Since we want to fetch the other object data too, we're okay to reuse

# Add archives to the batch request
for obj in archive_objs:
    # Get random text file from each archive
    random_file_i = random.randint(1, NUM_FILES)
    batch_request.add_object_request(obj, archpath=f"{FILE_NAME}_{random_file_i}.txt")

# Verify BatchRequest has expected contents
print([item["objname"] for item in batch_request.to_dict()["in"]])

['test-obj-1', 'test-obj-2', 'test-obj-3', 'test-obj-4', 'test-obj-5', 'archive-1.tar', 'archive-2.tar', 'archive-3.tar']


### 7. Fetching data across multiple buckets + types (object AND archive)

In [20]:
# We can reuse BatchLoader too!
batch_iter = batch_loader.get_batch(batch_request)

# Validate batch request
for resp_item, data in batch_iter:
    # The response item will be of type BatchResponseItem
    # from aistore.sdk import BatchResponseItem
    print(
        f"Name: {resp_item.obj_name}, Archpath: {resp_item.archpath}, Content: {data}"
    )

Name: test-obj-1, Archpath: , Content: b'This is the data stored in test-obj-1'
Name: test-obj-2, Archpath: , Content: b'This is the data stored in test-obj-2'
Name: test-obj-3, Archpath: , Content: b'This is the data stored in test-obj-3'
Name: test-obj-4, Archpath: , Content: b'This is the data stored in test-obj-4'
Name: test-obj-5, Archpath: , Content: b'This is the data stored in test-obj-5'
Name: archive-1.tar, Archpath: file_1.txt, Content: b'This is the data stored in file_1'
Name: archive-2.tar, Archpath: file_1.txt, Content: b'This is the data stored in file_1'
Name: archive-3.tar, Archpath: file_3.txt, Content: b'This is the data stored in file_3'
