Skip to content

Releases: NVIDIA/aistore

3.28

10 May 18:05
Compare
Choose a tag to compare

The latest AIStore release, version 3.28, arrives nearly three months after the previous release. As always, v3.28 maintains compatibility with the previous version. We fully expect it to upgrade cleanly from earlier versions.

This release delivers significantly improved ETL offload with a newly added WebSocket communicator and optimized data flows between ETL pods and AIS targets in a Kubernetes cluster.

For Python users, we added resilient retrying logic that maintains seamless connectivity during lifecycle events - the capability that can be critical when running multi-hour training workloads. We've also improved JobStats and JobSnapshot models, added MultiJobSnapshot, extended and fixed URL encoding, and added props accessor method to the Object class.

Python SDK's ETL has been extended with a new ETL server framework that provides three Python-based web server implementations: FastAPI, Flask, and HTTPMultiThreadedServer.

Separately, 3.28 adds a dual-layer rate-limiting capability with configurable support for both frontend (client-facing) and backend (cloud-facing, adaptive) operation.

On the CLI side, there are multiple usability improvements listed below. Users now have further improved list-objects (ais ls) operation, amended and improved inline helps and CLI documentation. The ais show job command now displays cluster-wide objects and bytes totals for distributed jobs.

Enhancements to observability are also detailed below and include new metrics to track rate-limited operations and extended job statistics. Most of the supported jobs will now report a j-w-f metric: number of mountpath joggers, number of (user-specified) workers, and a work-channel-full count.

Other improvements include new (and faster) content checksum, fast URL parsing (for Go API), optimized buffer allocation for multi-object operations and ETL, support for Unicode and special characters in object names. We've refactored and micro-optimized numerous components, and amended numerous docs, including the main readme and overview.

Last but not least, for better networking parallelism, we now support multiple long-lived peer-to-peer connections. The number of connections is configurable, and the supported batch jobs include distributed sort, erasure coding, multi-object and bucket-to-bucket copy, ETL, global rebalance, and more.

Table of Contents

Assorted commits for each section are also included below with detailed changelog available at this link.

Configuration Changes

We made several additions to global (cluster-wide) and bucket configuration settings.

Multiple xactions (jobs) now universally include a standard configuration triplet that provides for:

  • In-flight compression
  • Minimum size of work channel(s)
  • Number of peer-to-peer TCP connections (referred to as stream bundle multiplier)

The following jobs are now separately configurable at the cluster level:

  • EC (Erasure Coding)
  • Dsort (Distributed Shuffle)
  • Rebalance (Global Rebalance)
  • TCB (Bucket-to-Bucket Copy/Transform)
  • TCO (Multi-Object Copy/Transform)
  • Archive (Multi-Object Archiving or Sharding)

In addition, EC is also configurable on a per-bucket basis, allowing for further fine-tuning.

Commit Highlights

  • 15cf1ca: Add backward compatible config and BMD changes.
  • fc3d8f3: Add cluster config (tco, arch) sections and tcb burst.
  • 7b46f0c: Update configuration part two.
  • 8c49b6b: Add rate-limiting sections to global configuration and BMD (bucket metadata).
  • 15d4ed5: [config change] and [BMD change]: the following jobs now universally support XactConf triplet

New Default Checksum

AIStore 3.28 adds a new default content checksum. While still using xxhash, it uses a different implementation that delivers better performance in large-size streaming scenarios.

The system now makes a clear internal delineation between classic xxhash for system metadata (node IDs, bucket names, object metadata, etc.) and cespare/xxhash (designated as "xxhash2" in configuration) for user data. All newly created buckets now use "xxhash2" by default.

Benchmark tests show improved performance with the new implementation, especially for large objects and streaming operations.

Commit Highlights

  • a045b21: Implement new content checksum.
  • 7b69dc5: Update modules for new content checksum.
  • 9fa9265: Refine cespare vs one-of-one implementation.
  • d630c1f: Add cespare to hash micro-benchmark.

Rate Limiting

Version 3.28 introduces rate-limiting capability that operates at both frontend (client-facing) and backend (cloud-facing) layers.

On the frontend, each AIS proxy enforces configurable limits with burst capacity allowance. You can set different limits for each bucket based on its usage patterns, with separate configurations for GET, PUT, and DELETE operations.

For backend operations, the system implements an adaptive rate shaping mechanism that dynamically adjusts request rates based on cloud provider responses. This approach prevents hitting cloud provider limits proactively and implements exponential backoff when 429/503 responses are received. The implementation ensures zero overhead when rate limiting is disabled.

Configuration follows a hierarchical model with cluster-wide defaults that can be overridden per bucket. You can adjust intervals, token counts, burst sizes, and retry policies without service disruption.

Commit Highlights

  • e71c2b1: Implemented frontend/backend dual-layer rate limiting system.
  • 9f4d321: Added per-bucket overrides and exponential backoff for cloud 429 errors.
  • 12e5787: Not rate-limiting remote bucket with no props.
  • be309fd: docs: add rate limiting readme and blog.
  • b011945: Rate-limited backend: complete transition.
  • fcba62b: Rate-limit: add stats; prometheus metrics.
  • 8ee8b44: Rate-limited backend; context to propagate vlabs; prefetch.
  • c4b796a: Enable/disable rate-limited backends.
  • 666796f: Core: rate-limited backends (major update).

ETL

ETL (Extract, Transform, Load) is a cornerstone feature designed to execute transformations close to the data with an extremely high level of node-level parallelism across all nodes in the AIS cluster.

WebSocket

Version 3.28 adds WebSocket (ws://) as yet another fundamental communication mechanism between AIS nodes and ETL containers, complementing the existing HTTP and IO (STDIN/STDOUT) communications.

The WebSocket implementation supports multiple concurrent connections per transform session, preserves message order and boundaries for reliable communication, and provides stateful session management for long-lived, per-xaction sessions.

Direct PUT

The release implements a new direct PUT capability for ETL transformations that optimizes the data flow between components. Traditionally, data would flow from a source AIS target to an ETL container, back to the source AIS target, and finally to the destination target. With direct PUT, data flows directly from the ETL container to the destination AIS target.

Stress tests show 3x to 5x performance improvement with direct PUT enabled. This capability is available acro...

Read more

3.27

15 Feb 22:34
Compare
Choose a tag to compare

Changelog

List objects

  • "skip-lookup followed by list-remote (fix)" 7762faf2c
    • refactor and fix the relevant snippet
    • main loop: check 'aborted' also after getting next page

CLI

  • "show all supported feature-flags and descriptions (usability)" 9f222a215
    • cluster scope and bucket scope
    • set and show operations
    • color current (set) features
    • update readme
  • "colored help (all variations)" e95f3ac7f628
    • commands, subcommands, and the app ('ais --help') itself
    • a bunch of colored templates and wiring
    • separately, more pagination: replace memsys with simple buffer
    • with refactoring and cleanup

Python & ETL

  • "fix ETL tests" b639c0d68
  • "feat: add 'max_pool_size' parameter to Client and SessionManager" 8630b853b
  • "[Go API change] extend api.ETLObject - add transform args" 4b434184a
    • add etl_args argument
    • add TestETLInlineObjWithMetadata integration test
  • "add ETL transformation args QParam support" 8d9f2d11ae7d
    • introduce ETLConfig dataclass to encapsulate ETL-related parameters.
    • update get_reader and get methods to support ETLConfig, ensuring consistent handling of ETL metadata.
    • add ETL-related query parameter (QPARAM_ETL_ARGS) in Python SDK.
    • refactor get_reader and get to use the new ETL configuration approach.

Build & Lint

  • "upgrade all OSS packages" 1b65a37a6
  • "assorted lint; align fields" a5f7cfea1
    • build: add trimpath
    • production mode:
    • go build -trimpath
    • -all executables, including aisnode and cli
  • "downgrade all aws-sdk-go-v2 packages" 462d7f4

3.26

08 Feb 01:37
Compare
Choose a tag to compare

Version 3.26 arrives 4 months after the previous release and contains more than 400 commits.

The core changes in v3.26 address the last remaining limitations. A new scrub capability has been added, supporting bidirectional diffing to detect remote out-of-band deletions and version changes. The cluster can now also reload updated user credentials at runtime without requiring downtime.

Enhancements to observability are detailed below, and performance improvements include memory pooling for HTTP requests, global rebalance optimizations, and micro-optimizations across the codebase. Key fixes include better error-handling logic (with a new category for IO errors and improvements to the filesystem health checker) and enhanced object metadata caching.

The release also introduces the ability to resolve split-brain scenarios by merging splintered clusters. When and if a network partition occurs and two islands of nodes independently elect primaries, the "set primary with force" feature enables the administrative action of joining one cluster to another, effectively restoring the original node count. This functionality provides greater control for handling extreme and unlikely events that involve network partitioning.

On the CLI side, users can now view not only the fact that a specific UUID-ed instance of operations like prefetch, copy, etl, or rebalance is running, but also the exact command line that was used to launch the batch operation. This makes it easier to track and understand batch job context.

For the detailed changelog, please see link.

Table of Contents


CLI

The CLI in v3.26 features revamped inline help, reorganized command-line options with clearer descriptions, and added usage examples. Fixes include support for multi-object PUT with client-side checksumming and universal prefix support for all multi-object commands.

A notable new feature is the ais scrub command for validating in-cluster content. Additionally, the ais performance command has received several updates, including improved calculation of cluster-wide throughput. Top-level commands and their options have been reorganized for better clarity.

The ais scrub command in v3.26 focuses on detection rather than correction. It detects:

  • Misplaced objects (cluster-wide or within a specific multi-disk target)
  • Objects missing from the remote backend, and vice versa
  • In-cluster objects that no longer exist remotely
  • Objects with insufficient replicas
  • Objects larger or smaller than a specified size

The command generates both summary statistics and detailed reports for each identified issue. However, it does not attempt to fix misplaced or corrupted objects (those with invalid checksums). The ability to correct such issues is planned for v3.27.

For more details, see the full changelog here.


Observability

Version 3.26 includes several important updates. Prometheus metrics are now updated in real-time, eliminating the previous periodic updates via the prometheus.Collect interface.

Latencies and throughputs are no longer published as internally computed metrics; instead, .ns.total (nanoseconds) and .size (bytes) metrics are used to compute latency and throughput based on time intervals controlled by the monitoring client.

Default Prometheus go_* counters and gauges, including metrics for tracking goroutines and garbage collection, have been removed.

In addition to the total aggregated metrics, separate latency and throughput metrics are now included for each backend.

Metrics resulting from actions on a specific bucket now include the bucket name as a Prometheus variable label.

In-cluster writing generated by xactions (jobs) also now includes xaction labels, including the respective kind and ID, which results in more PUT metrics, including those not generated from user PUT requests.

Finally, all GET, PUT, and DELETE errors include the bucket label, and FSHC-related IO errors now include the mount path (faulty disk) label.

Commit Highlights

  • Commit e6814a2: Added Prometheus variable labels; removed collector.
  • Commit 3b323ff: Polymorphic statsValue, removed switch kind.
  • Commit 9290dc5: Amended re-initializing backends.
  • Commit d2ceca3: Removed default metrics (go_gc_*, go_memstats_*), started counting PUTs generated by xactions.
  • Commit 118a821: Major update (with partial rewrite) - added variable labels.
  • Commit 2d181ab: Tracked and showed jobs run options (prefix, sync, range, etc.)
  • Commit 8690876: API change for xactions to provide initiating control message, added ctlmsg to all supported x-kinds.
  • Commit afef76b: Added CPU utilization tracking and alerts.

Separately and in addition, AIStore now supports distributed tracing via OpenTelemetry (OTEL). To enable, use oteltracing build tag.

  • Commit 1f19cde13: Added support for distributed tracing.

For more details, see the full changelog here.


Python SDK

Added the ObjectFileWriter class (extending io.BufferedWriter) for file-like writing operations. This enhancement builds upon the ObjectFile feature introduced in the previous release, providing zero-copy and resilient streaming capabilities. More information can be found in the tech blogs on enhancing ObjectFile performance and resilient streaming.

Additionally, this update includes various fixes and minor improvements, such as memory optimizations for ObjectFile, improved error handling, and enhancements to the API's usability and performance.

Support has also been added for:

  • multi-object transforms
  • OCI backend, and more.

Complete changelog is available here.


Erasure Coding

The v3.26 release introduces significant improvements to Erasure Coding in AIStore, focusing on enhanced performance, better data recovery, improved configuration options, and seamless integration with other features. Key updates include the ability to recover EC data in scenarios where multiple parts are lost, a reduced memory footprint, fixed descriptor leakage when rebuilding object from slices, and improved CPU utilization during EC operations. Additionally, intra-cluster networking has been optimized, with reduced overhead when erasure coding is not in use.


Oracle (OCI) Object Storage

Until recently, AIStore natively supported three cloud storage providers: AWS S3, GCS, and Microsoft Azure Blob Storage. With the v3.26 release, OCI (Oracle Cloud Infrastructure) Object Storage has been added as the fourth supported backend. This enhancement allows AIStore to utilize OCI Object Storage directly, providing improved performance for large object uploads and downloads.

Native support for OCI Object Storage includes tunable optimizations for efficient data transfer between AIStore and OCI's infrastructure. This new addition ensures that AIStore offers the same level of support and value-added functionality for OCI as it does for AWS S3, GCS, and Microsoft Azure Blob Storage.

For more details, see:

Read more

3.25

07 Oct 13:52
Compare
Choose a tag to compare

Changelog

  • "S3 compatibility API: add missing access control" c046cb8

  • "core: async shutdown/decommission; primary to reject node-level requests" 2e17aaf
    | * primary will now fail node-level decommission and similar lifecycle and cluster membership (changing) requests
    | * keeping shutdown-cluster exception when forced (in re: local playground)
    | * when shutting down or decommissioning an entire cluster primary will now perform the final step asynchronously
    | * (so that the API caller receives ok)

  • "python/sdk: improve error handling and logging for ObjectFile" b61b3db

  • "core: cold-GET vs upgrading rlock to wlock" 9857e78
    | * remove all sync.Cond related state and logic
    | * reduce low-level lock-info to just rc and wlock
    | * poll for up to host-busy timeout
    | * return err-busy if unsuccessful

  • "CLI show cluster to sort rows by POD names with primary on top" e469684

  • "health check to be forwarded to primary when invoked with "primary-ready-to-rebalance" query param a59f921
    | * (previously, non-primary would fail the request)

  • "python: avoid module level import of webds; remove 'webds' dependency 228f23f
    | * refactor dataset_config.py: avoid module-level import of ShardWriter
    | * update pyproject.toml: add webdataset==0.2.86 as an optional dependency"

  • "aisloader: '--subdir' vs prefix (clarify)" 7e7e8e4

  • "CLI: directory walk: do not call lstat on every entry (optimize)" 4a22b88
    | * skip errors iff "continue-on-error"
    | * add verbose mode to see all warnings - especially when invoked with the "continue-on-error" option
    | * otherwise, stop walking and return the error in question
    | * with partial rewrite

  • "docs: add tips for copying files from Lustre; ais put vs ais promote" 3cb20f6

  • "CLI: --num-workers option ('ais put', 'ais archive', and more)" d5e6fbc
    | * add; amend
    | * an option to execute serially (consistent with aistore)
    | * limit not to exceed (2 * num-CPUs)
    | * remove --conc flag (obsolete)
    | * fix inline help

  • "CLI: PUT and archive files from multiple matching directories" 16edff7
    | * GLOBalize
    | * PUT: add back --include-src-dir option

  • "trim prefix: list-objects; bucket-summary; multi-obj operations" 7cf1546
    | * rtrim(prefix, '*') to satisfy one common expectation
    | * proxy only (leaving CLI intact)

  • "unify 'validate-prefix' & 'validate-objname'; count list-objects errors" 5789273
    | * add ErrInvalidPrefix (type-code)
    | * refactor and micro-optimize validate-* helpers; unify
    | * move object name validation to proxies; proxies to (also) count err.list.n
    | * refactor ver-changed and obj-move

3.24

27 Sep 19:36
Compare
Choose a tag to compare

Version 3.24 arrives nearly 4 months after the previous one and contains more than 400 commits that fall into several main categories, topics, and sub-topics:

1. Core

1.1 Observability

We improved and optimized stats-reporting logic and introduced multiple new metrics and new management alerts.

There's now an easy way to observe per-backend performance and errors, if any. Instead of (or rather, in addition to) a single combined counter or latency, the system separately tracks requests that utilize AWS, GCP, and/or Azure backends.

For latencies, we additionally added cumulative "total-time" metrics:

  • "GET: total cumulative time (nanoseconds)"
  • "PUT: total cumulative time (nanoseconds)"
  • and more

Together with respective counters, those total-times can be used to compute precise latencies and throughputs over arbitrary time intervals - either on a per-backend basis or averaged across all remote backends, if any.

New management alerts include keep-alive, tls-certificate-will-soon-expire (see next section), low-memory, low-capacity, and more.

Build-wise, aisnode with StatsD will now require the corresponding build tag.
Prometheus is effectively the default; for details, see related:

1.2 HTTPS; TLS

HTTPS deployment implies (and requires) that each AIS node (aisnode) has a valid TLS (X.509) certificate.

TLS certificates tend to expire from time to time, or eventually. Each TLS certificate expires, with a standard-defined maximum of 13 months - roughly, 397 days.

AIS v3.24 automatically reloads updated certificates, tracks expiration times, and reports any inconsistencies between certificates in a cluster:

Associated Grafana and CLI-visible management alerts:

alert comment
tls-cert-will-soon-expire Warning: less than 3 days remain until the current X.509 cert expires
tls-cert-expired Critical (red) alert (as the name implies)
tls-cert-invalid ditto

Finally, there's a brand-new management API and ais tls CLI.

1.3 Filesystem Health Checker (FSHC)

FSHC component detects disk faults, raises associated alerts, and disables degraded mountpaths.

AIS v3.24 comes with FSHC a major (version 2) update, with new capabilities that include:

  • detect mountpath changed at runtime;
  • differentiate in-cluster IO errors from network and remote backend (errors);
  • support associated configuration (section "API changes; Config changes" below);
  • resolve (mountpath, filesystem) to disk(s), and handle:
    • no-disks exception;
    • disk loss, disk fault;
    • new disk attachments.

1.4 Keep-Alive; Primary Election

In-cluster keep-alive mechanism (a.k.a. heartbeat) was generally micro-optimized and improved. In particular, when and if failing to ping primary via intra-cluster control, an AIS node will now utilize its public network, if available.

And vice versa.

As an aside, AIS does not require provisioning 3 different networks at deployment time. This has always been and remains a recommended option. But our experience running Kubernetes clusters in production environments proves that it is, well, highly recommended.

1.5 Rebalance; Erasure Coding: Intra-Cluster streams

Needless to say, erasure coding produces a lot of in-cluster traffic. For all those erasure-coded slice-sending-receiving transactions, AIS targets establish long-living peer-to-peer connections dubbed streams.

Long story short, any operation on an erasure bucket requires streams. But, there's also the motivation not to keep those streams open when there's no erasure coding. The associated overhead (expectedly) grows proportionally with the size of the cluster.

In AIS v3.24, we solve this problem, or part of this problem, by piggybacking on keep-alive messages that provide timely updates. Closing EC streams is a lazy process that may take several extra minutes, which is still preferable given that AIS clusters may run for days and weeks at a time with no EC traffic at all.

1.6 List Virtual Directories

Unlike hierarchical POSIX, object storage is flat, treating forward slash ('/') in object names as simply another symbol.

But that's not the entire truth. The other part of it is that users may want to operate on (ie., list, load, shuffle, copy, transform, etc.) a subset of objects in a dataset that, for lack of a better word, looks exactly like a directory.

For details, please refer to:

1.7 API changes; Config changes

Including:

  • "[API change] show TLS certificate details; add top-level 'ais tls' command" 091f7b0
  • "[API change]: extend HEAD(object) to check remote metadata" c1004dd
  • "[config change]: FSHC v2: track and handle total number of soft errors" a2d04da
  • and more

1.8 Performance Optimization; Bug fixes; Improvements

Including:

  • "new RMD not to trigger rebalance when disabled in the config" 550cade20
  • "prefetch/copy/transform: number of concurrent workers" a5a30247d, 8aa832619
  • "intra-cluster notifications: reduce locking, mem allocations" b7965b7be
  • and much more

2. Initial Sharding (ishard); Distributed Shuffle (dsort)

Initial Sharding utility (ishard) is intended to create well-formed WebDataset-formatted shards from the original dataset.

Goes without saying: original ML datasets will have an arbitrary structure, a massive number of small files and/or very large files, and deeply nested directories. Notwithstanding, there's almost always the need to batch associated files (that constitute computable samples) together and maybe pre-shuffle them for immediate consumption by a model.

Hence, ishard:

3. Authentication; Access Control

Other than code improvements and micro-optimizations (as in continuous refactoring) of the AuthN codebase, the most notable updates also include:

topic what changed
CLI improved token handling; user-friendly (and improved) error management; easy-to-use configuration that entails admin credentials, secret keys, and tokens
Configuration notable (and related) environment variables: AIS_AUTHN_SECRET_KEY, AIS_AUTHN_SU_NAME, AIS_AUTHN_SU_PASS, and AIS_AUTHN_TOKEN
AuthN container image (new) tailored specifically for Kubernetes deployments - for seamless integration and easy setup in K8s environments

4. CLI

Usability improvements across the board, including:

  • "add 'ais tls validate-certificates' command" 0a2f25c
  • "'ais put --retries ' with increasing timeout, if need be" 99b7a96
  • "copy/transform: add '--num-workers' (number of concurrent workers) option" 2414c68
  • "extend 'show cluster' - add 'alert' column" 40d6580df
  • "show configured backend providers" ba492a1
  • "per-backend...
Read more

3.23

28 May 14:49
Compare
Choose a tag to compare

Version 3.23 arrives three months after the previous one. In addition to datapath optimizations and bug fixes, most of the other changes are enumerated in the following

Table of Contents

  • List Objects; Bucket Inventory
  • Selecting Primary at startup; Restarting cluster when node IPs change (K8s)
  • S3 (backend, frontend)
  • BLOBs
  • Mountpath labels
  • Reading shards; Reading from shards

See also:

List Objects; Bucket Inventory

  • S3 backend: S3 ListObjectsV2 may return a directory !6672
  • list very large buckets using bucket inventory !6682, !6684, !6686, !6689, !6692
  • list-objects: optimize for prefix; add 'dont-optimize' feature flag !6685
  • list very large buckets using bucket inventory (major update, API changes) !6695, !6698
  • list very large buckets using bucket inventory !6704
  • list-objects: support non-recursive operation (new) !6711, !6712
  • refactor and code-generate (message pack) list-objects results !6714
  • bucket inventory; generic no-recursion helper !6715
  • bucket inventory: support arbitrary schema; add validation !6769
  • list-objects: micro-optimize setting custom properties of remote objects !6770
  • list very large buckets using bucket inventory !6775, !6776, !6777, !6778
  • list very large buckets using bucket inventory (major) !6810, !6811
  • list very large buckets using bucket inventory !6815
  • list-objects: skip virtual directories !6835
  • list very large buckets using bucket inventory !6847, !6851, !6853

Selecting Primary at startup; Restarting cluster when node IPs change (K8s)

  • primary role: add 'is-secondary' environment; precedence !6746
  • 'original' & 'discovery' URLs (major) !6747, !6749
  • cluster config: new convention for primary URL; role of the primary during: initial deployment, cluster restart !6752, !6755
  • cluster restart with simultaneous change of primary (major) !6758, !6760, !6761
  • primary startup: always update node net-infos !6762
  • all proxies to store RMD (previously, only primary) !6764
  • node join: remove duplicate IP check (is redundant) !6783
  • K8s startup with proxies change their network infos !6785
  • primary startup: initial version of the cluster map !6787
  • non-primary startup: retry and refactor; factor in !6788
  • K8s: primary startup when net-infos change !6789

S3 (backend, frontend)

  • backend put-object interface; presigned S3 (refactoring & cleanup) !6662
  • default AWS region (cleanup) !6679
  • s3cmd: add negative testing !6681
  • backend: S3 ListObjectsV2 may return a directory !6672
  • backend: consolidate environment and defaults !6678
  • backend: retain S3-specific error code !6688, !6691
  • move presigned URLs code to backend package !6801
  • multipart upload: read and send next part in parallel !6803
  • backend: refactor and simplify !6819
  • new feature flag to enable (older) path-style addressing !6821

BLOBs

  • config change: assorted feature flags now have bucket scope (major) !6664, !6666
  • Python: blob-download API !6687
  • Python: get and prefetch with blob-download !6708
  • blob downloader (minor ref) !6793
  • blob-downloader: finalize control structures; refactor !6812
  • GET via blob-download !6873
  • multiple blob-download jobs (fixes) !6876
  • prefetch via blob-downloader !6882

Mountpath labels

  • override-config, fspaths section (minor ref) !6718
  • config change, API change: mountpath labels (major) !6721, !6722, !6725, !6726, !6733, !6734, !6735, !6736, !6738
  • backward compatibility v3.22 and prior; bump CLI version !6740, !6742
  • log: mountpath labels vs shared filesystems; memory pressure !6744

Reading shards; Reading from shards

  • reading (from) shards: add read-until, read-one, and read-regex methods !6823
  • reading shards: read-until, read-one, read-regex !6824
  • WebDataset: add wds-key; add comments !6826
  • reading .TAR, .TGZ, etc. formatted objects (a.k.a. shards) - multiple selection !6827
  • GET request to select multiple archived files (feature) !6859
  • GET multiple archived files in one shot (feature) !6861, !6862, !6863, !6864, !6866
  • Python: GET multiple files from an archive (shard) !6860

Core

  • backend put-object interface (refactoring & cleanup) !6662
  • get-stats API vs attach/detach mountpaths !6669
  • unwrap URL errors; remove mux.unhandle; CLI: more tips !6673
  • removing a node from a 2-node cluster (in re: rebalance) !6674
  • POST /v1/buckets handler: add one more check to URI validation !6690
  • last byte (minor ref) !6694
  • project layout: move and consolidate all scripts !6699
  • extend RMD to reinforce cluster integrity checking !6702
  • micro-optimize fast-path fqn parsing !6707
  • continued refactoring !6709, !6710
  • security dependabot: fix #15 and #16 !6713
  • aisnode: remove logs from conf !6727
  • extract and unify cluster information; add flags !6741
  • copy shared FS capacity; color high/low usage pct; up cli !6743
  • node flags in a cluster map vs (node | cluster) restart; node equality !6765
  • receive cluster-level metadata (minor ref) !6766
  • dsort: write compressed tar !6771
  • dsort: read compressed tar; add linter !6772
  • backend: uniform naming, common base !6774
  • remove AIS_IS_PRIMARY environment (is obsolete) !6781
  • nlog: allow setting logging to STDERR flag in config !6791
  • feature flags fsync-put will now have (also) bucket scope !6804
  • cold GET: write locally and transmit in parallel (new) !6805, !6807
  • move atomic 'stopping' (ref) !6817
  • aisloader: add 's3-use-path-style' command line, to use older path-style addressing !6822
  • cold GET (fast): fclose and check !6825
  • speed-up batch jobs (prefetch, archive, copy/transform, multi-object evict/delete) !6830
  • LOM: add open-file method !6836
  • nlog: while stopping !6837
  • multi-object TCB/TCO; not in-cluster objects; multi-page fix !6840, !6842
  • xaction registry: when hk call is premature !6843
  • add metrics: get-size and put-size !6849
  • memsys/SGL: add compliant 'write-to' interface impl.; amend fast/simplified 'write-to' !6854, !6856, !6857
  • stats and metrics: report cumulative GET and PUT sizes in bytes !6855
  • datapath query parameters: preparse, reduce size !6858
  • stats: fix Prometheus label for total size !6871
  • imports (ref) !6878
  • move and rename 'node-state-info' and 'node-state-flags' (ref) !6879
  • new metric: node-state-flags (bitwise, gauge) !6880
  • add management alerts: out-of-space & low-capacity (major) !6883
  • add management alerts: out-of-memory & low-on-memory !6885
  • microbench: use math/rand/v2 !6886
  • transition to Go 1.22 math/rand/v2; crypto/rand reader !6887
  • dsort test: use rand.v2 !6888
  • transition to Go 1.22 math/rand/v2; add seeded-reader !6890
  • cleanup 'cos/math' (ref) !6891
  • tests: fix prefix-test for remote ais cluster !6893

CLI

  • 'more' fixes !6665
  • more tips !6673
  • warn when switching cluster to operate in reverse proxy mode !6703
  • show feature flags symbolically !6705
  • backward compatibility v3.22 and prior; bump CLI version !6740
  • 'ais show cluster' to highlight nodes that are low on memory !6745
  • 'ls' and 'show object' to support size units (raw, SI, IEC) !6795
  • progress bar decorators; elapsed time !6797
  • fix used and available capacity !6806
  • fix 'show throughput' to not show throughput when !6813
  • quiet 'show cluster', 'show performance'; misplaced flags !6814
  • 'ais ls' help and inline examples; native GET: add query params !6816
  • copying remote objects; progress bar; usability !6839
  • extend 'ais gen-shards' to generate WD-formatted shards !6865
  • add '--count-and-time-only' option !6868, !6869
  • max-pages and limit !6870
  • stopping jobs !6875

Python

  • add test for invalid bucket name !6683
  • blob-download API !6687
  • add timeout option to client + version bump !6693
  • get and prefetch with blob-download !6708
  • tests constants and refactoring !6717
  • prefetch blob-download tests !6719
  • cluster performance API !6724
  • remote enabled tests cleanup refactored !6731
  • add missing job tests !6737
  • fix formatting issues !6753
  • PyTorch: add Iterable-style datasets for AIS Backend !6759
  • writer for image dataset !6767
  • AISSource: list all objects !6779
  • add example for dataset_writer !6794
  • add tests for dataset writer !6799
  • log missing attributes in write_dataset !6820
  • update docs !6844
  • add MultiShard Stream to PyTorch !6852
  • GET multiple files from an archive !6860

Build, CI

  • transition to Go 1.22 !6675
  • upgrade OSS packages !6680, !6750, !6768
  • lint: upgrade; Go 1.22 int range !6728, !6732
  • CI: MacOS fix !6729
  • remove HDFS backend !6773
  • upgrade golang.org/x/net !6831
  • lint; min/max shadow !6850
  • build: transition to Go 1.22 math/rand/v2 !6892
  • CI: maintenance !6838
  • lint: golangci-lint !6894

Documentation

  • docs: fix https getting-started !6668
  • docs: amend getting started !6670
  • docs: fix the broken table of contents link !6677
  • blog: Very large !6874

3.22

25 Feb 18:14
Compare
Choose a tag to compare

Highlights

  • Blob downloader
  • Multi-homing: support multiple user-facing network interfaces
  • Versioning and remote sync
    • execute in presence of out-of-band changes/deletions
    • support latest version: the capability to check in-cluster metadata and, possibly, GET, download, prefetch, and/or copy the latest remote (object) version
    • remote synch: same as above, plus: remove in-cluster object if its remote counterpart is not present (any longer)
    • both latest version and remote sync are supported in a variety of APIs (including GET primitive) and tools (CLI, aisloader)
  • Intra-cluster n-way mirroring
    • to withstand a loss of node(s) erasure coding is now optional
  • AWS S3 (frontend) API
    • multipart V2 (major upgrade); other productization
    • listing very large S3 datasets
    • support presigned S3 requests (beta)
  • List objects (job): show diff: in-cluster vs. remote
  • Prefetch (job): V2 (major upgrade)
  • Copy/transform (jobs): V2 (major upgrade)
  • AWS S3: migrate AWS backend to AWS SDK V2
  • Azure Blob Storage: transition to latest stable native SDK

See also: aistore features and brief overview.

Core

  • NVMe multipathing: pick alternative block-stats location !6432
  • rotate logs; remove redundant interfaces, other refactoring !6433
  • cold GET: add stats !6435
  • http(s) clients: unify naming, construction; reduce code !6438, !6439
  • don't escape URL paths; up cli !6441
  • dsort: sort records (minor) !6445
  • core: micro-optimize copy-buffer !6447
  • list-objects utilities and helpers; rerun list-objects code-gen: refactor and optimize; cleanup !6450, !6451
  • intra-cluster transport: zero-copy header !6455
  • Go API: (object, multi-object): ref !6456
  • add 'read header timeout'; docs: aistore environment variables !6459
  • core: support target multi-homing - comma-separated IPs (part one) !6464
  • package 'ais': continued refactoring; up cli !6466
  • support multiple user-facing network interfaces (multi-homing) !6467, !6468
  • when setting backend two (or more) times a row !6469
  • core: (begin, abort, commit) job - corner cases !6470
  • in-cluster K8s environment: prune and cleanup, comment, and document !6471
  • multi-object PUT - variations !6473, !6474
  • unify PUT and PROMOTE destination naming !6475
  • APPEND (verb) to append if exists; amend metadata (major) !6476
  • EC: refactor and simplify erasure-coding datapath; docs: remove all gitlab references !6477
  • list-objects: enforce intra-cluster access, validate !6480
  • EC: remove redundant state; simplify !6481
  • Go API get-bmd; follow-up !6483
  • EC: cleanup manager: remove rlock and unused map - micro-optimize !6490
  • copy bucket: extend the command to sync remote bucket !6491
  • extend 'copy bucket' to sync remote !6494, !6495, !6497, !6498, !6499
  • don't compare checksums of different (checksum) types !6496
  • when deleting non-present (remote) object !6502
  • move transform/copy-bucket from 'mirror' package to 'xs' !6503
  • don't create data mover in a single-node cluster !6504
  • multi-object transform/copy (job): add missing cleanup !6506
  • multi-object transform & copy !6507
  • core: abort all (jobs) of a given kind; CLI 'ais stop'; strings: Damerau-Levensthein !6508
  • revamp target initialization !6509
  • copy/transform remote, non-present !6510
  • revamp target initialization !6512, !6513
  • [API change] get latest version (feature) !6516
  • amend Prefetch; flush atime cache when shutting down !6517
  • amend metadata cache flushing logic (atime, prefetch, is-dirty) !6518
  • core: remote reader to support 'latest version' !6519
  • extend config ROM; follow-up !6520
  • Prefetch v2 !6521
  • backend error formatting; notification-listener name !6522
  • [API change] Prefetch v2; multi-object operations !6523
  • Prefetch v2; cold-get stats; put size !6524
  • [config change] versioning vs remote version changed or deleted !6525, !6526
  • add 'remote-deleted' stats counter; Prefetch: test more !6528
  • AWS backend not-found; job status; other cleanup !6529
  • core: refactor 'copy-object' interface, prep to sync remote => in-cluster !6531
  • [Cluster Config change] versioning vs remote version: remote changed, deleted !6532
  • copy/transform (bucket | multi-object); intra-cluster notifications !6533
  • revise/simplify 'is-not-exist' check; ldp.reader to honor sync-remote option !6537
  • pre-parse (log-modules, log-level); micro-optimize !6538
  • amend error handling: not-found vs list iterator; OOS !6539
  • jobs ("xactions"): add and log non-critical errors; join(error) and fiends !6540
  • [API change] list-objects to report 'version-changed' (new) !6541
  • list-objects to report 'version-changed' (new) !6543, !6545
  • list-objects to report: 'version-changed', 'deleted' !6546
  • list-objects to support (in-cluster <=> remote) diff !6547, !6548
  • copy/transform with an option to sync remote: prune destination !6549
  • copy/transform --sync: add stress test, extract "pruning" logic !6550
  • revise and refine object write transaction (OWT) !6554, !6555
  • Go API: amend 'wait-idle' helper method !6558
  • copy/transform '--sync': use probabilistic filtering !6559
  • refactor list-range-prefix iterator !6560
  • multi-object copy/transform with '--sync' option !6561
  • S3 API (on the front): fix list-objects !6562, !6563
  • multi-object copy/transform with '--sync' option !6564
  • core: reset idle timer; xaction names (micro-optimizations) !6565
  • core: ETag in response headers !6569
  • S3 API (frontend): validate object names; multipart pathnames !6570
  • copy/transform with '--sync' option: add scripted test !6571, !6573
  • backend: special case to return 404 instead of 403 !6575
  • productize Azure backend !6576, !6578, !6580
  • S3 multipart: write-through all parts !6585
  • multipart upload: write-through all parts !6586
  • multipart upload: add extended error message; add stress test !6587
  • all supported backends: revisit range read (make it consistent across) !6589
  • introduce blob downloader (new) !6592
  • xaction (job) descriptor: remove unused specifiers !6593
  • blob downloader: add dedicated (non-generic) control path !6595
  • blob downloader (new) !6596, !6599, !6603
  • multipart upload: fix s3cmd to run elsewhere !6600, !6601
  • blob downloader (new) !6605, !6606, !6608
  • blob downloader (new); remote AIS cluster !6613
  • silent HEAD(bucket) !6614
  • leverage erasure coding to provide intra-cluster mirroring (new) !6615, !6616
  • blob downloader (new) !6618
  • S3 (frontend): support presigned S3 requests (new) !6621
  • intra-cluster mirroring: add integration test (no limit) !6622
  • blob downloader (new) !6628, !6629, !6631, !6632, !6633, !6639
  • add target's get-cold-blob interface; refactoring !6634
  • AWS backend: nil client !6636
  • Prefetch via blob-downloader: add 'blob-threshold' option !6637, !6638
  • blob-downloader: user abort; expected checksum !6646
  • Azure: ETag as object version; build !6647
  • Azure: transition from preview to stable 1.x (major) !6648
  • AWS backend: use sync.Map instead !6649, !6651
  • (AWS, GCP) backend: log extended error info; RC5 !6653
  • S3: presigned S3 requests; bucket config: add max-page-size !6657

Python

  • v1.4.17 release !6431
  • add support for self-signed certificates with or without verification !6465
  • add 'latest' flag for GET !6536
  • latest flag for prefetch and copy !6542
  • release 1.4.19 !6544
  • stress test for copy w/ '--sync' !6552
  • fix pylint to pass !6556
  • test multi-object copy with '--sync' flag !6567
  • fix black formatter issues in github CI !6582
  • github-CI lint - follow up !6583
  • support range read (offset, length) !6588
  • update common requirements !6609
  • bump SDK version !6610
  • lint: add more !6454

Bench

  • aisloader-composer: install docker alongside latest cri-o on CentOS !6436
  • aisloader-composer: fix install-docker and update OCI inventory !6446, !6449
  • aisloader-composer: update OCI inventory; avoid using reserved variables in playbooks !6452
  • aisloader-composer: update dashboard with k8s only networking visualization !6453
  • aisloader: support latest-version !6581
  • aisloader: add '--cached' flag !6623

Build, CI

  • refactor common 'k8s' package; up cli mod; docs !6434
  • build/minikube: skip making cli !6437
  • gitlab-CI: scheduled pipeline changes !6442
  • upgrade OSS packages !6443
  • lint: enable gocritic "huge-param" !6457
  • lint: add gosec linter !6462
  • gitlab: add etl label & rule !6488
  • github-CI: publish pypi package for aistore !6492
  • build: upgrade all minors !6501
  • rename 'cluster' package !6514
  • 'api' package not to import 'core' !6515
  • tests, tests, and more tests !6530
  • CI: fix HDFS docker image !6566
  • CI: remove HDFS build and tests !6572
  • deployment: add jq to init container for parsing JSON in Bash scripts !6577
  • CI: update tgt cnt for test short !6579
  • gitlab CI: add short test for cloud providers and long test for Azure !6584
  • build: new linter !6624
  • add github issue templates !6630
  • build: release candidate 4 (rc4) !6640
  • build: rc7; fixes !6658

Documentation

Read more

3.21

05 Nov 22:20
Compare
Choose a tag to compare

Highlights

  • cold GET: extract and micro-optimize the flow
  • sync Cloud bucket
    • leverage validate-warm-GET bucket config, and
    • extend it to support non-versioned Cloud buckets, and
    • optionally, delete (remotely deleted) objects
  • bucket sizing and counting:
    • support very large buckets that are not necessarily present in the cluster;
    • unify ais ls --summary and ais storage summary to utilize the same control message and flags
  • list, summarize, and lookup the properties of remote buckets without adding them to cluster's BMD
  • HTTPS:
    • support TLS configuration to authenticate clients
    • switch cluster from HTTP to HTTPS, and vice versa
  • optimize metadata cache
  • optimize capacity management
  • bug fixes, performance improvements

Core

  • set prime-time to amend local generation of globally unique IDs !6325
  • multi-object (archive, copy, transform) jobs: transport endpoint !6326
  • core: (maintenance, decommission, shutdown) transition w/ rebalancing !6327
  • core: (maintenance, decommission, shutdown) transition w/ rebalancing !6328
  • intra-cluster transport: make receive-side stats optional !6329
  • intra-cluster transport: reduce receive side contention !6330
  • fix channel full condition; rebalance-cluster; transport !6331
  • feature flags: add limited-coexistence; transport: track closed endpoints !6334
  • fix prime-time: add caller-is-primary; up cli module !6335
  • switch existing cluster between HTTPS and HTTP !6336
  • Go 1.21: use built-in min and max functions !6337
  • list-objects(remote-bucket-and-only-remote-props); Go 1.21 clear built-in !6339
  • Go 1.20: use typed atomic pointer, remove unsafe !6343
  • core: assorted micro-optimizations; remove read locks !6346
  • tweak multi-error join-err, remove error channel (minor) !6347
  • [API change] capacity management !6348
  • xxhash; field-align vol package !6349
  • bucket: new-query help; silent GET; test tools !6350
  • etl: adding fqn param to spec templates !6351
  • low-level control structs: bucket, namespace !6352
  • etl: Keras template fix !6355
  • etl: fix hello-world ais-etl tests !6356
  • core: don't recompute uname hash !6359
  • repackage HRW methods !6361
  • core: lom cache v2 (major update) !6362
  • refactor: downloader's diff resolver; control plane (receive BMD) !6363
  • core: lom metadata cache (cont-ed) !6365
  • dsort: error handling, assorted cleanups, more scripted tests !6366
  • core transactions: concurrency !6368
  • downloader: throttle; wait !6369
  • optimize cold GET !6370
  • global rebalance: log; minor edits !6373
  • core: update backend 'get-reader' API (all supported backends) !6374
  • core: validate-warm-get to support non-versioned buckets, and more !6375
  • validate-warm-get to support non-versioned buckets !6376
  • [API change] silent HEAD(object) request !6378
  • core: add load-unsafe (the faster way to load local metadata) !6382
  • total disk size: compute at startup, recompute on change !6383
  • [API change] new bucket summary; unify list-objects and summary !6384, !6386, !6387
  • add config.Rom to consolidate assorted "read-mostly" config values; refactor and unify !6388
  • [API change] new bucket summary (major update) !6390
  • mountpath jogger: support bucket query !6392
  • backend providers: do not include (checksum, version) if not asked to !6394
  • python: updated bucket info API !6395
  • feature flags: don't-add-remote & don't-head-remote; log: add s3 module; verbosity; !6398
  • support listing remote buckets without adding them to cluster's BMD !6399
  • concurrent HEAD(object) vs evict/create bucket - fix the race !6400
  • [API change] list and summarize remote buckets without adding remote buckets to cluster's BMD !6401
  • datapath query (dpq) !6402
  • Go-based API: response header to error message !6403
  • [API change] new bucket summary !6405, !6406
  • downloader: streamline and cleanup initialization sequence !6409
  • HTTPS: support TLS configuration !6410, !6411, !6412, !6413, !6414, !6415, !6416
  • assorted minor fixes !6417, !6418
  • core: cold GET: fast path & slow path !6419
  • cluster configuration: flip validate-cold-get !6420
  • downloader (major update); [API change]: xaction registry !6422
  • validate-warm-get: add scripted test utilizing remote ais cluster !6423
  • core: cold GET: fast path & slow path !6424, !6427
  • feature flags: add disable-fast-cold-get; show performance latency; up cli module !6425
  • refactor ais/utils !6429

Bench: aisloader and aisloader-composer

  • skip list objects for 100% put load !6332
  • composer: add playbook and script for intial aisloader copy !6333
  • composer: add support for aisloader --filelist option !6345
  • default value for duration should be infinite if num-epochs value is defined !6353
  • composer: add epochs option for GET workloads !6354
  • composer: add cluster name prefix to netdata sources for easier filtering !6357
  • new bucket not to be listed; usability !6358

CLI

  • typed does-not-exist error; misc !6358
  • always print dsort job description !6367
  • show cluster to report total num disks !6371
  • show performance: usability fixes, improvements !6379
  • show performance not to filter regex-selected zero columns !6380
  • attempt to copy/transform an empty remote bucket !6393
  • new bucket summary; evict multiple buckets in one shot; pretty print !6396, !6397
  • ais show bucket with an option to add remote bucket to cluster's BMD (effectively, create bucket) !6404
  • ais search: CLI command search results to include idiomatic extensions !6428

Build, CI, Deployment

  • tests: upon node shutdown: wait for the node to stop (tcp) listening !6338
  • CI: add gather-logs template for K8s tests !6340
  • deploy: ais with HTTPS in minikube !6364
  • build: bump urllib version !6372
  • tests: validate-warm-get (scripted) !6423
  • K8s playbooks: update kill aisloader command !6385
  • docs: validate-warm-get; assorted !6377
  • docs: add performance.md; inline help; rm all-columns flag (redundant) !6381
  • build: upgrade all minors !6389
  • CI: add checkmarx scan !6391
  • build: upgrade golangci-lint, add linters !6407, !6408

3.20

12 Sep 15:14
Compare
Choose a tag to compare

Core

  • tweak stop-maintenance logic; rebalance: cleanup log messages; assorted minor fixes !6288
  • do not timestamp err-aborted message !6290
  • [API change] dsort: remove extended metrics; add new counters; revise and refactor !6297
  • list-objects; house-keeper; aisloader, logger (assorted fixes) !6298
  • core stats: remove mutex and work channel - speed up !6299
  • slab allocator: remove stats mutex, do not sort !6300
  • consolidate and revise OOM handling !6301
  • ETL: require admin access to create & delete; add feature flag !6302
  • remove unused heartbeat tracker w/ minor ref !6308
  • reimplement keep-alive mechanism (major) !6309
  • keep-alive v2 (major update) !6312
  • keep-alive v2: remove timeout stats (control structure and code) !6317
  • keep-alive v2: add fast path !6320
  • micro-optimize get-all-running (jobs); atomic heard-from/timed-out !6321
  • node-restarted: remove 'lsof', use net dialer; fix node-decommissioning tests !6322

Tools and tests

  • CI: update fspath (aka mountpath) config for minikube-based aistore deployments !6289
  • aisloader: list and read s3 buckets directly !6291
  • aisloader: list, read, and write s3 buckets directly !6292
  • tests: K8s long tests (EchoGolang) fix !6293
  • aisloader: fix cleanup option for s3 bucket benchmarks !6294
  • aisloader: reimplement direct get from s3 - use SDK !6295
  • aisloader: show progress when listing s3 directly !6296
  • CLI: add show details param to etl !6304
  • tools: add check for ais etl deployment !6305
  • tools: add ETL_NAME var for CLI tests !6310
  • aisloader-composer: add playbook and script for clearing Linux Page Cache on all AIS targets !6311
  • aisloader-composer: add playbook for copying aws credentials !6314
  • tools: update check for aistore Kubernetes deployment !6315
  • CI: update github action version (all modules) !6316
  • CLI/ETL: support enumerated arg-type !6287, !6323

Build

  • upgrade all OSS packages (minor versions) !6313
  • transition to Go 1.21 !6318

3.19

29 Aug 17:31
Compare
Choose a tag to compare

Core

  • [API change] archive and download logs (feature) !6172, !6175
  • [API change] dsort: extend input format !6181
  • [API change] dsort spec; CLI: print job spec !6204
  • [API change] revise request spec (major upd) !6217
  • [API change] dsort: is now 'xaction' as well !6253
  • (downloader, dsort, ETL): disallow to run when out of space !6235
  • handle "DNS lookup fail" as one of the unreachable err types; nlog flush-exit !6164
  • when electing new primary; when joining nodes at startup !6165
  • k8s: Change prod k8s and docker default to not log all to stderr !6166
  • revise GFN !6167
  • stats runner is now responsible to periodically flush logs !6170
  • core: fail user attempt to abort global rebalance when !6184
  • new Go API; assorted fixes !6189
  • metasync BMD; up modules !6190
  • downloader: return not-found when not found !6196
  • start using scripted integration tests; CLI: 'dsort src dst spec' !6198
  • support S3 AWS profiles with alternative creds (feature) !6214
  • core: state transition => rebalance => (point of no return) !6216
  • amend low-level Go API check-response routine; add error type-code !6228, !6229
  • control plane: deserialize original error from call result !6230
  • xactions: when checking inactivity ("is idle") !6242 !6243
  • primary readiness vs cluster shutdown !6244
  • Go API: wait for xaction-related conditions !6245
  • assorted tuneups: space cleanup; housekeeping (HK) callback; log !6246
  • access control: when copying/transforming/dsorting to non-existing 'ais://' destination !6255
  • core: a call to update stats should never block !6257
  • core stats: add fast counters !6258 !6259 !6261
  • sparsify latency stats !6260
  • ETL: refactor and cleanup construction !6267
  • deploy/dev: updated minikube scripts !6272
  • new option to add Cloud bucket to aistore without checking accessibility !6275, !6277
  • un-throttle PUT mirroring; assorted changes !6278
  • feature: local generation of global (job) IDs !6280 !6282

Performance

  • Add distributed loader scripts and playbooks for using aisloader with multiple hosts !6156
  • pyaisloader: usability improvements !6215
  • Update Grafana dashboard to include latency statistics !6249
  • Reorganize benchmarks and related tools !6254
  • aisloader: no need to call rand for 100% or 50% read/write workloads !6256
  • aisloader-composer: add dashboard for DC network and disk !6266
  • aisloader: add an option to randomize gateways !6279
  • aisloader-composer: fix output files for GET bench !6283

Python

  • sdk: update ETL templates (docker migration) !6168
  • sdk: Release version 1.4.1 !6169
  • sdk: ETL templates (compress + ffmpeg decode) !6185
  • sdk: ETL templates (imagepullpolicy as always) !6191
  • sdk: adding keras_transform template !6200
  • sdk: ETL templates fix !6201
  • sdk: ETL templates (ffmpeg decode transformer) !6205
  • sdk: compress ETL template (updated usage) !6211
  • sdk: torchvision sample transformer ETL template !6221
  • sdk: fix comments (minor) !6240
  • sdk: update version !6248
  • sdk: increase timeout for torchvision transformer template (large image) !6252
  • sdk: updated torchvision transform ETL !6262
  • sdk: update dsort job info query and related tests !6265
  • sdk: switch ETL init code 'transform_url' boolean flag to 'arg_type' string !6269
  • docs: update ETL dev deployment for macOS !6163
  • ETL: keras template minor fix !6213
  • ETL: remove incorrect reference !6268
  • ETL: add 'arg-type=FQN' (new) !6271

Datasets (resize, resort, and shuffle)

  • [API change] dsort: extend input format !6181
  • dsort input format: iterate list, iterate range !6186 !6187
  • start using scripted integration tests; CLI: 'dsort src dst spec' !6198
  • add test scripts; memsys: init gmm only once !6192
  • refactoring and renaming !6193
  • move/consolidate error types; continued refactoring !6202
  • Go API change; add dsort/api.go; CLI: print job spec !6203
  • [API change]: dsort spec; CLI: print job spec !6204
  • CLI/dsort: extend inline help, pretty-print job spec; update docs !6206
  • dsort: continued refactoring (major update) !6208, !6209, !6210
  • free sgl on error; feature: any extension !6212
  • [API change] revise request spec (major upd) !6217
  • create destination on the fly !6218
  • record content path to retain full shard name !6219
  • output shard size estimation (rewrite) !6223
  • add is-compressed; refactor dsort-mem !6227
  • compressable shards (major) !6231
  • output ext; rcb buffer; fixes !6232
  • duplicated records (full coverage & stress); fixes !6233
  • fix tests; add stress !6234
  • rename subpackage, fix comments, refactor !6237
  • remove dsort-context, rewrite initialization !6238
  • static/stateless shard readers/writers; refactor and simplify !6239
  • two goroutines per each shard-distributing request !6241
  • [API change]: dsort: is now 'xaction' as well !6253
  • dsort: support generic abort-xaction API !6264
  • no need to block when sending shard records !6286

CLI

  • archive and download logs (feature) !6180
  • clarify "copying" vs "transforming" and "cached" vs "present" !6183
  • start using scripted integration tests; CLI: 'dsort src dst spec' !6198
  • dsort: extend inline help, pretty-print job spec; update docs !6206
  • dsort: Go API change; add dsort/api.go; CLI: print job spec !6203
  • 'archive get' is now a shortcut (an alias) !6222

Build, test, and tools

  • add test scripts; memsys: init gmm only once !6192
  • tests and tools: cleanup around stop-maintenance, wait-rebalance" !6194
  • deployment: update local deployment script to allow target-only deployment with defined primary host !6195
  • deployment: optionally, skip deploying primary proxy !6197
  • start using scripted integration tests; CLI: 'dsort src dst spec' !6198
  • tools/generate shards: optimize buffer allocation !6224
  • deploy/dev: Add ansible deployment scripts for deploying locally on multiple nodes !6199
  • aistorage/CI docker image (lzma libraries) !6220
  • tests: init with cleanup and without !6226
  • CI: Retry stuck Python ETL tests in GitLab CI pipeline !6270
  • remove aisfs (FUSE) !6273
  • dev tools: readers; handle read from corrupted arch or non-arch !6250

Documentation

  • update getting started !6161
  • updated python sdk readme !6162
  • update ETL dev deployment for macOS !6163
  • update documentation with recent ETL changes !6173
  • CLI/dsort: extend inline help, pretty-print job spec; update docs !6206