Aim is built around several concepts allowing to make sure that it meets the following criteria:
- Run data isolation. Each training run process isolated in terms of data and do not require additional services to run.
- Scalability. Aim web app is able to handle 1000s of training runs. Starting from v3.4.0 Aim provides a Remote Tracking server allowing to run multiple parallel experiments in a distributed multi-host environment.
- Flexibility. Aim UI and query language allow users to select, group and filter the tracked data any way they want.
In order to understand how Aim works, lets take a quick look on a different components it has.
- Aim Storage. At its core Aim uses a custom-built storage, based on rocksdb. More details in 'Where is data collected?'. Data tracked by
different training runs collected and indexed in an aim repository (
.aim
). The storage itself is generic; it allows accessing the data as collection of dictionaries and arrays. - Aim SDK. On top of the storage Aim SDK provides functionality to track/select/query data. Additionally, SDK is a layer used by Web APIs and CLI.
- Aim UI. Web app allowing to browse run metadata, metrics, images and other tracked data.
- Aim CLI. A collection of command line utilities for running Aim web server, managing aim repositories, runs, etc.
- Remote Tracking server. A gRPC-based service accepting incoming traffic and storing data on a centralized server.
The next sections will describe various concepts Aim introduces and provide more detailed look on individual components introduced above.