Skip to content

JustinLinKK/astra-sim

 
 

Repository files navigation

ASTRA-sim

ASTRA-sim is a distributed AI system simulator. It models the end-to-end software and hardware stack of modern AI systems - encompassing workload scheduling, collective communication algorithms, and hardware architectures (compute/memory/network). Through a suite of APIs, it enables plug-and-play of external open/proprietary components for modeling different parts of the AI system. This provides end-to-end multi-fidelity simulation capabilities for aiding in design and deployment of next-generation distributed AI systems.

Analytical Modes

This repository includes a unified analytical layer for three inference studies:

  1. tp_pp_crossover Sweeps TP versus PP tradeoffs for 70B-class dense models.
  2. serving_disagg_colocated Runs the existing disaggregated-versus-colocated serving simulator through the analytical frontend without changing TTFT, TPOT, E2E, queueing, or goodput behavior.
  3. attention_ffn_disaggregation Compares homogeneous GPU against heterogeneous GPU+LPU placement for trillion-parameter MoE decode.

The shared implementation lives under astra-sim/analytical and keeps model specs, hardware specs, interconnects, cost models, and result writers in one framework-native module.

Quick Start

Build the analytical binaries with:

./build/astra_analytical/build.sh

The main binaries are:

  • build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Aware
  • build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware

Run the shipped analytical configs with:

./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Aware \
  --analytical-config=$(realpath configs/tp_pp_70b.yaml)

./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Aware \
  --analytical-config=$(realpath configs/serving_disagg_colocated.yaml)

./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Aware \
  --analytical-config=$(realpath configs/attention_ffn_gpu_lpu_moe.yaml)

Important behavior:

  • --analytical-config is the public entrypoint for these analytical modes.
  • tp_pp_crossover and attention_ffn_disaggregation are closed-form analytical runs and do not instantiate the event-driven serving runtime.
  • serving_disagg_colocated still uses the existing serving simulator underneath, so serving behavior stays on the established path.

Analytical Configs

The checked-in examples live in configs:

These YAMLs are fully explicit. Output paths are resolved relative to the YAML file location, which makes it easy to copy a config and redirect outputs without rewriting absolute paths.

Outputs

tp_pp_crossover writes a row-oriented CSV plus a summary JSON containing pure-TP versus pure-PP crossover records.

serving_disagg_colocated keeps the existing request metrics CSV, summary JSON, and metadata JSON format from the serving simulator.

attention_ffn_disaggregation writes a row-oriented CSV plus a summary JSON comparing:

  • homogeneous_gpu
  • heterogeneous_gpu_lpu

Tests

Run the full regression suite with:

./tests/run_all.sh

Run the analytical regression directly with:

./tests/rt_analytical/run.sh

Run the analytical logic tests directly with:

./build/astra_analytical/build/bin/AstraSim_Analytical_Logic_Tests

Documentation

For a focused usage guide covering config structure, commands, outputs, and extension points, see docs/project/analytical-guide.md.

Overview and Documentation

Here is a concise visual summary of ASTRA-sim, showing its layers and APIs: alt text

For a comprehensive understanding of the tool, and to gain insights into its capabilities, please visit our website.

For information on how to use ASTRA-sim, please visit our Wiki.

ASTRA-sim accepts MLCommons Chakra Execution Traces as workload-layer inputs. For details, please visit Chakra Github.

Releases and Contributions

ASTRA-sim is currently at version 2.0. The previous version, ASTRA-sim 1.0, is available in the ASTRA-sim-1.0 branch.

We encourage community contributions to ASTRA-sim via PRs.

Contact Us

For any questions about using ASTRA-sim, you can email the ASTRA-sim User Mailing List: astrasim-users@googlegroups.com

To join the mailing list, please fill out the following form: https://forms.gle/18KVS99SG3k9CGXm6

We appreciate your interest and support in ASTRA-sim!

About

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • C++ 88.3%
  • Shell 5.0%
  • Python 4.7%
  • CMake 1.7%
  • Other 0.3%