Skip to content

Add airstack test#344

Merged
andrewjong merged 25 commits into
mainfrom
omar/ci
Apr 23, 2026
Merged

Add airstack test#344
andrewjong merged 25 commits into
mainfrom
omar/ci

Conversation

@andrewjong
Copy link
Copy Markdown
Member

What does this pull request do?

Adds automated testing framework via pytest that automatically runs Systems tests. Currently tests building images, docker container liveness and connectivity, and takeoff hover landing.

Which issue number does this address? #16

Video: https://www.youtube.com/watch?v=EzgGHnYDI_k

How did you implement it?

Omar added stuff

Testing

System Testing

AirStack's system tests bring up the full Docker-based stack — simulator, robot containers, and GCS — and verify end-to-end behavior: container health, ROS 2 node presence, sensor publishing rates, and compute resource usage. Tests are written in Python with pytest and live under tests/ at the repo root.

<iframe width="1120" height="630" src="https://www.youtube.com/embed/EzgGHnYDI_k?si=vpqER-TXud5XEMUX" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

Test Suite Structure

Module Mark What it tests Hardware required
test_build_docker.py build_docker Docker image builds (robot-desktop, gcs, isaac-sim, ms-airsim); records image sizes Docker daemon
test_build_packages.py build_packages colcon build inside each container (robot, GCS, ms-airsim ROS workspace) Docker daemon
test_liveliness.py liveliness Full stack up: container health, tmux process liveness, sentinel ROS 2 nodes, sim topic publishing rates, compute usage, sustained stability Docker daemon, GPU, sim license
test_takeoff_hover_land.py autonomy End-to-end flight: PX4 readiness gate, takeoff to 10 m, hover stability, land — one chain per (sim, num_robots, iteration, velocity) Docker daemon, GPU, sim license

Marks can be combined with pytest logic:
-m "build_docker or build_packages", -m liveliness, -m autonomy.


Test Infrastructure

All shared fixtures, helpers, and configuration live in tests/conftest.py.

airstack_env fixture

Parametrized over (sim, num_robots, iteration) tuples derived from CLI flags. For each combination it:

  1. Calls airstack up with the appropriate COMPOSE_PROFILES, NUM_ROBOTS, and headless flags
  2. Records airstack_up_duration_s to metrics.json
  3. Yields an env dict used by every TestLiveliness test
  4. Tears down with airstack down and records airstack_down_duration_s

MetricsRecorder

Writes custom metrics to tests/results/<timestamp>/metrics.json after each record() call. Keys follow the pattern test_node_id → metric_key → {value, unit, direction}. Time-series data (Hz samples, compute snapshots) are stored as {key}_samples lists and expanded into scalar aggregates (mean, min, max, start_mean, end_mean) by parse_metrics.py.

Output files

Every test run produces a timestamped directory:

tests/results/
└── 2025-04-21_14-30-00/
    ├── results.xml        # JUnit XML — test durations and pass/fail status
    ├── metrics.json       # Custom metrics (image sizes, Hz, compute, timing)
    └── logs/
        ├── test_build_docker.TestDockerBuilds.test_build_robot_desktop.log
        ├── test_liveliness.TestLiveliness.test_stable[msairsim-1-iter0].log
        └── ...            # One log file per test execution

Running Tests

airstack test (primary interface)

airstack test is the standard way to run tests. It builds the containerized
test runner from tests/docker/, mounts the repo read-only, and forwards all
arguments directly to pytest. No local Python environment needed.

# From the repo root (AirStack must be set up: airstack setup):

# Build tests only — fast, no GPU needed
airstack test -m "build_docker or build_packages" -v

# Liveliness run — ms-airsim, 1 robot, 1 iteration, 60 s stability window
airstack test -m liveliness \
  --sim msairsim \
  --num-robots 1 \
  --stress-iterations 1 \
  --stable-duration 60 \
  -v

# Autonomy run — takeoff/hover/land at three velocities
airstack test -m autonomy \
  --sim msairsim \
  --num-robots 1 \
  --stress-iterations 1 \
  --takeoff-velocities 0.5,1,2 \
  -v

# Show GUI windows (for local visual inspection)
airstack test -m liveliness --gui -v

airstack test calls xhost + automatically so GUI-mode sim containers
can reach the host X server; it is a no-op when DISPLAY is not set.

Prerequisites

  • Docker daemon running with your user in the docker group
  • NVIDIA drivers + nvidia-container-toolkit for liveliness/autonomy tests
  • airstack setup completed (adds airstack to PATH)

Direct pytest (for development / debugging)

Run pytest directly when you need faster iteration (no container rebuild) or
want to attach a debugger. Requires a local Python environment.

export AIRSTACK_ROOT=$(pwd)
pip install -r tests/requirements.txt

# Build tests only
pytest tests/ -m "build_docker or build_packages" -v

# Liveliness run
pytest tests/ -m liveliness \
  --sim msairsim \
  --num-robots 1 \
  --stress-iterations 1 \
  --stable-duration 60 \
  -v

CLI option reference

Option Default Description
--sim msairsim,isaacsim Comma-separated sim targets
--num-robots 1,3 Comma-separated robot counts
--stress-iterations 3 Up/down cycles per (sim, num_robots) config
--stable-duration 120 Seconds test_stable polls for
--stable-interval 10 Seconds between polls in test_stable
--gui off Show simulator GUI (disables headless mode)
--takeoff-velocities 0.5,1,2 Takeoff/land speeds in m/s

Autonomy Tests (test_takeoff_hover_land.py)

TestTakeoffHoverLand runs a 4-phase flight chain for every combination of
(sim, num_robots, iteration, velocity). The drone returns to the ground after
each velocity so the next velocity starts from a clean state.

Phase order

Phase Test What happens
1 test_px4_ready Waits for MAVROS + PX4 EKF ready; once per env
2 test_takeoff Sends TakeoffTask; asserts altitude within 10 %
3 test_hover Captures odom for 10 s; asserts altitude drift < 0.5 m
4 test_landing Sends LandTask; asserts final altitude < 0.5 m

If any phase other than test_hover fails, the remaining phases for that env
are skipped (the chain guard prevents a stuck-in-air drone from blocking later
velocity sweeps). A hover failure does not skip landing, so the drone always
returns to the ground.

Recorded metrics

Metric key Unit Description
ready_duration_sys_s s Wall-clock time from test start until PX4 ready
takeoff_duration_sim_s s Sim-time from first motion to 95 % of target
land_duration_sim_s s Sim time from 80 % peak descent to < 0.5 m
velocity_rmse_m_sim_s m/s RMSE of dz/dt vs commanded velocity during climb/descent
altitude_error_m m Signed steady-state error at takeoff success (+ = high)
overshoot_m m Unsigned transient overshoot above target
hover_altitude_mean_error_m m Mean altitude drift during hover
hover_position_stddev_m m 3-D position jitter (sqrt of summed axis variances)
final_altitude_m m Altitude at landing action completion
odometry_error_mean_m m Mean 3-D position error vs ground-truth odom
odometry_error_max_m m Peak 3-D error vs ground-truth odom
odometry_altitude_bias_m m Signed z-axis bias vs ground-truth odom

Metrics are recorded per robot as robot_N.<key> and written to
tests/results/<timestamp>/metrics.json.

Running autonomy tests

# Sweep velocities 0.5, 1, 2 m/s; 1 robot; ms-airsim
airstack test -m autonomy \
  --sim msairsim \
  --num-robots 1 \
  --stress-iterations 1 \
  --takeoff-velocities 0.5,1,2 \
  -v

# Single velocity, Isaac Sim, 3 robots
airstack test -m autonomy \
  --sim isaacsim \
  --num-robots 3 \
  --stress-iterations 1 \
  --takeoff-velocities 1 \
  -v

Metrics Reporting (parse_metrics.py)

tests/parse_metrics.py reads results.xml and metrics.json from a run directory and produces a markdown report. It has two modes:

Single-run report

python tests/parse_metrics.py \
  --current tests/results/2025-04-21_14-30-00/

Prints a markdown table of all recorded metrics. Always exits 0.

Diff / regression check

python tests/parse_metrics.py \
  --current  tests/results/2025-04-21_14-30-00/ \
  --baseline tests/results/2025-04-20_09-00-00/ \
  --threshold 20          # optional: regression if change% exceeds this (default 20)
  --output   report.md    # optional: also write to file

Prints a side-by-side comparison. Exits 1 if any metric regresses beyond the threshold; exits 0 otherwise.

The report has three sections per test module:

  • Metrics — flat table of scalar metrics (test name, metric key, value/baseline, change%)
  • Sim publishing rates — pivot table of topic Hz aggregates (mean, start_mean, end_mean, min, max)
  • Compute usage — pivot table of CPU/memory/GPU metrics per container

Regressions are flagged with 🔴, improvements with 🟢.


Did you update the docs (and where)?

yes under the tests/README.md and added to mkdocs.yml

OasisArtisan and others added 25 commits April 16, 2026 17:52
…verge to correct location at startup. More elegent solution is needed.

Also added tmux plugin to airsim tmux windows
Compare metrics still unchecked. And modelling cross test dependencies not done yet
Still need to work on making metrics more informative and informative logging messages.

Also have not tested state calculating state estimation errors as we will need ground truth from airsim and isaacsim
…endered from results.xml using parse_metrics.py
@andrewjong andrewjong merged commit 1104628 into main Apr 23, 2026
1 of 3 checks passed
@andrewjong andrewjong deleted the omar/ci branch April 23, 2026 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants