Add airstack test#344
Merged
Merged
Conversation
…iables not just .env
…verge to correct location at startup. More elegent solution is needed. Also added tmux plugin to airsim tmux windows
… markdown marking regressions
Compare metrics still unchecked. And modelling cross test dependencies not done yet
Still need to work on making metrics more informative and informative logging messages. Also have not tested state calculating state estimation errors as we will need ground truth from airsim and isaacsim
…endered from results.xml using parse_metrics.py
…or better reliability
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this pull request do?
Adds automated testing framework via pytest that automatically runs Systems tests. Currently tests building images, docker container liveness and connectivity, and takeoff hover landing.
Which issue number does this address? #16
Video: https://www.youtube.com/watch?v=EzgGHnYDI_k
How did you implement it?
Omar added stuff
Testing
System Testing
AirStack's system tests bring up the full Docker-based stack — simulator, robot containers, and GCS — and verify end-to-end behavior: container health, ROS 2 node presence, sensor publishing rates, and compute resource usage. Tests are written in Python with pytest and live under
<iframe width="1120" height="630" src="https://www.youtube.com/embed/EzgGHnYDI_k?si=vpqER-TXud5XEMUX" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>tests/at the repo root.Test Suite Structure
test_build_docker.pybuild_dockertest_build_packages.pybuild_packagescolcon buildinside each container (robot, GCS, ms-airsim ROS workspace)test_liveliness.pylivelinesstest_takeoff_hover_land.pyautonomyMarks can be combined with pytest logic:
-m "build_docker or build_packages",-m liveliness,-m autonomy.Test Infrastructure
All shared fixtures, helpers, and configuration live in
tests/conftest.py.airstack_envfixtureParametrized over
(sim, num_robots, iteration)tuples derived from CLI flags. For each combination it:airstack upwith the appropriateCOMPOSE_PROFILES,NUM_ROBOTS, and headless flagsairstack_up_duration_stometrics.jsonenvdict used by everyTestLivelinesstestairstack downand recordsairstack_down_duration_sMetricsRecorderWrites custom metrics to
tests/results/<timestamp>/metrics.jsonafter eachrecord()call. Keys follow the patterntest_node_id → metric_key → {value, unit, direction}. Time-series data (Hz samples, compute snapshots) are stored as{key}_sampleslists and expanded into scalar aggregates (mean, min, max, start_mean, end_mean) byparse_metrics.py.Output files
Every test run produces a timestamped directory:
Running Tests
airstack test(primary interface)airstack testis the standard way to run tests. It builds the containerizedtest runner from
tests/docker/, mounts the repo read-only, and forwards allarguments directly to pytest. No local Python environment needed.
airstack testcallsxhost +automatically so GUI-mode sim containerscan reach the host X server; it is a no-op when
DISPLAYis not set.Prerequisites
dockergroupnvidia-container-toolkitfor liveliness/autonomy testsairstack setupcompleted (addsairstacktoPATH)Direct pytest (for development / debugging)
Run pytest directly when you need faster iteration (no container rebuild) or
want to attach a debugger. Requires a local Python environment.
CLI option reference
--simmsairsim,isaacsim--num-robots1,3--stress-iterations3--stable-duration120test_stablepolls for--stable-interval10test_stable--gui--takeoff-velocities0.5,1,2Autonomy Tests (
test_takeoff_hover_land.py)TestTakeoffHoverLandruns a 4-phase flight chain for every combination of(sim, num_robots, iteration, velocity). The drone returns to the ground aftereach velocity so the next velocity starts from a clean state.
Phase order
test_px4_readytest_takeofftest_hovertest_landingIf any phase other than
test_hoverfails, the remaining phases for that envare skipped (the chain guard prevents a stuck-in-air drone from blocking later
velocity sweeps). A hover failure does not skip landing, so the drone always
returns to the ground.
Recorded metrics
ready_duration_sys_stakeoff_duration_sim_sland_duration_sim_svelocity_rmse_m_sim_saltitude_error_movershoot_mhover_altitude_mean_error_mhover_position_stddev_mfinal_altitude_modometry_error_mean_modometry_error_max_modometry_altitude_bias_mMetrics are recorded per robot as
robot_N.<key>and written totests/results/<timestamp>/metrics.json.Running autonomy tests
Metrics Reporting (
parse_metrics.py)tests/parse_metrics.pyreadsresults.xmlandmetrics.jsonfrom a run directory and produces a markdown report. It has two modes:Single-run report
Prints a markdown table of all recorded metrics. Always exits 0.
Diff / regression check
Prints a side-by-side comparison. Exits 1 if any metric regresses beyond the threshold; exits 0 otherwise.
The report has three sections per test module:
Regressions are flagged with 🔴, improvements with 🟢.
Did you update the docs (and where)?
yes under the tests/README.md and added to mkdocs.yml