A universal conformance test suite for DICOM-RT-to-NIfTI converters.
This tool generates a deterministic synthetic CT volume paired with an RTSTRUCT containing seven analytically-defined ROIs (sphere, cube, cylinder, ellipsoid, torus, hollow sphere, hollow cylinder), and ships analytic ground-truth NIfTI masks for each. Any tool that converts RTSTRUCT contours into per-ROI NIfTI masks can be checked against those references — language-agnostic, in three steps:
rtmask-conformance generate <fixture_dir>writes the fixture (CT + RTSTRUCT + GT).- Run your converter against the fixture; have it emit one
<roi>.nii.gzper ROI into a predictions directory. rtmask-conformance verify --predictions <pred_dir> --groundtruth <fixture_dir>/groundtruthscores each ROI and exits0(pass) or1(fail).
Because ground truth is computed analytically rather than by a competing converter, the measurement is independent of the tool under test. That independence only matters if the metrics beneath the gate are themselves trustworthy — so before showing how to wire one into a build, here is what those metrics do under controlled perturbation.
A conformance gate is only as trustworthy as the metrics underneath it. The internal validation suite proves the metrics respond predictably to known geometric perturbation — not only on the seven shipped ROIs, but on synthetic cases whose answers can be recovered with arithmetic.
Two identical 10³ binary cubes, one translated along x by 0–10 voxels at
1 mm spacing. Dice falls linearly through zero once the shift exceeds the
cube's edge; HD95 tracks the shift exactly, because the worst displaced
contour voxel sits at precisely the shift distance from its nearest match.
The dotted ideal HD95 = shift × spacing reference sits flush with the
measurement at every step:
The end-to-end suite in
test_offset_overlap_e2e.py
applies the same translation to the real fixture. It rasterises the ideal
prediction for every conformance ROI, writes perturbed copies via
np.roll (with the wraparound zeroed so the result is a true translation),
and runs the full verifier against the analytic ground truth:
| Perturbation | Pinned behavior across all 7 ROIs |
|---|---|
| None (ideal predictions) | Dice ≥ 0.999, HD95 ≤ 1 mm |
| 1-voxel x-shift | HD95 ∈ [0.5, 3.0] mm; Dice < 1 |
| 3-voxel x-shift | every ROI's status = FAIL |
| Ideal → 1-voxel → 3-voxel | Dice and HD95 strictly monotone in shift |
The monotonicity row is the strongest cross-metric assertion in the suite.
A regression that swaps the operands of binary_dsc or breaks the signed
distance map will either invert the ordering across the three shifts or
collapse them — a class of bug that per-case threshold tests do not always
catch, because flipping a numerator and denominator can still leave a
single measurement on the right side of a pass/fail line. The same fixture
also runs a 1-voxel binary_erosion perturbation to verify volume error
is signed correctly: every ROI under-reports volume, with the relative
loss tracking surface-area-to-volume ratio.
Every Dice, surface-DSC, HD95, MSD, and volume-error number this suite
reports comes from three vendored functions in
_vendor/metrics.py. Each is
pinned against hand-computable expected values on synthetic numpy arrays —
six axis-aligned cube configurations whose Dice is recoverable by counting
voxels: identity 1.0000, half-overlap 0.5000, eighth-overlap 0.1250, subset
0.2222, disjoint 0.0000, one-empty 0.0000. Measured agrees with analytical
to four decimal places across every accepted dtype (bool, uint8,
uint16, int32, float32):
A Metric drift gate runs these unit tests on every CI build before the slower end-to-end suite begins, so a numeric regression in any vendored metric fails the build at the cheapest stage. The full battery — anisotropic spacing, surface DSC saturation under tolerance, erosion volume loss vs surface-area-to-volume ratio — is in docs/VALIDATION.md, with every figure programmatically regenerated from the live metric functions so visual drift mirrors numeric drift.
Seven closed-planar primitives, each centered in a different region of a 512×512×200 mm volume to avoid overlap:
| ROI name | Shape | Dimensions | Note |
|---|---|---|---|
sphere |
sphere | r = 40 mm | smooth, convex |
cube |
cube | side 60 mm | axis-aligned |
cylinder |
z-axis cylinder | r = 30, h = 80 mm | curved + flat caps |
ellipsoid |
ellipsoid | semi-axes (30, 50, 60) mm | anisotropic |
torus |
z-axis torus | R = 60, r = 20 mm | annular cross-sections |
hollow_sphere |
hollow sphere | R = 40, r = 20 mm | XOR (multi-contour) |
straw |
hollow cylinder | R = 40, r = 20, h = 120 mm | XOR (multi-contour) |
Tools that mishandle multi-contour even-odd fill produce a solid (Dice ≈ 0.6) on the two XOR primitives and will fail conformance loudly — that is a feature, not a bug.
Each ROI is scored on:
- Dice (volumetric)
- Surface DSC @ 1 mm (Nikolov-style, tolerance-bounded)
- Hausdorff 95 (mm)
- Mean surface distance (mm)
- Relative volume error
A geometry precheck runs first: if a prediction's (origin, spacing, size, direction)
differs from the ground-truth NIfTI by more than 1e-4, the ROI is flagged
GEOMETRY_MISMATCH rather than scored — most third-party tool bugs are geometry, not
voxel labeling, and surfacing them separately is more diagnostic.
pip install git+https://github.com/brianmanderson/RTMaskConformanceTest
Requires Python ≥ 3.10. Runtime deps: pydicom, SimpleITK, numpy, scipy, pyyaml.
# 1. Generate fixture (DICOM CT series + RTSTRUCT + ground-truth NIfTIs)
rtmask-conformance generate ./fixture
# 2. Run YOUR tool. It must produce one binary NIfTI per ROI:
# ./predictions/sphere.nii.gz
# ./predictions/cube.nii.gz
# ... etc
#
# Inputs to your tool:
# DICOM CT series : ./fixture/refct/
# RTSTRUCT : ./fixture/rtstruct/primitives_planar.dcm
# 3. Score the predictions
rtmask-conformance verify --predictions ./predictions --groundtruth ./fixture/groundtruthExit codes: 0 all ROIs PASS, 1 any FAIL/MISSING/GEOMETRY_MISMATCH, 2 usage error.
See the file README_FOR_TOOL_AUTHOR.md written into the fixture directory for the
complete contract a tool author must satisfy.
Defaults ship in src/rtmask_conformance/data/default_thresholds.yaml. Override with
your own YAML and pass --config conformance.yaml:
schema_version: 1
defaults:
dice: 0.95
surface_dice_1mm: 0.95
hd95_mm: 2.0
msd_mm: 0.5
volume_rel_err: 0.03
primitives:
torus:
dice: 0.90 # relax for tools known to struggle with toroidal cross-sectionsPer-primitive overrides shallow-merge over defaults. Unknown schema_version is
rejected.
RTMASK_CONFORMANCE_PREDICTIONS=./predictions \
RTMASK_CONFORMANCE_GROUNDTRUTH=./fixture/groundtruth \
pytest --pyargs rtmask_conformance.testsThis produces one parametrized test per ROI with the same pass/fail semantics as the CLI.
For tools that already hold prediction and ground-truth masks in memory — or that
score data unrelated to the seven shipped ROIs — the package exposes two
format-flexible entry points. Either argument may be a numpy.ndarray, a path
to a NIfTI file, or a SimpleITK.Image, and the two arguments may differ in form:
from rtmask_conformance import evaluate_masks, evaluate_masks_with_thresholds
# 1. Raw metrics — no thresholds, no notion of "ROI". Just the numbers.
metrics = evaluate_masks(prediction, ground_truth, spacing_xyz=(1.0, 1.0, 1.0))
print(metrics.dice, metrics.surface_dice, metrics.hd95_mm,
metrics.msd_mm, metrics.volume_rel_err)
# 2. Graded result — apply thresholds, get a ResultRecord with PASS/FAIL.
record = evaluate_masks_with_thresholds(
prediction, ground_truth,
spacing_xyz=(1.0, 1.0, 1.0),
thresholds={"dice": 0.9, "hd95_mm": 2.0}, # partial dict OK; missing keys
# fall back to shipped defaults
)
print(record.status.value, record.violations)prediction and ground_truth each accept:
| Form | Spacing source |
|---|---|
numpy.ndarray (Z, Y, X) |
spacing_xyz= argument is required |
pathlib.Path / str (NIfTI file) |
inferred from the file's header |
SimpleITK.Image |
inferred from .GetSpacing() |
When both inputs carry geometry, an (origin, spacing, size, direction) precheck
runs first and any disagreement raises GeometryMismatchError (or returns a
ResultRecord with Status.GEOMETRY_MISMATCH from the thresholded variant).
Pass check_geometry=False to score arbitrary array pairs.
evaluate_masks_with_thresholds resolves thresholds in priority order:
thresholds=— aThresholdsinstance (used as-is) or adict(shallow-merged overconfig.defaults).roi="<one of CONFORMANCE_ROIS>"— looks up per-ROI thresholds from the shipped YAML.- Fallback to the shipped
defaultsblock whenroiis unknown / unspecified.
evaluate_masks returns a MaskMetrics dataclass:
MaskMetrics(
dice, # always populated
surface_dice, # None if both masks empty contour-wise
hd95_mm, # None if either mask empty
msd_mm, # None if either mask empty
volume_rel_err, # None if reference is empty
volume_abs_err_mm3,
tool_volume_mm3,
reference_volume_mm3,
surface_dsc_tolerance_mm, # echoes the tolerance used
spacing_xyz, # echoes the spacing used
)evaluate_masks_with_thresholds returns a ResultRecord with .status,
.metrics, .thresholds, .violations, and (on geometry failure)
.geometry_diagnostic — the same dataclass evaluate_one returns, so any code
that already consumes verify reports works unchanged.
Four end-to-end integrations live in sister projects. Each is the recommended template for its language / convention: copy the four pieces, adapt the converter call.
| Tool | Language / runtime | Pattern | Recommended for |
|---|---|---|---|
| DicomRTTool | Python / pip (PyPI-published) | requirements-file dep + pytest + CI job | Python packages with an existing pytest suite, especially PyPI-published ones |
| PyRaDiSe | Python / pip | same as DicomRTTool, with single-folder staging | Python packages whose API takes a single root directory |
| rt-utils | Python / pip | same four-piece pattern; mask returned in-memory rather than written to disk | Python packages whose converter returns numpy arrays |
| Dicom_RT_Images_Csharp | C# / .NET Framework 4.8 | CI-only: build → headless CLI → verify | Compiled tools with a CLI / headless mode |
The three Python integrations drive the suite from inside pytest; the C# integration is purely CLI-driven inside a GitHub Actions job. The accuracy gate is the same — only the surrounding plumbing differs.
The DicomRTTool
package wires this conformance suite in as a separate CI check. It's the
recommended pattern if your tool is a Python package with a pyproject.toml
and existing pytest suite — copy these four pieces and adapt the converter
call. Live files:
- requirements-conformance.txt — opt-in dependency (kept out of
pyproject.tomlso PyPI uploads aren't blocked; see below) - tests/test_conformance.py — fixture + per-ROI assertions
- tests/conformance.yaml — calibrated thresholds
- .github/workflows/conformance.yml — separate "Conformance" CI check
The intuitive place for this dependency is a [project.optional-dependencies]
extra in pyproject.toml. Don't do that if you publish to PyPI. PyPI
rejects any uploaded distribution whose metadata contains a PEP 508 direct
URL reference (name @ git+https://...) with HTTPError 400: Invalid value for requires_dist. The dependency has to live somewhere pip understands but
the PyPI uploader never inspects — a plain requirements file is the
simplest such place:
# requirements-conformance.txt
# Conformance-suite-only dependency. Kept out of pyproject.toml because
# PyPI rejects metadata containing direct URL references (PEP 508
# `name @ url`). Install with:
# pip install -e .[dev] -r requirements-conformance.txt
rtmask-conformance @ git+https://github.com/brianmanderson/RTMaskConformanceTest
Developers and CI install with pip install -e ".[dev]" -r requirements-conformance.txt.
The base package install (pip install yourtool) is untouched — users who
only want the converter don't pull pyyaml/trimesh/etc., same as the
extras-based version would have given them.
If your tool is not PyPI-published, you can equivalently use a
[project.optional-dependencies] conformance = ["rtmask-conformance @ git+https://..."] extra and install with pip install -e .[conformance].
It's the same dependency, just declared in a place that PyPI's metadata
validator will reject on upload. When in doubt, use the requirements-file
form — it works in both cases.
"""Conformance test: <YourTool> vs RTMaskConformanceTest analytic ground truth."""
from __future__ import annotations
import os
from pathlib import Path
import pytest
import SimpleITK as sitk
# Skips the entire module if the conformance extra isn't installed,
# so the default `pytest` run is unaffected.
rtmask_conformance = pytest.importorskip(
"rtmask_conformance",
reason="install rtmask-conformance: pip install -e .[dev] -r requirements-conformance.txt",
)
from rtmask_conformance import CONFORMANCE_ROIS, generate_fixture, load_config # noqa: E402
from rtmask_conformance.generate import GenerateOptions # noqa: E402
from rtmask_conformance.verify import Status, evaluate_one # noqa: E402
# >>> Replace this import with your tool's converter API <<<
from YourTool import RTStructToMaskConverter # noqa: E402
_CONFIG_YAML = Path(__file__).with_name("conformance.yaml")
@pytest.fixture(scope="session")
def conformance_fixture(tmp_path_factory):
"""Synthetic CT + RTSTRUCT + analytic GT NIfTIs (one per ROI)."""
out = tmp_path_factory.mktemp("conformance_fixture")
# n_quadrature=2 keeps fixture build under ~30 s; n=8 is the published default.
generate_fixture(out, options=GenerateOptions(n_quadrature=2))
return out
@pytest.fixture(scope="session")
def predictions(conformance_fixture, tmp_path_factory):
"""Run YOUR tool against the fixture; emit one binary <roi>.nii.gz per ROI."""
pred_dir = tmp_path_factory.mktemp("preds")
# >>> Adapt this block to your tool's API <<<
converter = RTStructToMaskConverter(roi_names=list(CONFORMANCE_ROIS))
converter.load_dicom_series(conformance_fixture / "refct")
converter.load_rtstruct(conformance_fixture / "rtstruct" / "primitives_planar.dcm")
# The verifier expects <pred_dir>/<roi>.nii.gz per ROI. If your tool emits
# a single labeled mask, split it into per-ROI binaries here:
for roi in CONFORMANCE_ROIS:
binary_mask = converter.get_roi_mask(roi) # <-- your API
img = sitk.GetImageFromArray(binary_mask.astype("uint8"))
img.CopyInformation(converter.reference_image) # <-- your API
sitk.WriteImage(img, str(pred_dir / f"{roi}.nii.gz"))
return pred_dir
@pytest.fixture(scope="session")
def conformance_config():
"""Resolution: env var > tests/conformance.yaml > package defaults."""
config_path = os.environ.get("RTMASK_CONFORMANCE_CONFIG")
if config_path is None and _CONFIG_YAML.is_file():
config_path = str(_CONFIG_YAML)
return load_config(config_path)
@pytest.mark.parametrize("roi", CONFORMANCE_ROIS)
def test_conformance(roi, conformance_fixture, predictions, conformance_config):
pred = predictions / f"{roi}.nii.gz"
gt = conformance_fixture / "groundtruth" / f"{roi}.nii.gz"
result = evaluate_one(roi, pred, gt, conformance_config)
if result.status != Status.PASS:
pytest.fail(
f"{roi}: {result.status.value}\n"
f" violations: {result.violations}\n"
f" metrics: {result.metrics}\n"
f" thresholds: {result.thresholds}"
)The only places you adapt are the marked >>> ... <<< blocks: the import and
the converter-driving block inside the predictions fixture. Everything
else (fixture wiring, geometry handling, parametrization, threshold
resolution) is identical across consumers.
The first time you run the test, expect one or two ROIs to land just under the published defaults — most rasterizers carry a half-voxel boundary bias. Rather than baking that into the package, document it locally so a future voxelizer fix can tighten it:
schema_version: 1
primitives:
cube:
# cv2.fillPoly is boundary-inclusive: every voxel touched by the polygon
# is filled. For an axis-aligned 60 mm cube on 1 mm voxels the rasterized
# mask gains ~3.4% volume from boundary pixels along each face. Surface
# metrics (sDSC=0.999, HD95=1.0 mm, MSD=0.33 mm) confirm the geometry
# is right; the volume gap is purely the boundary convention. Tighten
# back to defaults once the rasterizer honours a half-voxel-shrink.
dice: 0.98
volume_rel_err: 0.04The header of tests/conformance.yaml in DicomRTTool is the canonical example: every relaxation is dated, attributed to a specific behavior, and ends with the path back to the published default. That way the YAML stays self-explanatory as the rasterizer evolves.
A standalone job means "Conformance" appears as its own status check on PRs,
distinct from the existing Tests matrix:
name: Conformance
on:
push:
branches: [main, dev]
pull_request:
branches: [main]
workflow_dispatch: # manual run from any branch
jobs:
conformance:
name: RTSTRUCT->mask conformance
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: pip
cache-dependency-path: |
pyproject.toml
requirements-conformance.txt
- run: |
python -m pip install --upgrade pip
pip install -e ".[dev]" -r requirements-conformance.txt
- run: pytest tests/test_conformance.py -vConformance is an accuracy property — Python/OS portability is already
covered by your main test matrix, so a single ubuntu-latest × py3.12
job here is plenty. workflow_dispatch lets you re-run manually from
the Actions tab after an upstream rtmask-conformance change without
needing a code push.
sphere,cylinder,ellipsoid,torus,hollow_sphere,strawtypically pass on defaults if the converter is correct.cubeis the most common near-miss for boundary-inclusive rasterizers (cv2.fillPoly, naïve scanline fill); document the relaxation per above.hollow_sphereandstraware the strongest signal — a ~0.6 Dice on these means even-odd / multi-contour XOR is broken, which is a real bug in the converter, not a threshold issue.
PyRaDiSe wires the suite in the same four-piece shape as DicomRTTool — see those subsections above for the canonical walkthrough. The only differences worth calling out are PyRaDiSe-specific:
- pyproject.toml — opt-in
conformanceextra (PyRaDiSe is not PyPI-published, so the extras form is fine here; if you later publish, switch to the requirements-file pattern shown for DicomRTTool above) - tests/test_conformance.py — fixture + per-ROI assertions
- tests/conformance.yaml — calibrated thresholds
- .github/workflows/conformance.yml — separate "Conformance" CI check
Three adaptations for PyRaDiSe specifically — these will apply to most crawler-style packages:
1. Single-folder staging. PyRaDiSe's SubjectDicomCrawler walks one
root and groups by study/series — it can't take a CT folder and an
RTSTRUCT path as separate arguments. The predictions fixture hard-links
both into one temp dir before invoking the crawler:
def _stage_dicom_inputs(rtstruct, image_folder, stage):
stage.mkdir(parents=True, exist_ok=True)
for src in image_folder.glob("*.dcm"):
try: os.link(src, stage / src.name)
except OSError: shutil.copy2(src, stage / src.name)
try: os.link(rtstruct, stage / rtstruct.name)
except OSError: shutil.copy2(rtstruct, stage / rtstruct.name)This pattern generalizes: any tool that takes "a directory with both the CT and the RTSTRUCT" needs the same staging step.
2. Defensive image extraction. Different PyRaDiSe point releases
expose the underlying SimpleITK image under different attribute names
(get_image_data() vs get_image()). The fixture tries both:
def _extract_sitk_image(seg):
for attr in ("get_image_data", "get_image"):
if hasattr(seg, attr):
try:
v = getattr(seg, attr)()
if v is not None: return v
except Exception:
continue
return NoneIf your tool's API has shifted across releases, the same try-multiple-names pattern keeps the test resilient without pinning a specific version.
3. Python version mismatch handled by pip. PyRaDiSe declares Python
≥ 3.8, but rtmask-conformance requires ≥ 3.10. The opt-in extra works
out automatically: pip simply refuses to install the extra on 3.8 / 3.9,
so users on older interpreters get PyRaDiSe minus the conformance gate
(the intended behavior — only CI / dev users on 3.10+ run the gate).
The CI workflow also pre-installs setuptools to provide the distutils
shim PyRaDiSe imports (removed from stdlib in Python 3.12). Same trick
applies to any package that hasn't yet migrated off the standard-library
distutils.
PyRaDiSe's first-run cube metrics came in bit-identical to DicomRTTool's:
Metric on cube |
DicomRTTool | PyRaDiSe |
|---|---|---|
| Dice | 0.9835 | 0.9835 |
| Surface DSC @ 1 mm | 0.999 | 0.999 |
| HD95 (mm) | 1.0 | 1.0 |
| MSD (mm) | 0.33 | 0.33 |
| Volume relative error | +3.36% | +3.36% |
Two ostensibly-different Python wrappers around contour-to-mask conversion
producing identical numbers down to four decimal places means they share
an underlying rasterizer (in this case, both call into cv2.fillPoly).
That's the suite functioning as a fingerprint, not just a pass/fail
gate — useful for reasoning about provenance when an upstream
implementation changes, or when validating that a new wrapper hasn't
introduced incidental drift on top of a shared dependency.
rt-utils wires the suite in the same four-piece shape as DicomRTTool. It's the recommended template when the converter returns the mask in memory as a numpy array rather than writing per-ROI NIfTIs to disk. Live files:
- setup.py — opt-in
conformanceextra inextras_require(this fork is not PyPI-published from here, so the extras form is fine; the PyPI-safe requirements-file pattern shown for DicomRTTool above is the drop-in replacement when you do publish) - tests/test_conformance.py — fixture + per-ROI assertions
- tests/conformance.yaml — calibrated thresholds (cube relaxation only)
- .github/workflows/conformance.yml — separate "Conformance" CI check
Two adaptations specific to rt-utils — these will apply to most "give me masks back as numpy arrays" libraries:
1. Predictions fixture writes NIfTIs from numpy + GT geometry. The
rt-utils API is RTStructBuilder.create_from(...).get_roi_mask_by_name(roi)
and returns a bare bool numpy array. The verifier needs <roi>.nii.gz
files with origin / spacing / size / direction matching the ground truth,
so the fixture transposes each mask, wraps it in a SimpleITK.Image whose
geometry is copied verbatim from the GT NIfTI, and writes that to disk:
for roi in rtstruct.get_roi_names():
gt_path = gt_dir / f"{roi}.nii.gz"
if not gt_path.is_file():
continue
mask_yxz = rtstruct.get_roi_mask_by_name(roi) # bool (Y, X, Z)
mask_zyx = np.transpose(mask_yxz, (2, 0, 1)).astype(np.uint8)
gt_img = sitk.ReadImage(str(gt_path))
pred_img = sitk.GetImageFromArray(mask_zyx)
pred_img.CopyInformation(gt_img) # geometry inheritance
sitk.WriteImage(pred_img, str(pred_dir / f"{roi}.nii.gz"))Why we copy GT geometry rather than re-deriving it from the CT: the
fixture's CT slices and the analytic GT NIfTIs share the same geometry
by construction, so copying the GT's metadata onto the prediction is
equivalent to deriving it from the CT — and one less place to drift.
The same (numpy mask) → (sitk image with GT geometry) → (nii.gz)
recipe applies to any in-memory converter.
2. Empirical axis-order verification, not docstring trust. rt-utils'
image_helper.create_empty_series_mask allocates the array with
dimensions ordered (Columns, Rows, Slices), which would imply
(X, Y, Z) and np.transpose(2, 1, 0) to reach SimpleITK's (Z, Y, X).
That ordering is wrong: cv2.fillPoly writes into the per-slice mask at
[y, x] indices, leaving the populated array in (rows=Y, columns=X, slices=Z)
order. The conformance gate caught this on first run — transpose(2, 1, 0)
swapped Y and X for primitives whose centroid Y differed from X
(cylinder, ellipsoid, hollow_sphere, straw), giving Dice = 0
with volumes that otherwise matched the reference within ~1.7%; sphere,
cube, and torus passed because their centroids happened to be on the
Y = X diagonal. np.transpose(2, 0, 1) lined every primitive up on its
GT centroid (Dice ≥ 0.985 across the board, with cube at the expected
cv2.fillPoly fingerprint of 0.9835). The
upstream docstring on get_roi_mask_by_name
now spells out the populated (Y, X, Z) convention so future consumers
don't need to re-derive it. This is the analytic-fixture gate functioning
as a property test on the converter's documentation, not just its
correctness — exactly the kind of "right answer for the wrong reason"
case that download-based golden-mask tests can't surface.
rt-utils' first-run cube metrics joined the cv2.fillPoly cluster bit-for-bit:
Metric on cube |
DicomRTTool | PyRaDiSe | rt-utils |
|---|---|---|---|
| Dice | 0.9835 | 0.9835 | 0.9835 |
| Surface DSC @ 1 mm | 0.999 | 0.999 | 0.999 |
| HD95 (mm) | 1.0 | 1.0 | 1.0 |
| MSD (mm) | 0.33 | 0.33 | 0.33 |
| Volume relative error | +3.36% | +3.36% | +3.36% |
Three independent Python wrappers sitting on top of cv2.fillPoly
producing identical metrics to four decimals confirms the boundary-bias
is the rasterizer's, not any one wrapper's. If a fourth wrapper
landed at, say, +1.7% relative volume error on the cube, you'd know it
either patched the rasterizer or rolled its own.
The Dicom_RT_Images_Csharp project wires the suite in as a CI-only gate. It's the recommended pattern when the tool under test is a compiled binary with a CLI or headless mode — no Python package wrapping, no test runner needed. The pieces are:
- conformance.yaml at the repo root — calibrated thresholds (same schema as the Python case).
- .github/workflows/conformance.yml — single workflow that builds the C# project, runs its headless converter against the fixture, and verifies.
The C# tool's headless CLI surface (the bit you'll need an equivalent of in
your tool) takes the fixture's RTSTRUCT + reference CT folder and writes one
binary <roi>.nii.gz per ROI into an output directory:
Dicom_RT_images_Csharp.exe --headless --forward \
--rtstruct PATH \
--image-folder PATH \
--output-folder PATH
Sourced in Cli/HeadlessRunner.cs — it returns 0/non-zero and emits per-ROI volume rows on stdout. Matching that contract from your tool (whatever it's written in) is the only language-side work; everything else is YAML.
The CI job runs the same generate → run-tool → verify chain, just shelling out to the C# binary in the middle:
name: Conformance
on:
push:
branches: [main]
pull_request:
branches: [main]
workflow_dispatch:
# Bump these to pin a different SimpleITK release. Required because the
# project's .csproj references SimpleITK C# binaries via a relative
# HintPath, and they're not on NuGet.
env:
SIMPLEITK_VERSION: "2.5.0"
SIMPLEITK_ZIP_URL: "https://github.com/SimpleITK/SimpleITK/releases/download/v2.5.0/SimpleITK-2.5.0-CSharp-win64-x64.zip"
# Opt the JS actions into the upcoming Node.js 24 runtime ahead of
# GitHub's 2026-06-02 forced switch.
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: "true"
jobs:
conformance:
name: RTSTRUCT->mask conformance (C# headless)
runs-on: windows-latest # WPF / .NET Framework 4.8 means Windows-only.
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install rtmask-conformance
# `python -m pip` rather than `pip` directly: on Windows runners the
# bare pip.exe console script can't be overwritten if pip ever tries
# to self-upgrade.
run: python -m pip install git+https://github.com/brianmanderson/RTMaskConformanceTest
- uses: microsoft/setup-msbuild@v2
- uses: NuGet/setup-nuget@v2
- name: Stage SimpleITK C# binaries one level above the repo
# The .csproj's HintPath resolves to <repo_parent>/SimpleITK/, so
# that's where the DLLs need to land. The release ZIP layout has
# an inner directory; flatten it.
shell: pwsh
run: |
Invoke-WebRequest -Uri "${{ env.SIMPLEITK_ZIP_URL }}" -OutFile sitk.zip -UseBasicParsing
Expand-Archive -Path sitk.zip -DestinationPath sitk-extracted -Force
$inner = Get-ChildItem sitk-extracted -Directory | Select-Object -First 1
$sourcePath = if ($null -eq $inner) { "sitk-extracted" } else { $inner.FullName }
$target = Join-Path (Split-Path -Parent $env:GITHUB_WORKSPACE) "SimpleITK"
New-Item -ItemType Directory -Path $target -Force | Out-Null
Copy-Item -Path "$sourcePath\*" -Destination $target -Recurse -Force
- run: nuget restore Dicom_RT_images_Csharp.sln
- name: Build C# project (Release|x64)
# x64 (not AnyCPU) makes the 64-bit native SimpleITK requirement
# explicit and matches how the project is built locally.
run: msbuild Dicom_RT_images_Csharp.sln /p:Configuration=Release /p:Platform=x64 /m
- name: Inspect build output
# Asserts the three runtime artifacts (managed exe + managed
# SimpleITK + native SimpleITK) are present before invoking the
# binary. Catches the most common silent failure: a missing
# native DLL produces a DllNotFoundException that AttachConsole
# can swallow.
shell: pwsh
run: |
$bin = "Dicom_RT_images_Csharp\bin\x64\Release"
Get-ChildItem $bin
foreach ($f in @("Dicom_RT_images_Csharp.exe", "SimpleITKCSharpManaged.dll", "SimpleITKCSharpNative.dll")) {
if (-not (Test-Path (Join-Path $bin $f))) { throw "Missing artifact: $f" }
}
- run: rtmask-conformance generate ./fixture --n-quadrature 2
- name: Run C# headless forward conversion
# AttachConsole(ATTACH_PARENT_PROCESS) inside HeadlessRunner is
# unreliable on hosted runners — pwsh's `&` invocation against a
# WinExe doesn't always plumb stdout/stderr back to the GitHub
# Actions log. Use Start-Process -Wait with explicit redirection
# to files, then dump them unconditionally so any failure in the
# binary is debuggable from the run log alone.
shell: pwsh
run: |
$exe = (Resolve-Path "Dicom_RT_images_Csharp\bin\x64\Release\Dicom_RT_images_Csharp.exe").Path
New-Item -ItemType Directory -Path predictions -Force | Out-Null
$stdoutPath = Join-Path $env:RUNNER_TEMP "csharp.stdout.log"
$stderrPath = Join-Path $env:RUNNER_TEMP "csharp.stderr.log"
$proc = Start-Process -FilePath $exe -NoNewWindow -Wait -PassThru `
-ArgumentList @(
"--headless", "--forward",
"--rtstruct", (Resolve-Path fixture/rtstruct/primitives_planar.dcm).Path,
"--image-folder", (Resolve-Path fixture/refct).Path,
"--output-folder", (Resolve-Path predictions).Path
) `
-RedirectStandardOutput $stdoutPath -RedirectStandardError $stderrPath
Write-Host "----- C# stdout -----"; Get-Content $stdoutPath
Write-Host "----- C# stderr -----"; Get-Content $stderrPath
if ($proc.ExitCode -ne 0) { throw "Headless conversion failed: $($proc.ExitCode)" }
- name: Verify
run: |
rtmask-conformance verify `
--predictions ./predictions `
--groundtruth ./fixture/groundtruth `
--config ./conformance.yaml `
--report-json conformance-report.json
- name: Upload report + artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: conformance-report
path: |
conformance-report.json
predictions/
fixture/groundtruth/
fixture/manifest.json
retention-days: 30Some of the steps above look like over-engineering until you hit the failure mode. The four that bit us during this integration:
-
pip install --upgrade pipfails on Windows runners. Pip can't overwrite its own runningpip.exe. Either usepython -m pip install(the python.exe entry point isn't the locked file) or skip the upgrade — the runner's bundled pip is fresh enough. -
& exe.exe ...against a WinExe doesn't always surface stdout. AWinExe(any GUI-subsystem .NET executable, even one with a CLI mode) has no console attached by default.AttachConsole(ATTACH_PARENT_PROCESS)inside the binary tries to attach to pwsh's console, but the timing and redirection on hosted runners is flaky. UseStart-Process -Waitwith-RedirectStandardOutput/-RedirectStandardErrorto files and dump them withGet-Contentregardless of exit code. The exit code propagates correctly even when the streams don't. -
External native DLLs need explicit staging. SimpleITK's C# wrapper isn't on NuGet; the binaries ship as a separate ZIP from github.com/SimpleITK/SimpleITK/releases. Match whatever path-relative HintPath your
.csprojuses (this project uses..\..\SimpleITK\, resolving to<repo_parent>/SimpleITK/). -
Release|x64, not AnyCPU, when you have native deps. The native SimpleITK DLL is x86-64; building AnyCPU on a 64-bit runner technically works because the framework picks 64-bit at runtime, but pinning the platform makes the requirement explicit and unambiguous in the artifact path.
Mention these in your own workflow's comments — future-you will thank past-you.
The verifier produces the same plain-text table the Python case does:
rtmask-conformance verify config=conformance.yaml 7/7 passed
status ROI dice sDSC1 HD95mm MSD mm dV%
-----------------------------------------------------------------------------
PASS sphere 0.9898 1.0000 1.000 0.326 0.75
PASS cube 0.9833 1.0000 1.000 0.333 0.00
PASS cylinder 0.9872 1.0000 1.000 0.304 1.71
PASS ellipsoid 0.9912 1.0000 1.000 0.284 0.84
PASS torus 0.9850 1.0000 1.000 0.350 1.63
PASS hollow_sphere 0.9857 1.0000 1.000 0.323 0.99
PASS straw 0.9821 1.0000 1.000 0.339 1.88
The cube's relaxation in conformance.yaml is documented in the file's header — every override should be.
What makes this gate worth shipping across all three projects is that the same fixture surfaces qualitatively different rasterizer behaviors:
Metric on cube |
DicomRTTool (cv2.fillPoly) | PyRaDiSe (cv2.fillPoly) | rt-utils (cv2.fillPoly) | DicomRTToolC# (C# scanline) |
|---|---|---|---|---|
| Dice | 0.9835 | 0.9835 | 0.9835 | 0.9833 |
| Surface DSC @ 1 mm | 0.999 | 0.999 | 0.999 | 1.000 |
| HD95 (mm) | 1.0 | 1.0 | 1.0 | 1.0 |
| MSD (mm) | 0.33 | 0.33 | 0.33 | 0.33 |
| Volume relative error | +3.36% | +3.36% | +3.36% | 0.00% |
cv2.fillPoly is biased: it counts ~3.4% more voxels than ground truth, and all three Python wrappers around it inherit the bias bit-for-bit. The C# scanline implementation gets the volume exactly right but disagrees with the GT on which ~3500 voxels along the boundary belong to the cube — symmetric error, not systematic over-fill. The three cv2 wrappers and the C# scanline land at near-identical Dice on the cube but for completely different reasons. Without the analytic ground truth this distinction would be invisible; with it, all four implementations get a sharper picture of where they actually sit, and you can tell at a glance which converters share a rasterizer.
Ground-truth is computed by partial-volume sub-voxel quadrature against the analytic
shape definition (default: 8³ samples per voxel, thresholded at 0.5). The primitive
classes, voxelizer, RTSTRUCT writer, and metric implementations are vendored from
the upstream rtmask_validation project — see tools/UPSTREAM_VERSION.txt for the
exact source commit, and tools/sync_from_upstream.py for the re-vendor script.
Apache-2.0

