fix(tests): poll for runner exit instead of single-step assert (FILTER-461) by stwilt · Pull Request #94 · PlainsightAI/openfilter

stwilt · 2026-05-15T21:17:12Z

📋 What does this PR do?

Adds a wait_for_runner_exit(runner, timeout=5.0) helper in tests/test_filter.py that polls runner.step() until it returns something other than False (or times out), and replaces five fragile single-step shutdown assertions across four TestFilterOld tests:

test_topo_balance_step
test_topo_ephemeral_join_step
test_topo_doubly_ephemeral_tee_step (two callsites)
test_topo_subscribe_wildcard_all_step

Each callsite previously asserted that a single runner.step() immediately after qout.put(None) would return the exit-codes list ([0] * N for N filters). With the helper, the assertion now waits up to 5 s for the runner to actually report termination — preserving the original assertion semantics, but with the same step → settle → assert discipline the data-frame iterations already use.

🔍 Why is this needed?

test_topo_balance_step started failing on Python 3.10 only as of Create Release dry-run run #25936326919, while passing on 3.11 / 3.12 / 3.13. The same test passed on 3.10 for the v0.1.30 release (run #24800470941, 2026-04-22). Between those, #82 introduced the zero-copy IPC transport with SHM — a perf rewrite of the layer this test exercises. The faster IPC shifted timing margins such that on 3.10's GIL/scheduling, at least one of N filters routinely hasn't yet drained the None shutdown sentinel + reported its exit code by the time the next step() runs. Result: step() returns False ("not done") instead of [0, 0, …].

This is test fragility, not a runtime regression — the 9 data-frame iterations in the same test exercise the full pipeline correctly across 3.10 too. But it was also a release blocker: the Create Release workflow gates on every matrix Python version passing run-tests, so any release dispatch was rolling dice on the 3.10 job.

See FILTER-461 for the full root-cause analysis.

🧪 How was it tested?

Set up a fresh Python 3.10 venv with the project in editable mode (uv venv .venv-310 --python 3.10, uv pip install -e ".[all]", install pytest), and ran the failing test:

$ .venv-310/bin/pytest tests/test_filter.py::TestFilterOld::test_topo_balance_step -v
…
tests/test_filter.py::TestFilterOld::test_topo_balance_step PASSED  [100%]
1 passed in 3.66s

Repeated 3× consecutively — all green. Also ran all four patched tests together — all pass:

tests/test_filter.py::TestFilterOld::test_topo_balance_step PASSED  [ 25%]
tests/test_filter.py::TestFilterOld::test_topo_ephemeral_join_step PASSED  [ 50%]
tests/test_filter.py::TestFilterOld::test_topo_doubly_ephemeral_tee_step PASSED  [ 75%]
tests/test_filter.py::TestFilterOld::test_topo_subscribe_wildcard_all_step PASSED  [100%]
4 passed in 8.72s

Final CI confirmation will come from the Create Release re-dispatch after this merges — all four Python matrix versions should pass run-tests consistently.

🔗 Related Issues

FILTER-461 — root-cause analysis and fix proposal (this PR implements the proposed fix).
Unblocks the v0.2.1 release ceremony (validation of the PLAINSIGHT_PYPI_TOKEN publish path was caught by this flake before reaching the PyPI upload step).

✅ Checklist

I have read and agreed to the terms of the LICENSE
I have read the CONTRIBUTING guide
I have followed the coding style
I have signed all commits in compliance with the DCO (git commit -s)
I have added or updated tests as needed (the fix IS the test change; helper is documented in-line)
I have added or updated documentation as needed (helper has a docstring explaining the FILTER-461 context)

lucasmundim

LGTM — the polling helper is sound and the call-site replacements are correct. Two non-blocking nits below.

…R-461) ## 📋 What does this PR do? Adds a `wait_for_runner_exit(runner, timeout=5.0)` helper in `tests/test_filter.py` that polls `runner.step()` until it returns something other than `False` (or times out), and replaces five fragile single-step shutdown assertions across four `TestFilterOld` tests: - `test_topo_balance_step` - `test_topo_ephemeral_join_step` - `test_topo_doubly_ephemeral_tee_step` (two callsites) - `test_topo_subscribe_wildcard_all_step` Each callsite previously asserted that a single `runner.step()` immediately after `qout.put(None)` would return the exit-codes list (`[0] * N` for N filters). With the helper, the assertion now waits up to 5 s for the runner to actually report termination — preserving the original assertion semantics, but with the same `step → settle → assert` discipline the data-frame iterations already use. ## 🔍 Why is this needed? `test_topo_balance_step` started failing on Python 3.10 only as of [Create Release dry-run run #25936326919](https://github.com/PlainsightAI/openfilter/actions/runs/25936326919), while passing on 3.11 / 3.12 / 3.13. The same test passed on 3.10 for the v0.1.30 release ([run #24800470941](https://github.com/PlainsightAI/openfilter/actions/runs/24800470941), 2026-04-22). Between those, #82 introduced the zero-copy IPC transport with SHM — a perf rewrite of the layer this test exercises. The faster IPC shifted timing margins such that on 3.10's GIL/scheduling, at least one of N filters routinely hasn't yet drained the `None` shutdown sentinel + reported its exit code by the time the next `step()` runs. Result: `step()` returns `False` ("not done") instead of `[0, 0, …]`. This is **test fragility, not a runtime regression** — the 9 data-frame iterations in the same test exercise the full pipeline correctly across 3.10 too. But it was also a **release blocker**: the Create Release workflow gates on every matrix Python version passing run-tests, so any release dispatch was rolling dice on the 3.10 job. See [FILTER-461](https://plainsight-ai.atlassian.net/browse/FILTER-461) for the full root-cause analysis. ## 🧪 How was it tested? Set up a fresh Python 3.10 venv with the project in editable mode (`uv venv .venv-310 --python 3.10`, `uv pip install -e ".[all]"`, install pytest), and ran the failing test: ``` $ .venv-310/bin/pytest tests/test_filter.py::TestFilterOld::test_topo_balance_step -v … tests/test_filter.py::TestFilterOld::test_topo_balance_step PASSED [100%] 1 passed in 3.66s ``` Repeated 3× consecutively — all green. Also ran all four patched tests together — all pass: ``` tests/test_filter.py::TestFilterOld::test_topo_balance_step PASSED [ 25%] tests/test_filter.py::TestFilterOld::test_topo_ephemeral_join_step PASSED [ 50%] tests/test_filter.py::TestFilterOld::test_topo_doubly_ephemeral_tee_step PASSED [ 75%] tests/test_filter.py::TestFilterOld::test_topo_subscribe_wildcard_all_step PASSED [100%] 4 passed in 8.72s ``` Final CI confirmation will come from the Create Release re-dispatch after this merges — all four Python matrix versions should pass run-tests consistently. ## 🔗 Related Issues - [FILTER-461](https://plainsight-ai.atlassian.net/browse/FILTER-461) — root-cause analysis and fix proposal (this PR implements the proposed fix). - Unblocks the v0.2.1 release ceremony (validation of the `PLAINSIGHT_PYPI_TOKEN` publish path was caught by this flake before reaching the PyPI upload step). ## ✅ Checklist - [x] I have read and agreed to the terms of the [LICENSE](../LICENSE) - [x] I have read the [CONTRIBUTING](../CONTRIBUTING.md) guide - [x] I have followed the [coding style](../CONTRIBUTING.md#coding-style) - [x] I have signed all commits in compliance with the DCO (`git commit -s`) - [x] I have added or updated **tests** as needed (the fix IS the test change; helper is documented in-line) - [x] I have added or updated **documentation** as needed (helper has a docstring explaining the FILTER-461 context) Signed-off-by: stwilt <swilt@plainsight.ai>

Addresses two non-blocking observations from shingonoide on #96: 1. Adds a bullet under `### Infrastructure` for **#85** (`feat(cascade): tag-triggered bump-PR cascade`), the foundational DT-145 work that replaced `cloudbuild-cascade.yaml`. The previous draft credited the follow-on #93 (cascade upper-bound widening) but not #85, which read as an editorial inconsistency rather than a deliberate scoping choice. 2. Extends the preamble so RELEASE.md is self-contained on the rollup semantics: explicitly notes that DT-145 (#85 / #93) and FILTER-461 (#94) are folded in here despite being merged *after* the original v0.2.0 changelog window. The PR body already documented this; now future readers don't have to reach for the PR. No changes to the FILTER-44x / OTLP TLS / FILTER-461 entries; no version bump; parser regex still matches `v0.2.1 - 2026-05-16`. Signed-off-by: stwilt <swilt@plainsight.ai>

## 📋 What does this PR do? Edits `RELEASE.md` so the top entry collapses the previously-separate `v0.2.0` (2026-05-08, FILTER-441/442/444/452 declarative-config work) and `v0.2.1` (2026-05-09, OTLP TLS inference) entries under a single `## v0.2.1 - 2026-05-16` header. `VERSION` already says `v0.2.1` and is not modified by this PR. Also folds in two changes that shipped to main *after* the v0.2.0 entry's authored date but before this rollup: - DT-145 cascade-widening (#93) under `### Infrastructure` - FILTER-461 test-fragility fix (#94) under `### Fixed` ## 🔍 Why is this needed? `v0.2.0` was changelogged-but-never-tagged. The Create Release workflow's `changelog-parser-action` reads only the top entry off `RELEASE.md`; dispatching against main as-is would have tagged `v0.2.1` with release notes covering only the OTLP TLS change, leaving the FILTER-44x declarative-config work shipped-but-invisible in the GitHub release UI. Choosing rollup-into-v0.2.1 over the alternative paths: - **Cut v0.2.0 first then v0.2.1 sequentially**: requires a temp branch resetting `VERSION` to `v0.2.0` and dispatching from there — three cascades over a few days, churn across 30 consumer repos. - **Just cut v0.2.1 as-is**: loses the FILTER-44x notes on the GitHub release page; consumers reading the release page would not see the declarative-config work. Roll-up is fastest to prod and gives reviewers a complete view of what's actually shipping under the v0.2.1 tag. ## 🧪 How was it tested? - Parser regex compatibility verified locally — the `changelog-parser-action` regex `^v([0-9]+)\.([0-9]+)\.([0-9]+)(?:\.([0-9]+)-((?:dev|rc|int)))?(?:\s*-\s*(\d{4}-\d{2}-\d{2}))?$` matches `v0.2.1 - 2026-05-16` and extracts `(0, 2, 1, None, None, '2026-05-16')`. - `VERSION` and the new RELEASE.md top header agree on `v0.2.1`, satisfying the workflow's mandatory match check. - No code change; one file modified (`RELEASE.md`), pure content edit. ## 🔗 Related Issues After merge, dispatching `Create Release` from main will tag `v0.2.1`, build + publish to PyPI under the rolled-up notes, and fire `cascade-on-tag.yaml` against all 30 eligible filter-* repos (per the earlier dry-run discover sweep). The cascade-widening shipped in #93 means consumers pinning `>=0.1.30,<0.2.0` will get a constraint-widening PR opened in their repos rather than being silently skipped. Final two ops items before dispatch: - Confirm gitops PR #117 (filter telemetry endpoint `http://` prefix) is merged so newly-released filter pods don't break on TLS-by-default against the plaintext in-cluster collector. - Decide on cascade auto-merge stance: default is on; can override per-cascade via `workflow_dispatch` if a phased rollout is desired. ## ✅ Checklist - [x] I have read and agreed to the terms of the [LICENSE](../LICENSE) - [x] I have read the [CONTRIBUTING](../CONTRIBUTING.md) guide - [x] I have followed the [coding style](../CONTRIBUTING.md#coding-style) - [x] I have signed all commits in compliance with the DCO (`git commit -s`) - [x] I have added or updated **tests** as needed (documentation-only change; no test impact) - [x] I have added or updated **documentation** as needed (this PR *is* the documentation change) --------- Signed-off-by: stwilt <swilt@plainsight.ai>

stwilt requested a review from lucasmundim May 15, 2026 21:17

lucasmundim approved these changes May 15, 2026

View reviewed changes

Comment thread tests/test_filter.py Outdated

Comment thread tests/test_filter.py

stwilt force-pushed the fix/filter-461-topo-balance-shutdown-poll branch from 75901fd to b018167 Compare May 15, 2026 23:19

stwilt merged commit d936bc8 into main May 15, 2026
11 checks passed

stwilt deleted the fix/filter-461-topo-balance-shutdown-poll branch May 15, 2026 23:27

stwilt mentioned this pull request May 16, 2026

chore(release): roll up v0.2.0 + v0.2.1 release notes under v0.2.1 #96

Merged

6 tasks

lucasmundim mentioned this pull request May 16, 2026

feat(filter_runtime): windowed async batch dispatch (FILTER-457/458/459/460) #95

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tests): poll for runner exit instead of single-step assert (FILTER-461)#94

fix(tests): poll for runner exit instead of single-step assert (FILTER-461)#94
stwilt merged 1 commit into
mainfrom
fix/filter-461-topo-balance-shutdown-poll

stwilt commented May 15, 2026 •

edited by atlassian Bot

Loading

Uh oh!

lucasmundim left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stwilt commented May 15, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📋 What does this PR do?

🔍 Why is this needed?

🧪 How was it tested?

🔗 Related Issues

✅ Checklist

Uh oh!

lucasmundim left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stwilt commented May 15, 2026 •

edited by atlassian Bot

Loading