fix(tests): poll for runner exit instead of single-step assert (FILTER-461)#94
Merged
Merged
Conversation
lucasmundim
approved these changes
May 15, 2026
Contributor
lucasmundim
left a comment
There was a problem hiding this comment.
LGTM — the polling helper is sound and the call-site replacements are correct. Two non-blocking nits below.
…R-461) ## 📋 What does this PR do? Adds a `wait_for_runner_exit(runner, timeout=5.0)` helper in `tests/test_filter.py` that polls `runner.step()` until it returns something other than `False` (or times out), and replaces five fragile single-step shutdown assertions across four `TestFilterOld` tests: - `test_topo_balance_step` - `test_topo_ephemeral_join_step` - `test_topo_doubly_ephemeral_tee_step` (two callsites) - `test_topo_subscribe_wildcard_all_step` Each callsite previously asserted that a single `runner.step()` immediately after `qout.put(None)` would return the exit-codes list (`[0] * N` for N filters). With the helper, the assertion now waits up to 5 s for the runner to actually report termination — preserving the original assertion semantics, but with the same `step → settle → assert` discipline the data-frame iterations already use. ## 🔍 Why is this needed? `test_topo_balance_step` started failing on Python 3.10 only as of [Create Release dry-run run #25936326919](https://github.com/PlainsightAI/openfilter/actions/runs/25936326919), while passing on 3.11 / 3.12 / 3.13. The same test passed on 3.10 for the v0.1.30 release ([run #24800470941](https://github.com/PlainsightAI/openfilter/actions/runs/24800470941), 2026-04-22). Between those, #82 introduced the zero-copy IPC transport with SHM — a perf rewrite of the layer this test exercises. The faster IPC shifted timing margins such that on 3.10's GIL/scheduling, at least one of N filters routinely hasn't yet drained the `None` shutdown sentinel + reported its exit code by the time the next `step()` runs. Result: `step()` returns `False` ("not done") instead of `[0, 0, …]`. This is **test fragility, not a runtime regression** — the 9 data-frame iterations in the same test exercise the full pipeline correctly across 3.10 too. But it was also a **release blocker**: the Create Release workflow gates on every matrix Python version passing run-tests, so any release dispatch was rolling dice on the 3.10 job. See [FILTER-461](https://plainsight-ai.atlassian.net/browse/FILTER-461) for the full root-cause analysis. ## 🧪 How was it tested? Set up a fresh Python 3.10 venv with the project in editable mode (`uv venv .venv-310 --python 3.10`, `uv pip install -e ".[all]"`, install pytest), and ran the failing test: ``` $ .venv-310/bin/pytest tests/test_filter.py::TestFilterOld::test_topo_balance_step -v … tests/test_filter.py::TestFilterOld::test_topo_balance_step PASSED [100%] 1 passed in 3.66s ``` Repeated 3× consecutively — all green. Also ran all four patched tests together — all pass: ``` tests/test_filter.py::TestFilterOld::test_topo_balance_step PASSED [ 25%] tests/test_filter.py::TestFilterOld::test_topo_ephemeral_join_step PASSED [ 50%] tests/test_filter.py::TestFilterOld::test_topo_doubly_ephemeral_tee_step PASSED [ 75%] tests/test_filter.py::TestFilterOld::test_topo_subscribe_wildcard_all_step PASSED [100%] 4 passed in 8.72s ``` Final CI confirmation will come from the Create Release re-dispatch after this merges — all four Python matrix versions should pass run-tests consistently. ## 🔗 Related Issues - [FILTER-461](https://plainsight-ai.atlassian.net/browse/FILTER-461) — root-cause analysis and fix proposal (this PR implements the proposed fix). - Unblocks the v0.2.1 release ceremony (validation of the `PLAINSIGHT_PYPI_TOKEN` publish path was caught by this flake before reaching the PyPI upload step). ## ✅ Checklist - [x] I have read and agreed to the terms of the [LICENSE](../LICENSE) - [x] I have read the [CONTRIBUTING](../CONTRIBUTING.md) guide - [x] I have followed the [coding style](../CONTRIBUTING.md#coding-style) - [x] I have signed all commits in compliance with the DCO (`git commit -s`) - [x] I have added or updated **tests** as needed (the fix IS the test change; helper is documented in-line) - [x] I have added or updated **documentation** as needed (helper has a docstring explaining the FILTER-461 context) Signed-off-by: stwilt <swilt@plainsight.ai>
75901fd to
b018167
Compare
6 tasks
stwilt
added a commit
that referenced
this pull request
May 16, 2026
Addresses two non-blocking observations from shingonoide on #96: 1. Adds a bullet under `### Infrastructure` for **#85** (`feat(cascade): tag-triggered bump-PR cascade`), the foundational DT-145 work that replaced `cloudbuild-cascade.yaml`. The previous draft credited the follow-on #93 (cascade upper-bound widening) but not #85, which read as an editorial inconsistency rather than a deliberate scoping choice. 2. Extends the preamble so RELEASE.md is self-contained on the rollup semantics: explicitly notes that DT-145 (#85 / #93) and FILTER-461 (#94) are folded in here despite being merged *after* the original v0.2.0 changelog window. The PR body already documented this; now future readers don't have to reach for the PR. No changes to the FILTER-44x / OTLP TLS / FILTER-461 entries; no version bump; parser regex still matches `v0.2.1 - 2026-05-16`. Signed-off-by: stwilt <swilt@plainsight.ai>
stwilt
added a commit
that referenced
this pull request
May 16, 2026
## 📋 What does this PR do? Edits `RELEASE.md` so the top entry collapses the previously-separate `v0.2.0` (2026-05-08, FILTER-441/442/444/452 declarative-config work) and `v0.2.1` (2026-05-09, OTLP TLS inference) entries under a single `## v0.2.1 - 2026-05-16` header. `VERSION` already says `v0.2.1` and is not modified by this PR. Also folds in two changes that shipped to main *after* the v0.2.0 entry's authored date but before this rollup: - DT-145 cascade-widening (#93) under `### Infrastructure` - FILTER-461 test-fragility fix (#94) under `### Fixed` ## 🔍 Why is this needed? `v0.2.0` was changelogged-but-never-tagged. The Create Release workflow's `changelog-parser-action` reads only the top entry off `RELEASE.md`; dispatching against main as-is would have tagged `v0.2.1` with release notes covering only the OTLP TLS change, leaving the FILTER-44x declarative-config work shipped-but-invisible in the GitHub release UI. Choosing rollup-into-v0.2.1 over the alternative paths: - **Cut v0.2.0 first then v0.2.1 sequentially**: requires a temp branch resetting `VERSION` to `v0.2.0` and dispatching from there — three cascades over a few days, churn across 30 consumer repos. - **Just cut v0.2.1 as-is**: loses the FILTER-44x notes on the GitHub release page; consumers reading the release page would not see the declarative-config work. Roll-up is fastest to prod and gives reviewers a complete view of what's actually shipping under the v0.2.1 tag. ## 🧪 How was it tested? - Parser regex compatibility verified locally — the `changelog-parser-action` regex `^v([0-9]+)\.([0-9]+)\.([0-9]+)(?:\.([0-9]+)-((?:dev|rc|int)))?(?:\s*-\s*(\d{4}-\d{2}-\d{2}))?$` matches `v0.2.1 - 2026-05-16` and extracts `(0, 2, 1, None, None, '2026-05-16')`. - `VERSION` and the new RELEASE.md top header agree on `v0.2.1`, satisfying the workflow's mandatory match check. - No code change; one file modified (`RELEASE.md`), pure content edit. ## 🔗 Related Issues After merge, dispatching `Create Release` from main will tag `v0.2.1`, build + publish to PyPI under the rolled-up notes, and fire `cascade-on-tag.yaml` against all 30 eligible filter-* repos (per the earlier dry-run discover sweep). The cascade-widening shipped in #93 means consumers pinning `>=0.1.30,<0.2.0` will get a constraint-widening PR opened in their repos rather than being silently skipped. Final two ops items before dispatch: - Confirm gitops PR #117 (filter telemetry endpoint `http://` prefix) is merged so newly-released filter pods don't break on TLS-by-default against the plaintext in-cluster collector. - Decide on cascade auto-merge stance: default is on; can override per-cascade via `workflow_dispatch` if a phased rollout is desired. ## ✅ Checklist - [x] I have read and agreed to the terms of the [LICENSE](../LICENSE) - [x] I have read the [CONTRIBUTING](../CONTRIBUTING.md) guide - [x] I have followed the [coding style](../CONTRIBUTING.md#coding-style) - [x] I have signed all commits in compliance with the DCO (`git commit -s`) - [x] I have added or updated **tests** as needed (documentation-only change; no test impact) - [x] I have added or updated **documentation** as needed (this PR *is* the documentation change) --------- Signed-off-by: stwilt <swilt@plainsight.ai>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📋 What does this PR do?
Adds a
wait_for_runner_exit(runner, timeout=5.0)helper intests/test_filter.pythat pollsrunner.step()until it returns something other thanFalse(or times out), and replaces five fragile single-step shutdown assertions across fourTestFilterOldtests:test_topo_balance_steptest_topo_ephemeral_join_steptest_topo_doubly_ephemeral_tee_step(two callsites)test_topo_subscribe_wildcard_all_stepEach callsite previously asserted that a single
runner.step()immediately afterqout.put(None)would return the exit-codes list ([0] * Nfor N filters). With the helper, the assertion now waits up to 5 s for the runner to actually report termination — preserving the original assertion semantics, but with the samestep → settle → assertdiscipline the data-frame iterations already use.🔍 Why is this needed?
test_topo_balance_stepstarted failing on Python 3.10 only as of Create Release dry-run run #25936326919, while passing on 3.11 / 3.12 / 3.13. The same test passed on 3.10 for the v0.1.30 release (run #24800470941, 2026-04-22). Between those, #82 introduced the zero-copy IPC transport with SHM — a perf rewrite of the layer this test exercises. The faster IPC shifted timing margins such that on 3.10's GIL/scheduling, at least one of N filters routinely hasn't yet drained theNoneshutdown sentinel + reported its exit code by the time the nextstep()runs. Result:step()returnsFalse("not done") instead of[0, 0, …].This is test fragility, not a runtime regression — the 9 data-frame iterations in the same test exercise the full pipeline correctly across 3.10 too. But it was also a release blocker: the Create Release workflow gates on every matrix Python version passing run-tests, so any release dispatch was rolling dice on the 3.10 job.
See FILTER-461 for the full root-cause analysis.
🧪 How was it tested?
Set up a fresh Python 3.10 venv with the project in editable mode (
uv venv .venv-310 --python 3.10,uv pip install -e ".[all]", install pytest), and ran the failing test:Repeated 3× consecutively — all green. Also ran all four patched tests together — all pass:
Final CI confirmation will come from the Create Release re-dispatch after this merges — all four Python matrix versions should pass run-tests consistently.
🔗 Related Issues
PLAINSIGHT_PYPI_TOKENpublish path was caught by this flake before reaching the PyPI upload step).✅ Checklist
git commit -s)