Skip to content

fix: IPC build reliability and test improvements#27

Merged
w4bremer merged 11 commits into
mainfrom
fix/cmake-ipc-reliability
Apr 7, 2026
Merged

fix: IPC build reliability and test improvements#27
w4bremer merged 11 commits into
mainfrom
fix/cmake-ipc-reliability

Conversation

@w4bremer
Copy link
Copy Markdown
Contributor

@w4bremer w4bremer commented Apr 5, 2026

Summary

  • NATS cnats fallback: Replace hardcoded nats_static requirement with static→dynamic→fatal fallback chain, including MSVC debug-suffix probes and shared-aware Conan target selection
  • MQTT config template: Drop hardcoded paho-mqtt3as-static component from ApigearConfig.cmake.in — the CMakeLists.txt already handles the fallback
  • Conan fixes: Merge duplicate configure() methods (fPIC removal was dead code); correct nats/* option patterns to cnats/* to match the actual package reference
  • MQTT/NATS tests: Simplify MQTT test teardown, use readiness notifications in NATS tests
  • Deps: Bump objectlink-core-cpp to v0.2.12

Test plan

  • CI generates goldenmaster and verifies no diff (ci_generate.yml)
  • CI builds and tests on Windows/Ubuntu/macOS (ci_build_test.yml)
  • CI tests each feature in isolation (ci_build_features.yml)
  • Conan build with NATS enabled passes
  • Non-Conan CMake build with only dynamic cnats available succeeds

w4bremer added 7 commits April 5, 2026 02:42
Update olink core dependency across all version pins:
templates, apigear fallback, and test_cmake script.
NATS tests polled _is_ready() via wait_for without subscribing
for the readiness callback, causing each wait to hit the full
1000ms timeout. Subscribe to _subscribeForIsReady and notify
the condition variable immediately, matching the MQTT pattern.

Also reduce thread pool from 10 to 4 workers, add graceful
parameter to disconnect(), and lower reply_timeout to 2000ms.

Counter module: 30s → 0.09s (~320x speedup).
Remove redundant disconnect wait_for calls in MQTT test teardown.
CWrapper::disconnect() fires callbacks synchronously before
returning, so the condition is already satisfied. Also reduce
test timeout from 2000ms to 1000ms to match other IPC tests.
Replace hardcoded nats_static requirement with
target-existence checks that prefer static, fall
back to dynamic, and fail if neither is found.
Adds MSVC debug-suffix probes and shared-aware
Conan target selection.
The MQTT CMakeLists.txt already handles static and
dynamic fallback via target-existence checks. The
config template should not force a specific component
since exported targets encode the linked variant.
The second configure() silently overwrote the first,
making fPIC removal for shared builds dead code.
Merge both into a single method so fPIC is properly
removed when building shared libraries.
The default_options used "nats/*:" patterns but the
Conan package reference is "cnats/3.9.1". The mismatch
meant options like shared=False were silently ignored,
working only because cnats defaults happened to match.
w4bremer added 3 commits April 7, 2026 04:47
Heap-allocate CallbackContext with its own shared_ptr
so the void* pointers cnats holds remain valid after
CWrapper destruction. The closedCB (last callback per
cnats guarantee) signals a condition variable so the
destructor and disconnect wait until all async
callbacks have completed before freeing the context.

Copy the shared_ptr<natsSubscription> in unsubscribe
before releasing the lock — the previous code
dereferenced a map iterator after unlock, racing with
cleanSubscription on the dispatch thread.

Drain thread pools on disconnect to prevent workers
from accessing destroyed CWrapper members. Disconnect
before destroying adapters so BaseAdapter teardown
runs while the connection is still alive.

Use readiness notifications in NATS tests to avoid
fixed-delay waits.
Replace getPtr() (shared_from_this, throws bad_weak_ptr
when expired) with weak_from_this() in all Paho callback
contexts. The existing lock() guard in each callback
already handles the expired case.

Guard onSubscribed/onUnsubscribed callbacks with
m_disconnectRequested to prevent invoking adapter
callbacks during teardown. Clear all topic maps in
disconnect() after joining the run thread so late Paho
callbacks find nothing to invoke.

Add m_disconnectRequested check and 10-second timeout
to waitForPendingMessages() — the previous loop could
spin forever because m_connected was set to false only
after the thread was joined.

Lock m_subscribedTopicsMutex in handleTextMessage() to
fix a data race with concurrent modifications.

Skip unsubscribeAllTopics() in run() when disconnecting
intentionally — MQTTAsync_disconnect handles cleanup.
@w4bremer w4bremer force-pushed the fix/cmake-ipc-reliability branch from 6ec1e1b to d761532 Compare April 7, 2026 03:01
TSan adds 5-10x overhead to all mutex operations.
The hardcoded 1000ms timeout in IPC tests is too
tight, causing flaky failures when async round-trips
don't complete in time.

Use __SANITIZE_THREAD__ (GCC) and __has_feature
(Clang) to detect TSan builds and raise the timeout
to 10 seconds.
@w4bremer w4bremer force-pushed the fix/cmake-ipc-reliability branch from 1ffed6c to 6a19a96 Compare April 7, 2026 11:15
@w4bremer w4bremer merged commit b88dc6e into main Apr 7, 2026
28 checks passed
@w4bremer w4bremer deleted the fix/cmake-ipc-reliability branch April 7, 2026 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant