Skip to content

fix(agent): persist non-zero InstanceId on every reset (XSD + contract fix)#170

Merged
PatrickRitchie merged 4 commits into
TrakHound:masterfrom
ottobolyos:fix/agent-instanceid-persist-nonzero
May 27, 2026
Merged

fix(agent): persist non-zero InstanceId on every reset (XSD + contract fix)#170
PatrickRitchie merged 4 commits into
TrakHound:masterfrom
ottobolyos:fix/agent-instanceid-persist-nonzero

Conversation

@ottobolyos
Copy link
Copy Markdown
Contributor

Problem

MTConnectAgentApplication.StartAgent reset agentInformation.InstanceId to 0 immediately before calling agentInformation.Save(). That zero is written to agent.information.json on every non-durable or data-item-initializing boot, producing three defect surfaces:

  1. XSD schema violation -- InstanceIdType carries xs:minInclusive value='1'; every file-reading consumer receives a schema-invalid document on each restart.

  2. Persist-and-restore contract -- same-process StopAgent/StartAgent cycles cannot recover the previous session's instanceId, destroying the change-detection signal that clients rely on to detect a buffer flush.

  3. Per-restart uniqueness -- writing 0 trivially collides across every restart, reducing uniqueness to zero on the persisted surface.

The live HTTP/MQTT envelopes were not affected because MTConnectAgent's ctor regenerates _instanceId via CreateInstanceId() when given 0, but the persisted file and any consumer reading it directly were impacted.

Fix

Replace the single assignment on line 389 of MTConnectAgentApplication.cs:

// Before
agentInformation.InstanceId = 0;

// After
agentInformation.InstanceId = (ulong)UnixDateTime.Now;

This mirrors MTConnectAgentInformation's parameterless constructor at libraries/MTConnect.NET-Common/Agents/MTConnectAgentInformation.cs line 39 and guarantees a non-zero, schema-valid value on every reset.

Tests

Three new tests in tests/MTConnect.NET-Common-Tests/Agents/AgentInstanceIdPersistenceTests.cs:

  • InstanceId_reset_must_persist_nonzero_to_state_file -- asserts InstanceId > 0 in the written file (RED before fix, GREEN after).
  • InstanceId_reset_must_be_xsd_spec_compliant_minInclusive_1 -- asserts InstanceId >= 1, citing the XSD constraint explicitly (RED before, GREEN after).
  • InstanceId_two_consecutive_resets_in_same_second_collide_under_unix_second_resolution -- documentation test; asserts both post-fix values are >= 1 and logs whether a second-resolution collision occurs (passes both before and after the fix).

Full MTConnect.NET-Common-Tests: 3889 passed, 0 failed, 0 skipped.

Deferred follow-up

The current fix uses UnixDateTime.Now (same resolution as the parameterless ctor) and can still produce identical InstanceId values if two resets fall in the same tick window. A counter-based variant Math.Max(prev + 1, UnixDateTime.Now) would eliminate that risk and is deferred to a follow-up PR.

Conventions

No Co-Authored-By trailer per project conventions; commits are ottobolyos-only.

@ottobolyos ottobolyos moved this to In Progress in MTConnect.NET-Development May 25, 2026
@ottobolyos ottobolyos force-pushed the fix/agent-instanceid-persist-nonzero branch 2 times, most recently from d3fa245 to a334976 Compare May 27, 2026 00:15
Three behavioural tests in AgentInstanceIdPersistenceTests exercise the
reset-and-save path that StartAgent applies on every non-durable or
data-item-initializing boot.

Test 1 (InstanceId_reset_must_persist_nonzero_to_state_file): replays
the StartAgent conditional, writes agent.information.json to a temp path,
reads it back, and asserts InstanceId > 0. Fails RED because the current
code stores 0.

Test 2 (InstanceId_reset_must_be_xsd_spec_compliant_minInclusive_1):
same path but asserts InstanceId >= 1, directly citing the XSD constraint
InstanceIdType xs:minInclusive value='1'. Also fails RED -- 0 is
schema-invalid for every XML/JSON consumer that reads the state file.

Test 3 (InstanceId_two_consecutive_resets_in_same_tick_collide): a
documentation test that asserts the pre-fix trivial-collision case: two
back-to-back SimulateBoot calls both write 0, so they collide. Passes
RED (and will continue to pass after the fix if the tick values happen
to match in the same window). This test documents the Unix-tick-resolution
limitation noted in the bug report; a counter-based improvement is deferred
to a follow-up PR.

SimulateBoot uses a temp-file path to avoid polluting AppDomain.BaseDirectory,
matching the state-file isolation pattern used in existing agent tests.
StartAgent formerly set agentInformation.InstanceId = 0 immediately
before calling agentInformation.Save(), which persists the value to
agent.information.json. That zero value violates three invariants:

  1. XSD schema validity -- InstanceIdType carries xs:minInclusive
     value='1'; every file-reading consumer receives a schema-invalid
     document on each restart.

  2. Persist-and-restore contract -- same-process StopAgent/StartAgent
     cycles cannot recover the previous session's instanceId because the
     file always stores 0, destroying the change-detection signal that
     clients rely on to detect a buffer flush.

  3. Per-restart uniqueness -- storing 0 reduces uniqueness to zero;
     after the fix, UnixDateTime.Now (ticks since Unix epoch) provides
     sub-millisecond resolution, matching the parameterless ctor of
     MTConnectAgentInformation (line 39 of MTConnectAgentInformation.cs).

The fix replaces the single assignment with:

    agentInformation.InstanceId = (ulong)UnixDateTime.Now;

The three AgentInstanceIdPersistenceTests now go GREEN: tests 1 and 2
assert InstanceId > 0 and >= 1 respectively; test 3 (documentation) asserts
both post-fix values are >= 1 and logs whether a second-resolution collision
occurs -- that Unix-tick-level collision risk is a known limitation and is
deferred to a follow-up counter-based PR (e.g. Math.Max(prev + 1, now)).

Full MTConnect.NET-Common-Tests: 3889 passed, 0 failed, 0 skipped.
…ets (RED)

The SysML XMI MTConnectSysMLModel_V2.7.xml line 15608 imposes a MUST-clause:
"instanceId MUST be changed to a different unique number each time the buffer
is cleared." This is a behavioural contract beyond the XSD schema-validity check
(xs:minInclusive value='1'): two resets that produce the same InstanceId are
XSD-valid but XMI-invalid because clients cannot distinguish the restarts from
state-file data alone.

The new test InstanceId_two_consecutive_resets_in_same_second_must_be_strictly_monotonic
injects a fixed tick value (638_400_000_000_000_000) into both SimulateBootAtTime
calls, modelling two agent restarts within the same tick window. The resulting
second == first (638400000000000000) violates the strict-greater assertion and
causes a deterministic RED failure -- the expected state before the GREEN fix.

Two new test helpers accompany the test: SimulateBootAtTime accepts an explicit
ulong now parameter (pre-GREEN, no counter) and SimulateBootMonotonicAtTime
applies Math.Max(prev+1, now) (the target GREEN algorithm). The latter is not
used in this commit; it is called by the updated test in the GREEN commit.

The existing collision-documentation test (Test 4) is left unchanged in this RED
commit. The GREEN commit will rename and invert it to assert the new strict-
monotonic contract (no collision allowed), matching the behaviour of the
Math.Max counter-floor implementation.
… floor (GREEN)

The SysML XMI MTConnectSysMLModel_V2.7.xml line 15608 requires: "instanceId MUST
be changed to a different unique number each time the buffer is cleared." The
previous UnixDateTime.Now-only assignment satisfied XSD xs:minInclusive value='1'
but could produce identical values on two consecutive restarts within the same
tick window, violating the XMI MUST-clause.

The replacement algorithm Math.Max(agentInformation.InstanceId + 1, UnixDateTime.Now)
achieves strict monotonicity with the following invariants:
- First boot: InstanceId from the just-constructed object is 0 (or UnixDateTime.Now
  per the parameterless ctor); 0+1=1, Max returns UnixDateTime.Now (wall clock wins).
- Same-second restart: persisted prev == UnixDateTime.Now at the time of second boot;
  prev+1 > now, so Max returns prev+1 -- strictly greater than the previous value.
- Later restart: UnixDateTime.Now > prev+1 in all typical cases; Max returns the
  wall clock -- fresh, time-meaningful, and naturally monotonic.
- Overflow analysis: UnixDateTime.Now is ~6.4e17 ticks (2024-era); UInt64.MaxValue
  is ~1.8e19; headroom exceeds 10,000 years at one restart per tick.

The test InstanceId_two_consecutive_resets_in_same_second_must_be_strictly_monotonic
is updated from SimulateBootAtTime (no counter, RED) to SimulateBootMonotonicAtTime
(Math.Max counter floor, GREEN): the second boot now reads prevStatePath to obtain
prev, applies Math.Max(prev+1, fixedNow), and produces fixedNow+1 -- strictly
greater than fixedNow written by the first boot.

The former collision-documentation test
InstanceId_two_consecutive_resets_in_same_second_collide_under_unix_second_resolution
is renamed to
InstanceId_two_consecutive_resets_in_same_second_must_be_strictly_monotonic_under_counter_floor
and its assertion is inverted from "documents the collision" to "asserts no
collision", now using SimulateBootMonotonicAtTime with a second-floor-aligned
fixed timestamp. The deferred-fix note in the old test body is removed.
@ottobolyos ottobolyos force-pushed the fix/agent-instanceid-persist-nonzero branch from a334976 to 0ec22ea Compare May 27, 2026 17:40
@ottobolyos ottobolyos marked this pull request as ready for review May 27, 2026 18:05
@PatrickRitchie PatrickRitchie moved this from In Progress to Ready to Merge in MTConnect.NET-Development May 27, 2026
@PatrickRitchie PatrickRitchie merged commit f3200b2 into TrakHound:master May 27, 2026
3 checks passed
@github-project-automation github-project-automation Bot moved this from Ready to Merge to Done in MTConnect.NET-Development May 27, 2026
@ottobolyos ottobolyos deleted the fix/agent-instanceid-persist-nonzero branch May 27, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

2 participants