Skip to content

Conversation

@Sxnan
Copy link
Contributor

@Sxnan Sxnan commented Jan 31, 2026

Linked issue: #xxx

This PR fixes issues with the Kafka ActionStateStore that prevented actions from being correctly skipped after checkpoint recovery:

  1. Fix ActionState deserialization failure: Removed @JsonIgnore from PythonEvent.getEvent() so that Python event bytes are persisted in ActionState. Without this, recovered ActionState had null event bytes, causing TypeError: a bytes-like object is required, not 'NoneType' after recovery.

  2. Fix ActionStateStore not initialized during state recovery: Moved ActionStateStore initialization to also run in initializeState(), ensuring it's available when rebuilding state from recovery markers.

  3. Fix state key mismatch after recovery: Changed Python Event.id from random UUID to deterministic content-based UUID (MD5 hash). This ensures the same event produces the same ActionState key across restarts, enabling proper state lookup and divergence detection.

  4. Fix Kafka consumer partition assignment: Use consumer.assign() instead of subscribe() for explicit partition control during state rebuild.

Tests

Manually verified with local Kafka instance. E2E tests will be added in a follow-up PR.

API

No public API changes.

Documentation

  • doc-needed
  • doc-not-needed
  • doc-included

@github-actions github-actions bot added priority/major Default priority of the PR or issue. fixVersion/0.2.0 The feature or bug should be implemented/fixed in the 0.2.0 version. doc-not-needed Your PR changes do not impact docs labels Jan 31, 2026
// Byte array should not be in the log
assertThat(eventNode.has("event")).isFalse();
// Byte array should be serialized for ActionState persistence
assertThat(eventNode.has("event")).isTrue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case should not be changed. It verified the serialization for event logs, rather than for action states. It's expected not changed. This is the cause of the CI failures.

Copy link
Contributor

@xintongsong xintongsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one minor issue, which I've fixed with an extra commit. Other than that, LGTM.

@xintongsong xintongsong merged commit 4bd4379 into apache:main Feb 1, 2026
20 checks passed
xintongsong pushed a commit that referenced this pull request Feb 1, 2026
- Fix ActionState deserialization failure: Removed @JsonIgnore from PythonEvent.getEvent() so that Python event bytes are persisted in ActionState. Without this, recovered ActionState had null event bytes, causing TypeError: a bytes-like object is required, not 'NoneType' after recovery.
- Fix ActionStateStore not initialized during state recovery: Moved ActionStateStore initialization to also run in initializeState(), ensuring it's available when rebuilding state from recovery markers.
- Fix state key mismatch after recovery: Changed Python Event.id from random UUID to deterministic content-based UUID (MD5 hash). This ensures the same event produces the same ActionState key across restarts, enabling proper state lookup and divergence detection.
- Fix Kafka consumer partition assignment: Use consumer.assign() instead of subscribe() for explicit partition control during state rebuild.
@xintongsong
Copy link
Contributor

Ported to release-0.2 in 9ada100

@Sxnan Sxnan deleted the fix-action-state branch February 1, 2026 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs fixVersion/0.2.0 The feature or bug should be implemented/fixed in the 0.2.0 version. priority/major Default priority of the PR or issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants