Skip to content

Handle correctly subscription messages#2201

Merged
palfrey merged 1 commit intoTraceMachina:mainfrom
palfrey:redis-subscription-messages
Mar 11, 2026
Merged

Handle correctly subscription messages#2201
palfrey merged 1 commit intoTraceMachina:mainfrom
palfrey:redis-subscription-messages

Conversation

@palfrey
Copy link
Member

@palfrey palfrey commented Mar 11, 2026

Description

I was seeing error messages of the following two forms

{"timestamp":"2026-03-11T12:19:00.042579Z","level":"ERROR","fields":{"message":"Expected exactly one message on subscriber_channel","push_info":"PushInfo { kind: PSubscribe, data: [bulk-string('\"scheduler_key_change\"'), int(1)] }"},"target":"nativelink_store::redis_store","span":{"name":"redis_subscribe_spawn"},"spans":[{"name":"redis_subscribe_spawn"}]}

and

{"timestamp":"2026-03-11T12:19:07.346092Z","level":"ERROR","fields":{"message":"Expected exactly one message on subscriber_channel","push_info":"PushInfo { kind: PMessage, data: [bulk-string('\"scheduler_key_change\"'), bulk-string('\"scheduler_key_change\"'), bulk-string('\"e5307a240708b532a929004354399d84fd286798a39ef37da8d8ae6f57190b7a-1048576\"')] }"},"target":"nativelink_store::redis_store","span":{"name":"redis_subscribe_spawn"},"spans":[{"name":"redis_subscribe_spawn"}]}

These were both there before #2190, but that moved them from "used only in clusters" to "used all the time", which made the problem much worse. This PR properly handles PMessage's and ignores all the other PushInfo cases as we don't really care about them. They should mostly be just PSubscribe's so we log the rest as warnings for easier debug later.

Type of change

Please delete options that aren't relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

bazel test //...

Checklist

  • Updated documentation if needed
  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

@palfrey palfrey marked this pull request as ready for review March 11, 2026 13:21
@palfrey
Copy link
Member Author

palfrey commented Mar 11, 2026

/build-image nativelink-worker-init

@palfrey
Copy link
Member Author

palfrey commented Mar 11, 2026

/build-image

@github-actions
Copy link

Image built and pushed!

ghcr.io/TraceMachina/nativelink-worker-init:1156efe

@github-actions
Copy link

Image built and pushed!

ghcr.io/TraceMachina/nativelink:1156efe

Copy link
Collaborator

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisstaite-menlo reviewed 2 files and all commit messages, and made 1 comment.
Reviewable status: 0 of 1 LGTMs obtained, and all files reviewed.


nativelink-store/src/redis_store.rs line 1366 at r1 (raw file):

                                },
                                maybe_push_info = local_subscriber_channel.next() => {
                                    if let Some(push_info) = maybe_push_info {

Did the existing code never work for cluster mode either or will this change break it in cluster?

@MarcusSorealheis
Copy link
Collaborator

@chrisstaite-menlo i think. Ever worked is the answer but @palfrey needs to be the source of truth.

@palfrey
Copy link
Member Author

palfrey commented Mar 11, 2026

Did the existing code never work for cluster mode either or will this change break it in cluster?

I suspect the former. Cluster mode hasn't been well tested - we don't use it in our cloud or in most of the integration testing. It is now tested as part of redis_store_tester, but that's limited. I'm fairly confident this new code is more correct than the previous forms, but there's probably still issues in cluster mode.

@palfrey palfrey merged commit 2ea428b into TraceMachina:main Mar 11, 2026
28 of 29 checks passed
@palfrey palfrey deleted the redis-subscription-messages branch March 11, 2026 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants