Skip to content

Conversation

adutra
Copy link
Contributor

@adutra adutra commented Sep 30, 2025

Summary of changes:

  1. Extracted an interface from RequestIdGenerator.
  2. The generateRequestId method now returns a Uni<String> in case custom implementations need to perform I/O or other blocking calls during request ID generation.
  3. Also addressed comments in Generate Request IDs (if not specified); Return Request ID as a Header #2602.

Summary of changes:

1. Extracted an interface from `RequestIdGenerator`.
2. The `generateRequestId` method now returns a `Uni<String>` in case custom implementations need to perform I/O or other blocking calls during request ID generation.
3. Also addressed comments in apache#2602.
@adutra adutra force-pushed the request-id-generator-interface branch from 8f5f2b6 to 6e58c97 Compare September 30, 2025 13:18
@adutra
Copy link
Contributor Author

adutra commented Sep 30, 2025

cc @adnanhemani

: requestIdGenerator.generateRequestId(rc))
.onItem()
.invoke(id -> rc.setProperty(REQUEST_ID_KEY, id))
.invoke(id -> ContextLocals.put(REQUEST_ID_KEY, id))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as we do in RealmContextFilter: this places the request ID in the current Vertx context for later retrieval by another event loop thread:

https://quarkus.io/guides/duplicated-context#context-local-data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, ContextLocals seems a global container to hold anything could be used within one request context. Another option is to use a explicit RequestScoped bean to store these, like we did in the call context bean. In that case, it's a more typed solution. How would you compare both approaches?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a slightly orthogonal direction: I don't see us using the variables we've already stored in ContextLocals anywhere. So while I see that we are putting the information in RealmContextFilter, what is our (proposed?) use of this?

Another thought (I'm not sure if this actually answers the question) - if we were to use the request ID later downstream in the application (for example, Event Listeners), would it be better to use ContextLocals or ContainerRequestContext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you compare both approaches?

Indeed, a request-scoped bean is undoubtedly better.

So while I see that we are putting the information in RealmContextFilter, what is our (proposed?) use of this?

This was done a while ago, as I thought at that time that we would be leveraging context propagation to "transfer" some information from the initial request scope to, say, the async tasks framework.

It seems though that, since that time, we've been taking the opposite direction: that of avoiding context propagation (due to some issues with bean proxies, etc.)

The situation today is that those calls to ContextLocals are probably not necessary anymore. They can be removed.

Are we OK if I remove the call to ContextLocals in RequestIdFilter in this PR, then remove the other call in RealmContextFilter in a different PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: #2747 (removed ContextLocals and seized the opportunity for some minor cleanup).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATE: I actually remember now 😄

ContextLocals is required in this specific place:

context.requestContextLocalData(RealmContextFilter.REALM_CONTEXT_KEY);

I even remember asking the Quarkus devs for this feature:

quarkusio/quarkus#47887

So, TLDR: we need to keep ContextLocals in RealmContextFilter. As for RequestIdFilter we can remove it, at least for now.

Comment on lines 33 to 34
* Generates a new request ID. IDs must be fast to generate and unique. If the generation involves
* I/O (which is not recommended), it should be performed asynchronously.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove these comments, if I/O involving is not recommended?

Suggested change
* Generates a new request ID. IDs must be fast to generate and unique. If the generation involves
* I/O (which is not recommended), it should be performed asynchronously.
* Generates a new request ID. IDs must be fast to generate and unique.

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adutra for the PR — it looks good to me overall!
Would you mind elaborating a bit on the motivation behind it?
In what situation would we need a custom request ID generator?

@adutra
Copy link
Contributor Author

adutra commented Oct 1, 2025

Would you mind elaborating a bit on the motivation behind it?

I'm trying to avoid what happened with the RealmContextFilter a while ago: #1345.

IOW, we want to make sure that whatever technique is used to generate a request ID, we won't see Quarkus complaining about it being blocking.

In what situation would we need a custom request ID generator?

I could see e.g. users storing their request IDs in the database, or using Zookeeper's distributed atomic counter.

dimas-b
dimas-b previously approved these changes Oct 1, 2025
@Override
@Nonnull
public String toString() {
return String.format("%s_%019d", uuid(), counter());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if lexicographic ordering is desired (based on previous conversations), it might be worth using something like <MILLIS_SINCE_EPOCH_PADDED>_<COUNTER_PADDED>_<UUID> WDYT?

Copy link
Contributor

@dimas-b dimas-b Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or <MILLIS_PADDED>_<SMALL_COUNTER>_<UUID> - SMALL_COUNTER meaning something in the range of 10K to avoid calling the system clock too often.

but this might be getting too much into the monotonic clock area 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure lexicographic sort order is desirable for anything else than making it easier for humans to visualize and compare request IDs – so any of your suggestions is fine 😄
How about we get this in without changing the generation logic, and then, we look into ways of making the logic better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think it's fine if the requests are only lexicographically sorted across all requests made per node. IMO request ID sorting doesn't mean much for API clients as they also usually log timestamps.

I'm okay to change it, if you both feel there's something better - but just my two cents 😃

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Oct 1, 2025
@adnanhemani
Copy link
Contributor

In what situation would we need a custom request ID generator?

I could see e.g. users storing their request IDs in the database, or using Zookeeper's distributed atomic counter.

To be clear, I am mostly fine with the changes and asking mainly to learn here - isn't the expected workflow that, in the case users want to store their request IDs in a database prior to the API call, that they would store their request IDs on client-side and pass it into Polaris via the request headers? Can you please explain further in what use case the server-side would actually want to store (and then use) pre-computed request IDs?

I think ZK's distributed atomic counter would be a good use case where making request ID asynchronous would help, although I'm sure most developers nowadays would prefer not to use ZK for a use case like creating request IDs unless they have no other option :)

@adutra
Copy link
Contributor Author

adutra commented Oct 2, 2025

FYI. One thing that annoys me a bit but I didn't want to change it without prior discussion:

The classes RequestIdFilter, RequestIdGenerator and RequestIdResponseFilter are all in the org.apache.polaris.service.tracing package – but the configuration they use is declared in the org.apache.polaris.service.logging package and the configuration option is prefixed with polaris.log.

I find this setup a little confusing, especially for developers.

I can provide a "fix" for that later if you all agree.

adutra added a commit to adutra/polaris that referenced this pull request Oct 2, 2025
Summary of changes:

1. Remove the call to ContextLocals (context: apache#2720 (comment)).
2. Don't include the exception's message in the response as it can leak details about Polaris internals.
3. Add a small test for success and failure cases.
Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adutra !

@adutra adutra merged commit d2a607a into apache:main Oct 3, 2025
16 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Oct 3, 2025
@adutra adutra deleted the request-id-generator-interface branch October 3, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants