Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a bug where topic leader is not refreshed in the same metadata call even if the broker is present. #4315

Merged
merged 2 commits into from
Jun 16, 2023

Conversation

pranavrth
Copy link
Member

@pranavrth pranavrth commented Jun 12, 2023

@pranavrth pranavrth force-pushed the dev_fix-first-produce-taking-1s branch 3 times, most recently from 7d64754 to 264cd86 Compare June 14, 2023 20:47
@pranavrth pranavrth marked this pull request as ready for review June 14, 2023 20:52
@pranavrth pranavrth force-pushed the dev_fix-first-produce-taking-1s branch 3 times, most recently from 7493e67 to b1450e8 Compare June 15, 2023 07:35
Copy link
Collaborator

@emasab emasab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch Pranav!
Just some small changes to the test and then it's ready.

tests/0125-immediate_flush.c Show resolved Hide resolved
tests/0125-immediate_flush.c Outdated Show resolved Hide resolved
qsort(mdi->topics[i].partitions, md->topics[i].partition_cnt,
sizeof(*mdi->topics[i].partitions),
rd_kafka_metadata_partition_internal_cmp);

if (compute_racks)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done this change but I think this should be part of the first topic loop itself as tbuf is getting populated in that loop only. @milindl - Can you confirm this part once?

Copy link
Collaborator

@emasab emasab Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pranavrth It seems it works in both places because it doesn't use rd_kafka_broker_find_by_nodeid that was giving problems, putting it there means we do it also for blacklisted topics. Fair, it's not using it but has the same guarantees that other topics in the metadata_internal struct.

Copy link
Collaborator

@emasab emasab Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember to add the changelog entry too, with the bug description

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now to me, with blacklisted topics being ignored

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we chose previous way for performance we should also avoid sorting in the first loop. I see blacklisted topics can be much because they're regular expressions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at #1834 - it seems like topic blacklist already causes some secondary effects like the topic not being assigned to other clients if the leader has them in its blacklist. So computation of racks being below the blacklist if-clause should be okay.
But it's weird, that we are returning the blacklisted topics in metadata pointer even though we say that "Topic blacklist, a comma-separated list of regular expressions for matching topic names that should be ignored in broker metadata information as if the topics did not exist. "

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the whole, I think moving both the sorts to just before the compute racks if clause is the best way, preserves how things were for pre-2.2.0 and also has no performance impact. What do you think, @emasab , @pranavrth ?

Copy link
Collaborator

@emasab emasab Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean everything starting from the blacklist condition in the second loop? It's ok for me

Copy link
Contributor

@milindl milindl Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly what I meant, thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR. Moved sorting in second loop as well.

tests/test.c Outdated Show resolved Hide resolved
@pranavrth pranavrth force-pushed the dev_fix-first-produce-taking-1s branch from 92dd3cb to d207996 Compare June 15, 2023 13:24
Copy link
Contributor

@milindl milindl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after the discussion yesterday

@pranavrth pranavrth merged commit 25da531 into master Jun 16, 2023
1 check passed
@pranavrth pranavrth deleted the dev_fix-first-produce-taking-1s branch June 16, 2023 04:55
@r-owen
Copy link

r-owen commented Jul 27, 2023

Was this supposed to make it into librdkafka 2.2.0? Because I am still seeing the same issue there.

@jzvelc
Copy link

jzvelc commented Oct 20, 2023

Any news on this one? I have the same problem with librdkafka 2.2.0 in python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

First call to Producer.flush takes 1 second for each topic
5 participants