-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add subscription backlog size info for topicstats. #9302
Conversation
doc = "Enable expose the backlog size fro each subscription when generating stats.\n" + | ||
" Locking is used for fetching the status so default to false." | ||
) | ||
private boolean exposeSubscriptionBacklokSizeInPrometheus = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private boolean exposeSubscriptionBacklokSizeInPrometheus = false; | |
private boolean exposeSubscriptionBacklogSizeInPrometheus = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to introduce a new configuration for this metric? If users enabled the exposeTopicLevelMetricsInPrometheus
, this metric should be exposed? If it will bring a lot of expenses, we should consider adding a new configuration such as exposePreciseBacklogInPrometheus
. What do you think @zymap @MarvinCai?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codelipenghui
I think it is actually quiet expensive, it's using existing estimateBacklogFromPosition(PositionImpl pos) to get the backlog size, which requires locking the whole MLedger object. And this will happen for every subscription, periodic metric generation with this subscription backlog size could affect performance and could potentially compete with the ledger trimming task, so I think we should add a flag for it and disable by default.
@@ -764,10 +766,11 @@ default void delete(String topic, boolean force) throws PulsarAdminException { | |||
* @throws PulsarAdminException | |||
* Unexpected error | |||
*/ | |||
TopicStats getStats(String topic, boolean getPreciseBacklog) throws PulsarAdminException; | |||
TopicStats getStats(String topic, boolean getPreciseBacklog, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't change the original API, that's will bring the compatibility issue with the PulsarAdmin. Let's create a new API for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zymap
From what I've seen, this is only used by Admin CLI (CmdTopics.java), which provides default value . for all needed parameters, e.g. getPreciseBacklog and subscriptionBacklogSize. And there's another default implementation of TopicStats getStats(String topic) which is used in some other places, it also has default value provided for getPreciseBacklog and subscriptionBacklogSize.
So it shouldn't create compatibility issue for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking this is a user-faced API, user can call it from the PulsarAdmin
. This change might be breaking their code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zymap added a default default TopicStats getStats(String topic, boolean getPreciseBacklog)
to make it backward compatible. And existing customer rely on this API won't be affected.
@zymap can you take another look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me.
/pulsarbot run-failure-checks |
2 similar comments
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
@MarvinCai Can you rebase it? |
/pulsarbot run-failure-checks |
3 similar comments
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
2 similar comments
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
3 similar comments
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
@sijie done, rebased upon newer version master and resolved conflict. |
@MarvinCai There has a new conflicting file need to resolve... |
/pulsarbot run-failure-checks |
3 similar comments
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
There has something wrong with the SQL tests and @gaoran10 is working on it. |
/pulsarbot run-failure-checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java
Outdated
Show resolved
Hide resolved
/pulsarbot run-failure-checks |
@Override | ||
void run() throws Exception { | ||
String topic = validateTopicName(params); | ||
print(getTopics().getPartitionedStats(topic, perPartition, getPreciseBacklog)); | ||
print(topics.getPartitionedStats(topic, perPartition, getPreciseBacklog, subscriptionBacklogSize)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarvinCai it seems that getTopics()
was changed to topics
here unintentionally. Would you mind fixing this?
Fixes #9254 Add ability to fetch backlog size for subscription, add flag in topic-stats partitioned-topic-stats for getting backlog size for subscriptions. Sample output ``` ./pulsar-admin topics partitioned-stats zxc-t/zxc-ns/zxc-p -gpb -sbs { "msgRateIn" : 0.0, "msgThroughputIn" : 0.0, "msgRateOut" : 0.0, "msgThroughputOut" : 0.0, "bytesInCounter" : 0, "msgInCounter" : 0, "bytesOutCounter" : 0, "msgOutCounter" : 0, "averageMsgSize" : 0.0, "msgChunkPublished" : false, "storageSize" : 875, "backlogSize" : 585, "publishers" : [ ], "waitingPublishers" : 0, "subscriptions" : { "zxc-sub-1" : { "msgRateOut" : 0.0, "msgThroughputOut" : 0.0, "bytesOutCounter" : 0, "msgOutCounter" : 0, "msgRateRedeliver" : 0.0, "chuckedMessageRate" : 0, "msgBacklog" : 10, "backlogSize" : 585, "msgBacklogNoDelayed" : 10, "blockedSubscriptionOnUnackedMsgs" : false, "msgDelayed" : 0, "unackedMessages" : 0, "msgRateExpired" : 0.0, "totalMsgExpired" : 0, "lastExpireTimestamp" : 0, "lastConsumedFlowTimestamp" : 0, "lastConsumedTimestamp" : 0, "lastAckedTimestamp" : 0, "lastMarkDeleteAdvancedTimestamp" : 0, "consumers" : [ ], "isDurable" : true, "isReplicated" : false, "consumersAfterMarkDeletePosition" : { }, "nonContiguousDeletedMessagesRanges" : 0, "nonContiguousDeletedMessagesRangesSerializedSize" : 0 }, "zxc-sub-2" : { "msgRateOut" : 0.0, "msgThroughputOut" : 0.0, "bytesOutCounter" : 0, "msgOutCounter" : 0, "msgRateRedeliver" : 0.0, "chuckedMessageRate" : 0, "msgBacklog" : 5, "backlogSize" : 295, "msgBacklogNoDelayed" : 5, "blockedSubscriptionOnUnackedMsgs" : false, "msgDelayed" : 0, "unackedMessages" : 0, "msgRateExpired" : 0.0, "totalMsgExpired" : 0, "lastExpireTimestamp" : 0, "lastConsumedFlowTimestamp" : 0, "lastConsumedTimestamp" : 0, "lastAckedTimestamp" : 0, "lastMarkDeleteAdvancedTimestamp" : 0, "consumers" : [ ], "isDurable" : true, "isReplicated" : false, "consumersAfterMarkDeletePosition" : { }, "nonContiguousDeletedMessagesRanges" : 0, "nonContiguousDeletedMessagesRangesSerializedSize" : 0 } }, "replication" : { }, "nonContiguousDeletedMessagesRanges" : 0, "nonContiguousDeletedMessagesRangesSerializedSize" : 0, "metadata" : { "partitions" : 5 }, "partitions" : { } } ``` In ManagedLedgerImpl add API to get backlog size starting from specific position. In Admin Rest API and CLI add option to get subscription backlog. This change added tests and can be verified as follows: Added check in existing test to verify backlog size. (cherry picked from commit ab94743)
This version update is convenient for tests in real environment since there's no binary download url for original pulsar `2.8.0-rc-202101252233`. This PR fixes the API incompatibility problems that are introduced by apache/pulsar#9397 and apache/pulsar#9302. Another significant change between these two versions is apache/pulsar#9338, which introduced metadata-store API for cluster resources. This PR fixed the test failure caused by it as well. Since KoP `tests` module only uses one `MockZooKeeper` to manage z-nodes, see `KopProtocolHandlerTestBase#createMockZooKeeper`, the mocked `createConfigurationMetadataStore` method returns `mockedZooKeeper` here instead of a `mockedZooKeeperGlobal` like what Pulsar did in `MockedPulsarServiceBaseTest`. Besides, there's a test bug in `testBrokerHandleTopicMetadataRequest` that was not exposed by the previous Pulsar. This PR fixes it.
Fixes #9254
Motivation
Add ability to fetch backlog size for subscription, add flag in topic-stats partitioned-topic-stats for getting backlog size for subscriptions.
Sample output
Modifications
In ManagedLedgerImpl add API to get backlog size starting from specific position.
In Admin Rest API and CLI add option to get subscription backlog.
Verifying this change
This change added tests and can be verified as follows:
Added check in existing test to verify backlog size.
Does this pull request potentially affect one of the following parts:
Documentation