Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add subscription backlog size info for topicstats. #9302

Merged
merged 11 commits into from
Feb 8, 2021

Conversation

MarvinCai
Copy link
Contributor

@MarvinCai MarvinCai commented Jan 25, 2021

Fixes #9254

Motivation

Add ability to fetch backlog size for subscription, add flag in topic-stats partitioned-topic-stats for getting backlog size for subscriptions.
Sample output

./pulsar-admin topics partitioned-stats zxc-t/zxc-ns/zxc-p -gpb -sbs
{
  "msgRateIn" : 0.0,
  "msgThroughputIn" : 0.0,
  "msgRateOut" : 0.0,
  "msgThroughputOut" : 0.0,
  "bytesInCounter" : 0,
  "msgInCounter" : 0,
  "bytesOutCounter" : 0,
  "msgOutCounter" : 0,
  "averageMsgSize" : 0.0,
  "msgChunkPublished" : false,
  "storageSize" : 875,
  "backlogSize" : 585,
  "publishers" : [ ],
  "waitingPublishers" : 0,
  "subscriptions" : {
    "zxc-sub-1" : {
      "msgRateOut" : 0.0,
      "msgThroughputOut" : 0.0,
      "bytesOutCounter" : 0,
      "msgOutCounter" : 0,
      "msgRateRedeliver" : 0.0,
      "chuckedMessageRate" : 0,
      "msgBacklog" : 10,
      "backlogSize" : 585,
      "msgBacklogNoDelayed" : 10,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "msgDelayed" : 0,
      "unackedMessages" : 0,
      "msgRateExpired" : 0.0,
      "totalMsgExpired" : 0,
      "lastExpireTimestamp" : 0,
      "lastConsumedFlowTimestamp" : 0,
      "lastConsumedTimestamp" : 0,
      "lastAckedTimestamp" : 0,
      "lastMarkDeleteAdvancedTimestamp" : 0,
      "consumers" : [ ],
      "isDurable" : true,
      "isReplicated" : false,
      "consumersAfterMarkDeletePosition" : { },
      "nonContiguousDeletedMessagesRanges" : 0,
      "nonContiguousDeletedMessagesRangesSerializedSize" : 0
    },
    "zxc-sub-2" : {
      "msgRateOut" : 0.0,
      "msgThroughputOut" : 0.0,
      "bytesOutCounter" : 0,
      "msgOutCounter" : 0,
      "msgRateRedeliver" : 0.0,
      "chuckedMessageRate" : 0,
      "msgBacklog" : 5,
      "backlogSize" : 295,
      "msgBacklogNoDelayed" : 5,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "msgDelayed" : 0,
      "unackedMessages" : 0,
      "msgRateExpired" : 0.0,
      "totalMsgExpired" : 0,
      "lastExpireTimestamp" : 0,
      "lastConsumedFlowTimestamp" : 0,
      "lastConsumedTimestamp" : 0,
      "lastAckedTimestamp" : 0,
      "lastMarkDeleteAdvancedTimestamp" : 0,
      "consumers" : [ ],
      "isDurable" : true,
      "isReplicated" : false,
      "consumersAfterMarkDeletePosition" : { },
      "nonContiguousDeletedMessagesRanges" : 0,
      "nonContiguousDeletedMessagesRangesSerializedSize" : 0
    }
  },
  "replication" : { },
  "nonContiguousDeletedMessagesRanges" : 0,
  "nonContiguousDeletedMessagesRangesSerializedSize" : 0,
  "metadata" : {
    "partitions" : 5
  },
  "partitions" : { }
}

Modifications

In ManagedLedgerImpl add API to get backlog size starting from specific position.
In Admin Rest API and CLI add option to get subscription backlog.

Verifying this change

  • Make sure that the change passes the CI checks.
    This change added tests and can be verified as follows:
    Added check in existing test to verify backlog size.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API: yes (added option to get subscription's backlog size when getting topic-stats/partitioned-topic-stats)
  • The schema: no
  • The default values of configurations: no
  • The wire protocol: no
  • The rest endpoints: no
  • The admin cli options: yes (added option to get subscription's backlog size when getting topic-stats/partitioned-topic-stats)
  • Anything that affects deployment: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? docs / JavaDocs

doc = "Enable expose the backlog size fro each subscription when generating stats.\n" +
" Locking is used for fetching the status so default to false."
)
private boolean exposeSubscriptionBacklokSizeInPrometheus = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private boolean exposeSubscriptionBacklokSizeInPrometheus = false;
private boolean exposeSubscriptionBacklogSizeInPrometheus = false;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to introduce a new configuration for this metric? If users enabled the exposeTopicLevelMetricsInPrometheus, this metric should be exposed? If it will bring a lot of expenses, we should consider adding a new configuration such as exposePreciseBacklogInPrometheus. What do you think @zymap @MarvinCai?

Copy link
Contributor Author

@MarvinCai MarvinCai Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codelipenghui
I think it is actually quiet expensive, it's using existing estimateBacklogFromPosition(PositionImpl pos) to get the backlog size, which requires locking the whole MLedger object. And this will happen for every subscription, periodic metric generation with this subscription backlog size could affect performance and could potentially compete with the ledger trimming task, so I think we should add a flag for it and disable by default.

@@ -764,10 +766,11 @@ default void delete(String topic, boolean force) throws PulsarAdminException {
* @throws PulsarAdminException
* Unexpected error
*/
TopicStats getStats(String topic, boolean getPreciseBacklog) throws PulsarAdminException;
TopicStats getStats(String topic, boolean getPreciseBacklog,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't change the original API, that's will bring the compatibility issue with the PulsarAdmin. Let's create a new API for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zymap
From what I've seen, this is only used by Admin CLI (CmdTopics.java), which provides default value . for all needed parameters, e.g. getPreciseBacklog and subscriptionBacklogSize. And there's another default implementation of TopicStats getStats(String topic) which is used in some other places, it also has default value provided for getPreciseBacklog and subscriptionBacklogSize.
So it shouldn't create compatibility issue for users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking this is a user-faced API, user can call it from the PulsarAdmin. This change might be breaking their code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, I'll fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zymap added a default default TopicStats getStats(String topic, boolean getPreciseBacklog) to make it backward compatible. And existing customer rely on this API won't be affected.

@zymap zymap added this to the 2.8.0 milestone Jan 25, 2021
@MarvinCai
Copy link
Contributor Author

@zymap can you take another look

Copy link
Member

@zymap zymap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

2 similar comments
@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@sijie
Copy link
Member

sijie commented Jan 30, 2021

@MarvinCai Can you rebase it?

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

3 similar comments
@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

2 similar comments
@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

3 similar comments
@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@MarvinCai
Copy link
Contributor Author

@sijie done, rebased upon newer version master and resolved conflict.

@zymap
Copy link
Member

zymap commented Feb 2, 2021

@MarvinCai There has a new conflicting file need to resolve...

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

3 similar comments
@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@zymap
Copy link
Member

zymap commented Feb 4, 2021

There has something wrong with the SQL tests and @gaoran10 is working on it.

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

Copy link
Contributor

@gaoran10 gaoran10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@sijie sijie merged commit ab94743 into apache:master Feb 8, 2021
@Override
void run() throws Exception {
String topic = validateTopicName(params);
print(getTopics().getPartitionedStats(topic, perPartition, getPreciseBacklog));
print(topics.getPartitionedStats(topic, perPartition, getPreciseBacklog, subscriptionBacklogSize));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarvinCai it seems that getTopics() was changed to topics here unintentionally. Would you mind fixing this?

@codelipenghui codelipenghui added the cherry-picked/branch-2.7 Archived: 2.7 is end of life label Feb 18, 2021
codelipenghui pushed a commit that referenced this pull request Feb 18, 2021
Fixes #9254

Add ability to fetch backlog size for subscription, add flag in topic-stats partitioned-topic-stats for getting backlog size for subscriptions.
Sample output
```
./pulsar-admin topics partitioned-stats zxc-t/zxc-ns/zxc-p -gpb -sbs
{
  "msgRateIn" : 0.0,
  "msgThroughputIn" : 0.0,
  "msgRateOut" : 0.0,
  "msgThroughputOut" : 0.0,
  "bytesInCounter" : 0,
  "msgInCounter" : 0,
  "bytesOutCounter" : 0,
  "msgOutCounter" : 0,
  "averageMsgSize" : 0.0,
  "msgChunkPublished" : false,
  "storageSize" : 875,
  "backlogSize" : 585,
  "publishers" : [ ],
  "waitingPublishers" : 0,
  "subscriptions" : {
    "zxc-sub-1" : {
      "msgRateOut" : 0.0,
      "msgThroughputOut" : 0.0,
      "bytesOutCounter" : 0,
      "msgOutCounter" : 0,
      "msgRateRedeliver" : 0.0,
      "chuckedMessageRate" : 0,
      "msgBacklog" : 10,
      "backlogSize" : 585,
      "msgBacklogNoDelayed" : 10,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "msgDelayed" : 0,
      "unackedMessages" : 0,
      "msgRateExpired" : 0.0,
      "totalMsgExpired" : 0,
      "lastExpireTimestamp" : 0,
      "lastConsumedFlowTimestamp" : 0,
      "lastConsumedTimestamp" : 0,
      "lastAckedTimestamp" : 0,
      "lastMarkDeleteAdvancedTimestamp" : 0,
      "consumers" : [ ],
      "isDurable" : true,
      "isReplicated" : false,
      "consumersAfterMarkDeletePosition" : { },
      "nonContiguousDeletedMessagesRanges" : 0,
      "nonContiguousDeletedMessagesRangesSerializedSize" : 0
    },
    "zxc-sub-2" : {
      "msgRateOut" : 0.0,
      "msgThroughputOut" : 0.0,
      "bytesOutCounter" : 0,
      "msgOutCounter" : 0,
      "msgRateRedeliver" : 0.0,
      "chuckedMessageRate" : 0,
      "msgBacklog" : 5,
      "backlogSize" : 295,
      "msgBacklogNoDelayed" : 5,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "msgDelayed" : 0,
      "unackedMessages" : 0,
      "msgRateExpired" : 0.0,
      "totalMsgExpired" : 0,
      "lastExpireTimestamp" : 0,
      "lastConsumedFlowTimestamp" : 0,
      "lastConsumedTimestamp" : 0,
      "lastAckedTimestamp" : 0,
      "lastMarkDeleteAdvancedTimestamp" : 0,
      "consumers" : [ ],
      "isDurable" : true,
      "isReplicated" : false,
      "consumersAfterMarkDeletePosition" : { },
      "nonContiguousDeletedMessagesRanges" : 0,
      "nonContiguousDeletedMessagesRangesSerializedSize" : 0
    }
  },
  "replication" : { },
  "nonContiguousDeletedMessagesRanges" : 0,
  "nonContiguousDeletedMessagesRangesSerializedSize" : 0,
  "metadata" : {
    "partitions" : 5
  },
  "partitions" : { }
}
```

In ManagedLedgerImpl add API to get backlog size starting from specific position.
In Admin Rest API and CLI add option to get subscription backlog.

This change added tests and can be verified as follows:
Added check in existing test to verify backlog size.

(cherry picked from commit ab94743)
jiazhai pushed a commit to streamnative/kop that referenced this pull request Feb 26, 2021
This version update is convenient for tests in real environment since there's no binary download url for original pulsar `2.8.0-rc-202101252233`.

This PR fixes the API incompatibility problems that are introduced by apache/pulsar#9397 and apache/pulsar#9302.

Another significant change between these two versions is apache/pulsar#9338, which introduced metadata-store API for cluster resources. This PR fixed the test failure caused by it as well. Since KoP `tests` module only uses one `MockZooKeeper` to manage z-nodes, see `KopProtocolHandlerTestBase#createMockZooKeeper`, the mocked `createConfigurationMetadataStore` method returns `mockedZooKeeper` here instead of a `mockedZooKeeperGlobal` like what Pulsar did in `MockedPulsarServiceBaseTest`.

Besides, there's a test bug in `testBrokerHandleTopicMetadataRequest` that was not exposed by the previous Pulsar. This PR fixes it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-2.7 Archived: 2.7 is end of life release/2.7.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Size of the messages to be acknowledged at subscription level
6 participants