Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pulsar-broker] add pending read subscription metrics to stats-internal #9788

Merged
merged 1 commit into from
Mar 8, 2021

Conversation

rdhabalia
Copy link
Contributor

Motivation

We frequently see consumer gets stuck and broker is not dispatching messages though it should. We need additional pending-read metrics of the topic for better debugging. eg: when subscription is stuck, we want to know pendingRead and pendingReplayRead for better debugging.
for now, we have to validate from heapdump.

eg:
below example is stuck subscription which doesn't show subscription pending read in metrics
stats

subscriptions" : {
    "stuckSub" : {
      "msgRateOut" : 0.0,
      "msgThroughputOut" : 0.0,
      "msgRateRedeliver" : 0.0,
      "msgBacklog" : 3658305,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "msgDelayed" : 0,
      "unackedMessages" : 1341,
      "type" : "Shared",
      "msgRateExpired" : 1894,
      "consumers" : [ {
        "msgRateOut" : 0.0,
        "msgThroughputOut" : 0.0,
        "msgRateRedeliver" : 0.0,
        "consumerName" : "c105b",
        "availablePermits" : 915,
        "unackedMessages" : 0,
        "blockedConsumerOnUnackedMsgs" : false,
        "metadata" : { },
        "connectedSince" : "2021-01-24T17:58:57.134185Z",
        "address" : "/1.1.1.1:4444"
      }, {

stats-internal

"cursors" : {
    "stuckSub" : {
      "markDeletePosition" : "1111111:101782",
      "readPosition" : "1111111:101783",
      "waitingReadOp" : false,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 168673209,
      "cursorLedger" : 1111112,
      "cursorLedgerLastEntry" : 61,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2021-01-25T13:39:44.82Z",
      "state" : "Open",
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "properties" : { }
    }

image

@rdhabalia rdhabalia added this to the 2.8.0 milestone Mar 4, 2021
@rdhabalia rdhabalia self-assigned this Mar 4, 2021
@merlimat merlimat merged commit 926bb69 into apache:master Mar 8, 2021
codelipenghui pushed a commit that referenced this pull request Mar 10, 2021
…on (#9789)

### Motivation
We have been frequently seeing issue where subscription gets stuck on different topics and broker is not dispatching messages though consumer has available-permits and no pending reads (example #9788). It can happen due to regression bug or unknown issue when expiry runs.. one of the workarounds is manually unload the topic and reload it which is not feasible if this happens frequently to many topics. Or broker should have the capability to discover such stuck subscriptions and unblock them.
Below example shows that:
subscription has available-permit>0, there is no pending reads, cursor's read-position is not moving forward and that builds the backlog until we unload the topic. It happens frequently due to unknown reason:
```
STATS-INTERNAL:
"sub1" : {
      "markDeletePosition" : "11111111:15520",
      "readPosition" : "11111111:15521",
      "waitingReadOp" : false,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 115521,
      "cursorLedger" : 585099247,
      "cursorLedgerLastEntry" : 597,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2021-02-25T19:55:50.357Z",
      "state" : "Open",
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,

STATS:
"sub1" : {
      "msgRateOut" : 0.0,
      "msgThroughputOut" : 0.0,
      "msgRateRedeliver" : 0.0,
      "msgBacklog" : 30350,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "msgDelayed" : 0,
      "unackedMessages" : 0,
      "type" : "Shared",
      "msgRateExpired" : 0.0,
      "consumers" : [ {
        "msgRateOut" : 0.0,
        "msgThroughputOut" : 0.0,
        "msgRateRedeliver" : 0.0,
        "consumerName" : "C1",
        "availablePermits" : 723,
        "unackedMessages" : 0,
        "blockedConsumerOnUnackedMsgs" : false,
        "metadata" : { },
        "connectedSince" : "2021-02-25T19:55:50.358285Z",

```

![image](https://user-images.githubusercontent.com/2898254/109894631-ab62d980-7c42-11eb-8dcc-a1a5f4f5d14e.png)


### Modification
Add capability in broker to periodically check if subscription is stuck and unblock it if needed. This check is controlled by flag and for initial release it can be disabled by default (and we can enable by default in later release)


### Result
It helps broker to handle stuck subscription and logs the message for later debugging.
eolivelli pushed a commit that referenced this pull request May 13, 2021
…on (#9789)

We have been frequently seeing issue where subscription gets stuck on different topics and broker is not dispatching messages though consumer has available-permits and no pending reads (example #9788). It can happen due to regression bug or unknown issue when expiry runs.. one of the workarounds is manually unload the topic and reload it which is not feasible if this happens frequently to many topics. Or broker should have the capability to discover such stuck subscriptions and unblock them.
Below example shows that:
subscription has available-permit>0, there is no pending reads, cursor's read-position is not moving forward and that builds the backlog until we unload the topic. It happens frequently due to unknown reason:
```
STATS-INTERNAL:
"sub1" : {
      "markDeletePosition" : "11111111:15520",
      "readPosition" : "11111111:15521",
      "waitingReadOp" : false,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 115521,
      "cursorLedger" : 585099247,
      "cursorLedgerLastEntry" : 597,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2021-02-25T19:55:50.357Z",
      "state" : "Open",
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,

STATS:
"sub1" : {
      "msgRateOut" : 0.0,
      "msgThroughputOut" : 0.0,
      "msgRateRedeliver" : 0.0,
      "msgBacklog" : 30350,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "msgDelayed" : 0,
      "unackedMessages" : 0,
      "type" : "Shared",
      "msgRateExpired" : 0.0,
      "consumers" : [ {
        "msgRateOut" : 0.0,
        "msgThroughputOut" : 0.0,
        "msgRateRedeliver" : 0.0,
        "consumerName" : "C1",
        "availablePermits" : 723,
        "unackedMessages" : 0,
        "blockedConsumerOnUnackedMsgs" : false,
        "metadata" : { },
        "connectedSince" : "2021-02-25T19:55:50.358285Z",

```

![image](https://user-images.githubusercontent.com/2898254/109894631-ab62d980-7c42-11eb-8dcc-a1a5f4f5d14e.png)

Add capability in broker to periodically check if subscription is stuck and unblock it if needed. This check is controlled by flag and for initial release it can be disabled by default (and we can enable by default in later release)

It helps broker to handle stuck subscription and logs the message for later debugging.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants