Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix transaction pending ack generate managedLedgerStats fail exception. #10797

Merged
merged 1 commit into from
Jun 3, 2021

Conversation

yangl
Copy link
Contributor

@yangl yangl commented Jun 3, 2021

Motivation

prometheus generate method should not throw runtime exception, this interrupt the prometheus data generate logic. The detail
log as follow:

10:36:17.294 [pulsar-web-43-1] INFO  org.eclipse.jetty.server.RequestLog - 10.206.28.67 - - [03/Jun/2021:10:36:17 +0800] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.26.0" 0
10:36:17.317 [prometheus-stats-44-1] ERROR org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsServlet - Failed to generate prometheus stats
io.jsonwebtoken.io.IOException: Transaction pending ack generate managedLedgerStats fail!
	at org.apache.pulsar.broker.stats.prometheus.TransactionAggregator.lambda$generate$0(TransactionAggregator.java:69) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at java.util.ArrayList.forEach(ArrayList.java:1541) ~[?:?]
	at org.apache.pulsar.broker.stats.prometheus.TransactionAggregator.lambda$generate$1(TransactionAggregator.java:61) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:387) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:159) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.broker.stats.prometheus.TransactionAggregator.lambda$generate$2(TransactionAggregator.java:59) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:387) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:159) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.broker.stats.prometheus.TransactionAggregator.lambda$generate$3(TransactionAggregator.java:58) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:387) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:159) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.broker.stats.prometheus.TransactionAggregator.generate(TransactionAggregator.java:56) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsGenerator.generate(PrometheusMetricsGenerator.java:110) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsServlet.lambda$doGet$0(PrometheusMetricsServlet.java:72) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [org.apache.pulsar-managed-ledger-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.14.1.jar:4.14.1]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
	at java.lang.Thread.run(Thread.java:834) [?:?]
10:36:17.318 [prometheus-stats-44-1] INFO  org.eclipse.jetty.server.RequestLog - 10.206.28.67 - - [03/Jun/2021:10:36:17 +0800] "GET /metrics/ HTTP/1.1" 500 0 "http://10.206.128.216:18080/metrics" "Prometheus/2.26.0" 23

Modifications

throw exception --> log.warn()

@hangc0276 hangc0276 added this to the 2.8.0 milestone Jun 3, 2021
@yangl
Copy link
Contributor Author

yangl commented Jun 3, 2021

/pulsarbot run-failure-checks

@@ -66,7 +67,7 @@ public static void generate(PulsarService pulsar, SimpleTextOutputStream stream,
generateManageLedgerStats(managedLedger,
stream, cluster, namespace, name, subscription.getName());
} catch (Exception e) {
throw new IOException("Transaction pending ack generate managedLedgerStats fail!");
log.warn("Transaction pending ack generate managedLedgerStats fail!", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you investigate the root cause of the exception ?

I am fine with committing this patch, but we should understand the cause.
It is unfortunate that we are not adding the exception as "cause" to the IOException

can you investigate more ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subscription.pendingAckHandle instanceof PendingAckHandleDisabled, not PendingAckHandleImpl

image

PersistentSubscription.java

    public CompletableFuture<ManagedLedger> getPendingAckManageLedger() {
        if (this.pendingAckHandle instanceof PendingAckHandleImpl) {
            return ((PendingAckHandleImpl) this.pendingAckHandle).getStoreManageLedger();
        } else {
            return FutureUtil.failedFuture(new NotAllowedException("Pending ack handle don't use managedLedger!"));
        }
    }

return FutureUtil.failedFuture(new NotAllowedException("Pending ack handle don't use managedLedger!"));

PersistentSubscription.java

if (topic.getBrokerService().getPulsar().getConfig().isTransactionCoordinatorEnabled()
                && !checkTopicIsEventsNames(topicName)
                && !topicName.startsWith(TopicName.TRANSACTION_COORDINATOR_ASSIGN.getLocalName())
                && !topicName.startsWith(MLTransactionLogImpl.TRANSACTION_LOG_PREFIX)
                && !topicName.endsWith(MLPendingAckStore.PENDING_ACK_STORE_SUFFIX)) {
            this.pendingAckHandle = new PendingAckHandleImpl(this);
        } else {
            this.pendingAckHandle = new PendingAckHandleDisabled();
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or add the pendingAckHandle instanceof logic, like this

PersistentSubscription sub = ((PersistentSubscription) subscription);
Field pendingAckHandleField = PersistentSubscription.class.getDeclaredField("pendingAckHandle");
pendingAckHandleField.setAccessible(true);
PendingAckHandle pendingAckHandle =
        (PendingAckHandle) pendingAckHandleField.get(sub);
if (pendingAckHandle instanceof PendingAckHandleImpl){
    generateManageLedgerStats(managedLedger,
            stream, cluster, namespace, name, subscription.getName());
}

@codelipenghui codelipenghui merged commit 7585bc7 into apache:master Jun 3, 2021
@yangl yangl deleted the prometheus_generate branch June 4, 2021 01:22
yangl added a commit to yangl/pulsar that referenced this pull request Jun 23, 2021
…n. (apache#10797)

### Motivation

prometheus generate method should not throw runtime exception, this interrupt the prometheus data generate  logic. The detail   
log as follow:
```
10:36:17.294 [pulsar-web-43-1] INFO  org.eclipse.jetty.server.RequestLog - 10.206.28.67 - - [03/Jun/2021:10:36:17 +0800] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.26.0" 0
10:36:17.317 [prometheus-stats-44-1] ERROR org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsServlet - Failed to generate prometheus stats
io.jsonwebtoken.io.IOException: Transaction pending ack generate managedLedgerStats fail!
	at org.apache.pulsar.broker.stats.prometheus.TransactionAggregator.lambda$generate$0(TransactionAggregator.java:69) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at java.util.ArrayList.forEach(ArrayList.java:1541) ~[?:?]
	at org.apache.pulsar.broker.stats.prometheus.TransactionAggregator.lambda$generate$1(TransactionAggregator.java:61) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:387) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:159) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.broker.stats.prometheus.TransactionAggregator.lambda$generate$2(TransactionAggregator.java:59) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:387) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:159) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.broker.stats.prometheus.TransactionAggregator.lambda$generate$3(TransactionAggregator.java:58) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:387) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:159) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.broker.stats.prometheus.TransactionAggregator.generate(TransactionAggregator.java:56) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsGenerator.generate(PrometheusMetricsGenerator.java:110) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsServlet.lambda$doGet$0(PrometheusMetricsServlet.java:72) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [org.apache.pulsar-managed-ledger-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.14.1.jar:4.14.1]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
	at java.lang.Thread.run(Thread.java:834) [?:?]
10:36:17.318 [prometheus-stats-44-1] INFO  org.eclipse.jetty.server.RequestLog - 10.206.28.67 - - [03/Jun/2021:10:36:17 +0800] "GET /metrics/ HTTP/1.1" 500 0 "http://10.206.128.216:18080/metrics" "Prometheus/2.26.0" 23
```

### Modifications

throw exception --> log.warn()
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this pull request Mar 18, 2022
…n. (apache#10797)

### Motivation

prometheus generate method should not throw runtime exception, this interrupt the prometheus data generate  logic. The detail   
log as follow:
```
10:36:17.294 [pulsar-web-43-1] INFO  org.eclipse.jetty.server.RequestLog - 10.206.28.67 - - [03/Jun/2021:10:36:17 +0800] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.26.0" 0
10:36:17.317 [prometheus-stats-44-1] ERROR org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsServlet - Failed to generate prometheus stats
io.jsonwebtoken.io.IOException: Transaction pending ack generate managedLedgerStats fail!
	at org.apache.pulsar.broker.stats.prometheus.TransactionAggregator.lambda$generate$0(TransactionAggregator.java:69) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at java.util.ArrayList.forEach(ArrayList.java:1541) ~[?:?]
	at org.apache.pulsar.broker.stats.prometheus.TransactionAggregator.lambda$generate$1(TransactionAggregator.java:61) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:387) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:159) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.broker.stats.prometheus.TransactionAggregator.lambda$generate$2(TransactionAggregator.java:59) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:387) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:159) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.broker.stats.prometheus.TransactionAggregator.lambda$generate$3(TransactionAggregator.java:58) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:387) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:159) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.broker.stats.prometheus.TransactionAggregator.generate(TransactionAggregator.java:56) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsGenerator.generate(PrometheusMetricsGenerator.java:110) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsServlet.lambda$doGet$0(PrometheusMetricsServlet.java:72) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [org.apache.pulsar-managed-ledger-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.14.1.jar:4.14.1]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
	at java.lang.Thread.run(Thread.java:834) [?:?]
10:36:17.318 [prometheus-stats-44-1] INFO  org.eclipse.jetty.server.RequestLog - 10.206.28.67 - - [03/Jun/2021:10:36:17 +0800] "GET /metrics/ HTTP/1.1" 500 0 "http://10.206.128.216:18080/metrics" "Prometheus/2.26.0" 23
```

### Modifications

throw exception --> log.warn()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants