Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broker error for prometheus metrics #3112

Closed
yifan opened this issue Dec 3, 2018 · 5 comments · Fixed by #4183
Closed

broker error for prometheus metrics #3112

yifan opened this issue Dec 3, 2018 · 5 comments · Fixed by #4183
Labels
type/bug The PR fixed a bug or issue reported a bug

Comments

@yifan
Copy link
Contributor

yifan commented Dec 3, 2018

Expected behavior

I deployed pulsar with helm and minikube configuration in kubernetes. I tried multiple times, I having problem with one of the brokers, sometimes both brokers. The problem has been consistent for three deployments so far.

Steps to reproduce

deploy pulsar with helm

System configuration

Pulsar version: x.y
image: apachepulsar/pulsar-all

Error in Broker.log

06:40:34.280 [pulsar-web-30-30] INFO org.eclipse.jetty.server.RequestLog - 10.244.6.29 - - [03/Dec/2018:06:40:34 +0000] "GET //xx.xx.xx.xx:8080/metrics HTTP/1.1" 302 0 "-" "Prometheus/1.6.3" 1
06:40:34.287 [prometheus-stats-31-1] ERROR org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsServlet - Failed to generate prometheus stats
org.eclipse.jetty.io.EofException: null
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:197) ~[org.eclipse.jetty-jetty-io-9.3.11.v20160721.jar:9.3.11.v20160721]
at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:420) ~[org.eclipse.jetty-jetty-io-9.3.11.v20160721.jar:9.3.11.v20160721]
at org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:375) ~[org.eclipse.jetty-jetty-io-9.3.11.v20160721.jar:9.3.11.v20160721]
at org.eclipse.jetty.io.SelectChannelEndPoint$3.run(SelectChannelEndPoint.java:107) ~[org.eclipse.jetty-jetty-io-9.3.11.v20160721.jar:9.3.11.v20160721]
at org.eclipse.jetty.io.SelectChannelEndPoint.onSelected(SelectChannelEndPoint.java:193) ~[org.eclipse.jetty-jetty-io-9.3.11.v20160721.jar:9.3.11.v20160721]
at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.processSelected(ManagedSelector.java:283) ~[org.eclipse.jetty-jetty-io-9.3.11.v20160721.jar:9.3.11.v20160721]
at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:181) ~[org.eclipse.jetty-jetty-io-9.3.11.v20160721.jar:9.3.11.v20160721]
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:249) ~[org.eclipse.jetty-jetty-util-9.3.11.v20160721.jar:9.3.11.v20160721]
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) ~[org.eclipse.jetty-jetty-util-9.3.11.v20160721.jar:9.3.11.v20160721]
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) ~[org.eclipse.jetty-jetty-util-9.3.11.v20160721.jar:9.3.11.v20160721]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [org.apache.pulsar-pulsar-functions-metrics-2.2.0.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:1.8.0_171]
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[?:1.8.0_171]
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[?:1.8.0_171]
at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[?:1.8.0_171]
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) ~[?:1.8.0_171]
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:175) ~[org.eclipse.jetty-jetty-io-9.3.11.v20160721.jar:9.3.11.v20160721]
... 13 more

Error in Prometheus

text format parsing error in line 182: second TYPE line for metric name "pulsar_subscription_back_log", or TYPE reported after samples

@ivankelly ivankelly added type/bug The PR fixed a bug or issue reported a bug component/deploy labels Dec 5, 2018
@lelabo-m
Copy link
Contributor

We have something similar:

text format parsing error in line 189: second TYPE line for metric name "pulsar_subscriptions_count", or TYPE reported after samples

When doing a port forward on my pod:
kubectl port-forward $(kubectl get pods -l component=broker -o jsonpath='{.items[*].metadata.name}') 8080

I can see on http://localhost:8080/metrics/.

Some of the metrics like pulsar_subscriptions_count are duplicated (one by default and one for each namespace).

# TYPE log4j2_appender_total counter
log4j2_appender_total{cluster="pulsar",level="debug"} 0.0
log4j2_appender_total{cluster="pulsar",level="warn"} 4.0
log4j2_appender_total{cluster="pulsar",level="trace"} 0.0
log4j2_appender_total{cluster="pulsar",level="error"} 0.0
log4j2_appender_total{cluster="pulsar",level="fatal"} 0.0
log4j2_appender_total{cluster="pulsar",level="info"} 477.0
# TYPE jvm_buffer_pool_used_bytes gauge
jvm_buffer_pool_used_bytes{cluster="pulsar",pool="direct"} 1374353.0
jvm_buffer_pool_used_bytes{cluster="pulsar",pool="mapped"} 0.0
# TYPE jvm_buffer_pool_capacity_bytes gauge
jvm_buffer_pool_capacity_bytes{cluster="pulsar",pool="direct"} 1374352.0
jvm_buffer_pool_capacity_bytes{cluster="pulsar",pool="mapped"} 0.0
# TYPE jvm_buffer_pool_used_buffers gauge
jvm_buffer_pool_used_buffers{cluster="pulsar",pool="direct"} 50.0
jvm_buffer_pool_used_buffers{cluster="pulsar",pool="mapped"} 0.0
# TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_count{cluster="pulsar",gc="G1 Young Generation"} 2.0
jvm_gc_collection_seconds_sum{cluster="pulsar",gc="G1 Young Generation"} 1.667
jvm_gc_collection_seconds_count{cluster="pulsar",gc="G1 Old Generation"} 0.0
jvm_gc_collection_seconds_sum{cluster="pulsar",gc="G1 Old Generation"} 0.0
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total{cluster="pulsar"} 60.95
# TYPE process_start_time_seconds gauge
process_start_time_seconds{cluster="pulsar"} 1.551261824492E9
# TYPE process_open_fds gauge
process_open_fds{cluster="pulsar"} 367.0
# TYPE process_max_fds gauge
process_max_fds{cluster="pulsar"} 8192.0
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes{cluster="pulsar"} 1.743130624E10
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes{cluster="pulsar"} 1.637597184E9
# TYPE zk_write_latency summary
zk_write_latency{cluster="pulsar",quantile="0.5"} NaN
zk_write_latency{cluster="pulsar",quantile="0.75"} NaN
zk_write_latency{cluster="pulsar",quantile="0.95"} NaN
zk_write_latency{cluster="pulsar",quantile="0.99"} NaN
zk_write_latency{cluster="pulsar",quantile="0.999"} NaN
zk_write_latency{cluster="pulsar",quantile="0.9999"} NaN
zk_write_latency_count{cluster="pulsar"} 0.0
zk_write_latency_sum{cluster="pulsar"} 0.0
# TYPE zk_read_latency summary
zk_read_latency{cluster="pulsar",quantile="0.5"} NaN
zk_read_latency{cluster="pulsar",quantile="0.75"} NaN
zk_read_latency{cluster="pulsar",quantile="0.95"} NaN
zk_read_latency{cluster="pulsar",quantile="0.99"} NaN
zk_read_latency{cluster="pulsar",quantile="0.999"} NaN
zk_read_latency{cluster="pulsar",quantile="0.9999"} NaN
zk_read_latency_count{cluster="pulsar"} 0.0
zk_read_latency_sum{cluster="pulsar"} 0.0
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded{cluster="pulsar"} 9550.0
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total{cluster="pulsar"} 9550.0
# TYPE jvm_classes_unloaded_total counter
jvm_classes_unloaded_total{cluster="pulsar"} 0.0
# TYPE jvm_threads_current gauge
jvm_threads_current{cluster="pulsar"} 113.0
# TYPE jvm_threads_daemon gauge
jvm_threads_daemon{cluster="pulsar"} 9.0
# TYPE jvm_threads_peak gauge
jvm_threads_peak{cluster="pulsar"} 116.0
# TYPE jvm_threads_started_total counter
jvm_threads_started_total{cluster="pulsar"} 144.0
# TYPE jvm_threads_deadlocked gauge
jvm_threads_deadlocked{cluster="pulsar"} 0.0
# TYPE jvm_threads_deadlocked_monitor gauge
jvm_threads_deadlocked_monitor{cluster="pulsar"} 0.0
# TYPE jvm_memory_direct_bytes_used gauge
jvm_memory_direct_bytes_used{cluster="pulsar"} 8.388608E7
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_used{cluster="pulsar",area="heap"} 1.003231264E9
jvm_memory_bytes_used{cluster="pulsar",area="nonheap"} 8.194224E7
# TYPE jvm_memory_bytes_committed gauge
jvm_memory_bytes_committed{cluster="pulsar",area="heap"} 1.2884901888E10
jvm_memory_bytes_committed{cluster="pulsar",area="nonheap"} 8.5991424E7
# TYPE jvm_memory_bytes_max gauge
jvm_memory_bytes_max{cluster="pulsar",area="heap"} 1.2884901888E10
jvm_memory_bytes_max{cluster="pulsar",area="nonheap"} -1.0
# TYPE jvm_memory_bytes_init gauge
jvm_memory_bytes_init{cluster="pulsar",area="heap"} 1.2884901888E10
jvm_memory_bytes_init{cluster="pulsar",area="nonheap"} 2555904.0
# TYPE jvm_memory_pool_bytes_used gauge
jvm_memory_pool_bytes_used{cluster="pulsar",pool="Code Cache"} 2.0103808E7
jvm_memory_pool_bytes_used{cluster="pulsar",pool="Metaspace"} 5.5292816E7
jvm_memory_pool_bytes_used{cluster="pulsar",pool="Compressed Class Space"} 6545616.0
jvm_memory_pool_bytes_used{cluster="pulsar",pool="G1 Eden Space"} 9.68884224E8
jvm_memory_pool_bytes_used{cluster="pulsar",pool="G1 Survivor Space"} 2.5165824E7
jvm_memory_pool_bytes_used{cluster="pulsar",pool="G1 Old Gen"} 9181216.0
# TYPE jvm_memory_pool_bytes_committed gauge
jvm_memory_pool_bytes_committed{cluster="pulsar",pool="Code Cache"} 2.1037056E7
jvm_memory_pool_bytes_committed{cluster="pulsar",pool="Metaspace"} 5.7745408E7
jvm_memory_pool_bytes_committed{cluster="pulsar",pool="Compressed Class Space"} 7208960.0
jvm_memory_pool_bytes_committed{cluster="pulsar",pool="G1 Eden Space"} 6.740246528E9
jvm_memory_pool_bytes_committed{cluster="pulsar",pool="G1 Survivor Space"} 2.5165824E7
jvm_memory_pool_bytes_committed{cluster="pulsar",pool="G1 Old Gen"} 6.119489536E9
# TYPE jvm_memory_pool_bytes_max gauge
jvm_memory_pool_bytes_max{cluster="pulsar",pool="Code Cache"} 2.5165824E8
jvm_memory_pool_bytes_max{cluster="pulsar",pool="Metaspace"} -1.0
jvm_memory_pool_bytes_max{cluster="pulsar",pool="Compressed Class Space"} 1.073741824E9
jvm_memory_pool_bytes_max{cluster="pulsar",pool="G1 Eden Space"} -1.0
jvm_memory_pool_bytes_max{cluster="pulsar",pool="G1 Survivor Space"} -1.0
jvm_memory_pool_bytes_max{cluster="pulsar",pool="G1 Old Gen"} 1.2884901888E10
# TYPE jvm_memory_pool_bytes_init gauge
jvm_memory_pool_bytes_init{cluster="pulsar",pool="Code Cache"} 2555904.0
jvm_memory_pool_bytes_init{cluster="pulsar",pool="Metaspace"} 0.0
jvm_memory_pool_bytes_init{cluster="pulsar",pool="Compressed Class Space"} 0.0
jvm_memory_pool_bytes_init{cluster="pulsar",pool="G1 Eden Space"} 6.765412352E9
jvm_memory_pool_bytes_init{cluster="pulsar",pool="G1 Survivor Space"} 0.0
jvm_memory_pool_bytes_init{cluster="pulsar",pool="G1 Old Gen"} 6.119489536E9
# TYPE jvm_info gauge
jvm_info{cluster="pulsar",version="1.8.0_181-8u181-b13-2~deb9u1-b13",vendor="Oracle Corporation",runtime="OpenJDK Runtime Environment"} 1.0
# TYPE topic_load_times summary
topic_load_times{cluster="pulsar",quantile="0.5"} NaN
topic_load_times{cluster="pulsar",quantile="0.75"} NaN
topic_load_times{cluster="pulsar",quantile="0.95"} NaN
topic_load_times{cluster="pulsar",quantile="0.99"} NaN
topic_load_times{cluster="pulsar",quantile="0.999"} NaN
topic_load_times{cluster="pulsar",quantile="0.9999"} NaN
topic_load_times_count{cluster="pulsar"} 0.0
topic_load_times_sum{cluster="pulsar"} 0.0
# TYPE jvm_memory_direct_bytes_max gauge
jvm_memory_direct_bytes_max{cluster="pulsar"} 1.5032385536E10
# TYPE jetty_requests_total counter
jetty_requests_total{cluster="pulsar"} 239.0
# TYPE jetty_requests_active gauge
jetty_requests_active{cluster="pulsar"} 1.0
# TYPE jetty_requests_active_max gauge
jetty_requests_active_max{cluster="pulsar"} 1.0
# TYPE jetty_request_time_max_seconds gauge
jetty_request_time_max_seconds{cluster="pulsar"} 0.101
# TYPE jetty_request_time_seconds_total counter
jetty_request_time_seconds_total{cluster="pulsar"} 0.914
# TYPE jetty_dispatched_total counter
jetty_dispatched_total{cluster="pulsar"} 239.0
# TYPE jetty_dispatched_active gauge
jetty_dispatched_active{cluster="pulsar"} 0.0
# TYPE jetty_dispatched_active_max gauge
jetty_dispatched_active_max{cluster="pulsar"} 1.0
# TYPE jetty_dispatched_time_max gauge
jetty_dispatched_time_max{cluster="pulsar"} 101.0
# TYPE jetty_dispatched_time_seconds_total counter
jetty_dispatched_time_seconds_total{cluster="pulsar"} 0.588
# TYPE jetty_async_requests_total counter
jetty_async_requests_total{cluster="pulsar"} 80.0
# TYPE jetty_async_requests_waiting gauge
jetty_async_requests_waiting{cluster="pulsar"} 1.0
# TYPE jetty_async_requests_waiting_max gauge
jetty_async_requests_waiting_max{cluster="pulsar"} 1.0
# TYPE jetty_async_dispatches_total counter
jetty_async_dispatches_total{cluster="pulsar"} 0.0
# TYPE jetty_expires_total counter
jetty_expires_total{cluster="pulsar"} 0.0
# TYPE jetty_responses_total counter
jetty_responses_total{cluster="pulsar",code="1xx"} 0.0
jetty_responses_total{cluster="pulsar",code="2xx"} 158.0
jetty_responses_total{cluster="pulsar",code="3xx"} 78.0
jetty_responses_total{cluster="pulsar",code="4xx"} 2.0
jetty_responses_total{cluster="pulsar",code="5xx"} 0.0
# TYPE jetty_stats_seconds gauge
jetty_stats_seconds{cluster="pulsar"} 1524.577
# TYPE jetty_responses_bytes_total counter
jetty_responses_bytes_total{cluster="pulsar"} 910107.0
# TYPE pulsar_topics_count gauge
pulsar_topics_count{cluster="pulsar"} 0 1551263358513
# TYPE pulsar_subscriptions_count gauge
pulsar_subscriptions_count{cluster="pulsar"} 0 1551263358513
# TYPE pulsar_producers_count gauge
pulsar_producers_count{cluster="pulsar"} 0 1551263358513
# TYPE pulsar_consumers_count gauge
pulsar_consumers_count{cluster="pulsar"} 0 1551263358513
# TYPE pulsar_rate_in gauge
pulsar_rate_in{cluster="pulsar"} 0 1551263358513
# TYPE pulsar_rate_out gauge
pulsar_rate_out{cluster="pulsar"} 0 1551263358513
# TYPE pulsar_throughput_in gauge
pulsar_throughput_in{cluster="pulsar"} 0 1551263358513
# TYPE pulsar_throughput_out gauge
pulsar_throughput_out{cluster="pulsar"} 0 1551263358513
# TYPE pulsar_storage_size gauge
pulsar_storage_size{cluster="pulsar"} 0 1551263358513
# TYPE pulsar_storage_write_rate gauge
pulsar_storage_write_rate{cluster="pulsar"} 0 1551263358513
# TYPE pulsar_storage_read_rate gauge
pulsar_storage_read_rate{cluster="pulsar"} 0 1551263358513
# TYPE pulsar_msg_backlog gauge
pulsar_msg_backlog{cluster="pulsar"} 0 1551263358513
# TYPE pulsar_subscriptions_count gauge
pulsar_subscriptions_count{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 1.0 1551263358513
# TYPE pulsar_producers_count gauge
pulsar_producers_count{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_consumers_count gauge
pulsar_consumers_count{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_rate_in gauge
pulsar_rate_in{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_rate_out gauge
pulsar_rate_out{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_throughput_in gauge
pulsar_throughput_in{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_throughput_out gauge
pulsar_throughput_out{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_storage_size gauge
pulsar_storage_size{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 3254281406.0 1551263358513
# TYPE pulsar_msg_backlog gauge
pulsar_msg_backlog{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 589081.0 1551263358513
# TYPE pulsar_storage_write_latency_le_0_5 gauge
pulsar_storage_write_latency_le_0_5{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_storage_write_latency_le_1 gauge
pulsar_storage_write_latency_le_1{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_storage_write_latency_le_5 gauge
pulsar_storage_write_latency_le_5{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_storage_write_latency_le_10 gauge
pulsar_storage_write_latency_le_10{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_storage_write_latency_le_20 gauge
pulsar_storage_write_latency_le_20{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_storage_write_latency_le_50 gauge
pulsar_storage_write_latency_le_50{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_storage_write_latency_le_100 gauge
pulsar_storage_write_latency_le_100{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_storage_write_latency_le_200 gauge
pulsar_storage_write_latency_le_200{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_storage_write_latency_le_1000 gauge
pulsar_storage_write_latency_le_1000{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_storage_write_latency_overflow gauge
pulsar_storage_write_latency_overflow{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_storage_write_latency_count gauge
pulsar_storage_write_latency_count{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_storage_write_latency_sum gauge
pulsar_storage_write_latency_sum{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_entry_size_le_128 gauge
pulsar_entry_size_le_128{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_entry_size_le_512 gauge
pulsar_entry_size_le_512{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_entry_size_le_1_kb gauge
pulsar_entry_size_le_1_kb{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_entry_size_le_2_kb gauge
pulsar_entry_size_le_2_kb{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_entry_size_le_4_kb gauge
pulsar_entry_size_le_4_kb{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_entry_size_le_16_kb gauge
pulsar_entry_size_le_16_kb{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_entry_size_le_100_kb gauge
pulsar_entry_size_le_100_kb{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_entry_size_le_1_mb gauge
pulsar_entry_size_le_1_mb{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_entry_size_le_overflow gauge
pulsar_entry_size_le_overflow{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_entry_size_count gauge
pulsar_entry_size_count{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_entry_size_sum gauge
pulsar_entry_size_sum{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic"} 0.0 1551263358513
# TYPE pulsar_subscription_back_log gauge
pulsar_subscription_back_log{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic",subscription="mysubscriber"} 589081 1551263358513
# TYPE pulsar_subscription_msg_rate_redeliver gauge
pulsar_subscription_msg_rate_redeliver{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic",subscription="mysubscriber"} 0.0 1551263358513
# TYPE pulsar_subscription_unacked_massages gauge
pulsar_subscription_unacked_massages{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic",subscription="mysubscriber"} 0 1551263358513
# TYPE pulsar_subscription_blocked_on_unacked_messages gauge
pulsar_subscription_blocked_on_unacked_messages{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic",subscription="mysubscriber"} 0 1551263358513
# TYPE pulsar_subscription_msg_rate_out gauge
pulsar_subscription_msg_rate_out{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic",subscription="mysubscriber"} 0.0 1551263358513
# TYPE pulsar_subscription_msg_throughput_out gauge
pulsar_subscription_msg_throughput_out{cluster="pulsar",namespace="ten/ns",topic="persistent://ten/ns/my-topic",subscription="mysubscriber"} 0.0 1551263358513

@alexsplashex
Copy link

alexsplashex commented Apr 3, 2019

Same story. I have an error in Prometheus
text format parsing error in line 189: second TYPE line for metric name "pulsar_subscriptions_count", or TYPE reported after samples
but can see all metrics.

Any suggestions?

@lelabo-m
Copy link
Contributor

lelabo-m commented Apr 3, 2019

We had to update prometheus and grafana, and refactor the existing configuration for both until it was compatible with the last version in order to get ride of these errors.

jiazhai pushed a commit that referenced this issue Aug 26, 2019
… fixed format issue in metricWithRemoteCluster; added test for Prometheus types (#4183)

Fixes #3112 

### Motivation

In this pull request, I set out to fix #3112, which is caused by duplicate TYPE statements in the metrics output which leads to parsing of Prometheus metrics to fail in recent versions of Prometheus. Because of this, Prometheus will report the broker target as down.

Since I started looking at this, the type definitions have been removed (#4136) from topics metric output. I think these types are useful in Prometheus and have added them back in. 

While testing this fix in my geo-replicated setup, I found a format in error (missing quote and comma) in the TopicStats.metricWithRemoteCluster method. This pull request includes a fix for that issue.

I have also added a new test to PrometheusMetricsTest.java that fails without these changes but passed with them.

### Modifications

I added a static HashMap to TopicStats to keep track of the TYPEs that have been output. All writing of the TYPE for topics and namespaces is done with the TopicStats.metricType method. I modified that method to update the HashMap and only print the TYPE out for the first occurrence of the metric name.  I also added a method reset the HashMap, which gets called in NamespaceStatsAggregator.generate. 

### Verifying this change

This change added tests and can be verified as follows:
  - Added testDuplicateMetricTypeDefinitions which checks for:
       - duplicate TYPE definitions in the Prometheus output
       - validates that no TYPE definition appears after the first metric sample
       - ensures that all metrics have a defined type

I execute the test twice to make sure the resetting of the HashMap of the already seen metric type definitions works correctly. This test passes for me reliably (both occurrences).

I have confirmed using promtool that the metrics output will now parse without error using versions 2.7.1 and 2.9.2 (which is the latest). There are many warnings around missing HELP definitions and metrics using reserved suffixes (ex. _count), but no errors.

In addition, I have patched 2.3.1 with this fix and am currently running it in my cluster. Prometheus (2.7.1) successfully parses the metrics and I am able to see namespace and topic-level metrics.
jiazhai pushed a commit that referenced this issue Aug 28, 2019
… fixed format issue in metricWithRemoteCluster; added test for Prometheus types (#4183)

Fixes #3112

In this pull request, I set out to fix #3112, which is caused by duplicate TYPE statements in the metrics output which leads to parsing of Prometheus metrics to fail in recent versions of Prometheus. Because of this, Prometheus will report the broker target as down.

Since I started looking at this, the type definitions have been removed (#4136) from topics metric output. I think these types are useful in Prometheus and have added them back in.

While testing this fix in my geo-replicated setup, I found a format in error (missing quote and comma) in the TopicStats.metricWithRemoteCluster method. This pull request includes a fix for that issue.

I have also added a new test to PrometheusMetricsTest.java that fails without these changes but passed with them.

I added a static HashMap to TopicStats to keep track of the TYPEs that have been output. All writing of the TYPE for topics and namespaces is done with the TopicStats.metricType method. I modified that method to update the HashMap and only print the TYPE out for the first occurrence of the metric name.  I also added a method reset the HashMap, which gets called in NamespaceStatsAggregator.generate.

This change added tests and can be verified as follows:
  - Added testDuplicateMetricTypeDefinitions which checks for:
       - duplicate TYPE definitions in the Prometheus output
       - validates that no TYPE definition appears after the first metric sample
       - ensures that all metrics have a defined type

I execute the test twice to make sure the resetting of the HashMap of the already seen metric type definitions works correctly. This test passes for me reliably (both occurrences).

I have confirmed using promtool that the metrics output will now parse without error using versions 2.7.1 and 2.9.2 (which is the latest). There are many warnings around missing HELP definitions and metrics using reserved suffixes (ex. _count), but no errors.

In addition, I have patched 2.3.1 with this fix and am currently running it in my cluster. Prometheus (2.7.1) successfully parses the metrics and I am able to see namespace and topic-level metrics.

(cherry picked from commit 8d32b58)
@codelipenghui codelipenghui reopened this Nov 18, 2019
@codelipenghui
Copy link
Contributor

Seems the problem can still happens, but i can't get the restful response, broker is crashed at that time.

21:23:33.629 [prometheus-stats-39-1] ERROR org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsServlet - Failed to generate prometheus stats
org.eclipse.jetty.io.EofException: null
	at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:283) ~[org.eclipse.jetty-jetty-io-9.4.20.v20190813.jar:9.4.20.v20190813]
	at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422) ~[org.eclipse.jetty-jetty-io-9.4.20.v20190813.jar:9.4.20.v20190813]
	at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:277) ~[org.eclipse.jetty-jetty-io-9.4.20.v20190813.jar:9.4.20.v20190813]
	at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381) ~[org.eclipse.jetty-jetty-io-9.4.20.v20190813.jar:9.4.20.v20190813]
	at org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:818) ~[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
	at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241) ~[org.eclipse.jetty-jetty-util-9.4.20.v20190813.jar:9.4.20.v20190813]
	at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223) ~[org.eclipse.jetty-jetty-util-9.4.20.v20190813.jar:9.4.20.v20190813]
	at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:549) ~[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
	at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:857) ~[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
	at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:929) ~[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
	at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:250) ~[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
	at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:226) ~[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
	at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:546) ~[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
	at org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsGenerator.generate(PrometheusMetricsGenerator.java:78) ~[org.apache.pulsar-pulsar-broker-2.5.0-d3cb10859.jar:2.5.0-d3cb10859]
	at org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsServlet.lambda$doGet$0(PrometheusMetricsServlet.java:70) ~[org.apache.pulsar-pulsar-broker-2.5.0-d3cb10859.jar:2.5.0-d3cb10859]
	at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [org.apache.pulsar-managed-ledger-2.5.0-d3cb10859.jar:2.5.0-d3cb10859]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.9.2.jar:4.9.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_171]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_171]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_171]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: java.io.IOException: Broken pipe
	at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) ~[?:1.8.0_171]
	at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) ~[?:1.8.0_171]
	at sun.nio.ch.IOUtil.write(IOUtil.java:148) ~[?:1.8.0_171]
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) ~[?:1.8.0_171]
	at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:263) ~[org.eclipse.jetty-jetty-io-9.4.20.v20190813.jar:9.4.20.v20190813]
	... 24 more

@sijie
Copy link
Member

sijie commented Jun 9, 2020

I think this issue has been fixed in the latest Pulsar release. Close this issue now. Please reopen one if there are still issues.

@sijie sijie closed this as completed Jun 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
6 participants