Skip to content

[SPARK-29032][CORE] Simplify Prometheus support by adding PrometheusServlet/Resource#25741

Closed
dongjoon-hyun wants to merge 7 commits intoapache:masterfrom
dongjoon-hyun:SPARK-29032
Closed

[SPARK-29032][CORE] Simplify Prometheus support by adding PrometheusServlet/Resource#25741
dongjoon-hyun wants to merge 7 commits intoapache:masterfrom
dongjoon-hyun:SPARK-29032

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Sep 10, 2019

What changes were proposed in this pull request?

Prometheus.io is a CNCF project used widely with K8s.

This PR aims to simplify Prometheus support by adding PrometheusServlet and PrometheusResource. The main use cases are K8s and Spark Standalone cluster environments.

Note that this PR focus Spark-generated metrics. Also, we need to update the document later in a separate PR.

Why are the changes needed?

There exists a few ways to support Prometheus. However, they requires extra configurations and new resources (port numbers to pull or gateways to push). And, the endpoints are widely spreader over Master/Slave/Driver/CoarseGrainedExecutorBackend. This PR aims to export natively Spark's metrics which starts with metrics_ prefix from the existing (1) method at Master/Worker/Driver. For task metrics, instead of using CoarseGrainedExecutorBackend , we will reuse the collected information used for the Apache Spark REST API, api/v1.

  1. Spark JMX Sink + Prometheus JMX Converter
  2. Custom Sink + Prometheus Pushgateway

Does this PR introduce any user-facing change?

Yes. New web interfaces are added along with the existing JSON API.

JSON End Point Prometheus End Point
Master /metrics/master/json/ /metrics/master/prometheus/
Master /metrics/applications/json/ /metrics/applications/prometheus/
Worker /metrics/json/ /metrics/prometheus/
Driver /metrics/json/ /metrics/prometheus/
Driver /api/v1/applications/{id}/(executors|stages) /metrics/detail/prometheus/
$ bin/spark-shell
...
Spark context Web UI available at http://localhost:4040
...
$ curl --silent http://localhost:4040/metrics/prometheus/ | head -n5
metrics_local_1568101220707_driver_BlockManager_disk_diskSpaceUsed_MB_Value 0
metrics_local_1568101220707_driver_BlockManager_memory_maxMem_MB_Value 366
metrics_local_1568101220707_driver_BlockManager_memory_maxOffHeapMem_MB_Value 0
metrics_local_1568101220707_driver_BlockManager_memory_maxOnHeapMem_MB_Value 366
metrics_local_1568101220707_driver_BlockManager_memory_memUsed_MB_Value 0

How was this patch tested?

Manually connect the new end-points with curl.

Or, run prometheus --config.file=config.yaml with the following configuration and see through the Prometheus UI.

config.yaml

global:                                                                                             
  scrape_interval:     5s                                                                           
  evaluation_interval: 15s                                                                          
  external_labels:                                                                                  
      monitor: 'codelab-monitor'                                                                    
rule_files:                                                                                         
scrape_configs:                                                                                     
  - job_name: 'spark-master'                                                                        
    metrics_path: '/metrics/master/prometheus/'                                                     
    static_configs:                                                                                 
      - targets: ['localhost:8080']                                                                 
  - job_name: 'spark-applications'                                                                  
    metrics_path: '/metrics/applications/prometheus/'                                               
    static_configs:                                                                                 
      - targets: ['localhost:8080']                                                                 
  - job_name: 'spark-worker'                                                                        
    metrics_path: '/metrics/prometheus/'                                                            
    static_configs:                                                                                 
      - targets: ['localhost:8081']                                                                 
  - job_name: 'spark-driver'                                                                        
    metrics_path: '/metrics/prometheus/'                                                            
    static_configs:                                                                                 
      - targets: ['localhost:4040']                                                                 
  - job_name: 'active-executors-and-stages'                                                         
    metrics_path: '/metrics/detail/prometheus/'                                                     
    static_configs:                                                                                 
      - targets: ['localhost:4040']

@dongjoon-hyun
Copy link
Member Author

Hi, @vanzin and @srowen . Could you review this please?

@dongjoon-hyun
Copy link
Member Author

Oops. I missed the task metrics. I'll update this PR later.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29032][CORE] Simplify Prometheus support by adding PrometheusServlet [WIP][SPARK-29032][CORE] Simplify Prometheus support by adding PrometheusServlet Sep 10, 2019
@SparkQA

This comment has been minimized.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is prometheus BTW? is it worth adding special code for?

classOf[Properties], classOf[MetricRegistry], classOf[SecurityManager])
.newInstance(kv._2, registry, securityMgr)
metricsServlet = Some(servlet)
} else if (kv._1 == "prometheusServlet") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want you can make kv into case (key, value) here for clarity, but not necessary

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Sep 10, 2019

Thank you for review, @srowen . Prometheus.io is a CNCF project which is used widely with K8s.

@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@dongjoon-hyun dongjoon-hyun changed the title [WIP][SPARK-29032][CORE] Simplify Prometheus support by adding PrometheusServlet [SPARK-29032][CORE] Simplify Prometheus support by adding PrometheusServlet Sep 10, 2019
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29032][CORE] Simplify Prometheus support by adding PrometheusServlet [SPARK-29032][CORE] Simplify Prometheus support by adding PrometheusServlet/Resource Sep 10, 2019
@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@SparkQA
Copy link

SparkQA commented Sep 12, 2019

Test build #110491 has finished for PR 25741 at commit 8f063cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Although this is not a long PR, but I decided to split this into two PRs because PrometheusResource is a new feature aligned with SPARK-23429 (which is added at 3.0.0). That will make the review easier.

@dongjoon-hyun
Copy link
Member Author

Sorry for this change~

@dongjoon-hyun dongjoon-hyun deleted the SPARK-29032 branch September 19, 2019 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants