Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add prometheus vendor and pull images api metric and go-related metric like gc time #91

Merged
merged 1 commit into from
Nov 15, 2017

Conversation

WIZARD-CXY
Copy link
Contributor

@WIZARD-CXY WIZARD-CXY commented Nov 13, 2017

1.Describe what this PR did
add prometheus package into vendor and a pull image latency monitoring.

This PR is related to issue #16

2.Does this pull request fix one issue?

3.Describe how you did it
add code for monitoring in the apiserver part.

4.Describe how to verify it
use pouch to pull a image and get metrics from /metrics http endpoint to verify.

5.Special notes for reviews
NONE

@pouchrobot
Copy link
Collaborator

Thanks for your contribution. 🍻 @WIZARD-CXY
Please sign off in each of your commits.

@WIZARD-CXY
Copy link
Contributor Author

With basic prometheus package we can also have go-related metrics

@allencloud
Copy link
Collaborator

Could you please guide us how to utilize the feature of Prometheus integration?
Maybe a piece of document is needed. @WIZARD-CXY 😄

@allencloud allencloud added the need-docs This pull request should also add more document label Nov 13, 2017
// record the time spent during image pull procedure.
defer func(start time.Time) {
metrics.ImagePullSummary.WithLabelValues(image + ":" + tag).Observe(metrics.SinceInMicroseconds(start))
}(time.Now())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code should be moved into "apis/server/router.go: filter()", so that We don't need to insert some code into many places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pr is focusing on a simple pull image duration metric. No need to move it to filter now. Besides I see

r.Path("/_ping").Methods(http.MethodGet).Handler(s.filter(s.ping))
	r.Path("/info").Methods(http.MethodGet).Handler(s.filter(s.info))
	r.Path("/version").Methods(http.MethodGet).Handler(s.filter(s.version))

we don't need to record ping info and version api latency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not the blocker for this pr. feel free to add it in the future pr @skoo87

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still insist that we should add a doc for Prometheus.
Once merged, one will never add the doc any more, believe me. @WIZARD-CXY
WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add in another pr @allencloud

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks a lot.
And could you add this before weekends? @WIZARD-CXY


"github.com/alibaba/pouch/apis/metrics"
"github.com/sirupsen/logrus"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank line before "logrus"

@WIZARD-CXY WIZARD-CXY changed the title [WIP] add prometheus vendor and pull images api metric add prometheus vendor and pull images api metric and go-related metric like gc time Nov 14, 2017
@WIZARD-CXY
Copy link
Contributor Author

Now with this pr, we will have so many useful metrics

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.000111176
go_gc_duration_seconds{quantile="0.25"} 0.000198062
go_gc_duration_seconds{quantile="0.5"} 0.000269599
go_gc_duration_seconds{quantile="0.75"} 0.000474291
go_gc_duration_seconds{quantile="1"} 0.002013351
go_gc_duration_seconds_sum 0.021835193
go_gc_duration_seconds_count 52
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 22
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.9"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 7.910168e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 2.02160608e+08
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.463676e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 545450
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 0.0005226951640063396
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 643072
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 7.910168e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 2.326528e+06
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.0584064e+07
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 37901
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 0
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 1.2910592e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.5106441661824794e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 1007
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 583351
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 6944
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 129504
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 147456
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 9.974368e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.184636e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 720896
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 720896
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 1.7086712e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 13
# HELP http_request_duration_microseconds The HTTP request latencies in microseconds.
# TYPE http_request_duration_microseconds summary
http_request_duration_microseconds{handler="prometheus",quantile="0.5"} NaN
http_request_duration_microseconds{handler="prometheus",quantile="0.9"} NaN
http_request_duration_microseconds{handler="prometheus",quantile="0.99"} NaN
http_request_duration_microseconds_sum{handler="prometheus"} 0
http_request_duration_microseconds_count{handler="prometheus"} 0
# HELP http_request_size_bytes The HTTP request sizes in bytes.
# TYPE http_request_size_bytes summary
http_request_size_bytes{handler="prometheus",quantile="0.5"} NaN
http_request_size_bytes{handler="prometheus",quantile="0.9"} NaN
http_request_size_bytes{handler="prometheus",quantile="0.99"} NaN
http_request_size_bytes_sum{handler="prometheus"} 0
http_request_size_bytes_count{handler="prometheus"} 0
# HELP http_response_size_bytes The HTTP response sizes in bytes.
# TYPE http_response_size_bytes summary
http_response_size_bytes{handler="prometheus",quantile="0.5"} NaN
http_response_size_bytes{handler="prometheus",quantile="0.9"} NaN
http_response_size_bytes{handler="prometheus",quantile="0.99"} NaN
http_response_size_bytes_sum{handler="prometheus"} 0
http_response_size_bytes_count{handler="prometheus"} 0
# HELP pouch_image_pull_latency_microseconds Latency in microseconds to pull a image.
# TYPE pouch_image_pull_latency_microseconds summary
pouch_image_pull_latency_microseconds{image="docker.io/library/ubuntu:latest",quantile="0.5"} 3.7803132e+07
pouch_image_pull_latency_microseconds{image="docker.io/library/ubuntu:latest",quantile="0.9"} 3.7803132e+07
pouch_image_pull_latency_microseconds{image="docker.io/library/ubuntu:latest",quantile="0.99"} 3.7803132e+07
pouch_image_pull_latency_microseconds_sum{image="docker.io/library/ubuntu:latest"} 3.7803132e+07
pouch_image_pull_latency_microseconds_count{image="docker.io/library/ubuntu:latest"} 1
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 4.78
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 9
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 3.4521088e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.51064406778e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 4.91610112e+08

@WIZARD-CXY
Copy link
Contributor Author

pouch_image_pull_latency_microseconds{image="docker.io/library/ubuntu:latest",quantile="0.5"} 3.7803132e+07

indicates pouchd take 37s to pull docker.io/library/ubuntu:latest image

@WIZARD-CXY
Copy link
Contributor Author

@allencloud I will add a new documentation to show how to add new metrics and some conventions. I think we can delegate the api request lantency metrics to the sel guys.

…-related metrics.

Signed-off-by: 宇慕 <xingyu.chenxingyu@alibaba-inc.com>
@allencloud
Copy link
Collaborator

First, for end-users, how to use prometheus in pouch, this is the most important. Second, for develop-user, how to add new metrics to pouch is the second most important thing. @WIZARD-CXY

@WIZARD-CXY
Copy link
Contributor Author

@allencloud good point

@allencloud allencloud self-assigned this Nov 14, 2017
@allencloud
Copy link
Collaborator

I would like to hear some thoughts from @skoo87.
Does this PR's handler or router staff satisfy you? @skoo87
If the answer is yes, could we make this move on?

@allencloud
Copy link
Collaborator

I would like to merge this first. And we can iterate fast.

@allencloud
Copy link
Collaborator

LGTM

@pouchrobot pouchrobot added the LGTM one maintainer or community participant agrees to merge the pull reuqest. label Nov 15, 2017
@allencloud allencloud merged commit 53f1f36 into AliyunContainerService:master Nov 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
areas/monitoring LGTM one maintainer or community participant agrees to merge the pull reuqest. need-docs This pull request should also add more document size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants