Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minimal fastapi prom metrics #1426

Merged
merged 6 commits into from Feb 10, 2023

Conversation

andrewm4894
Copy link
Collaborator

@andrewm4894 andrewm4894 commented Feb 10, 2023

Summary:

  • adds /metrics endpoint to fastapi backend
  • add /metrics endpoint to inference fastapi backend

This PR uses https://github.com/trallnag/prometheus-fastapi-instrumentator to add a /metrics endpoint to the fastapi app that can then be scraped by Prometheus, netdata or any other monitoring tools.

Here is example of default metrics endpoint:

image

You can see it has typical enough metrics for each endpoint etc.

If we wanted to of course we could also add custom metrics easily enough: https://github.com/trallnag/prometheus-fastapi-instrumentator#creating-new-metrics

@andrewm4894 andrewm4894 marked this pull request as ready for review February 10, 2023 12:14
@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@andrewm4894
Copy link
Collaborator Author

Here is example of all the default metrics this gives:

# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 3608.0
python_gc_objects_collected_total{generation="1"} 1322.0
python_gc_objects_collected_total{generation="2"} 1665.0
# HELP python_gc_objects_uncollectable_total Uncollectable object found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 517.0
python_gc_collections_total{generation="1"} 46.0
python_gc_collections_total{generation="2"} 4.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="10",patchlevel="9",version="3.10.9"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.386524672e+09
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.5613952e+08
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.6760310371e+09
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 4.38
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 28.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP http_requests_total Total number of requests by method, status and handler.
# TYPE http_requests_total counter
http_requests_total{handler="none",method="GET",status="4xx"} 1.0
http_requests_total{handler="/docs",method="GET",status="2xx"} 1.0
http_requests_total{handler="/api/v1/openapi.json",method="GET",status="2xx"} 1.0
http_requests_total{handler="/metrics",method="GET",status="2xx"} 1.0
http_requests_total{handler="/api/v1/frontend_users/{auth_method}/{username}",method="GET",status="4xx"} 2.0
http_requests_total{handler="/api/v1/auth/check",method="GET",status="5xx"} 2.0
http_requests_total{handler="/api/v1/leaderboards/{time_frame}",method="GET",status="2xx"} 2.0
http_requests_total{handler="/api/v1/tasks/availability",method="POST",status="2xx"} 2.0
http_requests_total{handler="/api/v1/frontend_users/{auth_method}/{username}",method="GET",status="2xx"} 2.0
http_requests_total{handler="/api/v1/frontend_users/",method="POST",status="2xx"} 1.0
http_requests_total{handler="/api/v1/users/{user_id}",method="PUT",status="2xx"} 1.0
# HELP http_requests_created Total number of requests by method, status and handler.
# TYPE http_requests_created gauge
http_requests_created{handler="none",method="GET",status="4xx"} 1.6760310625765345e+09
http_requests_created{handler="/docs",method="GET",status="2xx"} 1.676031066198686e+09
http_requests_created{handler="/api/v1/openapi.json",method="GET",status="2xx"} 1.6760310669857574e+09
http_requests_created{handler="/metrics",method="GET",status="2xx"} 1.6760310709086044e+09
http_requests_created{handler="/api/v1/frontend_users/{auth_method}/{username}",method="GET",status="4xx"} 1.6760310831132445e+09
http_requests_created{handler="/api/v1/auth/check",method="GET",status="5xx"} 1.6760310834009001e+09
http_requests_created{handler="/api/v1/leaderboards/{time_frame}",method="GET",status="2xx"} 1.6760310835967855e+09
http_requests_created{handler="/api/v1/tasks/availability",method="POST",status="2xx"} 1.6760310836020613e+09
http_requests_created{handler="/api/v1/frontend_users/{auth_method}/{username}",method="GET",status="2xx"} 1.6760311554392397e+09
http_requests_created{handler="/api/v1/frontend_users/",method="POST",status="2xx"} 1.6760311584514124e+09
http_requests_created{handler="/api/v1/users/{user_id}",method="PUT",status="2xx"} 1.676031158459187e+09
# HELP http_request_size_bytes Content length of incoming requests by handler. Only value of header is respected. Otherwise ignored. No percentile calculated. 
# TYPE http_request_size_bytes summary
http_request_size_bytes_count{handler="none"} 1.0
http_request_size_bytes_sum{handler="none"} 0.0
http_request_size_bytes_count{handler="/docs"} 1.0
http_request_size_bytes_sum{handler="/docs"} 0.0
http_request_size_bytes_count{handler="/api/v1/openapi.json"} 1.0
http_request_size_bytes_sum{handler="/api/v1/openapi.json"} 0.0
http_request_size_bytes_count{handler="/metrics"} 1.0
http_request_size_bytes_sum{handler="/metrics"} 0.0
http_request_size_bytes_count{handler="/api/v1/frontend_users/{auth_method}/{username}"} 4.0
http_request_size_bytes_sum{handler="/api/v1/frontend_users/{auth_method}/{username}"} 0.0
http_request_size_bytes_count{handler="/api/v1/auth/check"} 2.0
http_request_size_bytes_sum{handler="/api/v1/auth/check"} 0.0
http_request_size_bytes_count{handler="/api/v1/leaderboards/{time_frame}"} 2.0
http_request_size_bytes_sum{handler="/api/v1/leaderboards/{time_frame}"} 0.0
http_request_size_bytes_count{handler="/api/v1/tasks/availability"} 2.0
http_request_size_bytes_sum{handler="/api/v1/tasks/availability"} 110.0
http_request_size_bytes_count{handler="/api/v1/frontend_users/"} 1.0
http_request_size_bytes_sum{handler="/api/v1/frontend_users/"} 55.0
http_request_size_bytes_count{handler="/api/v1/users/{user_id}"} 1.0
http_request_size_bytes_sum{handler="/api/v1/users/{user_id}"} 0.0
# HELP http_request_size_bytes_created Content length of incoming requests by handler. Only value of header is respected. Otherwise ignored. No percentile calculated. 
# TYPE http_request_size_bytes_created gauge
http_request_size_bytes_created{handler="none"} 1.6760310625765564e+09
http_request_size_bytes_created{handler="/docs"} 1.6760310661987078e+09
http_request_size_bytes_created{handler="/api/v1/openapi.json"} 1.676031066985779e+09
http_request_size_bytes_created{handler="/metrics"} 1.6760310709086244e+09
http_request_size_bytes_created{handler="/api/v1/frontend_users/{auth_method}/{username}"} 1.6760310831132627e+09
http_request_size_bytes_created{handler="/api/v1/auth/check"} 1.6760310834009166e+09
http_request_size_bytes_created{handler="/api/v1/leaderboards/{time_frame}"} 1.6760310835968053e+09
http_request_size_bytes_created{handler="/api/v1/tasks/availability"} 1.6760310836020775e+09
http_request_size_bytes_created{handler="/api/v1/frontend_users/"} 1.67603115845143e+09
http_request_size_bytes_created{handler="/api/v1/users/{user_id}"} 1.6760311584592013e+09
# HELP http_response_size_bytes Content length of outgoing responses by handler. Only value of header is respected. Otherwise ignored. No percentile calculated. 
# TYPE http_response_size_bytes summary
http_response_size_bytes_count{handler="none"} 1.0
http_response_size_bytes_sum{handler="none"} 22.0
http_response_size_bytes_count{handler="/docs"} 1.0
http_response_size_bytes_sum{handler="/docs"} 953.0
http_response_size_bytes_count{handler="/api/v1/openapi.json"} 1.0
http_response_size_bytes_sum{handler="/api/v1/openapi.json"} 96290.0
http_response_size_bytes_count{handler="/metrics"} 1.0
http_response_size_bytes_sum{handler="/metrics"} 8356.0
http_response_size_bytes_count{handler="/api/v1/frontend_users/{auth_method}/{username}"} 4.0
http_response_size_bytes_sum{handler="/api/v1/frontend_users/{auth_method}/{username}"} 828.0
http_response_size_bytes_count{handler="/api/v1/auth/check"} 2.0
http_response_size_bytes_sum{handler="/api/v1/auth/check"} 0.0
http_response_size_bytes_count{handler="/api/v1/leaderboards/{time_frame}"} 2.0
http_response_size_bytes_sum{handler="/api/v1/leaderboards/{time_frame}"} 174.0
http_response_size_bytes_count{handler="/api/v1/tasks/availability"} 2.0
http_response_size_bytes_sum{handler="/api/v1/tasks/availability"} 538.0
http_response_size_bytes_count{handler="/api/v1/frontend_users/"} 1.0
http_response_size_bytes_sum{handler="/api/v1/frontend_users/"} 353.0
http_response_size_bytes_count{handler="/api/v1/users/{user_id}"} 1.0
http_response_size_bytes_sum{handler="/api/v1/users/{user_id}"} 0.0
# HELP http_response_size_bytes_created Content length of outgoing responses by handler. Only value of header is respected. Otherwise ignored. No percentile calculated. 
# TYPE http_response_size_bytes_created gauge
http_response_size_bytes_created{handler="none"} 1.676031062576586e+09
http_response_size_bytes_created{handler="/docs"} 1.676031066198741e+09
http_response_size_bytes_created{handler="/api/v1/openapi.json"} 1.6760310669858077e+09
http_response_size_bytes_created{handler="/metrics"} 1.6760310709086478e+09
http_response_size_bytes_created{handler="/api/v1/frontend_users/{auth_method}/{username}"} 1.676031083113282e+09
http_response_size_bytes_created{handler="/api/v1/auth/check"} 1.6760310834009411e+09
http_response_size_bytes_created{handler="/api/v1/leaderboards/{time_frame}"} 1.6760310835968306e+09
http_response_size_bytes_created{handler="/api/v1/tasks/availability"} 1.676031083602097e+09
http_response_size_bytes_created{handler="/api/v1/frontend_users/"} 1.67603115845145e+09
http_response_size_bytes_created{handler="/api/v1/users/{user_id}"} 1.6760311584592183e+09
# HELP http_request_duration_highr_seconds Latency with many buckets but no API specific labels. Made for more accurate percentile calculations. 
# TYPE http_request_duration_highr_seconds histogram
http_request_duration_highr_seconds_bucket{le="0.01"} 9.0
http_request_duration_highr_seconds_bucket{le="0.025"} 11.0
http_request_duration_highr_seconds_bucket{le="0.05"} 13.0
http_request_duration_highr_seconds_bucket{le="0.075"} 13.0
http_request_duration_highr_seconds_bucket{le="0.1"} 13.0
http_request_duration_highr_seconds_bucket{le="0.25"} 16.0
http_request_duration_highr_seconds_bucket{le="0.5"} 16.0
http_request_duration_highr_seconds_bucket{le="0.75"} 16.0
http_request_duration_highr_seconds_bucket{le="1.0"} 16.0
http_request_duration_highr_seconds_bucket{le="1.5"} 16.0
http_request_duration_highr_seconds_bucket{le="2.0"} 16.0
http_request_duration_highr_seconds_bucket{le="2.5"} 16.0
http_request_duration_highr_seconds_bucket{le="3.0"} 16.0
http_request_duration_highr_seconds_bucket{le="3.5"} 16.0
http_request_duration_highr_seconds_bucket{le="4.0"} 16.0
http_request_duration_highr_seconds_bucket{le="4.5"} 16.0
http_request_duration_highr_seconds_bucket{le="5.0"} 16.0
http_request_duration_highr_seconds_bucket{le="7.5"} 16.0
http_request_duration_highr_seconds_bucket{le="10.0"} 16.0
http_request_duration_highr_seconds_bucket{le="30.0"} 16.0
http_request_duration_highr_seconds_bucket{le="60.0"} 16.0
http_request_duration_highr_seconds_bucket{le="+Inf"} 16.0
http_request_duration_highr_seconds_count 16.0
http_request_duration_highr_seconds_sum 0.7020050129794981
# HELP http_request_duration_highr_seconds_created Latency with many buckets but no API specific labels. Made for more accurate percentile calculations. 
# TYPE http_request_duration_highr_seconds_created gauge
http_request_duration_highr_seconds_created 1.676031039374932e+09
# HELP http_request_duration_seconds Latency with only few buckets by handler. Made to be only used if aggregation by handler is important. 
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{handler="none",le="0.1"} 1.0
http_request_duration_seconds_bucket{handler="none",le="0.5"} 1.0
http_request_duration_seconds_bucket{handler="none",le="1.0"} 1.0
http_request_duration_seconds_bucket{handler="none",le="+Inf"} 1.0
http_request_duration_seconds_count{handler="none"} 1.0
http_request_duration_seconds_sum{handler="none"} 0.0007602169935125858
http_request_duration_seconds_bucket{handler="/docs",le="0.1"} 1.0
http_request_duration_seconds_bucket{handler="/docs",le="0.5"} 1.0
http_request_duration_seconds_bucket{handler="/docs",le="1.0"} 1.0
http_request_duration_seconds_bucket{handler="/docs",le="+Inf"} 1.0
http_request_duration_seconds_count{handler="/docs"} 1.0
http_request_duration_seconds_sum{handler="/docs"} 0.00037947800592519343
http_request_duration_seconds_bucket{handler="/api/v1/openapi.json",le="0.1"} 0.0
http_request_duration_seconds_bucket{handler="/api/v1/openapi.json",le="0.5"} 1.0
http_request_duration_seconds_bucket{handler="/api/v1/openapi.json",le="1.0"} 1.0
http_request_duration_seconds_bucket{handler="/api/v1/openapi.json",le="+Inf"} 1.0
http_request_duration_seconds_count{handler="/api/v1/openapi.json"} 1.0
http_request_duration_seconds_sum{handler="/api/v1/openapi.json"} 0.23256462899735197
http_request_duration_seconds_bucket{handler="/metrics",le="0.1"} 1.0
http_request_duration_seconds_bucket{handler="/metrics",le="0.5"} 1.0
http_request_duration_seconds_bucket{handler="/metrics",le="1.0"} 1.0
http_request_duration_seconds_bucket{handler="/metrics",le="+Inf"} 1.0
http_request_duration_seconds_count{handler="/metrics"} 1.0
http_request_duration_seconds_sum{handler="/metrics"} 0.0031614219915354624
http_request_duration_seconds_bucket{handler="/api/v1/frontend_users/{auth_method}/{username}",le="0.1"} 3.0
http_request_duration_seconds_bucket{handler="/api/v1/frontend_users/{auth_method}/{username}",le="0.5"} 4.0
http_request_duration_seconds_bucket{handler="/api/v1/frontend_users/{auth_method}/{username}",le="1.0"} 4.0
http_request_duration_seconds_bucket{handler="/api/v1/frontend_users/{auth_method}/{username}",le="+Inf"} 4.0
http_request_duration_seconds_count{handler="/api/v1/frontend_users/{auth_method}/{username}"} 4.0
http_request_duration_seconds_sum{handler="/api/v1/frontend_users/{auth_method}/{username}"} 0.1464438099937979
http_request_duration_seconds_bucket{handler="/api/v1/auth/check",le="0.1"} 2.0
http_request_duration_seconds_bucket{handler="/api/v1/auth/check",le="0.5"} 2.0
http_request_duration_seconds_bucket{handler="/api/v1/auth/check",le="1.0"} 2.0
http_request_duration_seconds_bucket{handler="/api/v1/auth/check",le="+Inf"} 2.0
http_request_duration_seconds_count{handler="/api/v1/auth/check"} 2.0
http_request_duration_seconds_sum{handler="/api/v1/auth/check"} 0.004731963999802247
http_request_duration_seconds_bucket{handler="/api/v1/leaderboards/{time_frame}",le="0.1"} 2.0
http_request_duration_seconds_bucket{handler="/api/v1/leaderboards/{time_frame}",le="0.5"} 2.0
http_request_duration_seconds_bucket{handler="/api/v1/leaderboards/{time_frame}",le="1.0"} 2.0
http_request_duration_seconds_bucket{handler="/api/v1/leaderboards/{time_frame}",le="+Inf"} 2.0
http_request_duration_seconds_count{handler="/api/v1/leaderboards/{time_frame}"} 2.0
http_request_duration_seconds_sum{handler="/api/v1/leaderboards/{time_frame}"} 0.055905182991409674
http_request_duration_seconds_bucket{handler="/api/v1/tasks/availability",le="0.1"} 1.0
http_request_duration_seconds_bucket{handler="/api/v1/tasks/availability",le="0.5"} 2.0
http_request_duration_seconds_bucket{handler="/api/v1/tasks/availability",le="1.0"} 2.0
http_request_duration_seconds_bucket{handler="/api/v1/tasks/availability",le="+Inf"} 2.0
http_request_duration_seconds_count{handler="/api/v1/tasks/availability"} 2.0
http_request_duration_seconds_sum{handler="/api/v1/tasks/availability"} 0.24354792400845326
http_request_duration_seconds_bucket{handler="/api/v1/frontend_users/",le="0.1"} 1.0
http_request_duration_seconds_bucket{handler="/api/v1/frontend_users/",le="0.5"} 1.0
http_request_duration_seconds_bucket{handler="/api/v1/frontend_users/",le="1.0"} 1.0
http_request_duration_seconds_bucket{handler="/api/v1/frontend_users/",le="+Inf"} 1.0
http_request_duration_seconds_count{handler="/api/v1/frontend_users/"} 1.0
http_request_duration_seconds_sum{handler="/api/v1/frontend_users/"} 0.0073766929999692366
http_request_duration_seconds_bucket{handler="/api/v1/users/{user_id}",le="0.1"} 1.0
http_request_duration_seconds_bucket{handler="/api/v1/users/{user_id}",le="0.5"} 1.0
http_request_duration_seconds_bucket{handler="/api/v1/users/{user_id}",le="1.0"} 1.0
http_request_duration_seconds_bucket{handler="/api/v1/users/{user_id}",le="+Inf"} 1.0
http_request_duration_seconds_count{handler="/api/v1/users/{user_id}"} 1.0
http_request_duration_seconds_sum{handler="/api/v1/users/{user_id}"} 0.007133692997740582
# HELP http_request_duration_seconds_created Latency with only few buckets by handler. Made to be only used if aggregation by handler is important. 
# TYPE http_request_duration_seconds_created gauge
http_request_duration_seconds_created{handler="none"} 1.6760310625766513e+09
http_request_duration_seconds_created{handler="/docs"} 1.6760310661987698e+09
http_request_duration_seconds_created{handler="/api/v1/openapi.json"} 1.6760310669858382e+09
http_request_duration_seconds_created{handler="/metrics"} 1.6760310709086726e+09
http_request_duration_seconds_created{handler="/api/v1/frontend_users/{auth_method}/{username}"} 1.6760310831133058e+09
http_request_duration_seconds_created{handler="/api/v1/auth/check"} 1.6760310834009697e+09
http_request_duration_seconds_created{handler="/api/v1/leaderboards/{time_frame}"} 1.676031083596859e+09
http_request_duration_seconds_created{handler="/api/v1/tasks/availability"} 1.67603108360212e+09
http_request_duration_seconds_created{handler="/api/v1/frontend_users/"} 1.6760311584514742e+09
http_request_duration_seconds_created{handler="/api/v1/users/{user_id}"} 1.6760311584592369e+09

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@andrewm4894
Copy link
Collaborator Author

pre-commit acting funny here - i think it deleted the file /workspaces/Open-Assistant/website/public/mockServiceWorker.js for some reason

@andrewm4894 andrewm4894 enabled auto-merge (squash) February 10, 2023 12:32
@andrewm4894
Copy link
Collaborator Author

andrewm4894 commented Feb 10, 2023

if we merge this i can update this PR to scrape those metrics so also would be available in netdata.

will also try make another PR to add Prometheus and Grafana.

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

Copy link
Collaborator

@andreaskoepf andreaskoepf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, looks great!

@andrewm4894 andrewm4894 merged commit b60eb1e into LAION-AI:main Feb 10, 2023
@andrewm4894 andrewm4894 deleted the add-metrics-endpoints branch February 10, 2023 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants