Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC - Native Support for Prometheus Metrics Endpoint #3377

Closed
tonysun83 opened this issue Feb 22, 2021 · 8 comments
Closed

RFC - Native Support for Prometheus Metrics Endpoint #3377

tonysun83 opened this issue Feb 22, 2021 · 8 comments

Comments

@tonysun83
Copy link
Contributor

tonysun83 commented Feb 22, 2021


name: Formal RFC
about: Submit a formal Request For Comments for consideration by the team.
title: Support for native Prometheus Endpoints
labels: rfc, discussion
assignees: @tonysun83

Introduction

This is a formal proposal to add a /_node/{node-name}/_prometheus endpoint that outputs https://prometheus.io/ metrics data. @garrensmith and @jiangphcn began this discussion in the mailing list and this proposal consolidates the list of options for this new endpoint.

Abstract

Currently, CouchDB's metrics and diagnostic information can be obtained via node specific_stats, _active_tasks, and _system endpoints. Prometheus has become the more prevalent and standard approach for exposing metrics. Adding support to expose CouchDB metrics in Prometheus format is something to be desired as demonstrated by CouchDB Prometheus Exporter.

One solution is to bundle the CouchDB Prometheus Exporter as part of CouchDB. This requires bundling GO as part of the build and also does not include _system info.

The proposed solution is to add a native module or app that receives the /_node/{node-name}/_prometheus call and consolidates the _stats, _active_tasks, and _system calls into one response. The return Content-Type will be text/plain and the format that Prometheus expects.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in
RFC 2119.

Terminology

TBD - To Be Determined


Detailed Description

  1. A new module, or perhaps a new app will be written to consolidate the results of our _stats, active_task, and _system internal function calls. The exact form is TBD. The JSON format should be straightforward since it is just aggregation of existing calls. However the Prometheus format requires more detailed implementation for conversion. See the Http API for return format proposals.

  2. The endpoint will be node specific and not for the entire cluster. It's up to an external monitoring tool to aggregate and present the entire cluster's data. This the current design choice and is open for discussion. The exact endpoint is:
    /_node/{node-name}/_prometheus. The endpoint will simply return text/plain as a response with the metrics listed. The endpoint will have an optional Accept Header that determines whether JSON or Prometheus output is returned.

Advantages and Disadvantages

Advantages

  • Native functionality without relying on external converters.
  • No need to bundle GO as part of our release (we would need to bundle GO if we simply included CouchDB Prometheus Exporter)
  • Ability to see all metrics info via one endpoint

Disadvantages

  • Prometheus scraping would require issuing a _prometheus endpoint call for every node.
  • Re-implementation of functionality (_stats, _active_tasks) already available with CouchDB Prometheus Exporter
  • A standard that is obsolete in the future.

Key Changes

A new node specific endpoint will be added to the API.

Applications and Modules affected

  1. chttpd

    • chttpd_node
    • chttpd_prometheus (new module)
  2. (new app perhaps)

HTTP API additions

GET /_node/{node-name}/_prometheus HTTP/1.1

Returns consolidated metrics info (_stats, _active_tasks, _system) via JSON or Prometheus standard.

Request Headers: Headers:
If no header is provided, the default is JSON. Content-Type application/json will return JSON, while prometheus will return prometheus formatting.

Response Headers:

Content-Type:
~~ - application/json~~

  • text/plain; charset=utf-8

Valid Status Codes

200 OK - Request completed successfully

Sample Prometheus Output:

# TYPE couchdb_uptime_seconds counter
couchdb_uptime_seconds 1
# TYPE couchdb_erlang_memory_bytes gauge
couchdb_erlang_memory_bytes{memory_type="total"} 71237784
couchdb_erlang_memory_bytes{memory_type="processes"} 12248504
couchdb_erlang_memory_bytes{memory_type="processes_used"} 12235928
couchdb_erlang_memory_bytes{memory_type="system"} 58989280
couchdb_erlang_memory_bytes{memory_type="atom"} 1172689
couchdb_erlang_memory_bytes{memory_type="atom_used"} 1156575
couchdb_erlang_memory_bytes{memory_type="binary"} 182568
couchdb_erlang_memory_bytes{memory_type="code"} 27819083
couchdb_erlang_memory_bytes{memory_type="ets"} 3143536
# TYPE couchdb_erlang_gc_collections_total counter
couchdb_erlang_gc_collections_total 13417
# TYPE couchdb_erlang_gc_words_reclaimed_total counter
couchdb_erlang_gc_words_reclaimed_total 71296018
# TYPE couchdb_erlang_context_switches_total counter
couchdb_erlang_context_switches_total 358276
# TYPE couchdb_erlang_reductions_total counter
couchdb_erlang_reductions_total 46527253
# TYPE couchdb_erlang_processes gauge
couchdb_erlang_processes 528
# TYPE couchdb_erlang_process_limit gauge
couchdb_erlang_process_limit 262144
# TYPE couchdb_erlang_io_recv_bytes_total counter
couchdb_erlang_io_recv_bytes_total 23291839
# TYPE couchdb_erlang_io_sent_bytes_total counter
couchdb_erlang_io_sent_bytes_total 8915261
# TYPE couchdb_erlang_message_queues gauge
couchdb_erlang_message_queues 0
# TYPE couchdb_erlang_message_queue_min gauge
couchdb_active_task{type="replication", source="mailbox", target="http://mailsrv:5984/mailbox <http://mailsrv:5984/mailbox>", docs_count = "docs_read"} 4524
couchdb_active_task{type="replication", source="mailbox", target="http://mailsrv:5984/mailbox <http://mailsrv:5984/mailbox>", docs_count = "docs_written"} 4524
couchdb_active_task{type="replication", source="mailbox", target="http://mailsrv:5984/mailbox <http://mailsrv:5984/mailbox>", docs_count = "missing_revisions_found"} 4524

HTTP API deprecations

None

Security Considerations

N/A

References

http://couchdb-development.1959287.n2.nabble.com/DISCUSS-Prometheus-endpoint-in-CouchDB-4-x-td7607648.html

Acknowledgements

Summary and implementation ideas mostly from the mailing list discussion responses.

@jiangphcn @garrensmith @janl @wohali @davisp @willholley

@rnewson
Copy link
Member

rnewson commented Feb 22, 2021

It seems the discussion is still open, are we to continue it here in the Issue or on the mailing list? I was about to raise several points but they've all come up in the discussion (without resolution).

Consolidating my thoughts;

  1. We're adding support for Prometheus but also inventing a new format for it that nothing understands.
  2. Negotiation by content-type when Prometheus doesn't have a registered one is a mistake (The accept= query parameter idea is better).
  3. As not every item in active_tasks is a metric, the new endpoint is not a replacement for it, so we'll have to retain both.
  4. The output, in any format, remains Prometheus specific anyway (though, again, noting that tools that understand prometheus format could not consume our novel JSON format).

It's an odd proposal that appears to satisfy no particular audience, I don't feel it's appropriate to add this to CouchDB in this form.

The goal seemed clear, "integrate with Prometheus", and the solution is equally clear, a new endpoint ("/_prometheus") that emits the documented Prometheus format with the 'text/plain; version="0.0.4' content type. The CouchDB project can accept or reject such a thing. Should Prometheus become old hat, we can remove the endpoint, losing no other functionality.

The code to interrogate CouchDB internals and produce the Prometheus output is useful regardless and could live on as a plugin or separate library (system administrators could add in the 'couchdb-prometheus-plugin' library as part of a custom 'make release') if the project were not to accept the work.

imo it is not a blocker that Prometheus does not speak JSON and that CouchDB does. This is an integration endpoint and, by definition, the two systems being integrated are not identical.

I would vote to accept a clean Prometheus endpoint.

@janl
Copy link
Member

janl commented Feb 22, 2021

+1 to all of @rnewson’s points

@tonysun83
Copy link
Contributor Author

@rnewson @janl points taken. Let's continue the discussion here.

From these comments

  • We're adding support for Prometheus but also inventing a new format for it that nothing understands.
  • Negotiation by content-type when Prometheus doesn't have a registered one is a mistake (The accept= query parameter idea is better).
  • The output, in any format, remains Prometheus specific anyway (though, again, noting that tools that understand prometheus format could not consume our novel JSON format).

It seems that both of you are not a big fan of having JSON as a return type and would rather only return 'text/plain; version="0.0.4' content type. Am I understanding that correctly?

@rnewson
Copy link
Member

rnewson commented Feb 22, 2021

yes, that's right. and make the endpoint specifically about prometheus. the ad hoc attempts to make it generic aren't convincing.

@tonysun83
Copy link
Contributor Author

@janl @rnewson @chewbranca updated the RFC to reflect Bob's comments

@rnewson
Copy link
Member

rnewson commented Mar 2, 2021

looks good to me, direct and to the point.

@nickva
Copy link
Contributor

nickva commented Nov 4, 2021

Prometheus support shipped in 3.2.0

https://docs.couchdb.org/en/3.2.0/whatsnew/3.2.html#features-and-enhancements

@nickva nickva closed this as completed Nov 4, 2021
@kocolosk
Copy link
Member

kocolosk commented Nov 4, 2021

Yeah, I was just wondering if there's supposed to be some place where we store these RFCs in perpetuity. We started using the couchdb-documentation repo for this purpose but that didn't really feel natural either.

Agree this issue can be closed, but I think there's an opportunity for improvement in the RFC process here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants