Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose the number of errors occurred on server from the Prometheus endpoint. #57209

Merged
merged 2 commits into from Nov 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -1837,9 +1837,10 @@ Settings:

- `endpoint` – HTTP endpoint for scraping metrics by prometheus server. Start from ‘/’.
- `port` – Port for `endpoint`.
- `metrics` – Flag that sets to expose metrics from the [system.metrics](../../operations/system-tables/metrics.md#system_tables-metrics) table.
- `events` – Flag that sets to expose metrics from the [system.events](../../operations/system-tables/events.md#system_tables-events) table.
- `asynchronous_metrics` – Flag that sets to expose current metrics values from the [system.asynchronous_metrics](../../operations/system-tables/asynchronous_metrics.md#system_tables-asynchronous_metrics) table.
- `metrics` – Expose metrics from the [system.metrics](../../operations/system-tables/metrics.md#system_tables-metrics) table.
- `events` – Expose metrics from the [system.events](../../operations/system-tables/events.md#system_tables-events) table.
- `asynchronous_metrics` – Expose current metrics values from the [system.asynchronous_metrics](../../operations/system-tables/asynchronous_metrics.md#system_tables-asynchronous_metrics) table.
- `errors` - Expose the number of errors by error codes occurred since the last server restart. This information could be obtained from the [system.errors](../../operations/system-tables/asynchronous_metrics.md#system_tables-errors) as well.

**Example**

Expand All @@ -1855,6 +1856,7 @@ Settings:
<metrics>true</metrics>
<events>true</events>
<asynchronous_metrics>true</asynchronous_metrics>
<errors>true</errors>
</prometheus>
<!-- highlight-end -->
</clickhouse>
Expand Down
Expand Up @@ -1216,6 +1216,7 @@ ClickHouse использует потоки из глобального пул
- `metrics` – флаг для экспорта текущих значений метрик из таблицы [system.metrics](../system-tables/metrics.md#system_tables-metrics).
- `events` – флаг для экспорта текущих значений метрик из таблицы [system.events](../system-tables/events.md#system_tables-events).
- `asynchronous_metrics` – флаг для экспорта текущих значений значения метрик из таблицы [system.asynchronous_metrics](../system-tables/asynchronous_metrics.md#system_tables-asynchronous_metrics).
- `errors` - флаг для экспорта количества ошибок (по кодам) случившихся с момента последнего рестарта сервера. Эта информация может быть получена из таблицы [system.errors](../system-tables/asynchronous_metrics.md#system_tables-errors)

**Пример**

Expand All @@ -1226,6 +1227,7 @@ ClickHouse использует потоки из глобального пул
<metrics>true</metrics>
<events>true</events>
<asynchronous_metrics>true</asynchronous_metrics>
<errors>true</errors>
</prometheus>
```

Expand Down
23 changes: 22 additions & 1 deletion src/Server/PrometheusMetricsWriter.cpp
Expand Up @@ -50,6 +50,7 @@ PrometheusMetricsWriter::PrometheusMetricsWriter(
, send_events(config.getBool(config_name + ".events", true))
, send_metrics(config.getBool(config_name + ".metrics", true))
, send_asynchronous_metrics(config.getBool(config_name + ".asynchronous_metrics", true))
, send_errors(config.getBool(config_name + ".errors", true))
{
}

Expand Down Expand Up @@ -112,12 +113,32 @@ void PrometheusMetricsWriter::write(WriteBuffer & wb) const
std::string metric_doc{value.documentation};
convertHelpToSingleLine(metric_doc);

// TODO: add HELP section? asynchronous_metrics contains only key and value
writeOutLine(wb, "# HELP", key, metric_doc);
writeOutLine(wb, "# TYPE", key, "gauge");
writeOutLine(wb, key, value.value);
}
}

if (send_errors)
{
for (size_t i = 0, end = ErrorCodes::end(); i < end; ++i)
{
const auto & error = ErrorCodes::values[i].get();
std::string_view name = ErrorCodes::getName(static_cast<ErrorCodes::ErrorCode>(i));

if (name.empty())
continue;

std::string key{error_metrics_prefix + toString(name)};
std::string help = fmt::format("The number of {} errors since last server restart", name);

writeOutLine(wb, "# HELP", key, help);
writeOutLine(wb, "# TYPE", key, "counter");
/// We are interested in errors which are happened only on this server.
writeOutLine(wb, key, error.local.count);
}
}

}

}
2 changes: 2 additions & 0 deletions src/Server/PrometheusMetricsWriter.h
Expand Up @@ -27,10 +27,12 @@ class PrometheusMetricsWriter
const bool send_events;
const bool send_metrics;
const bool send_asynchronous_metrics;
const bool send_errors;

static inline constexpr auto profile_events_prefix = "ClickHouseProfileEvents_";
static inline constexpr auto current_metrics_prefix = "ClickHouseMetrics_";
static inline constexpr auto asynchronous_metrics_prefix = "ClickHouseAsyncMetrics_";
static inline constexpr auto error_metrics_prefix = "ClickHouseErrorMetric_";
};

}