diff --git a/README.md b/README.md index 769f87d606a6..525c6f2f6c25 100644 --- a/README.md +++ b/README.md @@ -110,6 +110,7 @@ Case studies: * [Brandwatch](https://docs.victoriametrics.com/CaseStudies.html#brandwatch) * [CERN](https://docs.victoriametrics.com/CaseStudies.html#cern) * [COLOPL](https://docs.victoriametrics.com/CaseStudies.html#colopl) +* [Criteo](https://docs.victoriametrics.com/CaseStudies.html#criteo) * [Dig Security](https://docs.victoriametrics.com/CaseStudies.html#dig-security) * [Fly.io](https://docs.victoriametrics.com/CaseStudies.html#flyio) * [German Research Center for Artificial Intelligence](https://docs.victoriametrics.com/CaseStudies.html#german-research-center-for-artificial-intelligence) @@ -364,6 +365,8 @@ See the [example VMUI at VictoriaMetrics playground](https://play.victoriametric * queries with the biggest average execution duration; * queries that took the most summary time for execution. +This information is obtained from the `/api/v1/status/top_queries` HTTP endpoint. + ## Active queries [VMUI](#vmui) provides `active queries` tab, which shows currently execute queries. @@ -373,6 +376,8 @@ It provides the following information per each query: - The duration of the query execution. - The client address, who initiated the query execution. +This information is obtained from the `/api/v1/status/active_queries` HTTP endpoint. + ## Metrics explorer [VMUI](#vmui) provides an ability to explore metrics exported by a particular `job` / `instance` in the following way: @@ -404,14 +409,16 @@ matching the specified [series selector](https://prometheus.io/docs/prometheus/l Cardinality explorer is built on top of [/api/v1/status/tsdb](#tsdb-stats). +See [cardinality explorer playground](https://play.victoriametrics.com/select/accounting/1/6a716b0f-38bc-4856-90ce-448fd713e3fe/prometheus/graph/#/cardinality). +See the example of using the cardinality explorer [here](https://victoriametrics.com/blog/cardinality-explorer/). + +## Cardinality explorer statistic inaccuracy + In [cluster version of VictoriaMetrics](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) each vmstorage tracks the stored time series individually. vmselect requests stats via [/api/v1/status/tsdb](#tsdb-stats) API from each vmstorage node and merges the results by summing per-series stats. This may lead to inflated values when samples for the same time series are spread across multiple vmstorage nodes due to [replication](#replication) or [rerouting](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html?highlight=re-routes#cluster-availability). -See [cardinality explorer playground](https://play.victoriametrics.com/select/accounting/1/6a716b0f-38bc-4856-90ce-448fd713e3fe/prometheus/graph/#/cardinality). -See the example of using the cardinality explorer [here](https://victoriametrics.com/blog/cardinality-explorer/). - ## How to apply new config to VictoriaMetrics VictoriaMetrics is configured via command-line flags, so it must be restarted when new command-line flags should be applied: @@ -616,6 +623,28 @@ Some plugins for Telegraf such as [fluentd](https://github.com/fangli/fluent-plu or [Juniper/jitmon](https://github.com/Juniper/jtimon) send `SHOW DATABASES` query to `/query` and expect a particular database name in the response. Comma-separated list of expected databases can be passed to VictoriaMetrics via `-influx.databaseNames` command-line flag. +### How to send data in InfluxDB v2 format + +VictoriaMetrics exposes endpoint for InfluxDB v2 HTTP API at `/influx/api/v2/write` and `/api/v2/write`. + + +In order to write data with InfluxDB line protocol to local VictoriaMetrics using `curl`: + +
+ +```console +curl -d 'measurement,tag1=value1,tag2=value2 field1=123,field2=1.23' -X POST 'http://localhost:8428/api/v2/write' +``` + +
+ +The `/api/v1/export` endpoint should return the following response: + +```json +{"metric":{"__name__":"measurement_field1","tag1":"value1","tag2":"value2"},"values":[123],"timestamps":[1695902762311]} +{"metric":{"__name__":"measurement_field2","tag1":"value1","tag2":"value2"},"values":[1.23],"timestamps":[1695902762311]} +``` + ## How to send data from Graphite-compatible agents such as [StatsD](https://github.com/etsy/statsd) Enable Graphite receiver in VictoriaMetrics by setting `-graphiteListenAddr` command line flag. For instance, @@ -830,7 +859,7 @@ Additionally, VictoriaMetrics provides the following handlers: * `/api/v1/series/count` - returns the total number of time series in the database. Some notes: * the handler scans all the inverted index, so it can be slow if the database contains tens of millions of time series; * the handler may count [deleted time series](#how-to-delete-time-series) additionally to normal time series due to internal implementation restrictions; -* `/api/v1/status/active_queries` - returns a list of currently running queries. +* `/api/v1/status/active_queries` - returns the list of currently running queries. This list is also available at [`active queries` page at VMUI](#active-queries). * `/api/v1/status/top_queries` - returns the following query lists: * the most frequently executed queries - `topByCount` * queries with the biggest average execution duration - `topByAvgDuration` @@ -840,6 +869,8 @@ Additionally, VictoriaMetrics provides the following handlers: For example, request to `/api/v1/status/top_queries?topN=5&maxLifetime=30s` would return up to 5 queries per list, which were executed during the last 30 seconds. VictoriaMetrics tracks the last `-search.queryStats.lastQueriesCount` queries with durations at least `-search.queryStats.minQueryDuration`. + See also [`top queries` page at VMUI](#top-queries). + ### Timestamp formats VictoriaMetrics accepts the following formats for `time`, `start` and `end` query args @@ -1790,9 +1821,9 @@ Graphs on the dashboards contain useful hints - hover the `i` icon in the top le We recommend setting up [alerts](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#alerts) via [vmalert](https://docs.victoriametrics.com/vmalert.html) or via Prometheus. -VictoriaMetrics exposes currently running queries and their execution times at `/api/v1/status/active_queries` page. +VictoriaMetrics exposes currently running queries and their execution times at [`active queries` page](#active-queries). -VictoriaMetrics exposes queries, which take the most time to execute, at `/api/v1/status/top_queries` page. +VictoriaMetrics exposes queries, which take the most time to execute, at [`top queries` page](#top-queries). See also [VictoriaMetrics Monitoring](https://victoriametrics.com/blog/victoriametrics-monitoring/) and [troubleshooting docs](https://docs.victoriametrics.com/Troubleshooting.html). @@ -1937,9 +1968,6 @@ and [cardinality explorer docs](#cardinality-explorer). has at least 20% of free space. The remaining amount of free space can be [monitored](#monitoring) via `vm_free_disk_space_bytes` metric. The total size of data stored on the disk can be monitored via sum of `vm_data_size_bytes` metrics. - See also `vm_merge_need_free_disk_space` metrics, which are set to values higher than 0 - if background merge cannot be initiated due to free disk space shortage. The value shows the number of per-month partitions, - which would start background merge if they had more free disk space. * VictoriaMetrics buffers incoming data in memory for up to a few seconds before flushing it to persistent storage. This may lead to the following "issues": diff --git a/app/vlinsert/elasticsearch/elasticsearch.go b/app/vlinsert/elasticsearch/elasticsearch.go index a3ae68f19ea9..d511d3729579 100644 --- a/app/vlinsert/elasticsearch/elasticsearch.go +++ b/app/vlinsert/elasticsearch/elasticsearch.go @@ -12,6 +12,8 @@ import ( "strings" "time" + "github.com/VictoriaMetrics/metrics" + "github.com/VictoriaMetrics/VictoriaMetrics/app/vlinsert/insertutils" "github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage" "github.com/VictoriaMetrics/VictoriaMetrics/lib/bufferedwriter" @@ -22,7 +24,6 @@ import ( "github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage" "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common" "github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter" - "github.com/VictoriaMetrics/metrics" ) var ( @@ -101,8 +102,11 @@ func RequestHandler(path string, w http.ResponseWriter, r *http.Request) bool { logger.Warnf("cannot decode log message #%d in /_bulk request: %s", n, err) return true } - vlstorage.MustAddRows(lr) + err = vlstorage.AddRows(lr) logstorage.PutLogRows(lr) + if err != nil { + httpserver.Errorf(w, r, "cannot insert rows: %s", err) + } tookMs := time.Since(startTime).Milliseconds() bw := bufferedwriter.Get(w) @@ -128,7 +132,7 @@ var ( ) func readBulkRequest(r io.Reader, isGzip bool, timeField, msgField string, - processLogMessage func(timestamp int64, fields []logstorage.Field), + processLogMessage func(timestamp int64, fields []logstorage.Field) error, ) (int, error) { // See https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html @@ -171,7 +175,7 @@ func readBulkRequest(r io.Reader, isGzip bool, timeField, msgField string, var lineBufferPool bytesutil.ByteBufferPool func readBulkLine(sc *bufio.Scanner, timeField, msgField string, - processLogMessage func(timestamp int64, fields []logstorage.Field), + processLogMessage func(timestamp int64, fields []logstorage.Field) error, ) (bool, error) { var line []byte @@ -218,8 +222,12 @@ func readBulkLine(sc *bufio.Scanner, timeField, msgField string, ts = time.Now().UnixNano() } p.RenameField(msgField, "_msg") - processLogMessage(ts, p.Fields) + err = processLogMessage(ts, p.Fields) logjson.PutParser(p) + if err != nil { + return false, err + } + return true, nil } diff --git a/app/vlinsert/elasticsearch/elasticsearch_test.go b/app/vlinsert/elasticsearch/elasticsearch_test.go index 09d1bf770cf3..3935e0dee6c8 100644 --- a/app/vlinsert/elasticsearch/elasticsearch_test.go +++ b/app/vlinsert/elasticsearch/elasticsearch_test.go @@ -15,8 +15,9 @@ func TestReadBulkRequestFailure(t *testing.T) { f := func(data string) { t.Helper() - processLogMessage := func(timestamp int64, fields []logstorage.Field) { + processLogMessage := func(timestamp int64, fields []logstorage.Field) error { t.Fatalf("unexpected call to processLogMessage with timestamp=%d, fields=%s", timestamp, fields) + return nil } r := bytes.NewBufferString(data) @@ -43,7 +44,7 @@ func TestReadBulkRequestSuccess(t *testing.T) { var timestamps []int64 var result string - processLogMessage := func(timestamp int64, fields []logstorage.Field) { + processLogMessage := func(timestamp int64, fields []logstorage.Field) error { timestamps = append(timestamps, timestamp) a := make([]string, len(fields)) @@ -52,6 +53,7 @@ func TestReadBulkRequestSuccess(t *testing.T) { } s := "{" + strings.Join(a, ",") + "}\n" result += s + return nil } // Read the request without compression diff --git a/app/vlinsert/elasticsearch/elasticsearch_timing_test.go b/app/vlinsert/elasticsearch/elasticsearch_timing_test.go index 9a50fe0ebef1..5d8cca1b29bd 100644 --- a/app/vlinsert/elasticsearch/elasticsearch_timing_test.go +++ b/app/vlinsert/elasticsearch/elasticsearch_timing_test.go @@ -33,7 +33,7 @@ func benchmarkReadBulkRequest(b *testing.B, isGzip bool) { timeField := "@timestamp" msgField := "message" - processLogMessage := func(timestmap int64, fields []logstorage.Field) {} + processLogMessage := func(timestmap int64, fields []logstorage.Field) error { return nil } b.ReportAllocs() b.SetBytes(int64(len(data))) diff --git a/app/vlinsert/insertutils/common_params.go b/app/vlinsert/insertutils/common_params.go index 23f100775dd1..1852f223377a 100644 --- a/app/vlinsert/insertutils/common_params.go +++ b/app/vlinsert/insertutils/common_params.go @@ -72,13 +72,13 @@ func GetCommonParams(r *http.Request) (*CommonParams, error) { } // GetProcessLogMessageFunc returns a function, which adds parsed log messages to lr. -func (cp *CommonParams) GetProcessLogMessageFunc(lr *logstorage.LogRows) func(timestamp int64, fields []logstorage.Field) { - return func(timestamp int64, fields []logstorage.Field) { +func (cp *CommonParams) GetProcessLogMessageFunc(lr *logstorage.LogRows) func(timestamp int64, fields []logstorage.Field) error { + return func(timestamp int64, fields []logstorage.Field) error { if len(fields) > *MaxFieldsPerLine { rf := logstorage.RowFormatter(fields) logger.Warnf("dropping log line with %d fields; it exceeds -insert.maxFieldsPerLine=%d; %s", len(fields), *MaxFieldsPerLine, rf) rowsDroppedTotalTooManyFields.Inc() - return + return nil } lr.MustAdd(cp.TenantID, timestamp, fields) @@ -87,12 +87,14 @@ func (cp *CommonParams) GetProcessLogMessageFunc(lr *logstorage.LogRows) func(ti lr.ResetKeepSettings() logger.Infof("remoteAddr=%s; requestURI=%s; ignoring log entry because of `debug` query arg: %s", cp.DebugRemoteAddr, cp.DebugRequestURI, s) rowsDroppedTotalDebug.Inc() - return + return nil } if lr.NeedFlush() { - vlstorage.MustAddRows(lr) + err := vlstorage.AddRows(lr) lr.ResetKeepSettings() + return err } + return nil } } diff --git a/app/vlinsert/jsonline/jsonline.go b/app/vlinsert/jsonline/jsonline.go index bf8d4760ecf0..863cf20479bf 100644 --- a/app/vlinsert/jsonline/jsonline.go +++ b/app/vlinsert/jsonline/jsonline.go @@ -75,8 +75,12 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool { rowsIngestedTotal.Inc() } - vlstorage.MustAddRows(lr) + err = vlstorage.AddRows(lr) logstorage.PutLogRows(lr) + if err != nil { + httpserver.Errorf(w, r, "cannot insert rows: %s", err) + return true + } // update jsonlineRequestDuration only for successfully parsed requests. // There is no need in updating jsonlineRequestDuration for request errors, @@ -86,7 +90,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool { return true } -func readLine(sc *bufio.Scanner, timeField, msgField string, processLogMessage func(timestamp int64, fields []logstorage.Field)) (bool, error) { +func readLine(sc *bufio.Scanner, timeField, msgField string, processLogMessage func(timestamp int64, fields []logstorage.Field) error) (bool, error) { var line []byte for len(line) == 0 { if !sc.Scan() { @@ -113,8 +117,12 @@ func readLine(sc *bufio.Scanner, timeField, msgField string, processLogMessage f ts = time.Now().UnixNano() } p.RenameField(msgField, "_msg") - processLogMessage(ts, p.Fields) + err = processLogMessage(ts, p.Fields) logjson.PutParser(p) + if err != nil { + return false, err + } + return true, nil } diff --git a/app/vlinsert/jsonline/jsonline_test.go b/app/vlinsert/jsonline/jsonline_test.go index 86a917491eb3..f6da725c32a4 100644 --- a/app/vlinsert/jsonline/jsonline_test.go +++ b/app/vlinsert/jsonline/jsonline_test.go @@ -16,7 +16,7 @@ func TestReadBulkRequestSuccess(t *testing.T) { var timestamps []int64 var result string - processLogMessage := func(timestamp int64, fields []logstorage.Field) { + processLogMessage := func(timestamp int64, fields []logstorage.Field) error { timestamps = append(timestamps, timestamp) a := make([]string, len(fields)) @@ -25,6 +25,8 @@ func TestReadBulkRequestSuccess(t *testing.T) { } s := "{" + strings.Join(a, ",") + "}\n" result += s + + return nil } // Read the request without compression diff --git a/app/vlinsert/loki/loki_json.go b/app/vlinsert/loki/loki_json.go index 88a75df1d086..653416faf795 100644 --- a/app/vlinsert/loki/loki_json.go +++ b/app/vlinsert/loki/loki_json.go @@ -50,12 +50,18 @@ func handleJSON(r *http.Request, w http.ResponseWriter) bool { lr := logstorage.GetLogRows(cp.StreamFields, cp.IgnoreFields) processLogMessage := cp.GetProcessLogMessageFunc(lr) n, err := parseJSONRequest(data, processLogMessage) - vlstorage.MustAddRows(lr) - logstorage.PutLogRows(lr) if err != nil { + logstorage.PutLogRows(lr) httpserver.Errorf(w, r, "cannot parse Loki request: %s", err) return true } + + err = vlstorage.AddRows(lr) + logstorage.PutLogRows(lr) + if err != nil { + httpserver.Errorf(w, r, "cannot insert rows: %s", err) + return true + } rowsIngestedJSONTotal.Add(n) // update lokiRequestJSONDuration only for successfully parsed requests @@ -72,7 +78,7 @@ var ( lokiRequestJSONDuration = metrics.NewHistogram(`vl_http_request_duration_seconds{path="/insert/loki/api/v1/push",format="json"}`) ) -func parseJSONRequest(data []byte, processLogMessage func(timestamp int64, fields []logstorage.Field)) (int, error) { +func parseJSONRequest(data []byte, processLogMessage func(timestamp int64, fields []logstorage.Field)error) (int, error) { p := parserPool.Get() defer parserPool.Put(p) v, err := p.ParseBytes(data) @@ -165,7 +171,10 @@ func parseJSONRequest(data []byte, processLogMessage func(timestamp int64, field Name: "_msg", Value: bytesutil.ToUnsafeString(msg), }) - processLogMessage(ts, fields) + err = processLogMessage(ts, fields) + if err != nil { + return rowsIngested, err + } } rowsIngested += len(lines) diff --git a/app/vlinsert/loki/loki_json_test.go b/app/vlinsert/loki/loki_json_test.go index 93cf8652ad45..f285dd1f7cf1 100644 --- a/app/vlinsert/loki/loki_json_test.go +++ b/app/vlinsert/loki/loki_json_test.go @@ -11,8 +11,9 @@ import ( func TestParseJSONRequestFailure(t *testing.T) { f := func(s string) { t.Helper() - n, err := parseJSONRequest([]byte(s), func(timestamp int64, fields []logstorage.Field) { + n, err := parseJSONRequest([]byte(s), func(timestamp int64, fields []logstorage.Field) error { t.Fatalf("unexpected call to parseJSONRequest callback!") + return nil }) if err == nil { t.Fatalf("expecting non-nil error") @@ -60,13 +61,14 @@ func TestParseJSONRequestSuccess(t *testing.T) { f := func(s string, resultExpected string) { t.Helper() var lines []string - n, err := parseJSONRequest([]byte(s), func(timestamp int64, fields []logstorage.Field) { + n, err := parseJSONRequest([]byte(s), func(timestamp int64, fields []logstorage.Field) error { var a []string for _, f := range fields { a = append(a, f.String()) } line := fmt.Sprintf("_time:%d %s", timestamp, strings.Join(a, " ")) lines = append(lines, line) + return nil }) if err != nil { t.Fatalf("unexpected error: %s", err) diff --git a/app/vlinsert/loki/loki_json_timing_test.go b/app/vlinsert/loki/loki_json_timing_test.go index 9c51f593a1a2..37d922fc022f 100644 --- a/app/vlinsert/loki/loki_json_timing_test.go +++ b/app/vlinsert/loki/loki_json_timing_test.go @@ -27,7 +27,7 @@ func benchmarkParseJSONRequest(b *testing.B, streams, rows, labels int) { b.RunParallel(func(pb *testing.PB) { data := getJSONBody(streams, rows, labels) for pb.Next() { - _, err := parseJSONRequest(data, func(timestamp int64, fields []logstorage.Field) {}) + _, err := parseJSONRequest(data, func(timestamp int64, fields []logstorage.Field) error { return nil }) if err != nil { panic(fmt.Errorf("unexpected error: %s", err)) } diff --git a/app/vlinsert/loki/loki_protobuf.go b/app/vlinsert/loki/loki_protobuf.go index aa4e6b592f25..0e7aceac7a88 100644 --- a/app/vlinsert/loki/loki_protobuf.go +++ b/app/vlinsert/loki/loki_protobuf.go @@ -42,10 +42,16 @@ func handleProtobuf(r *http.Request, w http.ResponseWriter) bool { lr := logstorage.GetLogRows(cp.StreamFields, cp.IgnoreFields) processLogMessage := cp.GetProcessLogMessageFunc(lr) n, err := parseProtobufRequest(data, processLogMessage) - vlstorage.MustAddRows(lr) + if err != nil { + logstorage.PutLogRows(lr) + httpserver.Errorf(w, r, "cannot parse Loki request: %s", err) + return true + } + + err = vlstorage.AddRows(lr) logstorage.PutLogRows(lr) if err != nil { - httpserver.Errorf(w, r, "cannot parse loki request: %s", err) + httpserver.Errorf(w, r, "cannot insert rows: %s", err) return true } @@ -65,7 +71,7 @@ var ( lokiRequestProtobufDuration = metrics.NewHistogram(`vl_http_request_duration_seconds{path="/insert/loki/api/v1/push",format="protobuf"}`) ) -func parseProtobufRequest(data []byte, processLogMessage func(timestamp int64, fields []logstorage.Field)) (int, error) { + func parseProtobufRequest(data []byte, processLogMessage func(timestamp int64, fields []logstorage.Field) error) (int, error) { bb := bytesBufPool.Get() defer bytesBufPool.Put(bb) @@ -108,7 +114,10 @@ func parseProtobufRequest(data []byte, processLogMessage func(timestamp int64, f if ts == 0 { ts = currentTimestamp } - processLogMessage(ts, fields) + err = processLogMessage(ts, fields) + if err != nil { + return rowsIngested, err + } } rowsIngested += len(stream.Entries) } diff --git a/app/vlinsert/loki/loki_protobuf_test.go b/app/vlinsert/loki/loki_protobuf_test.go index f6eb5f0ec210..cc259bce5862 100644 --- a/app/vlinsert/loki/loki_protobuf_test.go +++ b/app/vlinsert/loki/loki_protobuf_test.go @@ -14,7 +14,7 @@ func TestParseProtobufRequestSuccess(t *testing.T) { f := func(s string, resultExpected string) { t.Helper() var pr PushRequest - n, err := parseJSONRequest([]byte(s), func(timestamp int64, fields []logstorage.Field) { + n, err := parseJSONRequest([]byte(s), func(timestamp int64, fields []logstorage.Field) error { msg := "" for _, f := range fields { if f.Name == "_msg" { @@ -39,6 +39,7 @@ func TestParseProtobufRequestSuccess(t *testing.T) { }, }, }) + return nil }) if err != nil { t.Fatalf("unexpected error: %s", err) @@ -54,13 +55,14 @@ func TestParseProtobufRequestSuccess(t *testing.T) { encodedData := snappy.Encode(nil, data) var lines []string - n, err = parseProtobufRequest(encodedData, func(timestamp int64, fields []logstorage.Field) { + n, err = parseProtobufRequest(encodedData, func(timestamp int64, fields []logstorage.Field) error { var a []string for _, f := range fields { a = append(a, f.String()) } line := fmt.Sprintf("_time:%d %s", timestamp, strings.Join(a, " ")) lines = append(lines, line) + return nil }) if err != nil { t.Fatalf("unexpected error: %s", err) diff --git a/app/vlinsert/loki/loki_protobuf_timing_test.go b/app/vlinsert/loki/loki_protobuf_timing_test.go index 18f5b89ef68b..230ab7a47274 100644 --- a/app/vlinsert/loki/loki_protobuf_timing_test.go +++ b/app/vlinsert/loki/loki_protobuf_timing_test.go @@ -6,8 +6,9 @@ import ( "testing" "time" - "github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage" "github.com/golang/snappy" + + "github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage" ) func BenchmarkParseProtobufRequest(b *testing.B) { @@ -28,7 +29,7 @@ func benchmarkParseProtobufRequest(b *testing.B, streams, rows, labels int) { b.RunParallel(func(pb *testing.PB) { body := getProtobufBody(streams, rows, labels) for pb.Next() { - _, err := parseProtobufRequest(body, func(timestamp int64, fields []logstorage.Field) {}) + _, err := parseProtobufRequest(body, func(timestamp int64, fields []logstorage.Field) error { return nil }) if err != nil { panic(fmt.Errorf("unexpected error: %s", err)) } diff --git a/app/vlstorage/main.go b/app/vlstorage/main.go index 0a6c9b55a6b2..4533e7b25bfe 100644 --- a/app/vlstorage/main.go +++ b/app/vlstorage/main.go @@ -6,11 +6,12 @@ import ( "sync" "time" + "github.com/VictoriaMetrics/metrics" + "github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil" "github.com/VictoriaMetrics/VictoriaMetrics/lib/fs" "github.com/VictoriaMetrics/VictoriaMetrics/lib/logger" "github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage" - "github.com/VictoriaMetrics/metrics" ) var ( @@ -29,6 +30,7 @@ var ( "see https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields ; see also -logIngestedRows") logIngestedRows = flag.Bool("logIngestedRows", false, "Whether to log all the ingested log entries; this can be useful for debugging of data ingestion; "+ "see https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/ ; see also -logNewStreams") + minFreeDiskSpaceBytes = flagutil.NewBytes("storage.minFreeDiskSpaceBytes", 10e6, "The minimum free disk space at -storageDataPath after which the storage stops accepting new data") ) // Init initializes vlstorage. @@ -43,11 +45,12 @@ func Init() { logger.Fatalf("-retentionPeriod cannot be smaller than a day; got %s", retentionPeriod) } cfg := &logstorage.StorageConfig{ - Retention: retentionPeriod.Duration(), - FlushInterval: *inmemoryDataFlushInterval, - FutureRetention: futureRetention.Duration(), - LogNewStreams: *logNewStreams, - LogIngestedRows: *logIngestedRows, + Retention: retentionPeriod.Duration(), + FlushInterval: *inmemoryDataFlushInterval, + FutureRetention: futureRetention.Duration(), + LogNewStreams: *logNewStreams, + LogIngestedRows: *logIngestedRows, + MinFreeDiskSpaceBytes: minFreeDiskSpaceBytes.N, } logger.Infof("opening storage at -storageDataPath=%s", *storageDataPath) startTime := time.Now() @@ -74,9 +77,9 @@ func Stop() { var strg *logstorage.Storage var storageMetrics *metrics.Set -// MustAddRows adds lr to vlstorage -func MustAddRows(lr *logstorage.LogRows) { - strg.MustAddRows(lr) +// AddRows adds lr to vlstorage +func AddRows(lr *logstorage.LogRows) error { + return strg.AddRows(lr) } // RunQuery runs the given q and calls processBlock for the returned data blocks @@ -107,6 +110,13 @@ func initStorageMetrics(strg *logstorage.Storage) *metrics.Set { ms.NewGauge(fmt.Sprintf(`vl_free_disk_space_bytes{path=%q}`, *storageDataPath), func() float64 { return float64(fs.MustGetFreeSpace(*storageDataPath)) }) + ms.NewGauge(fmt.Sprintf(`vl_storage_is_read_only{path=%q}`, *storageDataPath), func() float64 { + if m().IsReadOnly { + return 1 + } + + return 0 + }) ms.NewGauge(`vl_active_merges{type="inmemory"}`, func() float64 { return float64(m().InmemoryActiveMerges) diff --git a/app/vmagent/README.md b/app/vmagent/README.md index aafd707d01a7..7f9fd0b8ac3b 100644 --- a/app/vmagent/README.md +++ b/app/vmagent/README.md @@ -3,7 +3,8 @@ `vmagent` is a tiny agent which helps you collect metrics from various sources, [relabel and filter the collected metrics](#relabeling) and store them in [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics) -or any other storage systems via Prometheus `remote_write` protocol. +or any other storage systems via Prometheus `remote_write` protocol +or via [VictoriaMetrics `remote_write` protocol](#victoriametrics-remote-write-protocol). See [Quick Start](#quick-start) for details. diff --git a/app/vmagent/main.go b/app/vmagent/main.go index 51fc5cf6b4f7..27db29f9965b 100644 --- a/app/vmagent/main.go +++ b/app/vmagent/main.go @@ -208,7 +208,7 @@ func getAuthTokenFromPath(path string) (*auth.Token, error) { if p.Suffix != "opentsdb/api/put" { return nil, fmt.Errorf("unsupported path requested: %q; expecting 'opentsdb/api/put'", p.Suffix) } - return auth.NewToken(p.AuthToken) + return auth.NewTokenPossibleMultitenant(p.AuthToken) } func requestHandler(w http.ResponseWriter, r *http.Request) bool { diff --git a/app/vmalert/README.md b/app/vmalert/README.md index f68a8d3c7416..1a6693aaaef7 100644 --- a/app/vmalert/README.md +++ b/app/vmalert/README.md @@ -526,7 +526,7 @@ Alertmanagers. To avoid recording rules results and alerts state duplication in VictoriaMetrics server don't forget to configure [deduplication](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#deduplication). -The recommended value for `-dedup.minScrapeInterval` must be multiple of vmalert's `evaluation_interval`. +The recommended value for `-dedup.minScrapeInterval` must be multiple of vmalert's `-evaluationInterval`. If you observe inconsistent or "jumping" values in series produced by vmalert, try disabling `-datasource.queryTimeAlignment` command line flag. Because of alignment, two or more vmalert HA pairs will produce results with the same timestamps. But due of backfilling (data delivered to the datasource with some delay) values of such results may differ, @@ -778,7 +778,7 @@ may get empty response from the datasource and produce empty recording rules or Try the following recommendations to reduce the chance of hitting the data delay issue: -* Always configure group's `evaluationInterval` to be bigger or at least equal to +* Always configure group's `-evaluationInterval` to be bigger or at least equal to [time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution); * Ensure that `[duration]` value is at least twice bigger than [time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution). For example, diff --git a/app/vmauth/README.md b/app/vmauth/README.md index 610462412358..71a627e0192c 100644 --- a/app/vmauth/README.md +++ b/app/vmauth/README.md @@ -25,6 +25,7 @@ The auth config can be reloaded via the following ways: and apply new changes every 5 seconds. Docker images for `vmauth` are available [here](https://hub.docker.com/r/victoriametrics/vmauth/tags). +See how `vmauth` used in [docker-compose env](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/README.md#victoriametrics-cluster). Pass `-help` to `vmauth` in order to see all the supported command-line flags with their descriptions. diff --git a/app/vmauth/auth_config.go b/app/vmauth/auth_config.go index 43b40bc94ecb..fbb9bb414a52 100644 --- a/app/vmauth/auth_config.go +++ b/app/vmauth/auth_config.go @@ -1,6 +1,7 @@ package main import ( + "bytes" "encoding/base64" "flag" "fmt" @@ -290,6 +291,13 @@ func (sp *SrcPath) MarshalYAML() (interface{}, error) { return sp.sOriginal, nil } +var ( + configReloads = metrics.NewCounter(`vmauth_config_last_reload_total`) + configReloadErrors = metrics.NewCounter(`vmauth_config_last_reload_errors_total`) + configSuccess = metrics.NewCounter(`vmauth_config_last_reload_successful`) + configTimestamp = metrics.NewCounter(`vmauth_config_last_reload_success_timestamp_seconds`) +) + func initAuthConfig() { if len(*authConfigPath) == 0 { logger.Fatalf("missing required `-auth.config` command-line flag") @@ -300,11 +308,14 @@ func initAuthConfig() { // See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240 sighupCh := procutil.NewSighupChan() - err := loadAuthConfig() + _, err := loadAuthConfig() if err != nil { logger.Fatalf("cannot load auth config: %s", err) } + configSuccess.Set(1) + configTimestamp.Set(fasttime.UnixTimestamp()) + stopCh = make(chan struct{}) authConfigWG.Add(1) go func() { @@ -327,52 +338,75 @@ func authConfigReloader(sighupCh <-chan os.Signal) { refreshCh = ticker.C } + updateFn := func() { + configReloads.Inc() + updated, err := loadAuthConfig() + if err != nil { + logger.Errorf("failed to load auth config; using the last successfully loaded config; error: %s", err) + configSuccess.Set(0) + configReloadErrors.Inc() + return + } + configSuccess.Set(1) + if updated { + configTimestamp.Set(fasttime.UnixTimestamp()) + } + } + for { select { case <-stopCh: return case <-refreshCh: - procutil.SelfSIGHUP() + updateFn() case <-sighupCh: logger.Infof("SIGHUP received; loading -auth.config=%q", *authConfigPath) - err := loadAuthConfig() - if err != nil { - logger.Errorf("failed to load auth config; using the last successfully loaded config; error: %s", err) - continue - } + updateFn() } } } +// authConfigData stores the yaml definition for this config. +// authConfigData needs to be updated each time authConfig is updated. +var authConfigData atomic.Pointer[[]byte] + var authConfig atomic.Pointer[AuthConfig] var authUsers atomic.Pointer[map[string]*UserInfo] var authConfigWG sync.WaitGroup var stopCh chan struct{} -func loadAuthConfig() error { - ac, err := readAuthConfig(*authConfigPath) +// loadAuthConfig loads and applies the config from *authConfigPath. +// It returns bool value to identify if new config was applied. +// The config can be not applied if there is a parsing error +// or if there are no changes to the current authConfig. +func loadAuthConfig() (bool, error) { + data, err := fs.ReadFileOrHTTP(*authConfigPath) + if err != nil { + return false, fmt.Errorf("failed to read -auth.config=%q: %w", *authConfigPath, err) + } + + oldData := authConfigData.Load() + if oldData != nil && bytes.Equal(data, *oldData) { + // there are no updates in the config - skip reloading. + return false, nil + } + + ac, err := parseAuthConfig(data) if err != nil { - return fmt.Errorf("failed to load -auth.config=%q: %s", *authConfigPath, err) + return false, fmt.Errorf("failed to parse -auth.config=%q: %w", *authConfigPath, err) } m, err := parseAuthConfigUsers(ac) if err != nil { - return fmt.Errorf("failed to parse users from -auth.config=%q: %s", *authConfigPath, err) + return false, fmt.Errorf("failed to parse users from -auth.config=%q: %w", *authConfigPath, err) } logger.Infof("loaded information about %d users from -auth.config=%q", len(m), *authConfigPath) authConfig.Store(ac) + authConfigData.Store(&data) authUsers.Store(&m) - return nil -} - -func readAuthConfig(path string) (*AuthConfig, error) { - data, err := fs.ReadFileOrHTTP(path) - if err != nil { - return nil, err - } - return parseAuthConfig(data) + return true, nil } func parseAuthConfig(data []byte) (*AuthConfig, error) { diff --git a/app/vmbackup/README.md b/app/vmbackup/README.md index 9daad41f1346..340c99d6b910 100644 --- a/app/vmbackup/README.md +++ b/app/vmbackup/README.md @@ -143,20 +143,23 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time- ## Advanced usage -* Obtaining credentials from a file. - Add flag `-credsFilePath=/etc/credentials` with the following content: +### Providing credentials as a file - for s3 (aws, minio or other s3 compatible storages): +Obtaining credentials from a file. +Add flag `-credsFilePath=/etc/credentials` with the following content: + +- for S3 (AWS, MinIO or other S3 compatible storages): + ```console [default] aws_access_key_id=theaccesskey aws_secret_access_key=thesecretaccesskeyvalue ``` - for gce cloud storage: - +- for GCP cloud storage: + ```json { "type": "service_account", @@ -171,24 +174,99 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time- "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account-email" } ``` -* Obtaining credentials from env variables. - - For AWS S3 compatible storages set env variable `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`. - Also you can set env variable `AWS_SHARED_CREDENTIALS_FILE` with path to credentials file. - - For GCE cloud storage set env variable `GOOGLE_APPLICATION_CREDENTIALS` with path to credentials file. - - For Azure storage either set env variables `AZURE_STORAGE_ACCOUNT_NAME` and `AZURE_STORAGE_ACCOUNT_KEY`, or `AZURE_STORAGE_ACCOUNT_CONNECTION_STRING`. -* Usage with s3 custom url endpoint. It is possible to use `vmbackup` with s3 compatible storages like minio, cloudian, etc. - You have to add a custom url endpoint via flag: +### Providing credentials via env variables -```console - # for minio - -customS3Endpoint=http://localhost:9000 +Obtaining credentials from env variables. +- For AWS S3 compatible storages set env variable `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`. + Also you can set env variable `AWS_SHARED_CREDENTIALS_FILE` with path to credentials file. +- For GCE cloud storage set env variable `GOOGLE_APPLICATION_CREDENTIALS` with path to credentials file. +- For Azure storage either set env variables `AZURE_STORAGE_ACCOUNT_NAME` and `AZURE_STORAGE_ACCOUNT_KEY`, or `AZURE_STORAGE_ACCOUNT_CONNECTION_STRING`. + +Please, note that `vmbackup` will use credentials provided by cloud providers metadata service [when applicable](https://docs.victoriametrics.com/vmbackup.html#using-cloud-providers-metadata-service). + +### Using cloud providers metadata service + +`vmbackup` and `vmbackupmanager` will automatically use cloud providers metadata service in order to obtain credentials if they are running in cloud environment +and credentials are not explicitly provided via flags or env variables. + +### Providing credentials in Kubernetes + +The simplest way to provide credentials in Kubernetes is to use [Secrets](https://kubernetes.io/docs/concepts/configuration/secret/) +and inject them into the pod as environment variables. For example, the following secret can be used for AWS S3 credentials: +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: vmbackup-credentials +data: + access_key: key + secret_key: secret +``` +And then it can be injected into the pod as environment variables: +```yaml +... +env: +- name: AWS_ACCESS_KEY_ID + valueFrom: + secretKeyRef: + key: access_key + name: vmbackup-credentials +- name: AWS_SECRET_ACCESS_KEY + valueFrom: + secretKeyRef: + key: secret_key + name: vmbackup-credentials +... +``` - # for aws gov region - -customS3Endpoint=https://s3-fips.us-gov-west-1.amazonaws.com +A more secure way is to use IAM roles to provide tokens for pods instead of managing credentials manually. + +For AWS deployments it will be required to configure [IAM roles for service accounts](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html). +In order to use IAM roles for service accounts with `vmbackup` or `vmbackupmanager` it is required to create ServiceAccount with IAM role mapping: +```yaml +apiVersion: v1 +kind: ServiceAccount +metadata: + name: monitoring-backups + annotations: + eks.amazonaws.com/role-arn: arn:aws:iam::{ACCOUNT_ID}:role/{ROLE_NAME} +``` +And [configure pod to use service account](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/). +After this `vmbackup` and `vmbackupmanager` will automatically use IAM role for service account in order to obtain credentials. + +For GCP deployments it will be required to configure [Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity). +In order to use Workload Identity with `vmbackup` or `vmbackupmanager` it is required to create ServiceAccount with Workload Identity annotation: +```yaml +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: monitoring-backups + annotations: + iam.gke.io/gcp-service-account: {sa_name}@{project_name}.iam.gserviceaccount.com ``` +And [configure pod to use service account](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/). +After this `vmbackup` and `vmbackupmanager` will automatically use Workload Identity for servicpe account in order to obtain credentials. + +### Using custom S3 endpoint + +Usage with s3 custom url endpoint. It is possible to use `vmbackup` with s3 compatible storages like minio, cloudian, etc. +You have to add a custom url endpoint via flag: + +- for MinIO + ```console + -customS3Endpoint=http://localhost:9000 + ``` + +- for aws gov region + ```console + -customS3Endpoint=https://s3-fips.us-gov-west-1.amazonaws.com + ``` + +### Command-line flags -* Run `vmbackup -help` in order to see all the available options: +Run `vmbackup -help` in order to see all the available options: ```console -concurrency int diff --git a/app/vmbackupmanager/README.md b/app/vmbackupmanager/README.md index 4d0c1937adbd..f56cac23831f 100644 --- a/app/vmbackupmanager/README.md +++ b/app/vmbackupmanager/README.md @@ -110,6 +110,9 @@ The result on the GCS bucket latest folder +Please, see [vmbackup docs](https://docs.victoriametrics.com/vmbackup.html#advanced-usage) for more examples of authentication with different +storage types. + ## Backup Retention Policy Backup retention policy is controlled by: diff --git a/app/vmselect/promql/exec.go b/app/vmselect/promql/exec.go index acbaa59d5bb2..ab14cb14c1d1 100644 --- a/app/vmselect/promql/exec.go +++ b/app/vmselect/promql/exec.go @@ -111,6 +111,12 @@ func maySortResults(e metricsql.Expr) bool { "bottomk_max", "bottomk_min", "bottomk_avg", "bottomk_median", "bottomk_last": return false } + case *metricsql.BinaryOpExpr: + if strings.ToLower(v.Op) == "or" { + // Do not sort results for `a or b` in the same way as Prometheus does. + // See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4763 + return false + } } return true } diff --git a/app/vmselect/promql/exec_test.go b/app/vmselect/promql/exec_test.go index 3290612a7081..d92348437991 100644 --- a/app/vmselect/promql/exec_test.go +++ b/app/vmselect/promql/exec_test.go @@ -5,10 +5,11 @@ import ( "testing" "time" + "github.com/VictoriaMetrics/metricsql" + "github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage" "github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/searchutils" "github.com/VictoriaMetrics/VictoriaMetrics/lib/storage" - "github.com/VictoriaMetrics/metricsql" ) func TestEscapeDots(t *testing.T) { @@ -245,6 +246,23 @@ func TestExecSuccess(t *testing.T) { resultExpected := []netstorage.Result{r} f(q, resultExpected) }) + t.Run("bitmap_and(NaN, 1)", func(t *testing.T) { + t.Parallel() + q := `bitmap_and(NaN, 1)` + var resultExpected []netstorage.Result + f(q, resultExpected) + }) + t.Run("bitmap_and(round(rand(1) > 0.5, 1), 1)", func(t *testing.T) { + t.Parallel() + q := `bitmap_and(round(rand(1) > 0.5, 1), 1)` + r := netstorage.Result{ + MetricName: metricNameExpected, + Values: []float64{1, 1, 1, nan, nan, 1}, + Timestamps: timestampsExpected, + } + resultExpected := []netstorage.Result{r} + f(q, resultExpected) + }) t.Run("bitmap_or(0xA2, 0x11)", func(t *testing.T) { t.Parallel() q := `bitmap_or(0xA2, 0x11)` @@ -267,6 +285,23 @@ func TestExecSuccess(t *testing.T) { resultExpected := []netstorage.Result{r} f(q, resultExpected) }) + t.Run("bitmap_or(NaN, 1)", func(t *testing.T) { + t.Parallel() + q := `bitmap_or(NaN, 1)` + var resultExpected []netstorage.Result + f(q, resultExpected) + }) + t.Run("bitmap_or(round(rand(1) > 0.5, 1), 1)", func(t *testing.T) { + t.Parallel() + q := `bitmap_or(round(rand(1) > 0.5, 1), 1)` + r := netstorage.Result{ + MetricName: metricNameExpected, + Values: []float64{1, 1, 1, nan, nan, 1}, + Timestamps: timestampsExpected, + } + resultExpected := []netstorage.Result{r} + f(q, resultExpected) + }) t.Run("bitmap_xor(0xB3, 0x11)", func(t *testing.T) { t.Parallel() q := `bitmap_xor(0xB3, 0x11)` @@ -289,6 +324,23 @@ func TestExecSuccess(t *testing.T) { resultExpected := []netstorage.Result{r} f(q, resultExpected) }) + t.Run("bitmap_xor(NaN, 1)", func(t *testing.T) { + t.Parallel() + q := `bitmap_xor(NaN, 1)` + var resultExpected []netstorage.Result + f(q, resultExpected) + }) + t.Run("bitmap_xor(round(rand(1) > 0.5, 1), 1)", func(t *testing.T) { + t.Parallel() + q := `bitmap_xor(round(rand(1) > 0.5, 1), 1)` + r := netstorage.Result{ + MetricName: metricNameExpected, + Values: []float64{0, 0, 0, nan, nan, 0}, + Timestamps: timestampsExpected, + } + resultExpected := []netstorage.Result{r} + f(q, resultExpected) + }) t.Run("timezone_offset(UTC)", func(t *testing.T) { t.Parallel() q := `timezone_offset("UTC")` @@ -7669,7 +7721,7 @@ func TestExecSuccess(t *testing.T) { }) t.Run(`aggr_over_time(multi-func)`, func(t *testing.T) { t.Parallel() - q := `sort(aggr_over_time(("min_over_time", "count_over_time", "max_over_time"), round(rand(0),0.1)[:10s]))` + q := `sort(aggr_over_time(("min_over_time", "median_over_time", "max_over_time"), round(rand(0),0.1)[:10s]))` r1 := netstorage.Result{ MetricName: metricNameExpected, Values: []float64{0, 0, 0, 0, 0, 0}, @@ -7681,21 +7733,21 @@ func TestExecSuccess(t *testing.T) { }} r2 := netstorage.Result{ MetricName: metricNameExpected, - Values: []float64{0.8, 0.9, 1, 0.9, 1, 0.9}, + Values: []float64{0.4, 0.5, 0.5, 0.75, 0.6, 0.45}, Timestamps: timestampsExpected, } r2.MetricName.Tags = []storage.Tag{{ Key: []byte("rollup"), - Value: []byte("max_over_time"), + Value: []byte("median_over_time"), }} r3 := netstorage.Result{ MetricName: metricNameExpected, - Values: []float64{20, 20, 20, 20, 20, 20}, + Values: []float64{0.8, 0.9, 1, 0.9, 1, 0.9}, Timestamps: timestampsExpected, } r3.MetricName.Tags = []storage.Tag{{ Key: []byte("rollup"), - Value: []byte("count_over_time"), + Value: []byte("max_over_time"), }} resultExpected := []netstorage.Result{r1, r2, r3} f(q, resultExpected) @@ -8479,11 +8531,11 @@ func TestExecSuccess(t *testing.T) { }) t.Run(`result sorting`, func(t *testing.T) { t.Parallel() - q := `label_set(1, "instance", "localhost:1001", "type", "free") - or label_set(1, "instance", "localhost:1001", "type", "buffers") - or label_set(1, "instance", "localhost:1000", "type", "buffers") - or label_set(1, "instance", "localhost:1000", "type", "free") -` + q := `(label_set(1, "instance", "localhost:1001", "type", "free"), + label_set(1, "instance", "localhost:1001", "type", "buffers"), + label_set(1, "instance", "localhost:1000", "type", "buffers"), + label_set(1, "instance", "localhost:1000", "type", "free"), + )` r1 := netstorage.Result{ MetricName: metricNameExpected, Values: []float64{1, 1, 1, 1, 1, 1}, @@ -8515,6 +8567,34 @@ func TestExecSuccess(t *testing.T) { resultExpected := []netstorage.Result{r1, r2, r3, r4} f(q, resultExpected) }) + t.Run(`no_sorting_for_or`, func(t *testing.T) { + t.Parallel() + q := `label_set(2, "foo", "bar") or label_set(1, "foo", "baz")` + r1 := netstorage.Result{ + MetricName: metricNameExpected, + Values: []float64{2, 2, 2, 2, 2, 2}, + Timestamps: timestampsExpected, + } + r1.MetricName.Tags = []storage.Tag{ + { + Key: []byte("foo"), + Value: []byte("bar"), + }, + } + r2 := netstorage.Result{ + MetricName: metricNameExpected, + Values: []float64{1, 1, 1, 1, 1, 1}, + Timestamps: timestampsExpected, + } + r2.MetricName.Tags = []storage.Tag{ + { + Key: []byte("foo"), + Value: []byte("baz"), + }, + } + resultExpected := []netstorage.Result{r1, r2} + f(q, resultExpected) + }) t.Run(`sort_by_label_numeric(multiple_labels_only_string)`, func(t *testing.T) { t.Parallel() q := `sort_by_label_numeric(( diff --git a/app/vmselect/promql/rollup.go b/app/vmselect/promql/rollup.go index 2cf82734455e..9f84f74a71d1 100644 --- a/app/vmselect/promql/rollup.go +++ b/app/vmselect/promql/rollup.go @@ -7,11 +7,12 @@ import ( "strings" "sync" + "github.com/VictoriaMetrics/metrics" + "github.com/VictoriaMetrics/metricsql" + "github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal" "github.com/VictoriaMetrics/VictoriaMetrics/lib/logger" "github.com/VictoriaMetrics/VictoriaMetrics/lib/storage" - "github.com/VictoriaMetrics/metrics" - "github.com/VictoriaMetrics/metricsql" ) var minStalenessInterval = flag.Duration("search.minStalenessInterval", 0, "The minimum interval for staleness calculations. "+ @@ -58,6 +59,7 @@ var rollupFuncs = map[string]newRollupFunc{ "lifetime": newRollupFuncOneArg(rollupLifetime), "mad_over_time": newRollupFuncOneArg(rollupMAD), "max_over_time": newRollupFuncOneArg(rollupMax), + "median_over_time": newRollupFuncOneArg(rollupMedian), "min_over_time": newRollupFuncOneArg(rollupMin), "mode_over_time": newRollupFuncOneArg(rollupModeOverTime), "predict_linear": newRollupPredictLinear, @@ -125,6 +127,7 @@ var rollupAggrFuncs = map[string]rollupFunc{ "lifetime": rollupLifetime, "mad_over_time": rollupMAD, "max_over_time": rollupMax, + "median_over_time": rollupMedian, "min_over_time": rollupMin, "mode_over_time": rollupModeOverTime, "present_over_time": rollupPresent, @@ -224,6 +227,7 @@ var rollupFuncsKeepMetricName = map[string]bool{ "holt_winters": true, "last_over_time": true, "max_over_time": true, + "median_over_time": true, "min_over_time": true, "mode_over_time": true, "predict_linear": true, @@ -1396,6 +1400,10 @@ func rollupMax(rfa *rollupFuncArg) float64 { return maxValue } +func rollupMedian(rfa *rollupFuncArg) float64 { + return quantile(0.5, rfa.values) +} + func rollupTmin(rfa *rollupFuncArg) float64 { // There is no need in handling NaNs here, since they must be cleaned up // before calling rollup funcs. diff --git a/app/vmselect/promql/transform.go b/app/vmselect/promql/transform.go index 94da2a572767..64b4f59b9d6b 100644 --- a/app/vmselect/promql/transform.go +++ b/app/vmselect/promql/transform.go @@ -11,12 +11,13 @@ import ( "strings" "time" + "github.com/VictoriaMetrics/metricsql" + "github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/searchutils" "github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil" "github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal" "github.com/VictoriaMetrics/VictoriaMetrics/lib/logger" "github.com/VictoriaMetrics/VictoriaMetrics/lib/storage" - "github.com/VictoriaMetrics/metricsql" ) var transformFuncs = map[string]transformFunc{ @@ -2589,6 +2590,9 @@ func newTransformBitmap(bitmapFunc func(a, b uint64) uint64) func(tfa *transform } tf := func(values []float64) { for i, v := range values { + if math.IsNaN(v) { + continue + } values[i] = float64(bitmapFunc(uint64(v), uint64(ns[i]))) } } diff --git a/app/vmstorage/main.go b/app/vmstorage/main.go index f3be38ff963a..e81e38825efa 100644 --- a/app/vmstorage/main.go +++ b/app/vmstorage/main.go @@ -585,11 +585,6 @@ func registerStorageMetrics(strg *storage.Storage) { return float64(idbm().ItemsAddedSizeBytes) }) - // See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686 - metrics.NewGauge(`vm_merge_need_free_disk_space`, func() float64 { - return float64(tm().MergeNeedFreeDiskSpace) - }) - metrics.NewGauge(`vm_pending_rows{type="storage"}`, func() float64 { return float64(tm().PendingRows) }) diff --git a/app/vmui/packages/vmui/src/components/Main/Hyperlink/Hyperlink.tsx b/app/vmui/packages/vmui/src/components/Main/Hyperlink/Hyperlink.tsx index c1cd58e3fe3a..32dac0b3ab04 100644 --- a/app/vmui/packages/vmui/src/components/Main/Hyperlink/Hyperlink.tsx +++ b/app/vmui/packages/vmui/src/components/Main/Hyperlink/Hyperlink.tsx @@ -8,6 +8,7 @@ interface Hyperlink { children?: ReactNode; colored?: boolean; underlined?: boolean; + withIcon?: boolean; } const Hyperlink: FC = ({ @@ -15,14 +16,16 @@ const Hyperlink: FC = ({ href, children, colored = true, - underlined = false + underlined = false, + withIcon = false, }) => ( = ({ isPrometheus, ...props }) => { +const CardinalityConfigurator: FC = ({ isPrometheus, isCluster, ...props }) => { const { isMobile } = useDeviceDetect(); const [searchParams] = useSearchParams(); const { setSearchParamsFromKeys } = useSearchParamsFromObject(); @@ -105,19 +106,29 @@ const CardinalityConfigurator: FC = ({ isPrometheus, ...
+ {isCluster && +
+ + + Statistic inaccuracy explanation + +
+ }
- Documentation - +
diff --git a/app/vmui/packages/vmui/src/pages/CardinalityPanel/CardinalityTotals/CardinalityTotals.tsx b/app/vmui/packages/vmui/src/pages/CardinalityPanel/CardinalityTotals/CardinalityTotals.tsx index 0c1f53e1aada..63845403af25 100644 --- a/app/vmui/packages/vmui/src/pages/CardinalityPanel/CardinalityTotals/CardinalityTotals.tsx +++ b/app/vmui/packages/vmui/src/pages/CardinalityPanel/CardinalityTotals/CardinalityTotals.tsx @@ -14,6 +14,7 @@ export interface CardinalityTotalsProps { totalLabelValuePairs: number; seriesCountByMetricName: TopHeapEntry[]; isPrometheus?: boolean; + isCluster: boolean; } const CardinalityTotals: FC = ({ @@ -21,7 +22,7 @@ const CardinalityTotals: FC = ({ totalSeriesPrev = 0, totalSeriesAll = 0, seriesCountByMetricName = [], - isPrometheus + isPrometheus, }) => { const { isMobile } = useDeviceDetect(); @@ -50,7 +51,7 @@ const CardinalityTotals: FC = ({ value: isNaN(progress) ? "-" : `${progress.toFixed(2)}%`, display: isMetric, info: "The share of these series in the total number of time series." - } + }, ].filter(t => t.display); if (!totals.length) { diff --git a/app/vmui/packages/vmui/src/pages/CardinalityPanel/hooks/useCardinalityFetch.ts b/app/vmui/packages/vmui/src/pages/CardinalityPanel/hooks/useCardinalityFetch.ts index 585ebf245ab7..c1c88b6e7fb4 100644 --- a/app/vmui/packages/vmui/src/pages/CardinalityPanel/hooks/useCardinalityFetch.ts +++ b/app/vmui/packages/vmui/src/pages/CardinalityPanel/hooks/useCardinalityFetch.ts @@ -7,12 +7,14 @@ import AppConfigurator from "../appConfigurator"; import { useSearchParams } from "react-router-dom"; import dayjs from "dayjs"; import { DATE_FORMAT } from "../../../constants/date"; +import { getTenantIdFromUrl } from "../../../utils/tenants"; export const useFetchQuery = (): { fetchUrl?: string[], isLoading: boolean, error?: ErrorTypes | string appConfigurator: AppConfigurator, + isCluster: boolean, } => { const appConfigurator = new AppConfigurator(); @@ -26,6 +28,7 @@ export const useFetchQuery = (): { const [isLoading, setIsLoading] = useState(false); const [error, setError] = useState(); const [tsdbStatus, setTSDBStatus] = useState(appConfigurator.defaultTSDBStatus); + const [isCluster, setIsCluster] = useState(false); const getResponseJson = async (url: string) => { const response = await fetch(url); @@ -115,6 +118,12 @@ export const useFetchQuery = (): { } }, [error]); + useEffect(() => { + const id = getTenantIdFromUrl(serverUrl); + setIsCluster(!!id); + }, [serverUrl]); + + appConfigurator.tsdbStatusData = tsdbStatus; - return { isLoading, appConfigurator: appConfigurator, error }; + return { isLoading, appConfigurator: appConfigurator, error, isCluster }; }; diff --git a/app/vmui/packages/vmui/src/pages/CardinalityPanel/index.tsx b/app/vmui/packages/vmui/src/pages/CardinalityPanel/index.tsx index 63144104167d..777b2f39897b 100644 --- a/app/vmui/packages/vmui/src/pages/CardinalityPanel/index.tsx +++ b/app/vmui/packages/vmui/src/pages/CardinalityPanel/index.tsx @@ -31,7 +31,7 @@ const CardinalityPanel: FC = () => { const match = searchParams.get("match") || ""; const focusLabel = searchParams.get("focusLabel") || ""; - const { isLoading, appConfigurator, error } = useFetchQuery(); + const { isLoading, appConfigurator, error, isCluster } = useFetchQuery(); const { tsdbStatusData, getDefaultState, tablesHeaders, sectionsTips } = appConfigurator; const defaultState = getDefaultState(match, focusLabel); @@ -62,6 +62,7 @@ const CardinalityPanel: FC = () => { totalSeriesAll={tsdbStatusData.totalSeriesByAll} totalLabelValuePairs={tsdbStatusData.totalLabelValuePairs} seriesCountByMetricName={tsdbStatusData.seriesCountByMetricName} + isCluster={isCluster} /> {showTips && ( @@ -69,7 +70,7 @@ const CardinalityPanel: FC = () => { {!match && !focusLabel && } {match && !focusLabel && } {!match && !focusLabel && } - {focusLabel && } + {focusLabel && }
)} diff --git a/dashboards/victoriametrics.json b/dashboards/victoriametrics.json index 69a0d93b5872..2d631352389b 100644 --- a/dashboards/victoriametrics.json +++ b/dashboards/victoriametrics.json @@ -76,7 +76,7 @@ "uid": "$ds" }, "enable": true, - "expr": "sum(vm_app_version{job=~\"$job\"}) by(short_version) unless (sum(vm_app_version{job=~\"$job\"} offset 20m) by(short_version))", + "expr": "sum(vm_app_version{job=~\"$job\", instance=~\"$instance\"}) by(short_version) unless (sum(vm_app_version{job=~\"$job\", instance=~\"$instance\"} offset 20m) by(short_version))", "hide": true, "iconColor": "dark-blue", "name": "version", diff --git a/dashboards/vm/victoriametrics.json b/dashboards/vm/victoriametrics.json index 52c04c949378..161308adebd3 100644 --- a/dashboards/vm/victoriametrics.json +++ b/dashboards/vm/victoriametrics.json @@ -77,7 +77,7 @@ "uid": "$ds" }, "enable": true, - "expr": "sum(vm_app_version{job=~\"$job\"}) by(short_version) unless (sum(vm_app_version{job=~\"$job\"} offset 20m) by(short_version))", + "expr": "sum(vm_app_version{job=~\"$job\", instance=~\"$instance\"}) by(short_version) unless (sum(vm_app_version{job=~\"$job\", instance=~\"$instance\"} offset 20m) by(short_version))", "hide": true, "iconColor": "dark-blue", "name": "version", diff --git a/dashboards/vm/vmagent.json b/dashboards/vm/vmagent.json index adfb37735a7d..3928c3248313 100644 --- a/dashboards/vm/vmagent.json +++ b/dashboards/vm/vmagent.json @@ -2373,7 +2373,8 @@ "mode": "absolute", "steps": [ { - "color": "green" + "color": "green", + "value": null }, { "color": "red", @@ -2389,7 +2390,7 @@ "h": 8, "w": 12, "x": 0, - "y": 4 + "y": 36 }, "id": 92, "options": { @@ -2475,7 +2476,8 @@ "mode": "absolute", "steps": [ { - "color": "green" + "color": "green", + "value": null }, { "color": "red", @@ -2491,7 +2493,7 @@ "h": 8, "w": 12, "x": 12, - "y": 4 + "y": 36 }, "id": 95, "options": { @@ -2580,7 +2582,8 @@ "mode": "absolute", "steps": [ { - "color": "transparent" + "color": "transparent", + "value": null }, { "color": "red", @@ -2596,7 +2599,7 @@ "h": 8, "w": 12, "x": 0, - "y": 12 + "y": 44 }, "id": 98, "options": { @@ -2685,7 +2688,8 @@ "mode": "absolute", "steps": [ { - "color": "transparent" + "color": "transparent", + "value": null }, { "color": "red", @@ -2701,7 +2705,7 @@ "h": 8, "w": 12, "x": 12, - "y": 12 + "y": 44 }, "id": 99, "options": { @@ -2789,7 +2793,8 @@ "mode": "absolute", "steps": [ { - "color": "green" + "color": "green", + "value": null }, { "color": "red", @@ -2805,7 +2810,7 @@ "h": 8, "w": 12, "x": 0, - "y": 20 + "y": 52 }, "id": 79, "links": [], @@ -2894,7 +2899,8 @@ "mode": "absolute", "steps": [ { - "color": "green" + "color": "green", + "value": null }, { "color": "red", @@ -2910,7 +2916,7 @@ "h": 8, "w": 12, "x": 12, - "y": 20 + "y": 52 }, "id": 18, "links": [ @@ -3004,7 +3010,8 @@ "mode": "absolute", "steps": [ { - "color": "green" + "color": "green", + "value": null }, { "color": "red", @@ -3020,7 +3027,7 @@ "h": 8, "w": 12, "x": 0, - "y": 28 + "y": 60 }, "id": 127, "links": [], @@ -3107,7 +3114,8 @@ "mode": "absolute", "steps": [ { - "color": "green" + "color": "green", + "value": null }, { "color": "red", @@ -3123,7 +3131,7 @@ "h": 8, "w": 12, "x": 12, - "y": 28 + "y": 60 }, "id": 50, "options": { @@ -3161,6 +3169,123 @@ "title": "Invalid datapoints rate ($instance)", "type": "timeseries" }, + { + "datasource": { + "type": "victoriametrics-datasource", + "uid": "$ds" + }, + "description": "Shows how many concurrent inserts (parsing and processing of scraped or ingested data) are taking place.\n\nIf the number of concurrent inserts hits the `limit` or is close to the `limit` constantly - it might be a sign of a resource shortage.\n\nIf vmagent's CPU usage and remote write connection saturation are at normal level, it might be that `-maxConcurrentInserts` cmd-line flag needs to be increased.", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "never", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "links": [], + "mappings": [], + "min": 0, + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "red", + "value": 80 + } + ] + }, + "unit": "short" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 68 + }, + "id": 130, + "links": [], + "options": { + "legend": { + "calcs": [ + "mean", + "lastNotNull", + "max" + ], + "displayMode": "table", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "multi", + "sort": "desc" + } + }, + "pluginVersion": "9.2.6", + "targets": [ + { + "datasource": { + "type": "victoriametrics-datasource", + "uid": "$ds" + }, + "editorMode": "code", + "exemplar": true, + "expr": "max_over_time(vm_concurrent_insert_current{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])", + "interval": "", + "legendFormat": "{{instance}} ({{job}})", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "victoriametrics-datasource", + "uid": "$ds" + }, + "editorMode": "code", + "exemplar": true, + "expr": "min(vm_concurrent_insert_capacity{job=~\"$job\", instance=~\"$instance\"}) by(job)", + "interval": "", + "legendFormat": "limit ({{job}})", + "range": true, + "refId": "B" + } + ], + "title": "Concurrent inserts ($instance)", + "type": "timeseries" + }, { "datasource": { "type": "victoriametrics-datasource", @@ -3181,7 +3306,8 @@ "mode": "absolute", "steps": [ { - "color": "green" + "color": "green", + "value": null }, { "color": "red", @@ -3221,7 +3347,7 @@ "h": 7, "w": 24, "x": 0, - "y": 36 + "y": 76 }, "id": 129, "options": { @@ -3240,7 +3366,7 @@ } ] }, - "pluginVersion": "9.2.6", + "pluginVersion": "9.2.7", "targets": [ { "datasource": { @@ -4063,7 +4189,7 @@ "h": 8, "w": 12, "x": 0, - "y": 38 + "y": 85 }, "id": 73, "links": [], @@ -4180,7 +4306,7 @@ "h": 8, "w": 12, "x": 12, - "y": 38 + "y": 85 }, "id": 131, "links": [], @@ -4219,123 +4345,6 @@ "title": "Rows rate ($instance)", "type": "timeseries" }, - { - "datasource": { - "type": "victoriametrics-datasource", - "uid": "$ds" - }, - "description": "Shows how many concurrent inserts are taking place.\n\nIf the number of concurrent inserts hitting the `limit` or is close to the `limit` constantly - it might be a sign of a resource shortage.\n\n If vmagent's CPU usage and remote write connection saturation are at normal level, it might be that `-maxConcurrentInserts` cmd-line flag need to be increased.", - "fieldConfig": { - "defaults": { - "color": { - "mode": "palette-classic" - }, - "custom": { - "axisCenteredZero": false, - "axisColorMode": "text", - "axisLabel": "", - "axisPlacement": "auto", - "barAlignment": 0, - "drawStyle": "line", - "fillOpacity": 0, - "gradientMode": "none", - "hideFrom": { - "legend": false, - "tooltip": false, - "viz": false - }, - "lineInterpolation": "linear", - "lineWidth": 1, - "pointSize": 5, - "scaleDistribution": { - "type": "linear" - }, - "showPoints": "never", - "spanNulls": false, - "stacking": { - "group": "A", - "mode": "none" - }, - "thresholdsStyle": { - "mode": "off" - } - }, - "links": [], - "mappings": [], - "min": 0, - "thresholds": { - "mode": "absolute", - "steps": [ - { - "color": "green", - "value": null - }, - { - "color": "red", - "value": 80 - } - ] - }, - "unit": "short" - }, - "overrides": [] - }, - "gridPos": { - "h": 8, - "w": 12, - "x": 0, - "y": 46 - }, - "id": 130, - "links": [], - "options": { - "legend": { - "calcs": [ - "mean", - "lastNotNull", - "max" - ], - "displayMode": "table", - "placement": "bottom", - "showLegend": true - }, - "tooltip": { - "mode": "multi", - "sort": "desc" - } - }, - "pluginVersion": "9.2.6", - "targets": [ - { - "datasource": { - "type": "victoriametrics-datasource", - "uid": "$ds" - }, - "editorMode": "code", - "exemplar": true, - "expr": "max_over_time(vm_concurrent_insert_current{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])", - "interval": "", - "legendFormat": "{{instance}} ({{job}})", - "range": true, - "refId": "A" - }, - { - "datasource": { - "type": "victoriametrics-datasource", - "uid": "$ds" - }, - "editorMode": "code", - "exemplar": true, - "expr": "min(vm_concurrent_insert_capacity{job=~\"$job\", instance=~\"$instance\"}) by(job)", - "interval": "", - "legendFormat": "limit ({{job}})", - "range": true, - "refId": "B" - } - ], - "title": "Concurrent inserts ($instance)", - "type": "timeseries" - }, { "datasource": { "type": "victoriametrics-datasource", @@ -4400,8 +4409,8 @@ "gridPos": { "h": 8, "w": 12, - "x": 12, - "y": 46 + "x": 0, + "y": 93 }, "id": 77, "links": [], diff --git a/dashboards/vmagent.json b/dashboards/vmagent.json index 4c66b13b9d15..de05026651c3 100644 --- a/dashboards/vmagent.json +++ b/dashboards/vmagent.json @@ -2372,7 +2372,8 @@ "mode": "absolute", "steps": [ { - "color": "green" + "color": "green", + "value": null }, { "color": "red", @@ -2388,7 +2389,7 @@ "h": 8, "w": 12, "x": 0, - "y": 4 + "y": 36 }, "id": 92, "options": { @@ -2474,7 +2475,8 @@ "mode": "absolute", "steps": [ { - "color": "green" + "color": "green", + "value": null }, { "color": "red", @@ -2490,7 +2492,7 @@ "h": 8, "w": 12, "x": 12, - "y": 4 + "y": 36 }, "id": 95, "options": { @@ -2579,7 +2581,8 @@ "mode": "absolute", "steps": [ { - "color": "transparent" + "color": "transparent", + "value": null }, { "color": "red", @@ -2595,7 +2598,7 @@ "h": 8, "w": 12, "x": 0, - "y": 12 + "y": 44 }, "id": 98, "options": { @@ -2684,7 +2687,8 @@ "mode": "absolute", "steps": [ { - "color": "transparent" + "color": "transparent", + "value": null }, { "color": "red", @@ -2700,7 +2704,7 @@ "h": 8, "w": 12, "x": 12, - "y": 12 + "y": 44 }, "id": 99, "options": { @@ -2788,7 +2792,8 @@ "mode": "absolute", "steps": [ { - "color": "green" + "color": "green", + "value": null }, { "color": "red", @@ -2804,7 +2809,7 @@ "h": 8, "w": 12, "x": 0, - "y": 20 + "y": 52 }, "id": 79, "links": [], @@ -2893,7 +2898,8 @@ "mode": "absolute", "steps": [ { - "color": "green" + "color": "green", + "value": null }, { "color": "red", @@ -2909,7 +2915,7 @@ "h": 8, "w": 12, "x": 12, - "y": 20 + "y": 52 }, "id": 18, "links": [ @@ -3003,7 +3009,8 @@ "mode": "absolute", "steps": [ { - "color": "green" + "color": "green", + "value": null }, { "color": "red", @@ -3019,7 +3026,7 @@ "h": 8, "w": 12, "x": 0, - "y": 28 + "y": 60 }, "id": 127, "links": [], @@ -3106,7 +3113,8 @@ "mode": "absolute", "steps": [ { - "color": "green" + "color": "green", + "value": null }, { "color": "red", @@ -3122,7 +3130,7 @@ "h": 8, "w": 12, "x": 12, - "y": 28 + "y": 60 }, "id": 50, "options": { @@ -3160,6 +3168,123 @@ "title": "Invalid datapoints rate ($instance)", "type": "timeseries" }, + { + "datasource": { + "type": "prometheus", + "uid": "$ds" + }, + "description": "Shows how many concurrent inserts (parsing and processing of scraped or ingested data) are taking place.\n\nIf the number of concurrent inserts hits the `limit` or is close to the `limit` constantly - it might be a sign of a resource shortage.\n\nIf vmagent's CPU usage and remote write connection saturation are at normal level, it might be that `-maxConcurrentInserts` cmd-line flag needs to be increased.", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "never", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "links": [], + "mappings": [], + "min": 0, + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "red", + "value": 80 + } + ] + }, + "unit": "short" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 68 + }, + "id": 130, + "links": [], + "options": { + "legend": { + "calcs": [ + "mean", + "lastNotNull", + "max" + ], + "displayMode": "table", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "multi", + "sort": "desc" + } + }, + "pluginVersion": "9.2.6", + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "$ds" + }, + "editorMode": "code", + "exemplar": true, + "expr": "max_over_time(vm_concurrent_insert_current{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])", + "interval": "", + "legendFormat": "{{instance}} ({{job}})", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "$ds" + }, + "editorMode": "code", + "exemplar": true, + "expr": "min(vm_concurrent_insert_capacity{job=~\"$job\", instance=~\"$instance\"}) by(job)", + "interval": "", + "legendFormat": "limit ({{job}})", + "range": true, + "refId": "B" + } + ], + "title": "Concurrent inserts ($instance)", + "type": "timeseries" + }, { "datasource": { "type": "prometheus", @@ -3180,7 +3305,8 @@ "mode": "absolute", "steps": [ { - "color": "green" + "color": "green", + "value": null }, { "color": "red", @@ -3220,7 +3346,7 @@ "h": 7, "w": 24, "x": 0, - "y": 36 + "y": 76 }, "id": 129, "options": { @@ -3239,7 +3365,7 @@ } ] }, - "pluginVersion": "9.2.6", + "pluginVersion": "9.2.7", "targets": [ { "datasource": { @@ -4062,7 +4188,7 @@ "h": 8, "w": 12, "x": 0, - "y": 38 + "y": 85 }, "id": 73, "links": [], @@ -4179,7 +4305,7 @@ "h": 8, "w": 12, "x": 12, - "y": 38 + "y": 85 }, "id": 131, "links": [], @@ -4218,123 +4344,6 @@ "title": "Rows rate ($instance)", "type": "timeseries" }, - { - "datasource": { - "type": "prometheus", - "uid": "$ds" - }, - "description": "Shows how many concurrent inserts are taking place.\n\nIf the number of concurrent inserts hitting the `limit` or is close to the `limit` constantly - it might be a sign of a resource shortage.\n\n If vmagent's CPU usage and remote write connection saturation are at normal level, it might be that `-maxConcurrentInserts` cmd-line flag need to be increased.", - "fieldConfig": { - "defaults": { - "color": { - "mode": "palette-classic" - }, - "custom": { - "axisCenteredZero": false, - "axisColorMode": "text", - "axisLabel": "", - "axisPlacement": "auto", - "barAlignment": 0, - "drawStyle": "line", - "fillOpacity": 0, - "gradientMode": "none", - "hideFrom": { - "legend": false, - "tooltip": false, - "viz": false - }, - "lineInterpolation": "linear", - "lineWidth": 1, - "pointSize": 5, - "scaleDistribution": { - "type": "linear" - }, - "showPoints": "never", - "spanNulls": false, - "stacking": { - "group": "A", - "mode": "none" - }, - "thresholdsStyle": { - "mode": "off" - } - }, - "links": [], - "mappings": [], - "min": 0, - "thresholds": { - "mode": "absolute", - "steps": [ - { - "color": "green", - "value": null - }, - { - "color": "red", - "value": 80 - } - ] - }, - "unit": "short" - }, - "overrides": [] - }, - "gridPos": { - "h": 8, - "w": 12, - "x": 0, - "y": 46 - }, - "id": 130, - "links": [], - "options": { - "legend": { - "calcs": [ - "mean", - "lastNotNull", - "max" - ], - "displayMode": "table", - "placement": "bottom", - "showLegend": true - }, - "tooltip": { - "mode": "multi", - "sort": "desc" - } - }, - "pluginVersion": "9.2.6", - "targets": [ - { - "datasource": { - "type": "prometheus", - "uid": "$ds" - }, - "editorMode": "code", - "exemplar": true, - "expr": "max_over_time(vm_concurrent_insert_current{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])", - "interval": "", - "legendFormat": "{{instance}} ({{job}})", - "range": true, - "refId": "A" - }, - { - "datasource": { - "type": "prometheus", - "uid": "$ds" - }, - "editorMode": "code", - "exemplar": true, - "expr": "min(vm_concurrent_insert_capacity{job=~\"$job\", instance=~\"$instance\"}) by(job)", - "interval": "", - "legendFormat": "limit ({{job}})", - "range": true, - "refId": "B" - } - ], - "title": "Concurrent inserts ($instance)", - "type": "timeseries" - }, { "datasource": { "type": "prometheus", @@ -4399,8 +4408,8 @@ "gridPos": { "h": 8, "w": 12, - "x": 12, - "y": 46 + "x": 0, + "y": 93 }, "id": 77, "links": [], diff --git a/deployment/docker/README.md b/deployment/docker/README.md index ed936455750b..fe368e88039e 100644 --- a/deployment/docker/README.md +++ b/deployment/docker/README.md @@ -42,30 +42,36 @@ The communication scheme between components is the following: and recording rules back to it; * [alertmanager](#alertmanager) is configured to receive notifications from `vmalert`. -To access `vmalert` use link [http://localhost:8428/vmalert](http://localhost:8428/vmalert/). +To access Grafana use link [http://localhost:3000](http://localhost:3000). To access [vmui](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#vmui) use link [http://localhost:8428/vmui](http://localhost:8428/vmui). +To access `vmalert` use link [http://localhost:8428/vmalert](http://localhost:8428/vmalert/). + + ## VictoriaMetrics cluster VictoriaMetrics cluster environment consists of `vminsert`, `vmstorage` and `vmselect` components. -`vmselect` has exposed port `:8481`, `vminsert` has exposed port `:8480` and the rest of components -are available only inside the environment. +`vminsert` has exposed port `:8480`, access to `vmselect` components goes through `vmauth` on port `:8427`, +and the rest of components are available only inside the environment. The communication scheme between components is the following: * [vmagent](#vmagent) sends scraped metrics to `vminsert`; * `vminsert` forwards data to `vmstorage`; -* `vmselect` is connected to `vmstorage` for querying data; -* [grafana](#grafana) is configured with datasource pointing to `vmselect`; -* [vmalert](#vmalert) is configured to query `vmselect` and send alerts state +* `vmselect`s are connected to `vmstorage` for querying data; +* [vmauth](#vmauth) balances incoming read requests among `vmselect`s; +* [grafana](#grafana) is configured with datasource pointing to `vmauth`; +* [vmalert](#vmalert) is configured to query `vmselect`s via `vmauth` and send alerts state and recording rules to `vminsert`; * [alertmanager](#alertmanager) is configured to receive notifications from `vmalert`. -To access `vmalert` use link [http://localhost:8481/select/0/prometheus/vmalert](http://localhost:8481/select/0/prometheus/vmalert/). +To access Grafana use link [http://localhost:3000](http://localhost:3000). -To access [vmui](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#vmui) -use link [http://localhost:8481/select/0/prometheus/vmui](http://localhost:8481/select/0/prometheus/vmui). +To access [vmui](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#vmui) +use link [http://localhost:8427/select/0/prometheus/vmui/](http://localhost:8427/select/0/prometheus/vmui/). + +To access `vmalert` use link [http://localhost:8427/select/0/prometheus/vmalert/](http://localhost:8427/select/0/prometheus/vmalert/). ## vmagent @@ -75,6 +81,13 @@ with listed targets for scraping. [Web interface link](http://localhost:8429/). +## vmauth + +[vmauth](https://docs.victoriametrics.com/vmauth.html) acts as a [balancer](https://docs.victoriametrics.com/vmauth.html#load-balancing) +to spread the load across `vmselect`'s. [Grafana](#grafana) and [vmalert](#vmalert) use vmauth for read queries. +vmauth config is available [here](ttps://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/auth-cluster.yml) + + ## vmalert vmalert evaluates alerting rules [alerts.yml](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/alerts.yml) diff --git a/deployment/docker/auth-cluster.yml b/deployment/docker/auth-cluster.yml new file mode 100644 index 000000000000..820fa0f40173 --- /dev/null +++ b/deployment/docker/auth-cluster.yml @@ -0,0 +1,6 @@ +# balance load among vmselects +# see https://docs.victoriametrics.com/vmauth.html#load-balancing +unauthorized_user: + url_prefix: + - http://vmselect-1:8481 + - http://vmselect-2:8481 \ No newline at end of file diff --git a/deployment/docker/docker-compose-cluster.yml b/deployment/docker/docker-compose-cluster.yml index 307765cad47f..ec9c3052c11e 100644 --- a/deployment/docker/docker-compose-cluster.yml +++ b/deployment/docker/docker-compose-cluster.yml @@ -2,7 +2,7 @@ version: '3.5' services: vmagent: container_name: vmagent - image: victoriametrics/vmagent:v1.93.4 + image: victoriametrics/vmagent:v1.93.5 depends_on: - "vminsert" ports: @@ -19,7 +19,7 @@ services: container_name: grafana image: grafana/grafana:9.2.7 depends_on: - - "vmselect" + - "vmauth" ports: - 3000:3000 restart: always @@ -32,7 +32,7 @@ services: vmstorage-1: container_name: vmstorage-1 - image: victoriametrics/vmstorage:v1.93.4-cluster + image: victoriametrics/vmstorage:v1.93.5-cluster ports: - 8482 - 8400 @@ -44,7 +44,7 @@ services: restart: always vmstorage-2: container_name: vmstorage-2 - image: victoriametrics/vmstorage:v1.93.4-cluster + image: victoriametrics/vmstorage:v1.93.5-cluster ports: - 8482 - 8400 @@ -54,9 +54,10 @@ services: command: - '--storageDataPath=/storage' restart: always + vminsert: container_name: vminsert - image: victoriametrics/vminsert:v1.93.4-cluster + image: victoriametrics/vminsert:v1.93.5-cluster depends_on: - "vmstorage-1" - "vmstorage-2" @@ -66,9 +67,24 @@ services: ports: - 8480:8480 restart: always - vmselect: - container_name: vmselect - image: victoriametrics/vmselect:v1.93.4-cluster + + vmselect-1: + container_name: vmselect-1 + image: victoriametrics/vmselect:v1.93.5-cluster + depends_on: + - "vmstorage-1" + - "vmstorage-2" + command: + - '--storageNode=vmstorage-1:8401' + - '--storageNode=vmstorage-2:8401' + - '--vmalert.proxyURL=http://vmalert:8880' + ports: + - 8481 + restart: always + + vmselect-2: + container_name: vmselect-2 + image: victoriametrics/vmselect:v1.93.5-cluster depends_on: - "vmstorage-1" - "vmstorage-2" @@ -77,14 +93,29 @@ services: - '--storageNode=vmstorage-2:8401' - '--vmalert.proxyURL=http://vmalert:8880' ports: - - 8481:8481 + - 8481 + restart: always + + vmauth: + container_name: vmauth + image: victoriametrics/vmauth:v1.93.5 + depends_on: + - "vmselect-1" + - "vmselect-2" + volumes: + - ./auth-cluster.yml:/etc/auth.yml +# - /var/run/docker.sock:/var/run/docker.sock + command: + - '--auth.config=/etc/auth.yml' + ports: + - 8427:8427 restart: always vmalert: container_name: vmalert - image: victoriametrics/vmalert:v1.93.4 + image: victoriametrics/vmalert:v1.93.5 depends_on: - - "vmselect" + - "vmauth" ports: - 8880:8880 volumes: @@ -93,8 +124,8 @@ services: - ./alerts-vmagent.yml:/etc/alerts/alerts-vmagent.yml - ./alerts-vmalert.yml:/etc/alerts/alerts-vmalert.yml command: - - '--datasource.url=http://vmselect:8481/select/0/prometheus' - - '--remoteRead.url=http://vmselect:8481/select/0/prometheus' + - '--datasource.url=http://vmauth:8427/select/0/prometheus' + - '--remoteRead.url=http://vmauth:8427/select/0/prometheus' - '--remoteWrite.url=http://vminsert:8480/insert/0/prometheus' - '--notifier.url=http://alertmanager:9093/' - '--rule=/etc/alerts/*.yml' diff --git a/deployment/docker/docker-compose.yml b/deployment/docker/docker-compose.yml index 0a615325ca77..f2969f5750d8 100644 --- a/deployment/docker/docker-compose.yml +++ b/deployment/docker/docker-compose.yml @@ -2,7 +2,7 @@ version: "3.5" services: vmagent: container_name: vmagent - image: victoriametrics/vmagent:v1.93.4 + image: victoriametrics/vmagent:v1.93.5 depends_on: - "victoriametrics" ports: @@ -18,7 +18,7 @@ services: restart: always victoriametrics: container_name: victoriametrics - image: victoriametrics/victoria-metrics:v1.93.4 + image: victoriametrics/victoria-metrics:v1.93.5 ports: - 8428:8428 - 8089:8089 @@ -57,7 +57,7 @@ services: restart: always vmalert: container_name: vmalert - image: victoriametrics/vmalert:v1.93.4 + image: victoriametrics/vmalert:v1.93.5 depends_on: - "victoriametrics" - "alertmanager" diff --git a/deployment/docker/prometheus-cluster.yml b/deployment/docker/prometheus-cluster.yml index 32336929be1a..e765b0860375 100644 --- a/deployment/docker/prometheus-cluster.yml +++ b/deployment/docker/prometheus-cluster.yml @@ -13,7 +13,7 @@ scrape_configs: - targets: ['vminsert:8480'] - job_name: 'vmselect' static_configs: - - targets: ['vmselect:8481'] + - targets: ['vmselect-1:8481', 'vmselect-2:8481'] - job_name: 'vmstorage' static_configs: - targets: ['vmstorage-1:8482', 'vmstorage-2:8482'] \ No newline at end of file diff --git a/deployment/docker/provisioning/datasources/datasource.yml b/deployment/docker/provisioning/datasources/datasource.yml index e16c273c4ae5..c0a7a20c97c7 100644 --- a/deployment/docker/provisioning/datasources/datasource.yml +++ b/deployment/docker/provisioning/datasources/datasource.yml @@ -10,5 +10,5 @@ datasources: - name: VictoriaMetrics - cluster type: prometheus access: proxy - url: http://vmselect:8481/select/0/prometheus + url: http://vmauth:8427/select/0/prometheus isDefault: false \ No newline at end of file diff --git a/deployment/logs-benchmark/docker-compose.yml b/deployment/logs-benchmark/docker-compose.yml index dd3ea91f0e26..8e131c3bb800 100644 --- a/deployment/logs-benchmark/docker-compose.yml +++ b/deployment/logs-benchmark/docker-compose.yml @@ -105,7 +105,7 @@ services: - '--config=/config.yml' vmsingle: - image: victoriametrics/victoria-metrics:v1.93.4 + image: victoriametrics/victoria-metrics:v1.93.5 ports: - '8428:8428' command: diff --git a/deployment/marketplace/digitialocean/one-click-droplet/RELEASE_GUIDE.md b/deployment/marketplace/digitialocean/one-click-droplet/RELEASE_GUIDE.md index 1e558c8ef077..b91d1f83d396 100644 --- a/deployment/marketplace/digitialocean/one-click-droplet/RELEASE_GUIDE.md +++ b/deployment/marketplace/digitialocean/one-click-droplet/RELEASE_GUIDE.md @@ -8,7 +8,7 @@ 4. Set variables `DIGITALOCEAN_API_TOKEN` with `VM_VERSION` for `packer` environment and run make from example below: ```console -make release-victoria-metrics-digitalocean-oneclick-droplet DIGITALOCEAN_API_TOKEN="dop_v23_2e46f4759ceeeba0d0248" VM_VERSION="1.93.4" +make release-victoria-metrics-digitalocean-oneclick-droplet DIGITALOCEAN_API_TOKEN="dop_v23_2e46f4759ceeeba0d0248" VM_VERSION="1.93.5" ``` diff --git a/deployment/marketplace/digitialocean/one-click-droplet/files/etc/update-motd.d/99-one-click b/deployment/marketplace/digitialocean/one-click-droplet/files/etc/update-motd.d/99-one-click index eb45bfab1c9f..c6179e002b2d 100755 --- a/deployment/marketplace/digitialocean/one-click-droplet/files/etc/update-motd.d/99-one-click +++ b/deployment/marketplace/digitialocean/one-click-droplet/files/etc/update-motd.d/99-one-click @@ -19,8 +19,8 @@ On the server: * VictoriaMetrics is running on ports: 8428, 8089, 4242, 2003 and they are bound to the local interface. ******************************************************************************** - # This image includes 1.93.4 version of VictoriaMetrics. - # See Release notes https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.93.4 + # This image includes 1.93.5 version of VictoriaMetrics. + # See Release notes https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.93.5 # Welcome to VictoriaMetrics droplet! diff --git a/docs/Articles.md b/docs/Articles.md index 8cb903c6eba0..96e9c7a9cb59 100644 --- a/docs/Articles.md +++ b/docs/Articles.md @@ -137,3 +137,4 @@ See also [case studies](https://docs.victoriametrics.com/CaseStudies.html). * [VictoriaMetrics Meetup December 2022](https://www.youtube.com/watch?v=Mesc6JBFNhQ). See also [slides for "VictoriaMetrics 2022: new features" talk](https://docs.google.com/presentation/d/1jI7XZoodmuzLymdu4MToG9onAKQjzCNwMO2NDupyUkQ/edit?usp=sharing). * [Comparing Thanos to VictoriaMetrics cluster](https://faun.pub/comparing-thanos-to-victoriametrics-cluster-b193bea1683) * [Evaluation performance and correctness: VictoriaMetrics response](https://valyala.medium.com/evaluating-performance-and-correctness-victoriametrics-response-e27315627e87) +* [How to reduce expenses on monitoring slides](https://www.slideshare.net/RomanKhavronenko/how-to-reduce-expenses-on-monitoringpdf) \ No newline at end of file diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md index 1291ab99c379..fc9db8799a04 100644 --- a/docs/CHANGELOG.md +++ b/docs/CHANGELOG.md @@ -21,6 +21,10 @@ The following `tip` changes can be tested by building VictoriaMetrics components * [How to build vmauth](https://docs.victoriametrics.com/vmauth.html#how-to-build-from-sources) * [How to build vmctl](https://docs.victoriametrics.com/vmctl.html#how-to-build) +Metrics of the latest version of VictoriaMetrics cluster are available for viewing at our +[sandbox](https://play-grafana.victoriametrics.com/d/oS7Bi_0Wz_vm/victoriametrics-cluster-vm). +The sandbox cluster installation is running under the constant load generated by +[prometheus-benchmark](https://github.com/VictoriaMetrics/prometheus-benchmark) and used for testing latest releases. ## tip @@ -37,18 +41,29 @@ The following `tip` changes can be tested by building VictoriaMetrics components * FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): add button for auto-formatting PromQL/MetricsQL queries. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4681). Thanks to @aramattamara for the [pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4694). * FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): improve accessibility score to 100 according to [Google's Lighthouse](https://developer.chrome.com/docs/lighthouse/accessibility/) tests. * FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): organize `min`, `max`, `median` values on the chart legend and tooltips for better visibility. +* FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): add explanation about [cardinality explorer](https://docs.victoriametrics.com/#cardinality-explorer) statistic inaccuracy in VictoriaMetrics cluster. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3070). * FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): add storage of query history in `localStorage`. See [the pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5022). * FEATURE: dashboards: provide copies of Grafana dashboards alternated with VictoriaMetrics datasource at [dashboards/vm](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/dashboards/vm). -* FEATURE: [vmauth](https://docs.victoriametrics.com/vmauth.html): added ability to set, override and clear request and response headers on a per-user and per-path basis. See [this i -ssue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4825) and [these docs](https://docs.victoriametrics.com/vmauth.html#auth-config) for details. +* FEATURE: [vmauth](https://docs.victoriametrics.com/vmauth.html): added ability to set, override and clear request and response headers on a per-user and per-path basis. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4825) and [these docs](https://docs.victoriametrics.com/vmauth.html#auth-config) for details. * FEATURE: [vmauth](https://docs.victoriametrics.com/vmauth.html): add ability to retry requests to the [remaining backends](https://docs.victoriametrics.com/vmauth.html#load-balancing) if they return response status codes specified in the `retry_status_codes` list. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4893). +* FEATURE: [vmauth](https://docs.victoriametrics.com/vmauth.html): expose metrics `vmauth_config_last_reload_*` for tracking the state of config reloads, similarly to vmagent/vmalert components. +* FEATURE: [vmauth](https://docs.victoriametrics.com/vmauth.html): do not print logs like `SIGHUP received...` once per configured `-configCheckInterval` cmd-line flag. This log will be printed only if config reload was invoked manually. * FEATURE: [vmalert](https://docs.victoriametrics.com/vmalert.html): add `eval_offset` attribute for [Groups](https://docs.victoriametrics.com/vmalert.html#groups). If specified, Group will be evaluated at the exact time offset on the range of [0...evaluationInterval]. The setting might be useful for cron-like rules which must be evaluated at specific moments of time. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3409) for details. * FEATURE: [vmalert](https://docs.victoriametrics.com/vmalert.html): validate [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html) function names in alerting and recording rules when `vmalert` runs with `-dryRun` command-line flag. Previously it was allowed to use unknown (aka invalid) MetricsQL function names there. For example, `foo()` was counted as a valid query. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4933). * FEATURE: limit the length of string params in log messages to 500 chars. Longer string params are replaced with the `first_250_chars..last_250_chars`. This prevents from too long log lines, which can be emitted by VictoriaMetrics components. +* FEATURE: [docker compose environment](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker): add `vmauth` component to cluster's docker-compose example for balancing load among multiple `vmselect` components. +* FEATURE: [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html): make sure that `q2` series are returned after `q1` series in the results of `q1 or q2` query, in the same way as Prometheus does. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4763). +* FEATURE: stop exposing `vm_merge_need_free_disk_space` metric, since it has been appeared that it confuses users while doesn't bring any useful information. See [this comment](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686#issuecomment-1733844128). * BUGFIX: [Official Grafana dashboards for VictoriaMetrics](https://grafana.com/orgs/victoriametrics): fix display of ingested rows rate for `Samples ingested/s` and `Samples rate` panels for vmagent's dasbhoard. Previously, not all ingested protocols were accounted in these panels. An extra panel `Rows rate` was added to `Ingestion` section to display the split for rows ingested rate by protocol. +* BUGFIX: [Official Grafana dashboards for VictoriaMetrics](https://grafana.com/orgs/victoriametrics): move vmagent's `Concurrent inserts` panel to Troubleshooting section from `Ingestion` section because this panel is related to both: scraped and ingested data. Before, it could have give a misleading impression that it is related to ingested metrics only. * BUGFIX: [vmui](https://docs.victoriametrics.com/#vmui): fix the bug causing render looping when switching to heatmap. * BUGFIX: [VictoriaMetrics enterprise](https://docs.victoriametrics.com/enterprise.html) validate `-dedup.minScrapeInterval` value and `-downsampling.period` intervals are multiples of each other. See [these docs](https://docs.victoriametrics.com/#downsampling). +* BUGFIX: [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html): fix bitmap_*() functions behavior. These functions will return `NaN` if timeseries has no value for timestamp. Previously these functions return `0`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4996). +* BUGFIX: [vmbackup](https://docs.victoriametrics.com/vmbackup.html): properly copy `appliedRetention.txt` files inside `<-storageDataPath>/{data}` folders during [incremental backups](https://docs.victoriametrics.com/vmbackup.html#incremental-backups). Previously the new `appliedRetention.txt` could be skipped during incremental backups, which could lead to increased load on storage after restoring from backup. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005). +* BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): suppress `context canceled` error messages in logs when `vmagent` is reloading service discovery config. This error could appear starting from [v1.93.5](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.93.5). See [this PR](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5048). +* BUGFIX: [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html): allow passing [median_over_time](https://docs.victoriametrics.com/MetricsQL.html#median_over_time) to [aggr_over_time](https://docs.victoriametrics.com/MetricsQL.html#aggr_over_time). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5034). +* BUGFIX: [vminsert](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html): fix ingestion via [multitenant url](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenancy-via-labels) for opentsdbhttp. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5061). The bug has been introduced in [v1.93.2](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.93.2). ## [v1.93.5](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.93.5) @@ -61,6 +76,7 @@ The v1.93.x line will be supported for at least 12 months since [v1.93.0](https: * BUGFIX: [Graphite Render API](https://docs.victoriametrics.com/#graphite-render-api-usage): correctly return `null` instead of `Inf` in JSON query responses. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3783). * BUGFIX: [vmbackup](https://docs.victoriametrics.com/vmbackup.html): properly copy `parts.json` files inside `<-storageDataPath>/{data,indexdb}` folders during [incremental backups](https://docs.victoriametrics.com/vmbackup.html#incremental-backups). Previously the new `parts.json` could be skipped during incremental backups, which could lead to inability to restore from the backup. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005). This issue has been introduced in [v1.90.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.90.0). * BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): properly close connections to Kubernetes API server after the change in `selectors` or `namespaces` sections of [kubernetes_sd_configs](https://docs.victoriametrics.com/sd_configs.html#kubernetes_sd_configs). Previously `vmagent` could continue polling Kubernetes API server with the old `selectors` or `namespaces` configs additionally to polling new configs. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850). +* BUGFIX: [vmauth](https://docs.victoriametrics.com/vmauth.html): prevent configuration reloading if there were no changes in config. This improves memory usage when `-configCheckInterval` cmd-line flag is configured and config has extensive list of regexp expressions requiring additional memory on parsing. ## [v1.93.4](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.93.4) diff --git a/docs/CaseStudies.md b/docs/CaseStudies.md index 3ab0d32799f8..0e94fdd2b2e3 100644 --- a/docs/CaseStudies.md +++ b/docs/CaseStudies.md @@ -23,6 +23,7 @@ where you can chat with VictoriaMetrics users to get additional references, revi - [Brandwatch](#brandwatch) - [CERN](#cern) - [COLOPL](#colopl) + - [Criteo](#criteo) - [Dig Security](#dig-security) - [Fly.io](#flyio) - [German Research Center for Artificial Intelligence](#german-research-center-for-artificial-intelligence) @@ -242,6 +243,13 @@ after evaulating the following remote storage solutions for Prometheus: See [slides](https://speakerdeck.com/inletorder/monitoring-platform-with-victoria-metrics) and [video](https://www.youtube.com/watch?v=hUpHIluxw80) from `Large-scale, super-load system monitoring platform built with VictoriaMetrics` talk at [Prometheus Meetup Tokyo #3](https://prometheus.connpass.com/event/157721/). +## Criteo + +[Criteo](https://www.criteo.com/) is a global technology company that helps marketers and media owners reach their goals through the world’s leading Commerce Media Platform. + +See [this blog post](https://medium.com/criteo-engineering/victoriametrics-a-prometheus-remote-storage-solution-57081a3d8e61) on how Criteo started using VictoriaMetrics +and why they prefer VictoriaMetrics over competing solutions. + ## Dig Security [Dig Security](https://www.dig.security) is a cloud data security startup with 50+ employees that provides real-time visibility, control, and protection of data assets. diff --git a/docs/Cluster-VictoriaMetrics.md b/docs/Cluster-VictoriaMetrics.md index c69ba299d36c..4c24ca0448b6 100644 --- a/docs/Cluster-VictoriaMetrics.md +++ b/docs/Cluster-VictoriaMetrics.md @@ -821,15 +821,15 @@ Below is the output for `/path/to/vminsert -help`: -cacheExpireDuration duration Items are removed from in-memory caches after they aren't accessed for this duration. Lower values may reduce memory usage at the cost of higher CPU usage. See also -prevCacheRemovalPercent (default 30m0s) -cluster.tls - Whether to use TLS for connections to -storageNode. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in enterprise version of VictoriaMetrics + Whether to use TLS for connections to -storageNode. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html -cluster.tlsCAFile string - Path to TLS CA file to use for verifying certificates provided by -storageNode if -cluster.tls flag is set. By default system CA is used. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in enterprise version of VictoriaMetrics + Path to TLS CA file to use for verifying certificates provided by -storageNode if -cluster.tls flag is set. By default system CA is used. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html -cluster.tlsCertFile string - Path to client-side TLS certificate file to use when connecting to -storageNode if -cluster.tls flag is set. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in enterprise version of VictoriaMetrics + Path to client-side TLS certificate file to use when connecting to -storageNode if -cluster.tls flag is set. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html -cluster.tlsInsecureSkipVerify - Whether to skip verification of TLS certificates provided by -storageNode nodes if -cluster.tls flag is set. Note that disabled TLS certificate verification breaks security. This flag is available only in enterprise version of VictoriaMetrics + Whether to skip verification of TLS certificates provided by -storageNode nodes if -cluster.tls flag is set. Note that disabled TLS certificate verification breaks security. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html -cluster.tlsKeyFile string - Path to client-side TLS key file to use when connecting to -storageNode if -cluster.tls flag is set. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in enterprise version of VictoriaMetrics + Path to client-side TLS key file to use when connecting to -storageNode if -cluster.tls flag is set. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html -clusternativeListenAddr string TCP address to listen for data from other vminsert nodes in multi-level cluster setup. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multi-level-cluster-setup . Usually :8400 should be set to match default vmstorage port for vminsert. Disabled work if empty -csvTrimTimestamp duration @@ -937,7 +937,7 @@ Below is the output for `/path/to/vminsert -help`: -loggerWarnsPerSecondLimit int Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit -maxConcurrentInserts int - The maximum number of concurrent insert requests. The default value should work for most cases, since it minimizes memory usage. The default value can be increased when clients send data over slow networks. See also -insert.maxQueueDuration (default 8) + The maximum number of concurrent insert requests. Default value should work for most cases, since it minimizes the memory usage. The default value can be increased when clients send data over slow networks. See also -insert.maxQueueDuration (default 8) -maxInsertRequestSize size The maximum size in bytes of a single Prometheus remote_write API request Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 33554432) @@ -1025,15 +1025,15 @@ Below is the output for `/path/to/vmselect -help`: -cacheExpireDuration duration Items are removed from in-memory caches after they aren't accessed for this duration. Lower values may reduce memory usage at the cost of higher CPU usage. See also -prevCacheRemovalPercent (default 30m0s) -cluster.tls - Whether to use TLS for connections to -storageNode. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in enterprise version of VictoriaMetrics + Whether to use TLS for connections to -storageNode. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html -cluster.tlsCAFile string - Path to TLS CA file to use for verifying certificates provided by -storageNode if -cluster.tls flag is set. By default system CA is used. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in enterprise version of VictoriaMetrics + Path to TLS CA file to use for verifying certificates provided by -storageNode if -cluster.tls flag is set. By default system CA is used. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html -cluster.tlsCertFile string - Path to client-side TLS certificate file to use when connecting to -storageNode if -cluster.tls flag is set. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in enterprise version of VictoriaMetrics + Path to client-side TLS certificate file to use when connecting to -storageNode if -cluster.tls flag is set. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html -cluster.tlsInsecureSkipVerify - Whether to skip verification of TLS certificates provided by -storageNode nodes if -cluster.tls flag is set. Note that disabled TLS certificate verification breaks security. This flag is available only in enterprise version of VictoriaMetrics + Whether to skip verification of TLS certificates provided by -storageNode nodes if -cluster.tls flag is set. Note that disabled TLS certificate verification breaks security. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html -cluster.tlsKeyFile string - Path to client-side TLS key file to use when connecting to -storageNode if -cluster.tls flag is set. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in enterprise version of VictoriaMetrics + Path to client-side TLS key file to use when connecting to -storageNode if -cluster.tls flag is set. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html -clusternative.disableCompression Whether to disable compression of the data sent to vmselect via -clusternativeListenAddr. This reduces CPU usage at the cost of higher network bandwidth usage -clusternative.maxConcurrentRequests int @@ -1047,18 +1047,18 @@ Below is the output for `/path/to/vmselect -help`: -clusternative.maxTagValues int The maximum number of tag values returned per search at -clusternativeListenAddr (default 100000) -clusternative.tls - Whether to use TLS when accepting connections at -clusternativeListenAddr. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection + Whether to use TLS when accepting connections at -clusternativeListenAddr. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html -clusternative.tlsCAFile string - Path to TLS CA file to use for verifying certificates provided by vmselect, which connects at -clusternativeListenAddr if -clusternative.tls flag is set. By default system CA is used. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection + Path to TLS CA file to use for verifying certificates provided by vmselect, which connects at -clusternativeListenAddr if -clusternative.tls flag is set. By default system CA is used. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html -clusternative.tlsCertFile string - Path to server-side TLS certificate file to use when accepting connections at -clusternativeListenAddr if -clusternative.tls flag is set. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection + Path to server-side TLS certificate file to use when accepting connections at -clusternativeListenAddr if -clusternative.tls flag is set. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html -clusternative.tlsCipherSuites array - Optional list of TLS cipher suites used for connections at -clusternativeListenAddr if -clusternative.tls flag is set. See the list of supported cipher suites at https://pkg.go.dev/crypto/tls#pkg-constants + Optional list of TLS cipher suites used for connections at -clusternativeListenAddr if -clusternative.tls flag is set. See the list of supported cipher suites at https://pkg.go.dev/crypto/tls#pkg-constants . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html Supports an array of values separated by comma or specified via multiple flags. -clusternative.tlsInsecureSkipVerify - Whether to skip verification of TLS certificates provided by vmselect, which connects to -clusternativeListenAddr if -clusternative.tls flag is set. Note that disabled TLS certificate verification breaks security + Whether to skip verification of TLS certificates provided by vmselect, which connects to -clusternativeListenAddr if -clusternative.tls flag is set. Note that disabled TLS certificate verification breaks security. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html -clusternative.tlsKeyFile string - Path to server-side TLS key file to use when accepting connections at -clusternativeListenAddr if -clusternative.tls flag is set. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection + Path to server-side TLS key file to use when accepting connections at -clusternativeListenAddr if -clusternative.tls flag is set. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#mtls-protection . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html -clusternativeListenAddr string TCP address to listen for requests from other vmselect nodes in multi-level cluster setup. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multi-level-cluster-setup . Usually :8401 should be set to match default vmstorage port for vmselect. Disabled work if empty -dedup.minScrapeInterval duration @@ -1367,7 +1367,7 @@ Below is the output for `/path/to/vmstorage -help`: -loggerWarnsPerSecondLimit int Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit -maxConcurrentInserts int - The maximum number of concurrent insert requests. The default value should work for most cases, since it minimizes memory usage. The default value can be increased when clients send data over slow networks. See also -insert.maxQueueDuration (default 8) + The maximum number of concurrent insert requests. Default value should work for most cases, since it minimizes the memory usage. The default value can be increased when clients send data over slow networks. See also -insert.maxQueueDuration (default 8) -memory.allowedBytes size Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from the OS page cache resulting in higher disk IO usage Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 0) diff --git a/docs/README.md b/docs/README.md index 169780f44c3b..4563dcaaf90c 100644 --- a/docs/README.md +++ b/docs/README.md @@ -113,6 +113,7 @@ Case studies: * [Brandwatch](https://docs.victoriametrics.com/CaseStudies.html#brandwatch) * [CERN](https://docs.victoriametrics.com/CaseStudies.html#cern) * [COLOPL](https://docs.victoriametrics.com/CaseStudies.html#colopl) +* [Criteo](https://docs.victoriametrics.com/CaseStudies.html#criteo) * [Dig Security](https://docs.victoriametrics.com/CaseStudies.html#dig-security) * [Fly.io](https://docs.victoriametrics.com/CaseStudies.html#flyio) * [German Research Center for Artificial Intelligence](https://docs.victoriametrics.com/CaseStudies.html#german-research-center-for-artificial-intelligence) @@ -367,6 +368,8 @@ See the [example VMUI at VictoriaMetrics playground](https://play.victoriametric * queries with the biggest average execution duration; * queries that took the most summary time for execution. +This information is obtained from the `/api/v1/status/top_queries` HTTP endpoint. + ## Active queries [VMUI](#vmui) provides `active queries` tab, which shows currently execute queries. @@ -376,6 +379,8 @@ It provides the following information per each query: - The duration of the query execution. - The client address, who initiated the query execution. +This information is obtained from the `/api/v1/status/active_queries` HTTP endpoint. + ## Metrics explorer [VMUI](#vmui) provides an ability to explore metrics exported by a particular `job` / `instance` in the following way: @@ -407,14 +412,16 @@ matching the specified [series selector](https://prometheus.io/docs/prometheus/l Cardinality explorer is built on top of [/api/v1/status/tsdb](#tsdb-stats). +See [cardinality explorer playground](https://play.victoriametrics.com/select/accounting/1/6a716b0f-38bc-4856-90ce-448fd713e3fe/prometheus/graph/#/cardinality). +See the example of using the cardinality explorer [here](https://victoriametrics.com/blog/cardinality-explorer/). + +## Cardinality explorer statistic inaccuracy + In [cluster version of VictoriaMetrics](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) each vmstorage tracks the stored time series individually. vmselect requests stats via [/api/v1/status/tsdb](#tsdb-stats) API from each vmstorage node and merges the results by summing per-series stats. This may lead to inflated values when samples for the same time series are spread across multiple vmstorage nodes due to [replication](#replication) or [rerouting](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html?highlight=re-routes#cluster-availability). -See [cardinality explorer playground](https://play.victoriametrics.com/select/accounting/1/6a716b0f-38bc-4856-90ce-448fd713e3fe/prometheus/graph/#/cardinality). -See the example of using the cardinality explorer [here](https://victoriametrics.com/blog/cardinality-explorer/). - ## How to apply new config to VictoriaMetrics VictoriaMetrics is configured via command-line flags, so it must be restarted when new command-line flags should be applied: @@ -619,6 +626,28 @@ Some plugins for Telegraf such as [fluentd](https://github.com/fangli/fluent-plu or [Juniper/jitmon](https://github.com/Juniper/jtimon) send `SHOW DATABASES` query to `/query` and expect a particular database name in the response. Comma-separated list of expected databases can be passed to VictoriaMetrics via `-influx.databaseNames` command-line flag. +### How to send data in InfluxDB v2 format + +VictoriaMetrics exposes endpoint for InfluxDB v2 HTTP API at `/influx/api/v2/write` and `/api/v2/write`. + + +In order to write data with InfluxDB line protocol to local VictoriaMetrics using `curl`: + +
+ +```console +curl -d 'measurement,tag1=value1,tag2=value2 field1=123,field2=1.23' -X POST 'http://localhost:8428/api/v2/write' +``` + +
+ +The `/api/v1/export` endpoint should return the following response: + +```json +{"metric":{"__name__":"measurement_field1","tag1":"value1","tag2":"value2"},"values":[123],"timestamps":[1695902762311]} +{"metric":{"__name__":"measurement_field2","tag1":"value1","tag2":"value2"},"values":[1.23],"timestamps":[1695902762311]} +``` + ## How to send data from Graphite-compatible agents such as [StatsD](https://github.com/etsy/statsd) Enable Graphite receiver in VictoriaMetrics by setting `-graphiteListenAddr` command line flag. For instance, @@ -833,7 +862,7 @@ Additionally, VictoriaMetrics provides the following handlers: * `/api/v1/series/count` - returns the total number of time series in the database. Some notes: * the handler scans all the inverted index, so it can be slow if the database contains tens of millions of time series; * the handler may count [deleted time series](#how-to-delete-time-series) additionally to normal time series due to internal implementation restrictions; -* `/api/v1/status/active_queries` - returns a list of currently running queries. +* `/api/v1/status/active_queries` - returns the list of currently running queries. This list is also available at [`active queries` page at VMUI](#active-queries). * `/api/v1/status/top_queries` - returns the following query lists: * the most frequently executed queries - `topByCount` * queries with the biggest average execution duration - `topByAvgDuration` @@ -843,6 +872,8 @@ Additionally, VictoriaMetrics provides the following handlers: For example, request to `/api/v1/status/top_queries?topN=5&maxLifetime=30s` would return up to 5 queries per list, which were executed during the last 30 seconds. VictoriaMetrics tracks the last `-search.queryStats.lastQueriesCount` queries with durations at least `-search.queryStats.minQueryDuration`. + See also [`top queries` page at VMUI](#top-queries). + ### Timestamp formats VictoriaMetrics accepts the following formats for `time`, `start` and `end` query args @@ -1793,9 +1824,9 @@ Graphs on the dashboards contain useful hints - hover the `i` icon in the top le We recommend setting up [alerts](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#alerts) via [vmalert](https://docs.victoriametrics.com/vmalert.html) or via Prometheus. -VictoriaMetrics exposes currently running queries and their execution times at `/api/v1/status/active_queries` page. +VictoriaMetrics exposes currently running queries and their execution times at [`active queries` page](#active-queries). -VictoriaMetrics exposes queries, which take the most time to execute, at `/api/v1/status/top_queries` page. +VictoriaMetrics exposes queries, which take the most time to execute, at [`top queries` page](#top-queries). See also [VictoriaMetrics Monitoring](https://victoriametrics.com/blog/victoriametrics-monitoring/) and [troubleshooting docs](https://docs.victoriametrics.com/Troubleshooting.html). @@ -1940,9 +1971,6 @@ and [cardinality explorer docs](#cardinality-explorer). has at least 20% of free space. The remaining amount of free space can be [monitored](#monitoring) via `vm_free_disk_space_bytes` metric. The total size of data stored on the disk can be monitored via sum of `vm_data_size_bytes` metrics. - See also `vm_merge_need_free_disk_space` metrics, which are set to values higher than 0 - if background merge cannot be initiated due to free disk space shortage. The value shows the number of per-month partitions, - which would start background merge if they had more free disk space. * VictoriaMetrics buffers incoming data in memory for up to a few seconds before flushing it to persistent storage. This may lead to the following "issues": diff --git a/docs/Single-server-VictoriaMetrics.md b/docs/Single-server-VictoriaMetrics.md index 3a9179479bb6..ffa2fa2afdfd 100644 --- a/docs/Single-server-VictoriaMetrics.md +++ b/docs/Single-server-VictoriaMetrics.md @@ -121,6 +121,7 @@ Case studies: * [Brandwatch](https://docs.victoriametrics.com/CaseStudies.html#brandwatch) * [CERN](https://docs.victoriametrics.com/CaseStudies.html#cern) * [COLOPL](https://docs.victoriametrics.com/CaseStudies.html#colopl) +* [Criteo](https://docs.victoriametrics.com/CaseStudies.html#criteo) * [Dig Security](https://docs.victoriametrics.com/CaseStudies.html#dig-security) * [Fly.io](https://docs.victoriametrics.com/CaseStudies.html#flyio) * [German Research Center for Artificial Intelligence](https://docs.victoriametrics.com/CaseStudies.html#german-research-center-for-artificial-intelligence) @@ -375,6 +376,8 @@ See the [example VMUI at VictoriaMetrics playground](https://play.victoriametric * queries with the biggest average execution duration; * queries that took the most summary time for execution. +This information is obtained from the `/api/v1/status/top_queries` HTTP endpoint. + ## Active queries [VMUI](#vmui) provides `active queries` tab, which shows currently execute queries. @@ -384,6 +387,8 @@ It provides the following information per each query: - The duration of the query execution. - The client address, who initiated the query execution. +This information is obtained from the `/api/v1/status/active_queries` HTTP endpoint. + ## Metrics explorer [VMUI](#vmui) provides an ability to explore metrics exported by a particular `job` / `instance` in the following way: @@ -415,13 +420,16 @@ matching the specified [series selector](https://prometheus.io/docs/prometheus/l Cardinality explorer is built on top of [/api/v1/status/tsdb](#tsdb-stats). +See [cardinality explorer playground](https://play.victoriametrics.com/select/accounting/1/6a716b0f-38bc-4856-90ce-448fd713e3fe/prometheus/graph/#/cardinality). +See the example of using the cardinality explorer [here](https://victoriametrics.com/blog/cardinality-explorer/). + +## Cardinality explorer statistic inaccuracy + In [cluster version of VictoriaMetrics](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) each vmstorage tracks the stored time series individually. vmselect requests stats via [/api/v1/status/tsdb](#tsdb-stats) API from each vmstorage node and merges the results by summing per-series stats. This may lead to inflated values when samples for the same time series are spread across multiple vmstorage nodes due to [replication](#replication) or [rerouting](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html?highlight=re-routes#cluster-availability). -See [cardinality explorer playground](https://play.victoriametrics.com/select/accounting/1/6a716b0f-38bc-4856-90ce-448fd713e3fe/prometheus/graph/#/cardinality). -See the example of using the cardinality explorer [here](https://victoriametrics.com/blog/cardinality-explorer/). ## How to apply new config to VictoriaMetrics @@ -627,6 +635,28 @@ Some plugins for Telegraf such as [fluentd](https://github.com/fangli/fluent-plu or [Juniper/jitmon](https://github.com/Juniper/jtimon) send `SHOW DATABASES` query to `/query` and expect a particular database name in the response. Comma-separated list of expected databases can be passed to VictoriaMetrics via `-influx.databaseNames` command-line flag. +### How to send data in InfluxDB v2 format + +VictoriaMetrics exposes endpoint for InfluxDB v2 HTTP API at `/influx/api/v2/write` and `/api/v2/write`. + + +In order to write data with InfluxDB line protocol to local VictoriaMetrics using `curl`: + +
+ +```console +curl -d 'measurement,tag1=value1,tag2=value2 field1=123,field2=1.23' -X POST 'http://localhost:8428/api/v2/write' +``` + +
+ +The `/api/v1/export` endpoint should return the following response: + +```json +{"metric":{"__name__":"measurement_field1","tag1":"value1","tag2":"value2"},"values":[123],"timestamps":[1695902762311]} +{"metric":{"__name__":"measurement_field2","tag1":"value1","tag2":"value2"},"values":[1.23],"timestamps":[1695902762311]} +``` + ## How to send data from Graphite-compatible agents such as [StatsD](https://github.com/etsy/statsd) Enable Graphite receiver in VictoriaMetrics by setting `-graphiteListenAddr` command line flag. For instance, @@ -841,7 +871,7 @@ Additionally, VictoriaMetrics provides the following handlers: * `/api/v1/series/count` - returns the total number of time series in the database. Some notes: * the handler scans all the inverted index, so it can be slow if the database contains tens of millions of time series; * the handler may count [deleted time series](#how-to-delete-time-series) additionally to normal time series due to internal implementation restrictions; -* `/api/v1/status/active_queries` - returns a list of currently running queries. +* `/api/v1/status/active_queries` - returns the list of currently running queries. This list is also available at [`active queries` page at VMUI](#active-queries). * `/api/v1/status/top_queries` - returns the following query lists: * the most frequently executed queries - `topByCount` * queries with the biggest average execution duration - `topByAvgDuration` @@ -851,6 +881,8 @@ Additionally, VictoriaMetrics provides the following handlers: For example, request to `/api/v1/status/top_queries?topN=5&maxLifetime=30s` would return up to 5 queries per list, which were executed during the last 30 seconds. VictoriaMetrics tracks the last `-search.queryStats.lastQueriesCount` queries with durations at least `-search.queryStats.minQueryDuration`. + See also [`top queries` page at VMUI](#top-queries). + ### Timestamp formats VictoriaMetrics accepts the following formats for `time`, `start` and `end` query args @@ -1801,9 +1833,9 @@ Graphs on the dashboards contain useful hints - hover the `i` icon in the top le We recommend setting up [alerts](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#alerts) via [vmalert](https://docs.victoriametrics.com/vmalert.html) or via Prometheus. -VictoriaMetrics exposes currently running queries and their execution times at `/api/v1/status/active_queries` page. +VictoriaMetrics exposes currently running queries and their execution times at [`active queries` page](#active-queries). -VictoriaMetrics exposes queries, which take the most time to execute, at `/api/v1/status/top_queries` page. +VictoriaMetrics exposes queries, which take the most time to execute, at [`top queries` page](#top-queries). See also [VictoriaMetrics Monitoring](https://victoriametrics.com/blog/victoriametrics-monitoring/) and [troubleshooting docs](https://docs.victoriametrics.com/Troubleshooting.html). @@ -1948,9 +1980,6 @@ and [cardinality explorer docs](#cardinality-explorer). has at least 20% of free space. The remaining amount of free space can be [monitored](#monitoring) via `vm_free_disk_space_bytes` metric. The total size of data stored on the disk can be monitored via sum of `vm_data_size_bytes` metrics. - See also `vm_merge_need_free_disk_space` metrics, which are set to values higher than 0 - if background merge cannot be initiated due to free disk space shortage. The value shows the number of per-month partitions, - which would start background merge if they had more free disk space. * VictoriaMetrics buffers incoming data in memory for up to a few seconds before flushing it to persistent storage. This may lead to the following "issues": diff --git a/docs/Troubleshooting.md b/docs/Troubleshooting.md index 8204f08771f5..e05bf977035c 100644 --- a/docs/Troubleshooting.md +++ b/docs/Troubleshooting.md @@ -296,29 +296,40 @@ There are the following most commons reasons for slow data ingestion in Victoria Some queries may take more time and resources (CPU, RAM, network bandwidth) than others. VictoriaMetrics logs slow queries if their execution time exceeds the duration passed to `-search.logSlowQueryDuration` command-line flag (5s by default). -VictoriaMetrics also provides `/api/v1/status/top_queries` endpoint, which returns + +VictoriaMetrics provides [`top queries` page at VMUI](https://docs.victoriametrics.com/#top-queries), which shows queries that took the most time to execute. -See [these docs](https://docs.victoriametrics.com/#prometheus-querying-api-enhancements) for details. -There are the following solutions exist for slow queries: +There are the following solutions exist for improving performance of slow queries: - Adding more CPU and memory to VictoriaMetrics, so it may perform the slow query faster. - If you use cluster version of VictoriaMetrics, then migration of `vmselect` nodes to machines + If you use cluster version of VictoriaMetrics, then migrating `vmselect` nodes to machines with more CPU and RAM should help improving speed for slow queries. Query performance - is always limited by resources of one vmselect which processes the query. For example, if 2vCPU cores on `vmselect` + is always limited by resources of one `vmselect` which processes the query. For example, if 2vCPU cores on `vmselect` isn't enough to process query fast enough, then migrating `vmselect` to a machine with 4vCPU cores should increase heavy query performance by up to 2x. - If the line on `Concurrent select` graph form the [official Grafana dashboard for VictoriaMetrics](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#monitoring) + If the line on `concurrent select` graph form the [official Grafana dashboard for VictoriaMetrics](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#monitoring) is close to the limit, then prefer adding more `vmselect` nodes to the cluster. Sometimes adding more `vmstorage` nodes also can help improving the speed for slow queries. - Rewriting slow queries, so they become faster. Unfortunately it is hard determining whether the given query is slow by just looking at it. - VictoriaMetrics provides [query tracing](https://docs.victoriametrics.com/#query-tracing) feature, - which can help determine the source of slow query. - See also [this article](https://valyala.medium.com/how-to-optimize-promql-and-metricsql-queries-85a1b75bf986), - which explains how to determine and optimize slow queries. - In practice many slow queries are generated because of improper use of [subqueries](https://docs.victoriametrics.com/MetricsQL.html#subqueries). + The main source of slow queries in practice is [alerting and recording rules](https://docs.victoriametrics.com/vmalert.html#rules) + with long lookbehind windows in square brackets. These queries are frequently used in SLI/SLO calculations such as [Sloth](https://github.com/slok/sloth). + + For example, `avg_over_time(up[30d]) > 0.99` needs to read and process + all the [raw samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples) + for `up` [time series](https://docs.victoriametrics.com/keyConcepts.html#time-series) over the last 30 days + each time it executes. If this query is executed frequently, then it can take significant share of CPU, disk read IO, network bandwidth and RAM. + Such queries can be optimized in the following ways: + + - To reduce the lookbehind window in square brackets. For example, `avg_over_time(up[10d])` takes up to 3x less compute resources + than `avg_over_time(up[30d])` at VictoriaMetrics. + - To increase evaluation interval for alerting and recording rules, so they are executed less frequently. + For example, increasing `-evaluationInterval` command-line flag value at [vmalert](https://docs.victoriametrics.com/vmalert.html) + from `1m` to `2m` should reduce compute resource usage at VictoriaMetrics by 2x. + + Another source of slow queries is improper use of [subqueries](https://docs.victoriametrics.com/MetricsQL.html#subqueries). It is recommended avoiding subqueries if you don't understand clearly how they work. It is easy to create a subquery without knowing about it. For example, `rate(sum(some_metric))` is implicitly transformed into the following subquery @@ -335,6 +346,11 @@ There are the following solutions exist for slow queries: It is likely this query won't return the expected results. Instead, `sum(rate(some_metric))` must be used instead. See [this article](https://www.robustperception.io/rate-then-sum-never-sum-then-rate/) for more details. + VictoriaMetrics provides [query tracing](https://docs.victoriametrics.com/#query-tracing) feature, + which can help determining the source of slow query. + See also [this article](https://valyala.medium.com/how-to-optimize-promql-and-metricsql-queries-85a1b75bf986), + which explains how to determine and optimize slow queries. + ## Out of memory errors diff --git a/docs/VictoriaLogs/CHANGELOG.md b/docs/VictoriaLogs/CHANGELOG.md index f2bb6d29aa2f..a945cc2f7952 100644 --- a/docs/VictoriaLogs/CHANGELOG.md +++ b/docs/VictoriaLogs/CHANGELOG.md @@ -11,9 +11,11 @@ according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/QuickSta * `vl_data_size_bytes{type="indexdb"}` - on-disk size for [log stream](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields) indexes. * FEATURE: add `-insert.maxFieldsPerLine` command-line flag, which can be used for limiting the number of fields per line in logs sent to VictoriaLogs via ingestion protocols. This helps to avoid issues like [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762). * FEATURE: expose `vl_http_request_duration_seconds` histogram at the [/metrics](https://docs.victoriametrics.com/VictoriaLogs/#monitoring) page. Thanks to @crossoverJie for [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4934). +* FEATURE: add support of `-storage.minFreeDiskSpaceBytes` command-line flag to allow switching to read-only mode when running out of disk space at `-storageDataPath`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4737). * BUGFIX: fix possible panic when no data is written to VictoriaLogs for a long time. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4895). Thanks to @crossoverJie for filing and fixing the issue. * BUGFIX: add `/insert/loky/ready` endpoint, which is used by Promtail for healthchecks. This should remove `unsupported path requested: /insert/loki/ready` warning logs. See [this comment](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762#issuecomment-1690966722). +* BUGFIX: prevent panic during background merge when amount of columns in resulting block exceeds max number of columns per block. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762). ## [v0.3.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v0.3.0-victorialogs) diff --git a/docs/VictoriaLogs/README.md b/docs/VictoriaLogs/README.md index 22c722cc1d08..ba8e2c59836f 100644 --- a/docs/VictoriaLogs/README.md +++ b/docs/VictoriaLogs/README.md @@ -239,6 +239,9 @@ Pass `-help` to VictoriaLogs in order to see the list of supported command-line Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 1048576) -storageDataPath string Path to directory with the VictoriaLogs data; see https://docs.victoriametrics.com/VictoriaLogs/#storage (default "victoria-logs-data") + -storage.minFreeDiskSpaceBytes size + The minimum free disk space at -storageDataPath after which the storage stops accepting new data + Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 10000000) -tls Whether to enable TLS for incoming HTTP requests at -httpListenAddr (aka https). -tlsCertFile and -tlsKeyFile must be set if -tls is set -tlsCertFile string diff --git a/docs/operator/CHANGELOG.md b/docs/operator/CHANGELOG.md index 6341ddcbc52e..c9d2022ca3f3 100644 --- a/docs/operator/CHANGELOG.md +++ b/docs/operator/CHANGELOG.md @@ -9,6 +9,8 @@ ### Fixes - [vmcluster](https://docs.victoriametrics.com/operator/api.html#vmcluster): remove redundant annotation `operator.victoriametrics/last-applied-spec` from created workloads like vmstorage statefulset. +- [vmoperator](https://docs.victoriametrics.com/operator/): properly resize statefulset's multiple pvc when needed and allowable, before they could be updated with wrong size. +- [vmoperator](https://docs.victoriametrics.com/operator/): fix wrong api group of endpointsices, before vmagent won't able to access endpointsices resources with default rbac rule. ## [v0.38.0](https://github.com/VictoriaMetrics/operator/releases/tag/v0.38.0) - 11 Sep 2023 diff --git a/docs/operator/vars.md b/docs/operator/vars.md index dad003d18e07..5142a05d1142 100644 --- a/docs/operator/vars.md +++ b/docs/operator/vars.md @@ -10,7 +10,7 @@ aliases: - /operator/vars.html --- # Auto Generated vars for package config - updated at Wed Sep 13 14:05:24 UTC 2023 + updated at Wed Sep 27 00:09:29 UTC 2023 | varible name | variable default value | variable required | variable description | @@ -20,7 +20,7 @@ aliases: | VM_CUSTOMCONFIGRELOADERIMAGE | victoriametrics/operator:config-reloader-v0.32.0 | false | - | | VM_PSPAUTOCREATEENABLED | false | false | - | | VM_VMALERTDEFAULT_IMAGE | victoriametrics/vmalert | false | - | -| VM_VMALERTDEFAULT_VERSION | v1.93.4 | false | - | +| VM_VMALERTDEFAULT_VERSION | v1.93.5 | false | - | | VM_VMALERTDEFAULT_PORT | 8080 | false | - | | VM_VMALERTDEFAULT_USEDEFAULTRESOURCES | true | false | - | | VM_VMALERTDEFAULT_RESOURCE_LIMIT_MEM | 500Mi | false | - | @@ -31,7 +31,7 @@ aliases: | VM_VMALERTDEFAULT_CONFIGRELOADERMEMORY | 25Mi | false | - | | VM_VMALERTDEFAULT_CONFIGRELOADIMAGE | jimmidyson/configmap-reload:v0.3.0 | false | - | | VM_VMAGENTDEFAULT_IMAGE | victoriametrics/vmagent | false | - | -| VM_VMAGENTDEFAULT_VERSION | v1.93.4 | false | - | +| VM_VMAGENTDEFAULT_VERSION | v1.93.5 | false | - | | VM_VMAGENTDEFAULT_CONFIGRELOADIMAGE | quay.io/prometheus-operator/prometheus-config-reloader:v0.68.0 | false | - | | VM_VMAGENTDEFAULT_PORT | 8429 | false | - | | VM_VMAGENTDEFAULT_USEDEFAULTRESOURCES | true | false | - | @@ -42,7 +42,7 @@ aliases: | VM_VMAGENTDEFAULT_CONFIGRELOADERCPU | 100m | false | - | | VM_VMAGENTDEFAULT_CONFIGRELOADERMEMORY | 25Mi | false | - | | VM_VMSINGLEDEFAULT_IMAGE | victoriametrics/victoria-metrics | false | - | -| VM_VMSINGLEDEFAULT_VERSION | v1.93.4 | false | - | +| VM_VMSINGLEDEFAULT_VERSION | v1.93.5 | false | - | | VM_VMSINGLEDEFAULT_PORT | 8429 | false | - | | VM_VMSINGLEDEFAULT_USEDEFAULTRESOURCES | true | false | - | | VM_VMSINGLEDEFAULT_RESOURCE_LIMIT_MEM | 1500Mi | false | - | @@ -53,14 +53,14 @@ aliases: | VM_VMSINGLEDEFAULT_CONFIGRELOADERMEMORY | 25Mi | false | - | | VM_VMCLUSTERDEFAULT_USEDEFAULTRESOURCES | true | false | - | | VM_VMCLUSTERDEFAULT_VMSELECTDEFAULT_IMAGE | victoriametrics/vmselect | false | - | -| VM_VMCLUSTERDEFAULT_VMSELECTDEFAULT_VERSION | v1.93.4-cluster | false | - | +| VM_VMCLUSTERDEFAULT_VMSELECTDEFAULT_VERSION | v1.93.5-cluster | false | - | | VM_VMCLUSTERDEFAULT_VMSELECTDEFAULT_PORT | 8481 | false | - | | VM_VMCLUSTERDEFAULT_VMSELECTDEFAULT_RESOURCE_LIMIT_MEM | 1000Mi | false | - | | VM_VMCLUSTERDEFAULT_VMSELECTDEFAULT_RESOURCE_LIMIT_CPU | 500m | false | - | | VM_VMCLUSTERDEFAULT_VMSELECTDEFAULT_RESOURCE_REQUEST_MEM | 500Mi | false | - | | VM_VMCLUSTERDEFAULT_VMSELECTDEFAULT_RESOURCE_REQUEST_CPU | 100m | false | - | | VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_IMAGE | victoriametrics/vmstorage | false | - | -| VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_VERSION | v1.93.4-cluster | false | - | +| VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_VERSION | v1.93.5-cluster | false | - | | VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_VMINSERTPORT | 8400 | false | - | | VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_VMSELECTPORT | 8401 | false | - | | VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_PORT | 8482 | false | - | @@ -69,7 +69,7 @@ aliases: | VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_RESOURCE_REQUEST_MEM | 500Mi | false | - | | VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_RESOURCE_REQUEST_CPU | 250m | false | - | | VM_VMCLUSTERDEFAULT_VMINSERTDEFAULT_IMAGE | victoriametrics/vminsert | false | - | -| VM_VMCLUSTERDEFAULT_VMINSERTDEFAULT_VERSION | v1.93.4-cluster | false | - | +| VM_VMCLUSTERDEFAULT_VMINSERTDEFAULT_VERSION | v1.93.5-cluster | false | - | | VM_VMCLUSTERDEFAULT_VMINSERTDEFAULT_PORT | 8480 | false | - | | VM_VMCLUSTERDEFAULT_VMINSERTDEFAULT_RESOURCE_LIMIT_MEM | 500Mi | false | - | | VM_VMCLUSTERDEFAULT_VMINSERTDEFAULT_RESOURCE_LIMIT_CPU | 500m | false | - | @@ -88,7 +88,7 @@ aliases: | VM_VMALERTMANAGER_RESOURCE_REQUEST_CPU | 30m | false | - | | VM_DISABLESELFSERVICESCRAPECREATION | false | false | - | | VM_VMBACKUP_IMAGE | victoriametrics/vmbackupmanager | false | - | -| VM_VMBACKUP_VERSION | v1.93.4-enterprise | false | - | +| VM_VMBACKUP_VERSION | v1.93.5-enterprise | false | - | | VM_VMBACKUP_PORT | 8300 | false | - | | VM_VMBACKUP_USEDEFAULTRESOURCES | true | false | - | | VM_VMBACKUP_RESOURCE_LIMIT_MEM | 500Mi | false | - | @@ -97,7 +97,7 @@ aliases: | VM_VMBACKUP_RESOURCE_REQUEST_CPU | 150m | false | - | | VM_VMBACKUP_LOGLEVEL | INFO | false | - | | VM_VMAUTHDEFAULT_IMAGE | victoriametrics/vmauth | false | - | -| VM_VMAUTHDEFAULT_VERSION | v1.93.4 | false | - | +| VM_VMAUTHDEFAULT_VERSION | v1.93.5 | false | - | | VM_VMAUTHDEFAULT_CONFIGRELOADIMAGE | quay.io/prometheus-operator/prometheus-config-reloader:v0.68.0 | false | - | | VM_VMAUTHDEFAULT_PORT | 8427 | false | - | | VM_VMAUTHDEFAULT_USEDEFAULTRESOURCES | true | false | - | diff --git a/docs/vmagent.md b/docs/vmagent.md index 0e918cc381ce..0ab5d9514500 100644 --- a/docs/vmagent.md +++ b/docs/vmagent.md @@ -14,7 +14,8 @@ aliases: `vmagent` is a tiny agent which helps you collect metrics from various sources, [relabel and filter the collected metrics](#relabeling) and store them in [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics) -or any other storage systems via Prometheus `remote_write` protocol. +or any other storage systems via Prometheus `remote_write` protocol +or via [VictoriaMetrics `remote_write` protocol](#victoriametrics-remote-write-protocol). See [Quick Start](#quick-start) for details. diff --git a/docs/vmalert.md b/docs/vmalert.md index 03e6f49b9408..b8a3ac4312e5 100644 --- a/docs/vmalert.md +++ b/docs/vmalert.md @@ -537,7 +537,7 @@ Alertmanagers. To avoid recording rules results and alerts state duplication in VictoriaMetrics server don't forget to configure [deduplication](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#deduplication). -The recommended value for `-dedup.minScrapeInterval` must be multiple of vmalert's `evaluation_interval`. +The recommended value for `-dedup.minScrapeInterval` must be multiple of vmalert's `-evaluationInterval`. If you observe inconsistent or "jumping" values in series produced by vmalert, try disabling `-datasource.queryTimeAlignment` command line flag. Because of alignment, two or more vmalert HA pairs will produce results with the same timestamps. But due of backfilling (data delivered to the datasource with some delay) values of such results may differ, @@ -789,7 +789,7 @@ may get empty response from the datasource and produce empty recording rules or Try the following recommendations to reduce the chance of hitting the data delay issue: -* Always configure group's `evaluationInterval` to be bigger or at least equal to +* Always configure group's `-evaluationInterval` to be bigger or at least equal to [time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution); * Ensure that `[duration]` value is at least twice bigger than [time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution). For example, diff --git a/docs/vmauth.md b/docs/vmauth.md index d6e631a7fdbf..65e8d929c5e9 100644 --- a/docs/vmauth.md +++ b/docs/vmauth.md @@ -36,6 +36,7 @@ The auth config can be reloaded via the following ways: and apply new changes every 5 seconds. Docker images for `vmauth` are available [here](https://hub.docker.com/r/victoriametrics/vmauth/tags). +See how `vmauth` used in [docker-compose env](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/README.md#victoriametrics-cluster). Pass `-help` to `vmauth` in order to see all the supported command-line flags with their descriptions. diff --git a/docs/vmbackup.md b/docs/vmbackup.md index 39c4b3e95e9d..bff38459ac19 100644 --- a/docs/vmbackup.md +++ b/docs/vmbackup.md @@ -154,20 +154,23 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time- ## Advanced usage -* Obtaining credentials from a file. - Add flag `-credsFilePath=/etc/credentials` with the following content: +### Providing credentials as a file - for s3 (aws, minio or other s3 compatible storages): +Obtaining credentials from a file. +Add flag `-credsFilePath=/etc/credentials` with the following content: + +- for S3 (AWS, MinIO or other S3 compatible storages): + ```console [default] aws_access_key_id=theaccesskey aws_secret_access_key=thesecretaccesskeyvalue ``` - for gce cloud storage: - +- for GCP cloud storage: + ```json { "type": "service_account", @@ -182,24 +185,99 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time- "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account-email" } ``` -* Obtaining credentials from env variables. - - For AWS S3 compatible storages set env variable `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`. - Also you can set env variable `AWS_SHARED_CREDENTIALS_FILE` with path to credentials file. - - For GCE cloud storage set env variable `GOOGLE_APPLICATION_CREDENTIALS` with path to credentials file. - - For Azure storage either set env variables `AZURE_STORAGE_ACCOUNT_NAME` and `AZURE_STORAGE_ACCOUNT_KEY`, or `AZURE_STORAGE_ACCOUNT_CONNECTION_STRING`. -* Usage with s3 custom url endpoint. It is possible to use `vmbackup` with s3 compatible storages like minio, cloudian, etc. - You have to add a custom url endpoint via flag: +### Providing credentials via env variables -```console - # for minio - -customS3Endpoint=http://localhost:9000 +Obtaining credentials from env variables. +- For AWS S3 compatible storages set env variable `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`. + Also you can set env variable `AWS_SHARED_CREDENTIALS_FILE` with path to credentials file. +- For GCE cloud storage set env variable `GOOGLE_APPLICATION_CREDENTIALS` with path to credentials file. +- For Azure storage either set env variables `AZURE_STORAGE_ACCOUNT_NAME` and `AZURE_STORAGE_ACCOUNT_KEY`, or `AZURE_STORAGE_ACCOUNT_CONNECTION_STRING`. + +Please, note that `vmbackup` will use credentials provided by cloud providers metadata service [when applicable](https://docs.victoriametrics.com/vmbackup.html#using-cloud-providers-metadata-service). + +### Using cloud providers metadata service + +`vmbackup` and `vmbackupmanager` will automatically use cloud providers metadata service in order to obtain credentials if they are running in cloud environment +and credentials are not explicitly provided via flags or env variables. + +### Providing credentials in Kubernetes + +The simplest way to provide credentials in Kubernetes is to use [Secrets](https://kubernetes.io/docs/concepts/configuration/secret/) +and inject them into the pod as environment variables. For example, the following secret can be used for AWS S3 credentials: +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: vmbackup-credentials +data: + access_key: key + secret_key: secret +``` +And then it can be injected into the pod as environment variables: +```yaml +... +env: +- name: AWS_ACCESS_KEY_ID + valueFrom: + secretKeyRef: + key: access_key + name: vmbackup-credentials +- name: AWS_SECRET_ACCESS_KEY + valueFrom: + secretKeyRef: + key: secret_key + name: vmbackup-credentials +... +``` - # for aws gov region - -customS3Endpoint=https://s3-fips.us-gov-west-1.amazonaws.com +A more secure way is to use IAM roles to provide tokens for pods instead of managing credentials manually. + +For AWS deployments it will be required to configure [IAM roles for service accounts](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html). +In order to use IAM roles for service accounts with `vmbackup` or `vmbackupmanager` it is required to create ServiceAccount with IAM role mapping: +```yaml +apiVersion: v1 +kind: ServiceAccount +metadata: + name: monitoring-backups + annotations: + eks.amazonaws.com/role-arn: arn:aws:iam::{ACCOUNT_ID}:role/{ROLE_NAME} ``` +And [configure pod to use service account](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/). +After this `vmbackup` and `vmbackupmanager` will automatically use IAM role for service account in order to obtain credentials. + +For GCP deployments it will be required to configure [Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity). +In order to use Workload Identity with `vmbackup` or `vmbackupmanager` it is required to create ServiceAccount with Workload Identity annotation: +```yaml +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: monitoring-backups + annotations: + iam.gke.io/gcp-service-account: {sa_name}@{project_name}.iam.gserviceaccount.com +``` +And [configure pod to use service account](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/). +After this `vmbackup` and `vmbackupmanager` will automatically use Workload Identity for servicpe account in order to obtain credentials. + +### Using custom S3 endpoint + +Usage with s3 custom url endpoint. It is possible to use `vmbackup` with s3 compatible storages like minio, cloudian, etc. +You have to add a custom url endpoint via flag: + +- for MinIO + ```console + -customS3Endpoint=http://localhost:9000 + ``` + +- for aws gov region + ```console + -customS3Endpoint=https://s3-fips.us-gov-west-1.amazonaws.com + ``` + +### Command-line flags -* Run `vmbackup -help` in order to see all the available options: +Run `vmbackup -help` in order to see all the available options: ```console -concurrency int diff --git a/docs/vmbackupmanager.md b/docs/vmbackupmanager.md index 7f912ff9e4c9..051c8765693e 100644 --- a/docs/vmbackupmanager.md +++ b/docs/vmbackupmanager.md @@ -121,6 +121,9 @@ The result on the GCS bucket latest folder +Please, see [vmbackup docs](https://docs.victoriametrics.com/vmbackup.html#advanced-usage) for more examples of authentication with different +storage types. + ## Backup Retention Policy Backup retention policy is controlled by: diff --git a/go.mod b/go.mod index a6ee13e10e78..6d1a47169fda 100644 --- a/go.mod +++ b/go.mod @@ -12,7 +12,7 @@ require ( // like https://github.com/valyala/fasthttp/commit/996610f021ff45fdc98c2ce7884d5fa4e7f9199b github.com/VictoriaMetrics/fasthttp v1.2.0 github.com/VictoriaMetrics/metrics v1.24.0 - github.com/VictoriaMetrics/metricsql v0.65.0 + github.com/VictoriaMetrics/metricsql v0.66.0 github.com/aws/aws-sdk-go-v2 v1.21.0 github.com/aws/aws-sdk-go-v2/config v1.18.39 github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.11.83 diff --git a/go.sum b/go.sum index 9bfe01c62cb1..aebec4c96980 100644 --- a/go.sum +++ b/go.sum @@ -70,8 +70,8 @@ github.com/VictoriaMetrics/fasthttp v1.2.0 h1:nd9Wng4DlNtaI27WlYh5mGXCJOmee/2c2b github.com/VictoriaMetrics/fasthttp v1.2.0/go.mod h1:zv5YSmasAoSyv8sBVexfArzFDIGGTN4TfCKAtAw7IfE= github.com/VictoriaMetrics/metrics v1.24.0 h1:ILavebReOjYctAGY5QU2F9X0MYvkcrG3aEn2RKa1Zkw= github.com/VictoriaMetrics/metrics v1.24.0/go.mod h1:eFT25kvsTidQFHb6U0oa0rTrDRdz4xTYjpL8+UPohys= -github.com/VictoriaMetrics/metricsql v0.65.0 h1:+/Oit3QycM8z/NbMHy4KENSUDS5q9QRx8h2x6cvoQOk= -github.com/VictoriaMetrics/metricsql v0.65.0/go.mod h1:k4UaP/+CjuZslIjd+kCigNG9TQmUqh5v0TP/nMEy90I= +github.com/VictoriaMetrics/metricsql v0.66.0 h1:2TaBEM7L5L67Ho65FdJVZ/qvjWmC/+f17nujL6dgtmE= +github.com/VictoriaMetrics/metricsql v0.66.0/go.mod h1:k4UaP/+CjuZslIjd+kCigNG9TQmUqh5v0TP/nMEy90I= github.com/VividCortex/ewma v1.2.0 h1:f58SaIzcDXrSy3kWaHNvuJgJ3Nmz59Zji6XoJR/q1ow= github.com/VividCortex/ewma v1.2.0/go.mod h1:nz4BbCtbLyFDeC9SUHbtcT5644juEuWfUAUnGx7j5l4= github.com/alecthomas/template v0.0.0-20160405071501-a0175ee3bccc/go.mod h1:LOuyumcjzFXgccqObfd/Ljyb9UuFJ6TxHnclSeseNhc= diff --git a/lib/backup/common/part.go b/lib/backup/common/part.go index d7f9aa239bc3..8b9c68e31c0a 100644 --- a/lib/backup/common/part.go +++ b/lib/backup/common/part.go @@ -40,9 +40,11 @@ type Part struct { // key returns a string, which uniquely identifies p. func (p *Part) key() string { - if strings.HasSuffix(p.Path, "/parts.json") { - // parts.json file contents changes over time, so it must have an unique key in order - // to always copy it during backup, restore and server-side copy. + if strings.HasSuffix(p.Path, "/parts.json") || + strings.HasSuffix(p.Path, "/appliedRetention.txt") { + // parts.json and appliedRetention.txt files contents changes over time, + // so it must have an unique key in order to always copy it during + // backup, restore and server-side copy. // See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005 id := atomic.AddUint64(&uniqueKeyID, 1) return fmt.Sprintf("unique-%016X", id) diff --git a/lib/logstorage/block.go b/lib/logstorage/block.go index 83834c895c4b..7e73c563dd99 100644 --- a/lib/logstorage/block.go +++ b/lib/logstorage/block.go @@ -505,6 +505,7 @@ func (b *block) appendRows(dst *rows) { dst.rows = append(dst.rows, fieldsBuf[fieldsLen:]) } dst.fieldsBuf = fieldsBuf + dst.uniqueFields += len(ccs) + len(cs) } func areSameFieldsInRows(rows [][]Field) bool { diff --git a/lib/logstorage/block_stream_merger.go b/lib/logstorage/block_stream_merger.go index 6137c2406ccc..fc08343774b5 100644 --- a/lib/logstorage/block_stream_merger.go +++ b/lib/logstorage/block_stream_merger.go @@ -5,6 +5,7 @@ import ( "fmt" "strings" "sync" + "time" "github.com/VictoriaMetrics/VictoriaMetrics/lib/logger" ) @@ -117,6 +118,14 @@ func (bsm *blockStreamMerger) mustInit(bsw *blockStreamWriter, bsrs []*blockStre heap.Init(&bsm.readersHeap) } +var mergeStreamsExceedLogger = logger.WithThrottler("mergeStreamsExceed", 10*time.Second) + +func (bsm *blockStreamMerger) mergeStreamsLimitWarn(bd *blockData) { + attempted := bsm.rows.uniqueFields + len(bd.columnsData) + len(bd.constColumns) + mergeStreamsExceedLogger.Warnf("cannot perform background merge: too many columns for block after merge: %d, max columns: %d; "+ + "check ingestion configuration; see: https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#troubleshooting", attempted, maxColumnsPerBlock) +} + // mustWriteBlock writes bd to bsm func (bsm *blockStreamMerger) mustWriteBlock(bd *blockData, bsw *blockStreamWriter) { bsm.checkNextBlock(bd) @@ -133,6 +142,12 @@ func (bsm *blockStreamMerger) mustWriteBlock(bd *blockData, bsw *blockStreamWrit // Slow path - copy the bd to the curr bd. bsm.bd.copyFrom(bd) } + case !bsm.rows.hasCapacityFor(bd): + // Cannot merge bd with bsm.rows as too many columns will be created. + // Flush bsm.rows and write bd as is. + bsm.mergeStreamsLimitWarn(bd) + bsm.mustFlushRows() + bsw.MustWriteBlockData(bd) case bd.uncompressedSizeBytes >= maxUncompressedBlockSize: // The bd contains the same streamID and it is full, // so it can be written next after the current log entries @@ -199,6 +214,15 @@ func (bsm *blockStreamMerger) mustMergeRows(bd *blockData) { bsm.bd.reset() } + if !bsm.rows.hasCapacityFor(bd) { + // Cannot merge bd with bsm.rows as too many columns will be created. + // Flush bsm.rows and write bd as is. + bsm.mergeStreamsLimitWarn(bd) + bsm.mustFlushRows() + bsm.bsw.MustWriteBlockData(bd) + return + } + // Unmarshal log entries from bd rowsLen := len(bsm.rows.timestamps) bsm.mustUnmarshalRows(bd) @@ -208,6 +232,7 @@ func (bsm *blockStreamMerger) mustMergeRows(bd *blockData) { rows := bsm.rows.rows bsm.rowsTmp.mergeRows(timestamps[:rowsLen], timestamps[rowsLen:], rows[:rowsLen], rows[rowsLen:]) bsm.rows, bsm.rowsTmp = bsm.rowsTmp, bsm.rows + bsm.rows.uniqueFields = bsm.rowsTmp.uniqueFields bsm.rowsTmp.reset() if bsm.uncompressedRowsSizeBytes >= maxUncompressedBlockSize { diff --git a/lib/logstorage/datadb.go b/lib/logstorage/datadb.go index 41f9c31a613b..81d3ff322ede 100644 --- a/lib/logstorage/datadb.go +++ b/lib/logstorage/datadb.go @@ -2,6 +2,7 @@ package logstorage import ( "encoding/json" + "errors" "fmt" "os" "path/filepath" @@ -69,9 +70,6 @@ type datadb struct { // stopCh is used for notifying background workers to stop stopCh chan struct{} - // mergeDoneCond is used for pace-limiting the data ingestion rate - mergeDoneCond *sync.Cond - // inmemoryPartsFlushersCount is the number of currently running in-memory parts flushers // // This variable must be accessed under partsLock. @@ -81,6 +79,9 @@ type datadb struct { // // This variable must be accessed under partsLock. mergeWorkersCount int + + // isReadOnly indicates whether the storage is in read-only mode. + isReadOnly *uint32 } // partWrapper is a wrapper for opened part. @@ -140,7 +141,7 @@ func mustCreateDatadb(path string) { } // mustOpenDatadb opens datadb at the given path with the given flushInterval for in-memory data. -func mustOpenDatadb(pt *partition, path string, flushInterval time.Duration) *datadb { +func mustOpenDatadb(pt *partition, path string, flushInterval time.Duration, isReadOnly *uint32) *datadb { // Remove temporary directories, which may be left after unclean shutdown. fs.MustRemoveTemporaryDirs(path) @@ -172,8 +173,8 @@ func mustOpenDatadb(pt *partition, path string, flushInterval time.Duration) *da path: path, fileParts: pws, stopCh: make(chan struct{}), + isReadOnly: isReadOnly, } - ddb.mergeDoneCond = sync.NewCond(&ddb.partsLock) // Start merge workers in the hope they'll merge the remaining parts ddb.partsLock.Lock() @@ -225,7 +226,10 @@ func (ddb *datadb) flushInmemoryParts() { // There are no in-memory parts, so stop the flusher. return } - ddb.mustMergePartsFinal(partsToFlush) + err := ddb.mergePartsFinal(partsToFlush) + if err != nil { + logger.Panicf("FATAL: cannot flush inmemory parts to disk: %s", err) + } select { case <-ddb.stopCh: @@ -239,6 +243,9 @@ func (ddb *datadb) flushInmemoryParts() { // // This function must be called under locked partsLock. func (ddb *datadb) startMergeWorkerLocked() { + if ddb.IsReadOnly() { + return + } if ddb.mergeWorkersCount >= getMergeWorkersCount() { return } @@ -246,8 +253,11 @@ func (ddb *datadb) startMergeWorkerLocked() { ddb.wg.Add(1) go func() { globalMergeLimitCh <- struct{}{} - ddb.mustMergeExistingParts() + err := ddb.mergeExistingParts() <-globalMergeLimitCh + if err != nil && !errors.Is(err, errReadOnly) { + logger.Panicf("FATAL: background merge failed: %s", err) + } ddb.wg.Done() }() } @@ -267,7 +277,7 @@ func getMergeWorkersCount() int { return n } -func (ddb *datadb) mustMergeExistingParts() { +func (ddb *datadb) mergeExistingParts() error { for !needStop(ddb.stopCh) { maxOutBytes := ddb.availableDiskSpace() @@ -284,7 +294,7 @@ func (ddb *datadb) mustMergeExistingParts() { if len(pws) == 0 { // Nothing to merge at the moment. - return + return nil } partsSize := getCompressedSize(pws) @@ -295,9 +305,14 @@ func (ddb *datadb) mustMergeExistingParts() { ddb.releasePartsToMerge(pws) continue } - ddb.mustMergeParts(pws, false) + err := ddb.mergeParts(pws, false) ddb.releaseDiskSpace(partsSize) + if err != nil { + return err + } } + + return nil } // appendNotInMergePartsLocked appends src parts with isInMerge=false to dst and returns the result. @@ -332,17 +347,24 @@ func assertIsInMerge(pws []*partWrapper) { } } -// mustMergeParts merges pws to a single resulting part. +var errReadOnly = errors.New("the storage is in read-only mode") + +// mergeParts merges pws to a single resulting part. // // if isFinal is set, then the resulting part will be saved to disk. // // All the parts inside pws must have isInMerge field set to true. -func (ddb *datadb) mustMergeParts(pws []*partWrapper, isFinal bool) { +func (ddb *datadb) mergeParts(pws []*partWrapper, isFinal bool) error { if len(pws) == 0 { // Nothing to merge. - return + return nil + } + + if ddb.IsReadOnly() { + return errReadOnly } assertIsInMerge(pws) + defer ddb.releasePartsToMerge(pws) startTime := time.Now() @@ -367,7 +389,7 @@ func (ddb *datadb) mustMergeParts(pws []*partWrapper, isFinal bool) { mp.MustStoreToDisk(dstPartPath) pwNew := ddb.openCreatedPart(&mp.ph, pws, nil, dstPartPath) ddb.swapSrcWithDstParts(pws, pwNew, dstPartType) - return + return nil } // Prepare blockStreamReaders for source parts. @@ -414,13 +436,11 @@ func (ddb *datadb) mustMergeParts(pws []*partWrapper, isFinal bool) { fs.MustSyncPath(dstPartPath) } if needStop(stopCh) { - ddb.releasePartsToMerge(pws) - ddb.mergeDoneCond.Broadcast() // Remove incomplete destination part if dstPartType == partFile { fs.MustRemoveAll(dstPartPath) } - return + return nil } // Atomically swap the source parts with the newly created part. @@ -440,7 +460,7 @@ func (ddb *datadb) mustMergeParts(pws []*partWrapper, isFinal bool) { d := time.Since(startTime) if d <= 30*time.Second { - return + return nil } // Log stats for long merges. @@ -448,6 +468,7 @@ func (ddb *datadb) mustMergeParts(pws []*partWrapper, isFinal bool) { rowsPerSec := int(float64(srcRowsCount) / durationSecs) logger.Infof("merged (%d parts, %d rows, %d blocks, %d bytes) into (1 part, %d rows, %d blocks, %d bytes) in %.3f seconds at %d rows/sec to %q", len(pws), srcRowsCount, srcBlocksCount, srcSize, dstRowsCount, dstBlocksCount, dstSize, durationSecs, rowsPerSec, dstPartPath) + return nil } func (ddb *datadb) nextMergeIdx() uint64 { @@ -526,11 +547,43 @@ func (ddb *datadb) mustAddRows(lr *LogRows) { if len(ddb.inmemoryParts) > defaultPartsToMerge { ddb.startMergeWorkerLocked() } - for len(ddb.inmemoryParts) > maxInmemoryPartsPerPartition { - // limit the pace for data ingestion if too many inmemory parts are created - ddb.mergeDoneCond.Wait() + needAssistedMerge := ddb.needAssistedMergeForInmemoryPartsLocked() + ddb.partsLock.Unlock() + + if needAssistedMerge { + ddb.assistedMergeForInmemoryParts() + } +} + +func (ddb *datadb) needAssistedMergeForInmemoryPartsLocked() bool { + if ddb.IsReadOnly() { + return false + } + if len(ddb.inmemoryParts) < maxInmemoryPartsPerPartition { + return false + } + n := 0 + for _, pw := range ddb.inmemoryParts { + if pw.isInMerge { + n++ + } } + return n >= defaultPartsToMerge +} + +func (ddb *datadb) assistedMergeForInmemoryParts() { + ddb.partsLock.Lock() + parts := make([]*partWrapper, 0, len(ddb.inmemoryParts)) + parts = appendNotInMergePartsLocked(parts, ddb.inmemoryParts) + pws := appendPartsToMerge(nil, parts, (1<<64)-1) + setInMergeLocked(pws) ddb.partsLock.Unlock() + + err := ddb.mergeParts(pws, false) + if err == nil || errors.Is(err, errReadOnly) { + return + } + logger.Panicf("FATAL: cannot perform assisted merge for in-memory parts: %s", err) } // DatadbStats contains various stats for datadb. @@ -619,7 +672,7 @@ func (ddb *datadb) debugFlush() { // Nothing to do, since all the ingested data is available for search via ddb.inmemoryParts. } -func (ddb *datadb) mustMergePartsFinal(pws []*partWrapper) { +func (ddb *datadb) mergePartsFinal(pws []*partWrapper) error { assertIsInMerge(pws) var pwsChunk []*partWrapper @@ -628,15 +681,20 @@ func (ddb *datadb) mustMergePartsFinal(pws []*partWrapper) { if len(pwsChunk) == 0 { pwsChunk = append(pwsChunk[:0], pws...) } - ddb.mustMergeParts(pwsChunk, true) - partsToRemove := partsToMap(pwsChunk) removedParts := 0 pws, removedParts = removeParts(pws, partsToRemove) if removedParts != len(pwsChunk) { logger.Panicf("BUG: unexpected number of parts removed; got %d; want %d", removedParts, len(pwsChunk)) } + + err := ddb.mergeParts(pwsChunk, true) + if err != nil { + ddb.releasePartsToMerge(pws) + return err + } } + return nil } func partsToMap(pws []*partWrapper) map[*partWrapper]struct{} { @@ -696,8 +754,6 @@ func (ddb *datadb) swapSrcWithDstParts(pws []*partWrapper, pwNew *partWrapper, d atomic.StoreUint32(&pw.mustBeDeleted, 1) pw.decRef() } - - ddb.mergeDoneCond.Broadcast() } func removeParts(pws []*partWrapper, partsToRemove map[*partWrapper]struct{}) ([]*partWrapper, int) { @@ -804,6 +860,10 @@ func (ddb *datadb) releaseDiskSpace(n uint64) { atomic.AddUint64(&reservedDiskSpace, -n) } +func (ddb *datadb) IsReadOnly() bool { + return atomic.LoadUint32(ddb.isReadOnly) == 1 +} + // reservedDiskSpace tracks global reserved disk space for currently executed // background merges across all the partitions. // @@ -828,7 +888,10 @@ func mustCloseDatadb(ddb *datadb) { // flush in-memory data to disk pws := append([]*partWrapper{}, ddb.inmemoryParts...) setInMergeLocked(pws) - ddb.mustMergePartsFinal(pws) + err := ddb.mergePartsFinal(pws) + if err != nil { + logger.Fatalf("FATAL: cannot merge inmemory parts: %s", err) + } // There is no need in using ddb.partsLock here, since nobody should acces ddb now. for _, pw := range ddb.inmemoryParts { diff --git a/lib/logstorage/filters_test.go b/lib/logstorage/filters_test.go index cf7d6e7827d0..289492045d72 100644 --- a/lib/logstorage/filters_test.go +++ b/lib/logstorage/filters_test.go @@ -9277,7 +9277,7 @@ func generateRowsFromColumns(s *Storage, tenantID TenantID, columns []column) { timestamp := int64(i) * 1e9 lr.MustAdd(tenantID, timestamp, fields) } - s.MustAddRows(lr) + _ = s.AddRows(lr) PutLogRows(lr) } @@ -9291,6 +9291,6 @@ func generateRowsFromTimestamps(s *Storage, tenantID TenantID, timestamps []int6 }) lr.MustAdd(tenantID, timestamp, fields) } - s.MustAddRows(lr) + _ = s.AddRows(lr) PutLogRows(lr) } diff --git a/lib/logstorage/log_rows.go b/lib/logstorage/log_rows.go index ce759b85a872..789dab6a344f 100644 --- a/lib/logstorage/log_rows.go +++ b/lib/logstorage/log_rows.go @@ -7,7 +7,7 @@ import ( "github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil" ) -// LogRows holds a set of rows needed for Storage.MustAddRows +// LogRows holds a set of rows needed for Storage.AddRows // // LogRows must be obtained via GetLogRows() type LogRows struct { diff --git a/lib/logstorage/partition.go b/lib/logstorage/partition.go index 64465de209df..df609c3aa367 100644 --- a/lib/logstorage/partition.go +++ b/lib/logstorage/partition.go @@ -77,7 +77,7 @@ func mustOpenPartition(s *Storage, path string) *partition { // Open datadb datadbPath := filepath.Join(path, datadbDirname) - pt.ddb = mustOpenDatadb(pt, datadbPath, s.flushInterval) + pt.ddb = mustOpenDatadb(pt, datadbPath, s.flushInterval, &s.isReadOnly) return pt } diff --git a/lib/logstorage/rows.go b/lib/logstorage/rows.go index 76516bc8bac3..efb3bf9959ce 100644 --- a/lib/logstorage/rows.go +++ b/lib/logstorage/rows.go @@ -65,6 +65,10 @@ func (f *Field) unmarshal(src []byte) ([]byte, error) { type rows struct { fieldsBuf []Field + // uniqueFields is the maximum estimated number of unique fields which are currently stored in fieldsBuf. + // it is used to perform worst case estimation when merging rows. + uniqueFields int + timestamps []int64 rows [][]Field @@ -121,3 +125,9 @@ func (rs *rows) mergeRows(timestampsA, timestampsB []int64, fieldsA, fieldsB [][ rs.appendRows(timestampsA, fieldsA) } } + +// hasCapacityFor returns true if merging bd with rs won't create too many columns +// for creating a new block. +func (rs *rows) hasCapacityFor(bd *blockData) bool { + return rs.uniqueFields+len(bd.columnsData)+len(bd.constColumns) < maxColumnsPerBlock +} diff --git a/lib/logstorage/storage.go b/lib/logstorage/storage.go index 9d840fb5bc8e..341ec46c1879 100644 --- a/lib/logstorage/storage.go +++ b/lib/logstorage/storage.go @@ -26,6 +26,9 @@ type StorageStats struct { PartitionsCount uint64 PartitionStats + + // IsReadOnly indicates whether the storage is read-only. + IsReadOnly bool } // Reset resets s. @@ -58,6 +61,9 @@ type StorageConfig struct { // // This can be useful for debugging of data ingestion. LogIngestedRows bool + + // MinFreeDiskSpaceBytes is the minimum free disk space at -storageDataPath after which the storage stops accepting new data + MinFreeDiskSpaceBytes int64 } // Storage is the storage for log entries. @@ -126,6 +132,10 @@ type Storage struct { // // It reduces the load on persistent storage during querying by _stream:{...} filter. streamFilterCache *workingsetcache.Cache + + isReadOnly uint32 + + freeDiskSpaceWatcherWG sync.WaitGroup } type partitionWrapper struct { @@ -288,6 +298,7 @@ func MustOpenStorage(path string, cfg *StorageConfig) *Storage { s.partitions = ptws s.runRetentionWatcher() + s.startFreeDiskSpaceWatcher(uint64(cfg.MinFreeDiskSpaceBytes)) return s } @@ -357,6 +368,7 @@ func (s *Storage) MustClose() { // Stop background workers close(s.stopCh) s.wg.Wait() + s.freeDiskSpaceWatcherWG.Wait() // Close partitions for _, pw := range s.partitions { @@ -389,8 +401,12 @@ func (s *Storage) MustClose() { s.path = "" } -// MustAddRows adds lr to s. -func (s *Storage) MustAddRows(lr *LogRows) { +// AddRows adds lr to s. +func (s *Storage) AddRows(lr *LogRows) error { + if s.IsReadOnly() { + return errReadOnly + } + // Fast path - try adding all the rows to the hot partition s.partitionsLock.Lock() ptwHot := s.ptwHot @@ -403,7 +419,7 @@ func (s *Storage) MustAddRows(lr *LogRows) { if ptwHot.canAddAllRows(lr) { ptwHot.pt.mustAddRows(lr) ptwHot.decRef() - return + return nil } ptwHot.decRef() } @@ -447,6 +463,7 @@ func (s *Storage) MustAddRows(lr *LogRows) { ptw.decRef() PutLogRows(lrPart) } + return nil } var tooSmallTimestampLogger = logger.WithThrottler("too_small_timestamp", 5*time.Second) @@ -515,6 +532,44 @@ func (s *Storage) UpdateStats(ss *StorageStats) { ptw.pt.updateStats(&ss.PartitionStats) } s.partitionsLock.Unlock() + ss.IsReadOnly = s.IsReadOnly() +} + +// IsReadOnly returns information is storage in read only mode +func (s *Storage) IsReadOnly() bool { + return atomic.LoadUint32(&s.isReadOnly) == 1 +} + +func (s *Storage) startFreeDiskSpaceWatcher(freeDiskSpaceLimitBytes uint64) { + f := func() { + freeSpaceBytes := fs.MustGetFreeSpace(s.path) + if freeSpaceBytes < freeDiskSpaceLimitBytes { + // Switch the storage to readonly mode if there is no enough free space left at s.path + logger.Warnf("switching the storage at %s to read-only mode, since it has less than -storage.minFreeDiskSpaceBytes=%d of free space: %d bytes left", + s.path, freeDiskSpaceLimitBytes, freeSpaceBytes) + atomic.StoreUint32(&s.isReadOnly, 1) + return + } + if atomic.CompareAndSwapUint32(&s.isReadOnly, 1, 0) { + logger.Warnf("enabling writing to the storage at %s, since it has more than -storage.minFreeDiskSpaceBytes=%d of free space: %d bytes left", + s.path, freeDiskSpaceLimitBytes, freeSpaceBytes) + } + } + f() + s.freeDiskSpaceWatcherWG.Add(1) + go func() { + defer s.freeDiskSpaceWatcherWG.Done() + ticker := time.NewTicker(time.Second) + defer ticker.Stop() + for { + select { + case <-s.stopCh: + return + case <-ticker.C: + f() + } + } + }() } func (s *Storage) debugFlush() { diff --git a/lib/logstorage/storage_search_test.go b/lib/logstorage/storage_search_test.go index 63404838ceb5..d61035a5dcf0 100644 --- a/lib/logstorage/storage_search_test.go +++ b/lib/logstorage/storage_search_test.go @@ -70,7 +70,7 @@ func TestStorageRunQuery(t *testing.T) { }) lr.MustAdd(tenantID, timestamp, fields) } - s.MustAddRows(lr) + _ = s.AddRows(lr) PutLogRows(lr) } } @@ -366,7 +366,7 @@ func TestStorageSearch(t *testing.T) { }) lr.MustAdd(tenantID, timestamp, fields) } - s.MustAddRows(lr) + _ = s.AddRows(lr) PutLogRows(lr) } } diff --git a/lib/logstorage/storage_test.go b/lib/logstorage/storage_test.go index 193179bb1784..9951a6a4c58e 100644 --- a/lib/logstorage/storage_test.go +++ b/lib/logstorage/storage_test.go @@ -32,7 +32,7 @@ func TestStorageMustAddRows(t *testing.T) { lr := newTestLogRows(1, 1, 0) lr.timestamps[0] = time.Now().UTC().UnixNano() totalRowsCount += uint64(len(lr.timestamps)) - s.MustAddRows(lr) + _ = s.AddRows(lr) sStats.Reset() s.UpdateStats(&sStats) if n := sStats.RowsCount(); n != totalRowsCount { @@ -56,7 +56,7 @@ func TestStorageMustAddRows(t *testing.T) { lr.timestamps[i] = time.Now().UTC().UnixNano() } totalRowsCount += uint64(len(lr.timestamps)) - s.MustAddRows(lr) + _ = s.AddRows(lr) sStats.Reset() s.UpdateStats(&sStats) if n := sStats.RowsCount(); n != totalRowsCount { @@ -80,7 +80,7 @@ func TestStorageMustAddRows(t *testing.T) { now += nsecPerDay } totalRowsCount += uint64(len(lr.timestamps)) - s.MustAddRows(lr) + _ = s.AddRows(lr) sStats.Reset() s.UpdateStats(&sStats) if n := sStats.RowsCount(); n != totalRowsCount { diff --git a/lib/mergeset/table.go b/lib/mergeset/table.go index bf7274a8eed3..a5420df3c347 100644 --- a/lib/mergeset/table.go +++ b/lib/mergeset/table.go @@ -773,45 +773,41 @@ func needAssistedMerge(pws []*partWrapper, maxParts int) bool { } func (tb *Table) assistedMergeForInmemoryParts() { - for { - tb.partsLock.Lock() - needMerge := needAssistedMerge(tb.inmemoryParts, maxInmemoryParts) - tb.partsLock.Unlock() - if !needMerge { - return - } + tb.partsLock.Lock() + needMerge := needAssistedMerge(tb.inmemoryParts, maxInmemoryParts) + tb.partsLock.Unlock() + if !needMerge { + return + } - atomic.AddUint64(&tb.inmemoryAssistedMerges, 1) - err := tb.mergeInmemoryParts() - if err == nil { - continue - } - if errors.Is(err, errNothingToMerge) || errors.Is(err, errForciblyStopped) { - return - } - logger.Panicf("FATAL: cannot assist with merging inmemory parts: %s", err) + atomic.AddUint64(&tb.inmemoryAssistedMerges, 1) + err := tb.mergeInmemoryParts() + if err == nil { + return + } + if errors.Is(err, errNothingToMerge) || errors.Is(err, errForciblyStopped) { + return } + logger.Panicf("FATAL: cannot assist with merging inmemory parts: %s", err) } func (tb *Table) assistedMergeForFileParts() { - for { - tb.partsLock.Lock() - needMerge := needAssistedMerge(tb.fileParts, maxFileParts) - tb.partsLock.Unlock() - if !needMerge { - return - } + tb.partsLock.Lock() + needMerge := needAssistedMerge(tb.fileParts, maxFileParts) + tb.partsLock.Unlock() + if !needMerge { + return + } - atomic.AddUint64(&tb.fileAssistedMerges, 1) - err := tb.mergeExistingParts(false) - if err == nil { - continue - } - if errors.Is(err, errNothingToMerge) || errors.Is(err, errForciblyStopped) || errors.Is(err, errReadOnlyMode) { - return - } - logger.Panicf("FATAL: cannot assist with merging file parts: %s", err) + atomic.AddUint64(&tb.fileAssistedMerges, 1) + err := tb.mergeExistingParts(false) + if err == nil { + return } + if errors.Is(err, errNothingToMerge) || errors.Is(err, errForciblyStopped) || errors.Is(err, errReadOnlyMode) { + return + } + logger.Panicf("FATAL: cannot assist with merging file parts: %s", err) } func getNotInMergePartsCount(pws []*partWrapper) int { @@ -1022,6 +1018,14 @@ func SetFinalMergeDelay(delay time.Duration) { var errNothingToMerge = fmt.Errorf("nothing to merge") +func assertIsInMerge(pws []*partWrapper) { + for _, pw := range pws { + if !pw.isInMerge { + logger.Panicf("BUG: partWrapper.isInMerge unexpectedly set to false") + } + } +} + func (tb *Table) releasePartsToMerge(pws []*partWrapper) { tb.partsLock.Lock() for _, pw := range pws { @@ -1040,12 +1044,16 @@ func (tb *Table) releasePartsToMerge(pws []*partWrapper) { // If isFinal is set, then the resulting part will be stored to disk. // // All the parts inside pws must have isInMerge field set to true. +// The isInMerge field inside pws parts is set to false before returning from the function. func (tb *Table) mergeParts(pws []*partWrapper, stopCh <-chan struct{}, isFinal bool) error { if len(pws) == 0 { // Nothing to merge. return errNothingToMerge } + assertIsInMerge(pws) + defer tb.releasePartsToMerge(pws) + startTime := time.Now() // Initialize destination paths. @@ -1095,7 +1103,6 @@ func (tb *Table) mergeParts(pws []*partWrapper, stopCh <-chan struct{}, isFinal putBlockStreamReader(bsr) } if err != nil { - tb.releasePartsToMerge(pws) return err } if mpNew != nil { diff --git a/lib/promscrape/discovery/kubernetes/api_watcher.go b/lib/promscrape/discovery/kubernetes/api_watcher.go index 3d4f0c8069bb..b677bd241553 100644 --- a/lib/promscrape/discovery/kubernetes/api_watcher.go +++ b/lib/promscrape/discovery/kubernetes/api_watcher.go @@ -771,7 +771,7 @@ func (uw *urlWatcher) watchForUpdates() { err = uw.readObjectUpdateStream(resp.Body) _ = resp.Body.Close() if err != nil { - if !errors.Is(err, io.EOF) { + if !(errors.Is(err, io.EOF) || errors.Is(err, context.Canceled)) { logger.Errorf("error when reading WatchEvent stream from %q: %s", requestURL, err) uw.resourceVersion = "" } diff --git a/lib/storage/index_db.go b/lib/storage/index_db.go index dd4c54b7508c..62777d7fad07 100644 --- a/lib/storage/index_db.go +++ b/lib/storage/index_db.go @@ -668,7 +668,8 @@ func (is *indexSearch) searchLabelNamesWithFiltersOnDate(qt *querytracer.Tracer, // This would help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2978 metricIDs := filter.AppendTo(nil) qt.Printf("sort %d metricIDs", len(metricIDs)) - return is.getLabelNamesForMetricIDs(qt, metricIDs, lns, maxLabelNames) + is.getLabelNamesForMetricIDs(qt, metricIDs, lns, maxLabelNames) + return nil } var prevLabelName []byte ts := &is.ts @@ -732,39 +733,34 @@ func (is *indexSearch) searchLabelNamesWithFiltersOnDate(qt *querytracer.Tracer, return nil } -func (is *indexSearch) getLabelNamesForMetricIDs(qt *querytracer.Tracer, metricIDs []uint64, lns map[string]struct{}, maxLabelNames int) error { +func (is *indexSearch) getLabelNamesForMetricIDs(qt *querytracer.Tracer, metricIDs []uint64, lns map[string]struct{}, maxLabelNames int) { lns["__name__"] = struct{}{} var mn MetricName foundLabelNames := 0 var buf []byte for _, metricID := range metricIDs { - var err error - buf, err = is.searchMetricNameWithCache(buf[:0], metricID) - if err != nil { - if err == io.EOF { - // It is likely the metricID->metricName entry didn't propagate to inverted index yet. - // Skip this metricID for now. - continue - } - return fmt.Errorf("cannot find metricName by metricID %d: %w", metricID, err) + var ok bool + buf, ok = is.searchMetricNameWithCache(buf[:0], metricID) + if !ok { + // It is likely the metricID->metricName entry didn't propagate to inverted index yet. + // Skip this metricID for now. + continue } if err := mn.Unmarshal(buf); err != nil { - return fmt.Errorf("cannot unmarshal metricName %q: %w", buf, err) + logger.Panicf("FATAL: cannot unmarshal metricName %q: %w", buf, err) } for _, tag := range mn.Tags { - _, ok := lns[string(tag.Key)] - if !ok { + if _, ok := lns[string(tag.Key)]; !ok { foundLabelNames++ lns[string(tag.Key)] = struct{}{} if len(lns) >= maxLabelNames { qt.Printf("hit the limit on the number of unique label names: %d", maxLabelNames) - return nil + return } } } } qt.Printf("get %d distinct label names from %d metricIDs", foundLabelNames, len(metricIDs)) - return nil } // SearchLabelValuesWithFiltersOnTimeRange returns label values for the given labelName, tfss and tr. @@ -868,7 +864,8 @@ func (is *indexSearch) searchLabelValuesWithFiltersOnDate(qt *querytracer.Tracer // This would help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2978 metricIDs := filter.AppendTo(nil) qt.Printf("sort %d metricIDs", len(metricIDs)) - return is.getLabelValuesForMetricIDs(qt, lvs, labelName, metricIDs, maxLabelValues) + is.getLabelValuesForMetricIDs(qt, lvs, labelName, metricIDs, maxLabelValues) + return nil } if labelName == "__name__" { // __name__ label is encoded as empty string in indexdb. @@ -927,7 +924,7 @@ func (is *indexSearch) searchLabelValuesWithFiltersOnDate(qt *querytracer.Tracer return nil } -func (is *indexSearch) getLabelValuesForMetricIDs(qt *querytracer.Tracer, lvs map[string]struct{}, labelName string, metricIDs []uint64, maxLabelValues int) error { +func (is *indexSearch) getLabelValuesForMetricIDs(qt *querytracer.Tracer, lvs map[string]struct{}, labelName string, metricIDs []uint64, maxLabelValues int) { if labelName == "" { labelName = "__name__" } @@ -935,32 +932,27 @@ func (is *indexSearch) getLabelValuesForMetricIDs(qt *querytracer.Tracer, lvs ma foundLabelValues := 0 var buf []byte for _, metricID := range metricIDs { - var err error - buf, err = is.searchMetricNameWithCache(buf[:0], metricID) - if err != nil { - if err == io.EOF { - // It is likely the metricID->metricName entry didn't propagate to inverted index yet. - // Skip this metricID for now. - continue - } - return fmt.Errorf("cannot find metricName by metricID %d: %w", metricID, err) + var ok bool + buf, ok = is.searchMetricNameWithCache(buf[:0], metricID) + if !ok { + // It is likely the metricID->metricName entry didn't propagate to inverted index yet. + // Skip this metricID for now. + continue } if err := mn.Unmarshal(buf); err != nil { - return fmt.Errorf("cannot unmarshal metricName %q: %w", buf, err) + logger.Panicf("FATAL: cannot unmarshal metricName %q: %s", buf, err) } tagValue := mn.GetTagValue(labelName) - _, ok := lvs[string(tagValue)] - if !ok { + if _, ok := lvs[string(tagValue)]; !ok { foundLabelValues++ lvs[string(tagValue)] = struct{}{} if len(lvs) >= maxLabelValues { qt.Printf("hit the limit on the number of unique label values for label %q: %d", labelName, maxLabelValues) - return nil + return } } } qt.Printf("get %d distinct values for label %q from %d metricIDs", foundLabelValues, labelName, len(metricIDs)) - return nil } // SearchTagValueSuffixes returns all the tag value suffixes for the given tagKey and tagValuePrefix on the given tr. @@ -1442,38 +1434,35 @@ func (th *topHeap) Pop() interface{} { // searchMetricNameWithCache appends metric name for the given metricID to dst // and returns the result. -func (db *indexDB) searchMetricNameWithCache(dst []byte, metricID uint64) ([]byte, error) { +func (db *indexDB) searchMetricNameWithCache(dst []byte, metricID uint64) ([]byte, bool) { metricName := db.getMetricNameFromCache(dst, metricID) if len(metricName) > len(dst) { - return metricName, nil + return metricName, true } is := db.getIndexSearch(noDeadline) - var err error - dst, err = is.searchMetricName(dst, metricID) + var ok bool + dst, ok = is.searchMetricName(dst, metricID) db.putIndexSearch(is) - if err == nil { + if ok { // There is no need in verifying whether the given metricID is deleted, // since the filtering must be performed before calling this func. db.putMetricNameToCache(metricID, dst) - return dst, nil - } - if err != io.EOF { - return dst, err + return dst, true } // Try searching in the external indexDB. if db.doExtDB(func(extDB *indexDB) { is := extDB.getIndexSearch(noDeadline) - dst, err = is.searchMetricName(dst, metricID) + dst, ok = is.searchMetricName(dst, metricID) extDB.putIndexSearch(is) - if err == nil { + if ok { // There is no need in verifying whether the given metricID is deleted, // since the filtering must be performed before calling this func. extDB.putMetricNameToCache(metricID, dst) } - }) { - return dst, err + }) && ok { + return dst, true } // Cannot find MetricName for the given metricID. This may be the case @@ -1484,7 +1473,7 @@ func (db *indexDB) searchMetricNameWithCache(dst []byte, metricID uint64) ([]byt // Mark the metricID as deleted, so it will be created again when new data point // for the given time series will arrive. db.deleteMetricIDs([]uint64{metricID}) - return dst, io.EOF + return dst, false } // DeleteTSIDs marks as deleted all the TSIDs matching the given tfss. @@ -1820,36 +1809,36 @@ func (is *indexSearch) getTSIDByMetricNameNoExtDB(dst *TSID, metricName []byte, return false } -func (is *indexSearch) searchMetricNameWithCache(dst []byte, metricID uint64) ([]byte, error) { +func (is *indexSearch) searchMetricNameWithCache(dst []byte, metricID uint64) ([]byte, bool) { metricName := is.db.getMetricNameFromCache(dst, metricID) if len(metricName) > len(dst) { - return metricName, nil + return metricName, true } - var err error - dst, err = is.searchMetricName(dst, metricID) - if err == nil { + var ok bool + dst, ok = is.searchMetricName(dst, metricID) + if ok { // There is no need in verifying whether the given metricID is deleted, // since the filtering must be performed before calling this func. is.db.putMetricNameToCache(metricID, dst) - return dst, nil + return dst, true } - return dst, err + return dst, false } -func (is *indexSearch) searchMetricName(dst []byte, metricID uint64) ([]byte, error) { +func (is *indexSearch) searchMetricName(dst []byte, metricID uint64) ([]byte, bool) { ts := &is.ts kb := &is.kb kb.B = is.marshalCommonPrefix(kb.B[:0], nsPrefixMetricIDToMetricName) kb.B = encoding.MarshalUint64(kb.B, metricID) if err := ts.FirstItemWithPrefix(kb.B); err != nil { if err == io.EOF { - return dst, err + return dst, false } - return dst, fmt.Errorf("error when searching metricName by metricID; searchPrefix %q: %w", kb.B, err) + logger.Panicf("FATAL: error when searching metricName by metricID; searchPrefix %q: %w", kb.B, err) } v := ts.Item[len(kb.B):] dst = append(dst, v...) - return dst, nil + return dst, true } func (is *indexSearch) containsTimeRange(tr TimeRange) (bool, error) { @@ -1928,18 +1917,15 @@ func (is *indexSearch) updateMetricIDsByMetricNameMatch(qt *querytracer.Tracer, return err } } - var err error - metricName.B, err = is.searchMetricNameWithCache(metricName.B[:0], metricID) - if err != nil { - if err == io.EOF { - // It is likely the metricID->metricName entry didn't propagate to inverted index yet. - // Skip this metricID for now. - continue - } - return fmt.Errorf("cannot find metricName by metricID %d: %w", metricID, err) + var ok bool + metricName.B, ok = is.searchMetricNameWithCache(metricName.B[:0], metricID) + if !ok { + // It is likely the metricID->metricName entry didn't propagate to inverted index yet. + // Skip this metricID for now. + continue } if err := mn.Unmarshal(metricName.B); err != nil { - return fmt.Errorf("cannot unmarshal metricName %q: %w", metricName.B, err) + logger.Panicf("FATAL: cannot unmarshal metricName %q: %s", metricName.B, err) } // Match the mn against tfs. diff --git a/lib/storage/index_db_test.go b/lib/storage/index_db_test.go index 28c01b72eb0a..16f5f2dec640 100644 --- a/lib/storage/index_db_test.go +++ b/lib/storage/index_db_test.go @@ -3,7 +3,6 @@ package storage import ( "bytes" "fmt" - "io" "math/rand" "os" "reflect" @@ -655,19 +654,19 @@ func testIndexDBCheckTSIDByName(db *indexDB, mns []MetricName, tsids []TSID, isC } // Search for metric name for the given metricID. - var err error - metricNameCopy, err = db.searchMetricNameWithCache(metricNameCopy[:0], genTSID.TSID.MetricID) - if err != nil { - return fmt.Errorf("error in searchMetricNameWithCache for metricID=%d; i=%d: %w", genTSID.TSID.MetricID, i, err) + var ok bool + metricNameCopy, ok = db.searchMetricNameWithCache(metricNameCopy[:0], genTSID.TSID.MetricID) + if !ok { + return fmt.Errorf("cannot find metricName for metricID=%d; i=%d", genTSID.TSID.MetricID, i) } if !bytes.Equal(metricName, metricNameCopy) { return fmt.Errorf("unexpected mn for metricID=%d;\ngot\n%q\nwant\n%q", genTSID.TSID.MetricID, metricNameCopy, metricName) } // Try searching metric name for non-existent MetricID. - buf, err := db.searchMetricNameWithCache(nil, 1) - if err != io.EOF { - return fmt.Errorf("expecting io.EOF error when searching for non-existing metricID; got %v", err) + buf, found := db.searchMetricNameWithCache(nil, 1) + if found { + return fmt.Errorf("unexpected metricName found for non-existing metricID; got %X", buf) } if len(buf) > 0 { return fmt.Errorf("expecting empty buf when searching for non-existent metricID; got %X", buf) diff --git a/lib/storage/partition.go b/lib/storage/partition.go index ddad2b67d0d2..ca91d71e3d60 100644 --- a/lib/storage/partition.go +++ b/lib/storage/partition.go @@ -119,8 +119,6 @@ type partition struct { inmemoryAssistedMerges uint64 smallAssistedMerges uint64 - mergeNeedFreeDiskSpace uint64 - mergeIdx uint64 smallPartsPath string @@ -354,8 +352,6 @@ type partitionMetrics struct { InmemoryAssistedMerges uint64 SmallAssistedMerges uint64 - - MergeNeedFreeDiskSpace uint64 } // TotalRowsCount returns total number of rows in tm. @@ -421,8 +417,6 @@ func (pt *partition) UpdateMetrics(m *partitionMetrics) { m.InmemoryAssistedMerges += atomic.LoadUint64(&pt.inmemoryAssistedMerges) m.SmallAssistedMerges += atomic.LoadUint64(&pt.smallAssistedMerges) - - m.MergeNeedFreeDiskSpace += atomic.LoadUint64(&pt.mergeNeedFreeDiskSpace) } // AddRows adds the given rows to the partition pt. @@ -640,45 +634,41 @@ func needAssistedMerge(pws []*partWrapper, maxParts int) bool { } func (pt *partition) assistedMergeForInmemoryParts() { - for { - pt.partsLock.Lock() - needMerge := needAssistedMerge(pt.inmemoryParts, maxInmemoryPartsPerPartition) - pt.partsLock.Unlock() - if !needMerge { - return - } + pt.partsLock.Lock() + needMerge := needAssistedMerge(pt.inmemoryParts, maxInmemoryPartsPerPartition) + pt.partsLock.Unlock() + if !needMerge { + return + } - atomic.AddUint64(&pt.inmemoryAssistedMerges, 1) - err := pt.mergeInmemoryParts() - if err == nil { - continue - } - if errors.Is(err, errNothingToMerge) || errors.Is(err, errForciblyStopped) { - return - } - logger.Panicf("FATAL: cannot merge inmemory parts: %s", err) + atomic.AddUint64(&pt.inmemoryAssistedMerges, 1) + err := pt.mergeInmemoryParts() + if err == nil { + return } + if errors.Is(err, errNothingToMerge) || errors.Is(err, errForciblyStopped) { + return + } + logger.Panicf("FATAL: cannot merge inmemory parts: %s", err) } func (pt *partition) assistedMergeForSmallParts() { - for { - pt.partsLock.Lock() - needMerge := needAssistedMerge(pt.smallParts, maxSmallPartsPerPartition) - pt.partsLock.Unlock() - if !needMerge { - return - } + pt.partsLock.Lock() + needMerge := needAssistedMerge(pt.smallParts, maxSmallPartsPerPartition) + pt.partsLock.Unlock() + if !needMerge { + return + } - atomic.AddUint64(&pt.smallAssistedMerges, 1) - err := pt.mergeExistingParts(false) - if err == nil { - continue - } - if errors.Is(err, errNothingToMerge) || errors.Is(err, errForciblyStopped) || errors.Is(err, errReadOnlyMode) { - return - } - logger.Panicf("FATAL: cannot merge small parts: %s", err) + atomic.AddUint64(&pt.smallAssistedMerges, 1) + err := pt.mergeExistingParts(false) + if err == nil { + return } + if errors.Is(err, errNothingToMerge) || errors.Is(err, errForciblyStopped) || errors.Is(err, errReadOnlyMode) { + return + } + logger.Panicf("FATAL: cannot merge small parts: %s", err) } func getNotInMergePartsCount(pws []*partWrapper) int { @@ -1119,7 +1109,9 @@ func (pt *partition) getMaxSmallPartSize() uint64 { } func (pt *partition) getMaxBigPartSize() uint64 { - workersCount := getDefaultMergeConcurrency(4) + // Always use 4 workers for big merges due to historical reasons. + // See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4915#issuecomment-1733922830 + workersCount := 4 return getMaxOutBytes(pt.bigPartsPath, workersCount) } @@ -1146,10 +1138,9 @@ func (pt *partition) mergeInmemoryParts() error { maxOutBytes := pt.getMaxBigPartSize() pt.partsLock.Lock() - pws, needFreeSpace := getPartsToMerge(pt.inmemoryParts, maxOutBytes, false) + pws := getPartsToMerge(pt.inmemoryParts, maxOutBytes, false) pt.partsLock.Unlock() - atomicSetBool(&pt.mergeNeedFreeDiskSpace, needFreeSpace) return pt.mergeParts(pws, pt.stopCh, false) } @@ -1166,13 +1157,20 @@ func (pt *partition) mergeExistingParts(isFinal bool) error { dst = append(dst, pt.inmemoryParts...) dst = append(dst, pt.smallParts...) dst = append(dst, pt.bigParts...) - pws, needFreeSpace := getPartsToMerge(dst, maxOutBytes, isFinal) + pws := getPartsToMerge(dst, maxOutBytes, isFinal) pt.partsLock.Unlock() - atomicSetBool(&pt.mergeNeedFreeDiskSpace, needFreeSpace) return pt.mergeParts(pws, pt.stopCh, isFinal) } +func assertIsInMerge(pws []*partWrapper) { + for _, pw := range pws { + if !pw.isInMerge { + logger.Panicf("BUG: partWrapper.isInMerge unexpectedly set to false") + } + } +} + func (pt *partition) releasePartsToMerge(pws []*partWrapper) { pt.partsLock.Lock() for _, pw := range pws { @@ -1186,14 +1184,6 @@ func (pt *partition) releasePartsToMerge(pws []*partWrapper) { var errNothingToMerge = fmt.Errorf("nothing to merge") -func atomicSetBool(p *uint64, b bool) { - v := uint64(0) - if b { - v = 1 - } - atomic.StoreUint64(p, v) -} - func (pt *partition) runFinalDedup() error { requiredDedupInterval, actualDedupInterval := pt.getRequiredDedupInterval() t := time.Now() @@ -1240,12 +1230,16 @@ func getMinDedupInterval(pws []*partWrapper) int64 { // if isFinal is set, then the resulting part will be saved to disk. // // All the parts inside pws must have isInMerge field set to true. +// The isInMerge field inside pws parts is set to false before returning from the function. func (pt *partition) mergeParts(pws []*partWrapper, stopCh <-chan struct{}, isFinal bool) error { if len(pws) == 0 { // Nothing to merge. return errNothingToMerge } + assertIsInMerge(pws) + defer pt.releasePartsToMerge(pws) + startTime := time.Now() // Initialize destination paths. @@ -1296,7 +1290,6 @@ func (pt *partition) mergeParts(pws []*partWrapper, stopCh <-chan struct{}, isFi putBlockStreamReader(bsr) } if err != nil { - pt.releasePartsToMerge(pws) return err } if mpNew != nil { @@ -1629,8 +1622,7 @@ func (pt *partition) removeStaleParts() { // getPartsToMerge returns optimal parts to merge from pws. // // The summary size of the returned parts must be smaller than maxOutBytes. -// The function returns true if pws contains parts, which cannot be merged because of maxOutBytes limit. -func getPartsToMerge(pws []*partWrapper, maxOutBytes uint64, isFinal bool) ([]*partWrapper, bool) { +func getPartsToMerge(pws []*partWrapper, maxOutBytes uint64, isFinal bool) []*partWrapper { pwsRemaining := make([]*partWrapper, 0, len(pws)) for _, pw := range pws { if !pw.isInMerge { @@ -1639,14 +1631,13 @@ func getPartsToMerge(pws []*partWrapper, maxOutBytes uint64, isFinal bool) ([]*p } maxPartsToMerge := defaultPartsToMerge var pms []*partWrapper - needFreeSpace := false if isFinal { for len(pms) == 0 && maxPartsToMerge >= finalPartsToMerge { - pms, needFreeSpace = appendPartsToMerge(pms[:0], pwsRemaining, maxPartsToMerge, maxOutBytes) + pms = appendPartsToMerge(pms[:0], pwsRemaining, maxPartsToMerge, maxOutBytes) maxPartsToMerge-- } } else { - pms, needFreeSpace = appendPartsToMerge(pms[:0], pwsRemaining, maxPartsToMerge, maxOutBytes) + pms = appendPartsToMerge(pms[:0], pwsRemaining, maxPartsToMerge, maxOutBytes) } for _, pw := range pms { if pw.isInMerge { @@ -1654,7 +1645,7 @@ func getPartsToMerge(pws []*partWrapper, maxOutBytes uint64, isFinal bool) ([]*p } pw.isInMerge = true } - return pms, needFreeSpace + return pms } // minMergeMultiplier is the minimum multiplier for the size of the output part @@ -1665,13 +1656,11 @@ func getPartsToMerge(pws []*partWrapper, maxOutBytes uint64, isFinal bool) ([]*p // The 1.7 is good enough for production workloads. const minMergeMultiplier = 1.7 -// appendPartsToMerge finds optimal parts to merge from src, appends -// them to dst and returns the result. -// The function returns true if src contains parts, which cannot be merged because of maxOutBytes limit. -func appendPartsToMerge(dst, src []*partWrapper, maxPartsToMerge int, maxOutBytes uint64) ([]*partWrapper, bool) { +// appendPartsToMerge finds optimal parts to merge from src, appends them to dst and returns the result. +func appendPartsToMerge(dst, src []*partWrapper, maxPartsToMerge int, maxOutBytes uint64) []*partWrapper { if len(src) < 2 { // There is no need in merging zero or one part :) - return dst, false + return dst } if maxPartsToMerge < 2 { logger.Panicf("BUG: maxPartsToMerge cannot be smaller than 2; got %d", maxPartsToMerge) @@ -1679,18 +1668,15 @@ func appendPartsToMerge(dst, src []*partWrapper, maxPartsToMerge int, maxOutByte // Filter out too big parts. // This should reduce N for O(N^2) algorithm below. - skippedBigParts := 0 maxInPartBytes := uint64(float64(maxOutBytes) / minMergeMultiplier) tmp := make([]*partWrapper, 0, len(src)) for _, pw := range src { if pw.p.size > maxInPartBytes { - skippedBigParts++ continue } tmp = append(tmp, pw) } src = tmp - needFreeSpace := skippedBigParts > 1 sortPartsForOptimalMerge(src) @@ -1709,15 +1695,12 @@ func appendPartsToMerge(dst, src []*partWrapper, maxPartsToMerge int, maxOutByte for i := minSrcParts; i <= maxSrcParts; i++ { for j := 0; j <= len(src)-i; j++ { a := src[j : j+i] - outSize := getPartsSize(a) - if outSize > maxOutBytes { - needFreeSpace = true - } if a[0].p.size*uint64(len(a)) < a[len(a)-1].p.size { // Do not merge parts with too big difference in size, // since this results in unbalanced merges. continue } + outSize := getPartsSize(a) if outSize > maxOutBytes { // There is no need in verifying remaining parts with bigger sizes. break @@ -1738,9 +1721,9 @@ func appendPartsToMerge(dst, src []*partWrapper, maxPartsToMerge int, maxOutByte if maxM < minM { // There is no sense in merging parts with too small m, // since this leads to high disk write IO. - return dst, needFreeSpace + return dst } - return append(dst, pws...), needFreeSpace + return append(dst, pws...) } func sortPartsForOptimalMerge(pws []*partWrapper) { diff --git a/lib/storage/partition_test.go b/lib/storage/partition_test.go index 931ada012cf8..2643347a50cb 100644 --- a/lib/storage/partition_test.go +++ b/lib/storage/partition_test.go @@ -34,24 +34,6 @@ func TestAppendPartsToMerge(t *testing.T) { testAppendPartsToMerge(t, 3, []uint64{11, 1, 10, 100, 10}, []uint64{10, 10, 11}) } -func TestAppendPartsToMergeNeedFreeSpace(t *testing.T) { - f := func(sizes []uint64, maxOutBytes int, expectedNeedFreeSpace bool) { - t.Helper() - pws := newTestPartWrappersForSizes(sizes) - _, needFreeSpace := appendPartsToMerge(nil, pws, defaultPartsToMerge, uint64(maxOutBytes)) - if needFreeSpace != expectedNeedFreeSpace { - t.Fatalf("unexpected needFreeSpace; got %v; want %v", needFreeSpace, expectedNeedFreeSpace) - } - } - f(nil, 1000, false) - f([]uint64{1000}, 100, false) - f([]uint64{1000}, 1100, false) - f([]uint64{120, 200}, 180, true) - f([]uint64{100, 200}, 310, false) - f([]uint64{100, 110, 109, 1}, 300, true) - f([]uint64{100, 110, 109, 1}, 330, false) -} - func TestAppendPartsToMergeManyParts(t *testing.T) { // Verify that big number of parts are merged into minimal number of parts // using minimum merges. @@ -69,7 +51,7 @@ func TestAppendPartsToMergeManyParts(t *testing.T) { iterationsCount := 0 sizeMergedTotal := uint64(0) for { - pms, _ := appendPartsToMerge(nil, pws, defaultPartsToMerge, maxOutSize) + pms := appendPartsToMerge(nil, pws, defaultPartsToMerge, maxOutSize) if len(pms) == 0 { break } @@ -118,7 +100,7 @@ func testAppendPartsToMerge(t *testing.T, maxPartsToMerge int, initialSizes, exp pws := newTestPartWrappersForSizes(initialSizes) // Verify appending to nil. - pms, _ := appendPartsToMerge(nil, pws, maxPartsToMerge, 1e9) + pms := appendPartsToMerge(nil, pws, maxPartsToMerge, 1e9) sizes := newTestSizesFromPartWrappers(pms) if !reflect.DeepEqual(sizes, expectedSizes) { t.Fatalf("unexpected size for maxPartsToMerge=%d, initialSizes=%d; got\n%d; want\n%d", @@ -135,7 +117,7 @@ func testAppendPartsToMerge(t *testing.T, maxPartsToMerge int, initialSizes, exp {}, {}, } - pms, _ = appendPartsToMerge(prefix, pws, maxPartsToMerge, 1e9) + pms = appendPartsToMerge(prefix, pws, maxPartsToMerge, 1e9) if !reflect.DeepEqual(pms[:len(prefix)], prefix) { t.Fatalf("unexpected prefix for maxPartsToMerge=%d, initialSizes=%d; got\n%+v; want\n%+v", maxPartsToMerge, initialSizes, pms[:len(prefix)], prefix) diff --git a/lib/storage/search.go b/lib/storage/search.go index 1873939f5534..f5b4ab7f894a 100644 --- a/lib/storage/search.go +++ b/lib/storage/search.go @@ -211,16 +211,12 @@ func (s *Search) NextMetricBlock() bool { // Skip the block, since it contains only data outside the configured retention. continue } - var err error - s.MetricBlockRef.MetricName, err = s.idb.searchMetricNameWithCache(s.MetricBlockRef.MetricName[:0], tsid.MetricID) - if err != nil { - if err == io.EOF { - // Skip missing metricName for tsid.MetricID. - // It should be automatically fixed. See indexDB.searchMetricNameWithCache for details. - continue - } - s.err = err - return false + var ok bool + s.MetricBlockRef.MetricName, ok = s.idb.searchMetricNameWithCache(s.MetricBlockRef.MetricName[:0], tsid.MetricID) + if !ok { + // Skip missing metricName for tsid.MetricID. + // It should be automatically fixed. See indexDB.searchMetricNameWithCache for details. + continue } s.prevMetricID = tsid.MetricID } diff --git a/lib/storage/storage.go b/lib/storage/storage.go index fff06b435eef..7d88d36c5182 100644 --- a/lib/storage/storage.go +++ b/lib/storage/storage.go @@ -1114,15 +1114,12 @@ func (s *Storage) SearchMetricNames(qt *querytracer.Tracer, tfss []*TagFilters, return nil, err } } - var err error - metricName, err = idb.searchMetricNameWithCache(metricName[:0], metricID) - if err != nil { - if err == io.EOF { - // Skip missing metricName for metricID. - // It should be automatically fixed. See indexDB.searchMetricName for details. - continue - } - return nil, fmt.Errorf("error when searching metricName for metricID=%d: %w", metricID, err) + var ok bool + metricName, ok = idb.searchMetricNameWithCache(metricName[:0], metricID) + if !ok { + // Skip missing metricName for metricID. + // It should be automatically fixed. See indexDB.searchMetricName for details. + continue } if _, ok := metricNamesSeen[string(metricName)]; ok { // The given metric name was already seen; skip it @@ -1175,13 +1172,11 @@ func (s *Storage) prefetchMetricNames(qt *querytracer.Tracer, srcMetricIDs []uin return err } } - metricName, err = is.searchMetricNameWithCache(metricName[:0], metricID) - if err != nil { - if err == io.EOF { - missingMetricIDs = append(missingMetricIDs, metricID) - continue - } - return fmt.Errorf("error in pre-fetching metricName for metricID=%d: %w", metricID, err) + var ok bool + metricName, ok = is.searchMetricNameWithCache(metricName[:0], metricID) + if !ok { + missingMetricIDs = append(missingMetricIDs, metricID) + continue } } idb.doExtDB(func(extDB *indexDB) { @@ -1193,11 +1188,7 @@ func (s *Storage) prefetchMetricNames(qt *querytracer.Tracer, srcMetricIDs []uin return } } - metricName, err = is.searchMetricNameWithCache(metricName[:0], metricID) - if err != nil && err != io.EOF { - err = fmt.Errorf("error in pre-fetching metricName for metricID=%d in extDB: %w", metricID, err) - return - } + metricName, _ = is.searchMetricNameWithCache(metricName[:0], metricID) } }) if err != nil && err != io.EOF { diff --git a/vendor/github.com/VictoriaMetrics/metricsql/parser.go b/vendor/github.com/VictoriaMetrics/metricsql/parser.go index c408768577a3..8458e5f71661 100644 --- a/vendor/github.com/VictoriaMetrics/metricsql/parser.go +++ b/vendor/github.com/VictoriaMetrics/metricsql/parser.go @@ -55,7 +55,6 @@ func getDefaultWithArgExprs() []*withArgExpr { clamp_max(step()/300, 1) )`, - `median_over_time(m) = quantile_over_time(0.5, m)`, `range_median(q) = range_quantile(0.5, q)`, `alias(q, name) = label_set(q, "__name__", name)`, }) diff --git a/vendor/github.com/VictoriaMetrics/metricsql/rollup.go b/vendor/github.com/VictoriaMetrics/metricsql/rollup.go index da3204adc129..99d8f56bc598 100644 --- a/vendor/github.com/VictoriaMetrics/metricsql/rollup.go +++ b/vendor/github.com/VictoriaMetrics/metricsql/rollup.go @@ -44,6 +44,7 @@ var rollupFuncs = map[string]bool{ "lifetime": true, "mad_over_time": true, "max_over_time": true, + "median_over_time": true, "min_over_time": true, "mode_over_time": true, "predict_linear": true, diff --git a/vendor/modules.txt b/vendor/modules.txt index 7c406cb47a50..37d0db292aeb 100644 --- a/vendor/modules.txt +++ b/vendor/modules.txt @@ -99,7 +99,7 @@ github.com/VictoriaMetrics/fasthttp/stackless # github.com/VictoriaMetrics/metrics v1.24.0 ## explicit; go 1.20 github.com/VictoriaMetrics/metrics -# github.com/VictoriaMetrics/metricsql v0.65.0 +# github.com/VictoriaMetrics/metricsql v0.66.0 ## explicit; go 1.13 github.com/VictoriaMetrics/metricsql github.com/VictoriaMetrics/metricsql/binaryop