Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmcluster: rerouting enhancement #4922

Closed
2 of 3 tasks
hagen1778 opened this issue Aug 30, 2023 · 13 comments
Closed
2 of 3 tasks

vmcluster: rerouting enhancement #4922

hagen1778 opened this issue Aug 30, 2023 · 13 comments
Labels
enhancement New feature or request

Comments

@hagen1778
Copy link
Collaborator

Is your question request related to a specific component?

VictoriaMetrics cluster is a distributed system. Its "heart" is its storage. In cluster storage is represented by vmstorage component - the only stateful component according to https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#architecture-overview.

It is an expected situation for distributed system components to be temporarily offline: due to network issues, maintenance, hardware failure, etc. But still, it is expected from the distributed system to remain available during such events.

VictoriaMetrics approach for remaining available when vmstorage component goes offline is rerouting. Rerouting means that if vminsert is unable to push data to the vmstorage, it should redirect/reroute the payload to another alive vmstorage node. In practice, this means that if cluster has 5 vmstorage nodes and one of them "dies" - vminserts will spread the traffic across the remaining 4 storage nodes. And if with 5 vmstorage nodes alive each of them were processing 20% of the traffic, when one storage dies the rest of vmstorage nodes should be able to process a 5% increase in traffic each.

The approach for rerouting is super simple. It doesn't require coordination, extra components or manual actions from user. Everything happens automatically. However, it also means that load increase on remaining nodes may result in cluster instability. This ticket is supposed to aggregate thoughts on the problem of VM cluster instability during rerouting events.

Describe the question in detail

What do you mean by instability?

In this specific issue, I am mostly concerned about data ingestion slow-down. We've received reports that high loaded VM installations experience data ingestion slow-down during vmstorage restarts.

Why does the cluster become unstable during rerouting?

In short, because of sharding.

Each vmstorage in the cluster setup is represented as a shard. The shard holds only a fraction of the data. During ingestion, vminsert component consistently shards data across available vmstorage nodes, making sure that each unique time series ends up on the same shard/vmstorage. For example, series foo and bar ingested in a 2-shard cluster will end up on separate vmstorages, no matter how many vminserts were doing the ingestion. Such an approach provides the benefit of better data locality, improves search speed, compression ratio, memory usage for caches, page cache, etc.

However, this also means that during rerouting, vmstorages start to receive new to them metrics. These rerouted metrics aren't present in vmstorage caches or indexes, or page cache. Registering new metrics or missing the cache is an orders of magnitude slower than accepting already-seen metrics.

What can be done?

We assume two pain points are leading to ingestion slow-down:

  1. The moment when re-routing starts
  2. The moment when restarted vmstorage recovers

The moment when re-routing starts

When one vmstorage goes off, its traffic gets rerouted to the rest of vmstorage nodes. As described above, this hits the problem that other vmstorage nodes had no knowledge of the rerouted metrics and start to lag. But we should remember that load here is spread across all vmstorage nodes. So with enough resources, the "hiccup" on ingestion should be unnoticeable.

The moment when restarted vmstorage recovers

Restarted vmstorage can get back online after a few minutes delay, usually. In high-churn environments or environments where vmstorage node could have been re-scheduled to another instance - it may be missing some caches on startup. So when it starts, it goes through the expensive way of registering not-seen metrics. And in contrast to p1, ingestion speed of the cluster is limited by the capabilities of this specific vmstorage node.

Experiment

To verify which pain point is the most problematic, I did an experiment:

  1. I have a testing VM cluster running with the following config: 2 vmselects (2CPU, 1Gi), 3 vminserts (1CPU, 1Gi), 5 vmstorages (1CPU, 4Gi)
  2. The cluster receives a data stream of 316K samples/s, Active Time series 4.4Mil, Churn rate 10Mil/day.
  3. I shut down one vmstorage and observe recovery speed for ingestion. This verifies p1.
  4. I return back the stopped vmstorage and observe recovery speed for ingestion. This verifies p2.
image

The moment when rerouting started - is when I stopped the vmstorage.
The moment when rerouting ended - is when I started the vmstorage.

Charts below show mem and CPU usage, as well as SlowInserts (cache misses) and PendingDatapoints panels
image

From this data it looks like p1 is more harmful:

  1. The ingestion was slowed for a bigger period of time
  2. The CPU and mem usage increase was bigger
  3. The amount of pending datapoints (a signed of a bottleneck) was higher

The data stream into the cluster is generated by vmagent. From the vmagent perspective on the same time interval the situation with remote-write was looking like the following:
image

We see that saturation was increased significantly on the vmstorage shutdown, and wasn't affected on its startup.

Experiment summary

My conclusion is that p1 is more harmful situation than p2 and should be optimized in the first place.
However, I can assume that the experiment was running in environment with low churn rate. It could be that in environments with much higher churn rate starting the vmstorage after a few minutes of downtime could result in a lot of cache misses and slow down the ingestion.

Troubleshooting docs

@hagen1778 hagen1778 added the question The question issue label Aug 30, 2023
@valyala
Copy link
Collaborator

valyala commented Aug 30, 2023

@hagen1778 , thanks for very detailed analysis!

The following additional interesting details can be extracted from this analysis:

  • Every vmstorage node contains around 4.4M / 5 = 0.9M active time series in steady state when all 5 vmstorage nodes are available in the cluster.
  • The 0.9M time series are registered on the remaining 4 vmstorage nodes in ~1 minute according to the data ingestion graph. This means that every vmstorage node can register new time series at the rate of 0.9M / 4 = 220K series per minute per CPU core on this workload, since the re-routed series are evenly spread among the remaining vmstorage nodes. This allows estimating the duration needed for registering new time series in the remaining vmstorage nodes when one vmstorage node goes offline:
durationMinutes = (activeSeries / N) / (N - 1) / cpuCores / 220K

Where:

  • N is the number of storage nodes in the cluster
  • cpuCores is the number of CPU cores per each vmstorage node

For example, if the cluster contains 10 vmstorage nodes with 4CPU cores each and it handles 100M active time series, then the duration of data ingestion slowdown when one vmstorage node goes offline would be (100M / 10) / 9 / 4 / 220K = 1.3 minutes.
The durationMinutes formula suggests approaches on how to reduce the duration of data ingestion slowdown when vmstorage node goes offline:

  • By increasing the number of vmstorage nodes in the cluster. For example, increasing the number of vmstorage nodes from 10 to 20 should reduce the duration of data ingestion slowdown by 4x times to 100M / 20 / 19 / 4 / 220K = 0.3 minutes.
  • By increasing the number of CPU cores per each vmstorage node. For example, increasing the number of CPU cores per each vmstorage nodes from 4 to 8 should reduce the duration of data ingestion slowdown by 2 times to 100M / 10 / 9 / 8 / 220K = 0.6 minutes.

Calculations above assume that every vmstorage node has enough free CPU and RAM for handling a temporary increase in the number of active time series during rolling restart. If resources aren't enough, then the cluster may experience data ingestion slowdown, instability and oom crashes for indefinite duration.

The duration of data ingestion slowdown can be reduced if the restarted vmstorage node returns back to the cluster in less than the durationMinutes. In this case vminsert will re-route time series back to the returned vmstorage node, without registering of the remaining time series on other vmstorage nodes.

@aierui
Copy link

aierui commented Sep 6, 2023

@hagen1778 , thanks for very detailed analysis!

VictoriaMetrics is an outstanding time-series product. It has effectively addressed the storage requirements for a significant volume of monitoring data in our operational environment.

During our usage, we have encountered similar challenging issues as described above. Furthermore, we have noticed that the timeout duration for vminsert to send data to vmstorage is determined based on the packet size. The shortest timeout is set to 60 seconds, and the longest is approximately 120 seconds.

When a vmstorage instance experiences a failure, it currently takes at least 60 seconds for vminsert to detect the faulty instance by itself. In network terms, 60 seconds is a remarkably long duration. Is there a parameter option available here to support different values?

timeoutSeconds := len(buf) / 3e5
if timeoutSeconds < 60 {
timeoutSeconds = 60
}
timeout := time.Duration(timeoutSeconds) * time.Second
deadline := time.Now().Add(timeout)
if err := bc.SetWriteDeadline(deadline); err != nil {
return fmt.Errorf("cannot set write deadline to %s: %w", deadline, err)
}
// sizeBuf guarantees that the rows batch will be either fully
// read or fully discarded on the vmstorage side.
// sizeBuf is used for read optimization in vmstorage.
sizeBuf := sizeBufPool.Get()
defer sizeBufPool.Put(sizeBuf)
sizeBuf.B = encoding.MarshalUint64(sizeBuf.B[:0], uint64(len(buf)))
if _, err := bc.Write(sizeBuf.B); err != nil {
return fmt.Errorf("cannot write data size %d: %w", len(buf), err)
}
if _, err := bc.Write(buf); err != nil {
return fmt.Errorf("cannot write data with size %d: %w", len(buf), err)
}
if err := bc.Flush(); err != nil {
return fmt.Errorf("cannot flush data with size %d: %w", len(buf), err)
}
// Wait for `ack` from vmstorage.
// This guarantees that the message has been fully received by vmstorage.
deadline = time.Now().Add(timeout)
if err := bc.SetReadDeadline(deadline); err != nil {
return fmt.Errorf("cannot set read deadline for reading `ack` to vmstorage: %w", err)
}

error message:

2023-09-03T12:39:25.687+0800	warn	VictoriaMetrics/app/vminsert/netstorage/netstorage.go:303	cannot send 280520 bytes with 973 rows to -storageNode="vmstorage-1:8400": cannot read `ack` from vmstorage: cannot read data in 60.000 seconds: read tcp4 10.171.192.149:46136->10.170.108.235:8400: i/o timeout; closing the connection to storageNode and re-routing this data to healthy storage nodes
2023-09-03T12:39:30.689+0800	warn	VictoriaMetrics/app/vminsert/netstorage/netstorage.go:261	cannot dial storageNode "vmstorage-1:8400": dial tcp4 10.170.108.235:8400: i/o timeout

@valyala
Copy link
Collaborator

valyala commented Sep 6, 2023

@aierui , this issue has been addressed in the pull request #4423 , which will be included in the next release. This pull request reduces the network timeout for unavailable vmstorage instance from 60 seconds to 3 seconds by default. Additionally, this timeout can be configured when needed via -vmstorageUserTimeout command-line flag.

In the mean time you can build vminsert and vmselect from the latest commit in the cluster branch (this is e0923f9 right now) according to these docs and verify whether it reduces the timeout needed to start re-routing from the disappeared vmstorage node from 60 seconds to 3 seconds.

@aierui
Copy link

aierui commented Sep 6, 2023

Thank you very much for your reply! @valyala

It looks great👍.

Upon my initial review of the code changes in the pull request #4423 , I have a question: Why did we not utilize the SetReadDeadline() or SetWriteDeadline() methods provided by the standard library's net package, and instead opted for the TCP_USER_TIMEOUT socket option ?

@wjordan
Copy link

wjordan commented Sep 7, 2023

Why did we not utilize the SetReadDeadline() or SetWriteDeadline() methods provided by the standard library's net package, and instead opted for the TCP_USER_TIMEOUT socket option ?

@aierui as you already noted SetReadReadline() and SetWriteDeadline() are already being used, only with a minimum of 60 seconds. These Deadlines set a timeout for the entire Read()/Write() call, which can be quite large (maxInsertRequestSize defaults to 32MB), and so even if this were configurable it would still need to be extremely conservative. TCP_USER_TIMEOUT sets a timeout for each low-level TCP packet transmission, so it can be set much lower based on the network round-trip time. (The 3-second default allows for ~3 RTO retransmissions with Linux TCP_RTO_MIN of ~200 ms)

@aierui
Copy link

aierui commented Sep 8, 2023

Thank you very much @wjordan for providing a detailed explanation.

@wjordan
Copy link

wjordan commented Oct 13, 2023

One approach to minimize the impact of slowdown / increased resource-usage caused by rerouting metrics to alternate storage nodes would be to buffer pending data for unhealthy storage nodes instead of immediately rerouting. A file buffer could be used (as in vmagent) to keep worse-case memory usage low once pending data exceeds the 30MiB max vminsert packet size.

Some reasons against this solution were raised in #791 (comment), here are some ideas on how to address them:

  • The buffer may grow to unlimited sizes if the corresponding vmstorage node remains unavailable for long periods of time.
  • A buffer could be used to delay rerouting by an adjustable timeout, instead of allowing indefinitely-long unavailability. ElasticSearch has delayed shard allocation (the index.unassigned.node_left.delayed_timeout dynamic setting, which defaults to 1m), which works quite well to minimize the impact of brief/intermittent network issues as well as planned rolling-restart cluster upgrades.

vmagent solves this issue by dropping the oldest data if the buffer grows beyond -remoteWrite.maxDiskUsagePerURL command-line value. Such approach cannot be used by vminsert, since this will mean incoming data loss when it could be re-routed to the remaining vmstorage nodes.

  • Instead of dropping data when the buffer is full, you could reroute the data at that point.

  • When replicationFactor > 1 and if dropSamplesOnOverload could be made to work with replicationFactor (Take in account replicationFactor with dropSamplesOnOverload=true #4798), it could be possible to drop redundant data for up to replicationFactor-1 unhealthy storage nodes while ensuring no data loss. In this case it could make sense to introduce a configuration to prefer dropping samples rather instead of rerouting them.

  • The data buffered at vminsert is invisible in queries. This means that vmselect may return incomplete responses when a vmstorage node is temporarily unavailable.
  • When replicationFactor > 1 and skipSlowReplicas = false (default), up to replicationFactor-1 unhealthy storage nodes could be buffered without any impact to query availability.

  • There is some tradeoff between query availability and storage-node instability caused by rerouting, but I would argue the cluster instability is worse or at least there should be a configurable choice to prefer one over the other. The impact of registering new time series on rerouted storage nodes often leads to massive resource-usage spikes, cascading storage-node failure and even potential data loss. This impact can be much more severe and longer-lasting than the impact of a brief period of incomplete responses due to buffered data while a storage node is temporarily unhealthy.

@hagen1778
Copy link
Collaborator Author

A buffer could be used to delay rerouting by an adjustable timeout, instead of allowing indefinitely-long unavailability.

The problem with the buffer that it could play very well for low-loaded installation. But everything will work well for such installation, as they not experience that much pressure. They might not even notice rerouting storms as our playground doesnt.
For big installations, the in-memory buffer can be filled in seconds. This would increase complexity, memory usage of vminsert and give no benefits.

When replicationFactor > 1 and if dropSamplesOnOverload could be made to work with replicationFactor

That's one of the options, yes. But it results in data loss. Despite the current issues with rerouting, even if cluster is unstable, the ingested data will be buffered on client (vmagent i.e) but still will be delivered eventually.

This impact can be much more severe and longer-lasting than the impact of a brief period of incomplete responses due to buffered data while a storage node is temporarily unhealthy.

Afaik, the impact of this is that recording or alerting rules may misbehave for some period of time. But in other terms, data won't be lost. I agree that rules misbehaving is severe, but dropping data on the floor has the same or even bigger severity level.

@hagen1778
Copy link
Collaborator Author

hagen1778 commented Oct 18, 2023

One of the simplest ideas to gradually reroute data is to make vmstorage to close accepted connections one by one on a specified time interval. Let's say, we have 6 vminserts sending data to 10 storage nodes. If we want to reboot 1 vmstorage gracefully, we send it SIGTERM as usually, but instead of closing all the connections at one vmstorage could spread this action over configured gracefulShutdownInterval. This would make vminserts to start rerouting one-by-one, not all together.

This feature should be easy to implement and test, since it doesn't require changes to vminsert or intercommunication protocol. I hope @zekker6 will try to test it and report his findings.

The downside of this feature is that it won't work for low number of vminserts. Since vminsert establishes only 1 connection to vmstorage then it means that the minimum amount of rerouted data in one step is equal to 1/len(vminserts).

@zekker6
Copy link
Contributor

zekker6 commented Oct 23, 2023

Tested an option suggested by @hagen1778:

One of the simplest ideas to gradually reroute data is to make vmstorage to close accepted connections one by one on a specified time interval. Let's say, we have 6 vminserts sending data to 10 storage nodes. If we want to reboot 1 vmstorage gracefully, we send it SIGTERM as usually, but instead of closing all the connections at one vmstorage could spread this action over configured gracefulShutdownInterval. This would make vminserts to start rerouting one-by-one, not all together.

Test setup:

  • 5x vmstorage - 1 CPU / 4 GB ram
  • 3x vminsert - 0.6 CPU / 0.5 GB
  • ingestion rate - 452K

Here are results for v1.94.0:

  • Ingestion rate restored to the prior rate in 1m45s
    1698073791
  • CPU spiked up to 100%
    1698073844
  • vmstorage connection saturation
    1698074307

Version with gradual drop of vminsert connections:

  • Ingestion rate restored in ~1m, overall spike seems smoother
    1698074481
  • CPU spike is also lower
    1698074573
  • As well as vmstorage connection saturation
    1698074613

The only downside I could see is that rolling restart now will either be harder or will take more time to perform full restart of 5 nodes.

@hagen1778
Copy link
Collaborator Author

hagen1778 commented Nov 2, 2023

@zekker6 shared with me another round of tests:

  1. the number of vminserts was increased to 8
  2. the graceful shutdown period increased to 2min

@zekker6 also changed the way how we test the changes: now we run two clusters (patched and unpatched) concurrently and can see how both are effected with the vmstorage restart.

image

On the screenshot we see two vmstorage jobs: patched and base. Patched is the one implementing idea from this comment. Base is original version.
It is clear, that load on the patched version is more gradual. In fact, with 8 vminserts each storage node (5 vmstorages in total) has 8 connections established. With ingestion rate of 280k each vmstorage receives 280k/5 = 56k samples/s via 8 connections. Hence, each connection is supposed to transfer 56/8 = 7k samples/s. Closing all connections at once (a regular shutdown of the vmstorage node) means vminserts need to re-route 56k samples/s immediately, increasing load on each vmstorage by 12.5% at once.
Closing 8 connections gradually on 2m interval means we aim to re-route 7k samples each 120s/8=15s. This should smooth resource usage by vmstorages and vminserts.

According to the screenshot above, the load was indeed smoother. CPU usage and ingestion speed were less anomalous for the patched version. But what is also very interesting is vmiserts behavior:
image

It is clear, that vminserts writing to the patched storage nodes experienced lower connection saturation and memory usage during vmstorage restart. Lower saturation also means smaller queue build-up, which results into higher data freshness for read queries.

I personally think we should proceed with adding this feature to the upstream.

zekker6 added a commit that referenced this issue Nov 3, 2023
…torage

Implements graceful shutdown approach suggested here - #4922 (comment)

Test results for this can be found here - #4922 (comment)

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
valyala added a commit that referenced this issue Nov 14, 2023
* app/vmstorage: close vminsert connections gradually before stopping storage

Implements graceful shutdown approach suggested here - #4922 (comment)

Test results for this can be found here - #4922 (comment)

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmstorage: update graceful shutdown logic

- close connections from vminsert in determenistic order
- update flag description
- lower default timeout to 25 seconds. 25 seconds value was chosen because the lowest default value used in default configuration deployments is 30s(default value in Kubernetes and ansible-playbooks).

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/cluster: add information about re-routing enhancement during restart

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/changelog: add entry for new command-line flag

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* {app/vmstorage,lib/ingestserver}: address review feedback

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/cluster: add note to update workload scheduler timeout

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* wip

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
valyala added a commit that referenced this issue Nov 14, 2023
* app/vmstorage: close vminsert connections gradually before stopping storage

Implements graceful shutdown approach suggested here - #4922 (comment)

Test results for this can be found here - #4922 (comment)

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmstorage: update graceful shutdown logic

- close connections from vminsert in determenistic order
- update flag description
- lower default timeout to 25 seconds. 25 seconds value was chosen because the lowest default value used in default configuration deployments is 30s(default value in Kubernetes and ansible-playbooks).

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/cluster: add information about re-routing enhancement during restart

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/changelog: add entry for new command-line flag

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* {app/vmstorage,lib/ingestserver}: address review feedback

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/cluster: add note to update workload scheduler timeout

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* wip

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
@valyala valyala added enhancement New feature or request and removed question The question issue labels Nov 14, 2023
@valyala
Copy link
Collaborator

valyala commented Nov 14, 2023

The commit f783476 enables gradual closing of vminsert connections during vmstorage graceful shutdown. By default vminsert connections are closed during 25 seconds. This duration can be tuned with the -storage.vminsertConnsShutdownDuration command-line flag at vmstorage. See these docs for more details.

This commit will be included in the next release of VictoriaMetrics. In the mean time it is possible to build vmstorage from this commit according to these docs and verifying whether the patched vmstorage helps reducing data ingestion slowdown during rolling restarts.

valyala added a commit that referenced this issue Nov 14, 2023
valyala added a commit that referenced this issue Nov 14, 2023
valyala added a commit that referenced this issue Nov 14, 2023
Previously there was off-by-one error, which resulted in logging len(conns-1) connections instead of len(conns)

Updates #4922
valyala added a commit that referenced this issue Nov 14, 2023
Previously there was off-by-one error, which resulted in logging len(conns-1) connections instead of len(conns)

Updates #4922
AndrewChubatiuk pushed a commit to AndrewChubatiuk/VictoriaMetrics that referenced this issue Nov 15, 2023
* app/vmstorage: close vminsert connections gradually before stopping storage

Implements graceful shutdown approach suggested here - VictoriaMetrics#4922 (comment)

Test results for this can be found here - VictoriaMetrics#4922 (comment)

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmstorage: update graceful shutdown logic

- close connections from vminsert in determenistic order
- update flag description
- lower default timeout to 25 seconds. 25 seconds value was chosen because the lowest default value used in default configuration deployments is 30s(default value in Kubernetes and ansible-playbooks).

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/cluster: add information about re-routing enhancement during restart

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/changelog: add entry for new command-line flag

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* {app/vmstorage,lib/ingestserver}: address review feedback

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/cluster: add note to update workload scheduler timeout

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* wip

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
AndrewChubatiuk pushed a commit to AndrewChubatiuk/VictoriaMetrics that referenced this issue Nov 15, 2023
AndrewChubatiuk pushed a commit to AndrewChubatiuk/VictoriaMetrics that referenced this issue Nov 15, 2023
Previously there was off-by-one error, which resulted in logging len(conns-1) connections instead of len(conns)

Updates VictoriaMetrics#4922
@valyala
Copy link
Collaborator

valyala commented Nov 15, 2023

vmstorage improves re-routing handling for incoming data during graceful shutdown according to these docs starting from v1.95.0.

Closing this feature request as done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants