New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vmselect: query over multiple availability zones #4792
Comments
The unauthorized_user:
url_prefix:
- http://vmauth-zone-A/?deny_partial_response=1
- http://vmauth-zone-B/?deny_partial_response=1 This config adds The lower level of unauthorized_user:
url_map:
- src_paths:
- /api/v1/query
- /api/v1/query_range
url_prefix:
- http://vmselect-1:8481/select/0/prometheus
# ...
- http://vmselect-N:8481/select/0/prometheus |
…d and some of vmstorage nodes are temporarily unavailable This should help detecting this case and automatic retrying the query at healthy cluster replica in another availability zone. This commit is needed as a preparation for automatic query retry at another backend at vmauth on 5xx errors as described at #4792 (comment)
…d and some of vmstorage nodes are temporarily unavailable This should help detecting this case and automatic retrying the query at healthy cluster replica in another availability zone. This commit is needed as a preparation for automatic query retry at another backend at vmauth on 5xx errors as described at #4792 (comment)
…d and some of vmstorage nodes are temporarily unavailable This should help detecting this case and automatic retrying the query at healthy cluster replica in another availability zone. This commit is needed as a preparation for automatic query retry at another backend at vmauth on 5xx errors as described at #4792 (comment)
…d and some of vmstorage nodes are temporarily unavailable This should help detecting this case and automatic retrying the query at healthy cluster replica in another availability zone. This commit is needed as a preparation for automatic query retry at another backend at vmauth on 5xx errors as described at #4792 (comment)
…d and some of vmstorage nodes are temporarily unavailable This should help detecting this case and automatic retrying the query at healthy cluster replica in another availability zone. This commit is needed as a preparation for automatic query retry at another backend at vmauth on 5xx errors as described at #4792 (comment)
…d and some of vmstorage nodes are temporarily unavailable This should help detecting this case and automatic retrying the query at healthy cluster replica in another availability zone. This commit is needed as a preparation for automatic query retry at another backend at vmauth on 5xx errors as described at #4792 (comment)
…odes This should allow implementing high availability scheme described at #4792 (comment) See also #4893
…odes This should allow implementing high availability scheme described at #4792 (comment) See also #4893
The related issue - #5197 |
…load balancing policy vmauth in `hot standby` mode sends requests to the first url_prefix while it is available. If the first url_prefix becomes unavailable, then vmauth falls back to the next url_prefix. This allows building highly available setup as described at https://docs.victoriametrics.com/vmauth.html#high-availability Updates #4893 Updates #4792
…load balancing policy vmauth in `hot standby` mode sends requests to the first url_prefix while it is available. If the first url_prefix becomes unavailable, then vmauth falls back to the next url_prefix. This allows building highly available setup as described at https://docs.victoriametrics.com/vmauth.html#high-availability Updates #4893 Updates #4792
FYI, the upcoming release of vmauth will provide the ability to send all the requests to the closest AZ and to fall back to other AZs only when the closest AZ is unavailable. For example, the following unauthorized_user:
url_prefix:
- 'https://vmselect-az1/?deny_partial_response=1'
- 'https://vmselect-az2/?deny_partial_response=1'
retry_status_codes: [500, 502, 503]
load_balancing_policy: first_available
This allows building highly available setups as described in these docs.
unauthorized_user:
url_prefix:
- http://vmselect-1:8481/
...
- http://vmselect-N:8481/ See these docs for details on how This functionality can be tested by building |
Link to the related issue - #4792 Fix heading for `Modifying HTTP headers` chapter at docs/vmagent.md
Link to the related issue - #4792 Fix heading for `Modifying HTTP headers` chapter at docs/vmagent.md
But what if vmauth runs in az2 and gets a full response from az1 vmselect I understand that not all running their VM clusters on k8s, yet if they are I Update: Tried it and it does not seem like to work at all: References: We a very interested in something similar yet a bit different, we want to make While writing a comment I checked the docs and it seems like vminsert spreads /vminsert-prod \
--storageNode=vmcluster-victoria-metrics-cluster-vmstorage-0.vmcluster-victoria-metrics-cluster-vmstorage.vmcluster.svc.cluster.local:8400 \
--storageNode=vmcluster-victoria-metrics-cluster-vmstorage-1.vmcluster-victoria-metrics-cluster-vmstorage.vmcluster.svc.cluster.local:8400 \
--envflag.enable=true --envflag.prefix=VM_ --loggerFormat=json \
--maxLabelsPerTimeseries=50 It's hardly we could make it balance, at least with this configuration. Also it What do you think? |
It isn't recommended to spread
You need to replicate incoming data among multiple completely independent VictoriaMetrics clusters located in different AZs in order to achieve real HA. If a single AZ becomes unavailable, then all the data remains for querying available in other AZs, while new data continues flowing into these other AZs. The data can be replicated among multiple AZs with |
The |
Sorry to post on a closed issue, but while the context is here fresh wanted to clarify. Do I understand correctly that practically with this setup:
I can basically run two single servers in different zones, and pushing metrics only it's own az (I will need to have configured dynamic write urls for services/vmagent in different azs). Meaning every vm will only have metrics from the services in it's own az. |
@ivankovnatsky , you need to write the same data to VictoriaMetrics instances in every AZ. This can be done by specifying multiple |
Is your question request related to a specific component?
vmselect
Describe the question in detail
It is a common approach to run multiple VictoriaMetrics clusters in each availability zone (AZ) for reliability purposes. Usually, each AZ will contain identical data, ensuring that if one AZ fails - another AZ will continue returning complete results.
The recommended multi-AZ setup assumes that user has more than one AZ, each AZ has a cluster of vmstorage and vminsert nodes. The data stored in each AZ is supposed to be identical. For querying the data it is expected for a user to choose from one of the options:
Both cases will protect the user in case one of AZs becomes unreachable or returns partial results. But each of them will suffer from the increased latency produced by the slowest AZ as vmselect will wait for responses from all vmstorage nodes in all AZs.
One of the ways to solve this would be to make vmselect smarter by introducing storage groups. With introducing additional logic vmselect may query only the fastest AZ, and fallback to other options of the fastest AZ is unreachable or returns incomplete data.
However, such approach could be also implemented without adding extra logic to vmselect. Using nginx or any other reverse proxy would suffice. For example, HTTP load balancing in nginx already provides such features as:
On the pic above we have two VM clusters in zone A and B. These are independent clusters but with identical data.
Tor read from both clusters we use 2 layers of Nginx:
-search.denyPartialResponse
setting.The text was updated successfully, but these errors were encountered: