Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmctl stops cluster to cluster migration when it finds a tenant with zero metrics. #4796

Closed
ThomasADavis opened this issue Aug 7, 2023 · 5 comments
Assignees
Labels
bug Something isn't working vmctl

Comments

@ThomasADavis
Copy link

Describe the bug

vmctl vm-native
--vm-native-src-addr=http://omni-192-168-85-56.nip.io
--vm-native-dst-addr=http://ov2-192-168-85-56.nip.io/
--vm-native-filter-time-start='2023-08-01T00:00:00Z'
--vm-native-step-interval=day
--vm-intercluster

gives this result:

VictoriaMetrics Native import mode
2023/08/07 12:31:40 Discovering tenants...
The following tenants were discovered: [0:0 100000:0 10000:0 1000:0 100:0 1100:0 1200:0 1300:0 200:0 20100:0 300:0 400:0 500:0 600:0 800:0 900:0].
 Continue? [Y/n] y

2023/08/07 12:31:42 Initing import process from "http://omni-192-168-85-56.nip.io/select/0:0/prometheus/api/v1/export/native" to "http://ov2-192-168-85-56.nip.io/insert/0:0/prometheus/api/v1/import/native" with filter 
	filter: match[]={__name__!=""}
	start: 2023-07-21T00:00:00Z
	end: 2023-07-21T02:00:00Z for tenant 0:0
2023/08/07 12:31:42 Exploring metrics...
2023/08/07 12:31:42 Found 1134 metrics to import
2023/08/07 12:31:42 Requests to make: 1134
Requests to make for tenant 0:0: 1134 / 1134 [██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 100.00%

2023/08/07 12:32:13 Initing import process from "http://omni-192-168-85-56.nip.io/select/100000:0/prometheus/api/v1/export/native" to "http://ov2-192-168-85-56.nip.io/insert/100000:0/prometheus/api/v1/import/native" with filter 
	filter: match[]={__name__!=""}
	start: 2023-07-21T00:00:00Z
	end: 2023-07-21T02:00:00Z for tenant 100000:0
2023/08/07 12:32:13 Exploring metrics...
2023/08/07 12:32:14 Found 3237 metrics to import
2023/08/07 12:32:14 Requests to make: 3237
Requests to make for tenant 100000:0: 3237 / 3237 [█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 100.00%

2023/08/07 12:32:21 Initing import process from "http://omni-192-168-85-56.nip.io/select/10000:0/prometheus/api/v1/export/native" to "http://ov2-192-168-85-56.nip.io/insert/10000:0/prometheus/api/v1/import/native" with filter 
	filter: match[]={__name__!=""}
	start: 2023-07-21T00:00:00Z
	end: 2023-07-21T02:00:00Z for tenant 10000:0
2023/08/07 12:32:21 Exploring metrics...
2023/08/07 12:32:21 migration failed: no metrics found

To Reproduce

Create a cluster.
Add tenants with data.
Pick a tenant, and delete the tenant's data.
try to migrate to another vmcluster using vmctl with the --vm-intercluster flag.
vmctl will stop when it hits the empty tenant.

Version

clusterVersion: v1.92.1-cluster
[root@vmetric-k3s-1 omni-2023]# vmctl --version
vmctl version vmctl-20230630-132322-tags-v1.91.3-0-g7226242070
2023/08/07 13:41:35 Total time: 364.19µs
[root@vmetric-k3s-1 omni-2023]# 

Logs

No response

Screenshots

No response

Used command-line flags

No response

Additional information

Cluster has been online for a long time:

[root@vmetric-k3s-1 omni-2023]# kubectl get ingress -n omni
NAME               CLASS    HOSTS                       ADDRESS                                     PORTS   AGE
omni-vmetric-nip   <none>   omni-192-168-85-56.nip.io   192.168.85.58,192.168.85.59,192.168.85.60   80      2y65d
[root@vmetric-k3s-1 omni-2023]# 

Tenants have come and gone;

    retentionPeriod: "12"

means no data for several tenants exist in the cluster.

What version this cluster started at is unknown.

Current work around is:

#!/bin/bash
#
TENANTS="100 200 300 400 500 600 700 800 900 1000 1100 1200 1300"

for TENANT in $TENANTS
do
	echo $TENANT
	vmctl vm-native -s \
  	  --vm-native-src-addr=http://omni-192-168-85-56.nip.io/select/$TENANT/prometheus \
	  --vm-native-dst-addr=http://ov2-192-168-85-56.nip.io/insert/$TENANT/prometheus \
	  --vm-native-filter-time-start='2023-08-01T00:00:00Z' \
	  --vm-native-step-interval=day \
	  --vm-concurrency 8 
done
@ThomasADavis ThomasADavis added the bug Something isn't working label Aug 7, 2023
@ThomasADavis
Copy link
Author

Also, doing the curl command to delete all the data from the tenant doesn't work in this cluster - the tenant in question that caused vmctl to stop already has no data.

We don't care if the tenant id itself doesn't get transferred, especially if the tenant has no data.

@hagen1778
Copy link
Collaborator

hagen1778 commented Aug 8, 2023

Thanks for report!

2023/08/07 12:32:21 migration failed: no metrics found

Indeed, vmctl shouldn't stop here. It should continue doing the rest of the work.
@dmitryk-dk would you mind taking over it?

@dmitryk-dk dmitryk-dk self-assigned this Aug 8, 2023
@dmitryk-dk
Copy link
Contributor

Hi @hagen1778 @ThomasADavis ! Yes, I will take a look and fix this problem

@hagen1778
Copy link
Collaborator

@ThomasADavis commit 39623ae has been merged to address the issue. It will be available in the next VM release.

@valyala valyala added the vmctl label Aug 17, 2023
@valyala
Copy link
Collaborator

valyala commented Aug 24, 2023

The commit 39623ae has been included in v1.93.1. Closing the issue as fixed then.

@valyala valyala closed this as completed Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working vmctl
Projects
None yet
Development

No branches or pull requests

4 participants