New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions for migrate data from cluster-to-cluster #4624
Comments
Addresses #4624 Signed-off-by: hagen1778 <roman@victoriametrics.com>
Yes, vmctl will migrate all the data. As a source you specify vmselect address. vmselect has access to all vmstorage nodes. Hence, it will return back data from all vmstorage nodes.
It will migrate all the data from old cluster, including replicated data. You're right, it will produce duplicates that need to be removed. And to keep the same replicationFactor (if needed), one should configure vminsert on the destination address with corresponding |
Addresses #4624 Signed-off-by: hagen1778 <roman@victoriametrics.com>
Addresses #4624 Signed-off-by: hagen1778 <roman@victoriametrics.com>
hi @hagen1778 , thanks a lot! I have another question, due to the data are very huge(more than 3 months, and more than 20T).. I tried from my side, seems like it's very slow to migrate these data, for example: could you give me some advices on it?. how to make it faster?. and after it's done, it pop out this error msg: and the vmselect seems like been killed automatically due to below error msg: how to avoid it? and another quick question.. if I just copy the source data file of vmstorage from old cluster to new cluster, like : thanks again!! |
See https://docs.victoriametrics.com/vmctl.html#migrating-data-from-victoriametrics
I'd also suggest to change interval from hours to days or weeks.
follow the recommendation from the error message and bump
vmselect persists temporary results from search queries on disk if there is no enough mem to fit them. It is recommended to configure vmselect with PVC with couple of GBs of the space.
It does. See https://docs.victoriametrics.com/#from-victoriametrics |
thx again!! your answer help me a lot!!! and another quick question.. if I just copy the source data file of vmstorage from old cluster to new cluster, like : It does. See https://docs.victoriametrics.com/#from-victoriametrics |
No, you can't merge data from two or more storage nodes with simple copying. It can be only done with 1-to-1 match. |
hi @hagen1778 sorry to bother you, I already changed the add the --vm-concurrency command, but seems like nor work, as bellow: before add the command: log: after add the command:
log: ----any suggestions?. |
If increasing concurrency doesn't help, you likely having a bottleneck elsewhere:
90MB/s migration speed is pretty decent. Are you sure the network between components isn't maxed out? |
If increasing concurrency doesn't help, you likely having a bottleneck elsewhere: Source database is maxed out by resources and there are many error msgs when I tried to migrate the data about 1 month.
what's these error msg mean?. ---sorry, I have many questions, because what I do now is a migration for huge data(over 20T), and there are some gap between old and new cluster(12 instances->6 instances, and the cpu/memary/disk are different also, this is the first time I do this, so I dont have any experiences T-T |
Hm, the error doesn't look related to VM. It is returned by network stack and suggests that there is no enough ports to establish new TCP connection. Do you have |
@hagen1778 hello again LOL, I checked with my peers, seems like it's not due to the network limit, because the network specification of our instance is 12 Gpbs , seems like the command '--vm-concurrency=16' dosent work , any advices?. command: and another question, if we cant decrease the migrate time, I want to know if the incremental data will be migrated after I execute the command?. for example:
and it will last for 60 hours or more, will the data of these 60 hours will be migrated too?. or it will catch a snapshot at 00:00:00 28 Jul, and only will migrate the data before this snapshot. thx a lot. |
and btw, I tested the bandwidth between source instance and des instance with iperf3, seems like the bandwidth is 5G/s, and the transfer speed is about 600mb/s. here is the test result:
|
You're right! Created a feature request #4738
network limit is one of the 3 points I listed here #4624 (comment). Have you verified the rest?
vmctl has no checkpoints for now, so it will re-import everything again. If this happens, it is recommended to configure |
@hagen1778 , I think I'll get the key point soon, thx for your help..
for the example I mentioned before, maybe I'm not make it clear: for new cluster - only finished the deployment , doesnt start to receive the new monitor data from vmagent yet. I executed the command in new cluster at 00:00:00 28 Jul:
and it will execute for 60 hours(will done at 30 Jul), due to the old cluster is still receiving the new monitor data, will it migrate the data of 29 Jul/30 Jul from old cluster to new cluster also in this execution period?. the new one only receive the data that vmctl transferred not from vmagent. |
I tested from my side, seems like it wont include the data of the execution time period... here is the command:
and here is the log:
and I exported the data of one metric, the latest timestamp is 1690866043 = 2023-08-01T05:00:43.000Z almost same as the timestamp I executed the command.. so if I want to ensure as little data loss as possible, the only way that I can do is to increase the limit of transfer.. do you have any other suggestions for this situation?. @hagen1778 |
hello, needs your help, bro. @hagen1778 TOT |
Yes, the end timestamp when omitted is set to now(). You can manually set it to whatever you want.
For minimal data loss it is recommended to start writing data to two destinations: old and new installations. Then one should use vmctl to migrate data from old to new installation starting from whenever you want to start and ending with the moment when you started to write to two destinations. Once migration is done the new installation will have complete data: ingested in realtime and migrated via vmctl. |
@hagen1778 thx a lot, I have tried from my side. this is the command: this is the log:
and I exported the data, the latest timestamp is 1691464723 = 2023-08-08T03:18:43.000Z seems like it still not include the latest data.. |
Is it for all metrics like this?
If migration takes 10min for 10 series it is likely each series takes 1m to migrate. Hence, the first time series in the list will contain data from In two words, you shouldn't expect vmctl to migrate all the data including last minutes. If you want to migrate from cluster A to cluster B without losing the most recent data - follow the advice described here #4624 (comment) |
thx, I see. you helped me a lot. great community! @hagen1778 |
Is your question request related to a specific component?
vmctl
Describe the question in detail
I want to migrate the monitor data from a cluster to another cluster. after I checked in doc, I found this part:
I have some questions:
./vmctl vm-native --vm-native-src-addr=http://127.0.0.1:8481/
--vm-native-dst-addr=http://127.0.0.1:8480/
--vm-native-filter-match='{name="vm_app_uptime_seconds"}'
--vm-native-filter-time-start='2023-02-01T00:00:00Z'
--vm-native-step-interval=day \
--vm-intercluster
. or I need to use individual IP, like
./vmctl vm-native --vm-native-src-addr=http://source 1:8481/
--vm-native-dst-addr=http://des A:8480/ \
./vmctl vm-native --vm-native-src-addr=http://source 2:8481/
--vm-native-dst-addr=http://des B:8480/ \
kindly give me some advices, thx a lot.
Troubleshooting docs
The text was updated successfully, but these errors were encountered: