-
-
Notifications
You must be signed in to change notification settings - Fork 732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple Connections/Streams #6794
Comments
You can run multiple borg processes in parallel, backing up to one repo per process. |
Hi Thomas, we use it to backup data from a data warehouse. We can't split the data across multiple repos without losing consistency I'm afraid. |
no. not being able to saturate your connection with 1 borg likely comes from internal processing being single-threaded and not internally queued. but not sure how you ensure consistency. if you used a snapshot to get consistency, you could also run multiple borg to save the snapshot. |
Is this the first backup you are doing or is there already data in the repo from previous backups? |
it is not the first backup we just got the point where they can't complete within a day anymore. When we use iperf3 to measure the bandwidth then we can see that a single connection only gets 100-200M while multiple get > 900M. For data centers that are not on the other side of the world, we get a higher bandwidth for a single connection. So I doubt it is borg directly. Btw. borg CPU usage is always sitting at 10-20% of one core while uploading. Only when saving the file cache does it go to 100% and bandwidth to 0. The files are also quite large (multiple GB). We do have a hardlink-based snapshot. How would we run multiple borg processes and ensure that they are not cannibalizing each other and also that we end up with a consistent backup? |
borg manages caching, indexes and locking based on the repo id (which is unique and random). so you can run borg on the same machine, as the same user, at the same time IF you use different repos. so you could partition your input data set and give each part to another borg. |
also wondering why a not-first backup takes that long. does the dedup not work or is it really lots of NEW data? |
There is more new data than 100MBit/s can do. |
Unfortunately, partitioning is not possible with the way the data is stored. 90% is under the same directory and then goes into around one million files. |
ok. iirc there is some you use some fast compression (default is lz4, zstd,1 .. zstd,3 would also work i guess)? |
another idea is not to use different repo for partitions of the data, but for different times. not pretty, but would work: use a different repo depending on weekday. |
the data is already compressed, hence we don't use any Are there any plans to support multi-connection uploads? Would it be a major change or something simple? |
the majority of the new data is from the last 24 hours :( these are in the same place - not really possible to be split. |
--upload-buffer is about buffering, not compression. |
Sorry I quoted the wrong line. ;) |
Unfortunately changing the buffer does not help. Restic added parallel uploads not too long ago, if borg had something similar it would be great. |
with the current backend structure multi connection upload are not sensibly possible as the log structured store is not concurrent and the encryption scheme is also not yet prepared for such a scenario i would imagine that a major refactor would be necessary to support them |
I see, thank you. |
Are there any plans for supporting multiple concurrent connections for the data transfer? Or is this already possible somehow?
Doing backups across > 1G networks is quite slow, due to the bottleneck of a single connection.
For cross-continent backups, it can be even worse, where a single connection won't be able to utilize a 1G and sit at 200M max.
The text was updated successfully, but these errors were encountered: