Multiple Connections/Streams #6794

Slind14 · 2022-06-26T11:51:49Z

Are there any plans for supporting multiple concurrent connections for the data transfer? Or is this already possible somehow?

Doing backups across > 1G networks is quite slow, due to the bottleneck of a single connection.
For cross-continent backups, it can be even worse, where a single connection won't be able to utilize a 1G and sit at 200M max.

ThomasWaldmann · 2022-06-26T12:26:03Z

You can run multiple borg processes in parallel, backing up to one repo per process.

Slind14 · 2022-06-26T12:34:01Z

Hi Thomas,

we use it to backup data from a data warehouse. We can't split the data across multiple repos without losing consistency I'm afraid.
Is there another option?

ThomasWaldmann · 2022-06-26T13:04:20Z

no. not being able to saturate your connection with 1 borg likely comes from internal processing being single-threaded and not internally queued.

but not sure how you ensure consistency. if you used a snapshot to get consistency, you could also run multiple borg to save the snapshot.

ThomasWaldmann · 2022-06-26T13:05:19Z

Is this the first backup you are doing or is there already data in the repo from previous backups?

Slind14 · 2022-06-26T13:10:30Z

it is not the first backup we just got the point where they can't complete within a day anymore.

When we use iperf3 to measure the bandwidth then we can see that a single connection only gets 100-200M while multiple get > 900M.

For data centers that are not on the other side of the world, we get a higher bandwidth for a single connection. So I doubt it is borg directly. Btw. borg CPU usage is always sitting at 10-20% of one core while uploading. Only when saving the file cache does it go to 100% and bandwidth to 0. The files are also quite large (multiple GB).

We do have a hardlink-based snapshot. How would we run multiple borg processes and ensure that they are not cannibalizing each other and also that we end up with a consistent backup?

ThomasWaldmann · 2022-06-26T13:14:47Z

borg manages caching, indexes and locking based on the repo id (which is unique and random). so you can run borg on the same machine, as the same user, at the same time IF you use different repos.

so you could partition your input data set and give each part to another borg.

ThomasWaldmann · 2022-06-26T13:16:11Z

also wondering why a not-first backup takes that long. does the dedup not work or is it really lots of NEW data?

Slind14 · 2022-06-26T13:21:26Z

also wondering why a not-first backup takes that long. does the dedup not work or is it really lots of NEW data?

There is more new data than 100MBit/s can do.

Slind14 · 2022-06-26T13:24:20Z

borg manages caching, indexes and locking based on the repo id (which is unique and random). so you can run borg on the same machine, as the same user, at the same time IF you use different repos.

so you could partition your input data set and give each part to another borg.

Unfortunately, partitioning is not possible with the way the data is stored. 90% is under the same directory and then goes into around one million files.

ThomasWaldmann · 2022-06-26T13:32:17Z

ok.

iirc there is some --upload-buffer (or so) option, maybe you can try using that to speed it up.

you use some fast compression (default is lz4, zstd,1 .. zstd,3 would also work i guess)?

ThomasWaldmann · 2022-06-26T13:34:23Z

another idea is not to use different repo for partitions of the data, but for different times.

not pretty, but would work: use a different repo depending on weekday.

Slind14 · 2022-06-26T14:08:42Z

iirc there is some --upload-buffer (or so) option, maybe you can try using that to speed it up.

the data is already compressed, hence we don't use any

Are there any plans to support multi-connection uploads? Would it be a major change or something simple?

Slind14 · 2022-06-26T14:23:21Z

another idea is not to use different repo for partitions of the data, but for different times.

the majority of the new data is from the last 24 hours :( these are in the same place - not really possible to be split.

ThomasWaldmann · 2022-06-26T14:38:08Z

--upload-buffer is about buffering, not compression.

Slind14 · 2022-06-26T14:58:29Z

--upload-buffer is about buffering, not compression.

Sorry I quoted the wrong line. ;)

Slind14 · 2022-06-26T21:48:19Z

Unfortunately changing the buffer does not help.

Restic added parallel uploads not too long ago, if borg had something similar it would be great.

restic/restic#3593
restic/restic#3513

RonnyPfannschmidt · 2022-06-27T05:24:08Z

with the current backend structure multi connection upload are not sensibly possible as the log structured store is not concurrent and the encryption scheme is also not yet prepared for such a scenario

i would imagine that a major refactor would be necessary to support them

Slind14 · 2022-06-27T09:29:30Z

I see, thank you.

ThomasWaldmann added the question label Jun 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple Connections/Streams #6794

Multiple Connections/Streams #6794

Slind14 commented Jun 26, 2022

ThomasWaldmann commented Jun 26, 2022

Slind14 commented Jun 26, 2022

ThomasWaldmann commented Jun 26, 2022

ThomasWaldmann commented Jun 26, 2022

Slind14 commented Jun 26, 2022 •

edited

ThomasWaldmann commented Jun 26, 2022

ThomasWaldmann commented Jun 26, 2022

Slind14 commented Jun 26, 2022

Slind14 commented Jun 26, 2022 •

edited

ThomasWaldmann commented Jun 26, 2022

ThomasWaldmann commented Jun 26, 2022

Slind14 commented Jun 26, 2022

Slind14 commented Jun 26, 2022

ThomasWaldmann commented Jun 26, 2022

Slind14 commented Jun 26, 2022

Slind14 commented Jun 26, 2022

RonnyPfannschmidt commented Jun 27, 2022

Slind14 commented Jun 27, 2022

Multiple Connections/Streams #6794

Multiple Connections/Streams #6794

Comments

Slind14 commented Jun 26, 2022

ThomasWaldmann commented Jun 26, 2022

Slind14 commented Jun 26, 2022

ThomasWaldmann commented Jun 26, 2022

ThomasWaldmann commented Jun 26, 2022

Slind14 commented Jun 26, 2022 • edited

ThomasWaldmann commented Jun 26, 2022

ThomasWaldmann commented Jun 26, 2022

Slind14 commented Jun 26, 2022

Slind14 commented Jun 26, 2022 • edited

ThomasWaldmann commented Jun 26, 2022

ThomasWaldmann commented Jun 26, 2022

Slind14 commented Jun 26, 2022

Slind14 commented Jun 26, 2022

ThomasWaldmann commented Jun 26, 2022

Slind14 commented Jun 26, 2022

Slind14 commented Jun 26, 2022

RonnyPfannschmidt commented Jun 27, 2022

Slind14 commented Jun 27, 2022

Slind14 commented Jun 26, 2022 •

edited

Slind14 commented Jun 26, 2022 •

edited