New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document pull-like operation #900

Open
ThomasWaldmann opened this Issue Apr 13, 2016 · 42 comments

Comments

Projects
None yet
@ThomasWaldmann
Member

ThomasWaldmann commented Apr 13, 2016

this is a FAQ (by people who have firewalls or want it for other reasons) and some people are evaluating setups with ssh -R (see some posts in #36).

this issue is to collect such setups and if evaluated successfully, add it to the documentation.

note: the debian/ubuntu package description says borg only supports push, maybe that can be removed after this ticket is closed.

so, if you successfully run a pull-like setup, the best thing you can do is to make a pull request that closes this ticket.


💰 there is a bounty for this

Note: to collect the bounty you need to run a reliable pull-like setup, do a pull request for our documentation, documenting the pull-related parts of the setup.

@ThomasWaldmann

This comment has been minimized.

Show comment
Hide comment
@ThomasWaldmann

ThomasWaldmann May 20, 2016

Member

A pull setup that does not involve ssh is to just mount the source filesystems on the machine that runs borg.

Member

ThomasWaldmann commented May 20, 2016

A pull setup that does not involve ssh is to just mount the source filesystems on the machine that runs borg.

@textshell

This comment has been minimized.

Show comment
Hide comment
@textshell

textshell Sep 9, 2016

Member

For the usecase where the normal push way is problematic because of firewalls etc.

From axion on irc (slightly edited and simplified, so all errors are likely mine):

repo=ssh://${USER}@localhost:${PULL_PORT}${REPO_PATH}/${host}
ssh -R ${PULL_PORT}:localhost:22 ${host}         \
  BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK=yes \
  borg create ${repo}::${archive} /some/path

This way tunnels a ssh connection through an ssh connections so it does have some additional overhead.

Another way would be to use BORG_RSH and a pair of socat instances to avoid one layer of ssh encryption.

Member

textshell commented Sep 9, 2016

For the usecase where the normal push way is problematic because of firewalls etc.

From axion on irc (slightly edited and simplified, so all errors are likely mine):

repo=ssh://${USER}@localhost:${PULL_PORT}${REPO_PATH}/${host}
ssh -R ${PULL_PORT}:localhost:22 ${host}         \
  BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK=yes \
  borg create ${repo}::${archive} /some/path

This way tunnels a ssh connection through an ssh connections so it does have some additional overhead.

Another way would be to use BORG_RSH and a pair of socat instances to avoid one layer of ssh encryption.

@ThomasWaldmann

This comment has been minimized.

Show comment
Hide comment
@ThomasWaldmann

ThomasWaldmann Sep 9, 2016

Member

@textshell I don't think that BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK=yes should be in there permanently, right?

Also, the repo=...{REPO_PATH}/${host} can be done that way, but is unrelated to pull-mode. Also, REPO_PATH there is rather the path that has the repos as subdirs, not the path of the repository itself.

Member

ThomasWaldmann commented Sep 9, 2016

@textshell I don't think that BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK=yes should be in there permanently, right?

Also, the repo=...{REPO_PATH}/${host} can be done that way, but is unrelated to pull-mode. Also, REPO_PATH there is rather the path that has the repos as subdirs, not the path of the repository itself.

@textshell

This comment has been minimized.

Show comment
Hide comment
@textshell

textshell Sep 10, 2016

Member

@UNKNOWN_UNENCRYPTED it depends if PULL_PORT is always the same. If that can be arranged that yes, it should not be in there.

@repo yes that can be simplified. I only did limited editing from axion’s pastebin, but i didn’t want this to get lost again.

I think the BORG_RSH + socat way would be nicer anyway (no ssh in ssh, not dependency on sshd running on the backups server, etc), but a little more complex bash script.

Member

textshell commented Sep 10, 2016

@UNKNOWN_UNENCRYPTED it depends if PULL_PORT is always the same. If that can be arranged that yes, it should not be in there.

@repo yes that can be simplified. I only did limited editing from axion’s pastebin, but i didn’t want this to get lost again.

I think the BORG_RSH + socat way would be nicer anyway (no ssh in ssh, not dependency on sshd running on the backups server, etc), but a little more complex bash script.

@sudoman

This comment has been minimized.

Show comment
Hide comment
@sudoman

sudoman Oct 29, 2016

Contributor

While having documentation for this workaround is great, wouldn't it be better to add this functionality to borg itself? This kind of syntax would be awesome:

$ borg create /path/to/repo::example.com-now user@example.com:/
Contributor

sudoman commented Oct 29, 2016

While having documentation for this workaround is great, wouldn't it be better to add this functionality to borg itself? This kind of syntax would be awesome:

$ borg create /path/to/repo::example.com-now user@example.com:/
@cwebber

This comment has been minimized.

Show comment
Hide comment
@cwebber

cwebber Nov 19, 2016

I agree with @sudoman; it would be useful to know, are there architectural reasons this would be difficult? It feels like this would dramatically increase the number of scenarios for which Borg is a recommendable solution.

cwebber commented Nov 19, 2016

I agree with @sudoman; it would be useful to know, are there architectural reasons this would be difficult? It feels like this would dramatically increase the number of scenarios for which Borg is a recommendable solution.

@textshell

This comment has been minimized.

Show comment
Hide comment
@textshell

textshell Nov 19, 2016

Member

We first need to agree on a plan. For example i don‘t like overloading the directory argument on borg create with additional magic. I personally tend to a new sub command.

(Note: server is where the repo is, client is the remote system where the to be backuped data is)

Also as this is not the main use case for borg i think the design should minimize the changes needed. Thus i think the stdin, stdout and stderr of the ssh session should be used for the ui of the borg client on the remote system not for data transfer so that all interaction still works as expected. Thus the repository communication would need to be tunneled with an additional unix socket forward. I’m not sure what to do about borg serve’s stderr. Maybe it‘s ok to just (implicitly) splice that in on the server side.

One way to implement borg pull would be to create unix socket and listen to it, ssh to the to be backuped system to run borg with special options to create that tell it the needed unix socket to connect. The server would then wait for an connection on the unix socket, dup2 these to stdin, stdout and invoke the RepositoryServer.

Still open is how key management is even supposed to work in this scenario. Maybe mandate a keyfile on the client in the default location?
We also need to ensure that there is a good way to secure this with the usual forced command stuff.

This would need minimal changes to the main borg code:

  • RemoteRepository would need to get an option to connect to a unix domain socket via a new option in create.
  • A new pull command needs to be implemented that does the initial setup and then chains into borg serve.

Of course this still requires a borg executable on the client.

So it doesn‘t need any architectural changes but is a lot fiddling with external ssh process interaction and the os module. So in the end i think it‘s a task that is doable for anyone with sufficient motivation and decent python skills.

Member

textshell commented Nov 19, 2016

We first need to agree on a plan. For example i don‘t like overloading the directory argument on borg create with additional magic. I personally tend to a new sub command.

(Note: server is where the repo is, client is the remote system where the to be backuped data is)

Also as this is not the main use case for borg i think the design should minimize the changes needed. Thus i think the stdin, stdout and stderr of the ssh session should be used for the ui of the borg client on the remote system not for data transfer so that all interaction still works as expected. Thus the repository communication would need to be tunneled with an additional unix socket forward. I’m not sure what to do about borg serve’s stderr. Maybe it‘s ok to just (implicitly) splice that in on the server side.

One way to implement borg pull would be to create unix socket and listen to it, ssh to the to be backuped system to run borg with special options to create that tell it the needed unix socket to connect. The server would then wait for an connection on the unix socket, dup2 these to stdin, stdout and invoke the RepositoryServer.

Still open is how key management is even supposed to work in this scenario. Maybe mandate a keyfile on the client in the default location?
We also need to ensure that there is a good way to secure this with the usual forced command stuff.

This would need minimal changes to the main borg code:

  • RemoteRepository would need to get an option to connect to a unix domain socket via a new option in create.
  • A new pull command needs to be implemented that does the initial setup and then chains into borg serve.

Of course this still requires a borg executable on the client.

So it doesn‘t need any architectural changes but is a lot fiddling with external ssh process interaction and the os module. So in the end i think it‘s a task that is doable for anyone with sufficient motivation and decent python skills.

@enkore

This comment has been minimized.

Show comment
Hide comment
@enkore

enkore Nov 19, 2016

Contributor

It's pretty much what I do over at borgcube. I don't mean that as advertisement (wouldn't make any sense for a project that doesn't really work yet, does it?), rather, if someone wants to implement it in their system they can draw inspiration from there - the basics (pulling an archive from a client) work rather well.

You can also gauge how many changes it would likely need to do this; if it is even more tightly integrated into Borg itself it would probably mean much more changes than those presented in borgcube.

Which is the reason why I choose to put that into a separate project; however if someone wants to work toward integrating it into Borg itself I won't interfere of course, since I'm obviously inherently biased here.

Contributor

enkore commented Nov 19, 2016

It's pretty much what I do over at borgcube. I don't mean that as advertisement (wouldn't make any sense for a project that doesn't really work yet, does it?), rather, if someone wants to implement it in their system they can draw inspiration from there - the basics (pulling an archive from a client) work rather well.

You can also gauge how many changes it would likely need to do this; if it is even more tightly integrated into Borg itself it would probably mean much more changes than those presented in borgcube.

Which is the reason why I choose to put that into a separate project; however if someone wants to work toward integrating it into Borg itself I won't interfere of course, since I'm obviously inherently biased here.

@cwebber

This comment has been minimized.

Show comment
Hide comment
@cwebber

cwebber Nov 20, 2016

I'm reading the ~solution posted by @textshell again, and I'm realizing... this has reintroduces to the pull model one of the issues (well, issue depending on your setup) that I was hoping to avoid from the push model.
,
Consider a scenario where I have a backup machine running on say, my local LAN. I have a lot of backups on it. I don't want the machines I'm backing up from some remote VM hosting server to have access to this machine... the trust is in the backup machine accessing the other machines, not the reverse.

In the scenario being described, it sounds like both machines will have to have access to each other.

cwebber commented Nov 20, 2016

I'm reading the ~solution posted by @textshell again, and I'm realizing... this has reintroduces to the pull model one of the issues (well, issue depending on your setup) that I was hoping to avoid from the push model.
,
Consider a scenario where I have a backup machine running on say, my local LAN. I have a lot of backups on it. I don't want the machines I'm backing up from some remote VM hosting server to have access to this machine... the trust is in the backup machine accessing the other machines, not the reverse.

In the scenario being described, it sounds like both machines will have to have access to each other.

@textshell

This comment has been minimized.

Show comment
Hide comment
@textshell

textshell Nov 20, 2016

Member

@cwebber The client only has access to the borg repository not the whole backup server in the scenario i posted (but that is comparable to the access it would have in push mode with a correctly setup forced command and assuming sshd is not buggy). At least if RepositoryServer is started with restrictions to only allow access to the right borg repository and only via the socket that borg pull creates. (So no direct network or ssh access is needed)
I think doing the chunking and deduplication (and encryption) locally on the client is one of the core parts of borg. On the other hand it would be possible to have a pull script the creates an sshfs tunnel and does those on the server side. But i don‘t think that really needs support in borg, that‘s just a easy script to write, but looses quite some of borgs performance.

Member

textshell commented Nov 20, 2016

@cwebber The client only has access to the borg repository not the whole backup server in the scenario i posted (but that is comparable to the access it would have in push mode with a correctly setup forced command and assuming sshd is not buggy). At least if RepositoryServer is started with restrictions to only allow access to the right borg repository and only via the socket that borg pull creates. (So no direct network or ssh access is needed)
I think doing the chunking and deduplication (and encryption) locally on the client is one of the core parts of borg. On the other hand it would be possible to have a pull script the creates an sshfs tunnel and does those on the server side. But i don‘t think that really needs support in borg, that‘s just a easy script to write, but looses quite some of borgs performance.

@horazont

This comment has been minimized.

Show comment
Hide comment
@horazont

horazont Dec 1, 2016

FWIW, I have made a small hack which works with socat, thus saving the SSH-in-SSH overhead and obliterating the need for the remote machine to have an account on the local machine. Using --append-only and --restrict-to-path, this should be as safe as Borg is, but I’d like any feedback on that.

First, we create socat-wrap.sh, which we will use as BORG_RSH:

#!/bin/bash
exec socat STDIO TCP-CONNECT:localhost:12345

Locally, we run socat to offer the borg service:

socat TCP-LISTEN:12345,fork \
    "EXEC:borg serve --append-only --restrict-to-path $PATH_TO_REPOSITORIES --umask 077"

(omit the ,fork if you want to allow only exactly one borg command to be run)

Now we invoke borg on the remote using ssh, forwarding the port:

ssh -R 12345:localhost:12345 sourcehost \
    BORG_RSH="/home/horazont/socat-wrap.sh" \
    borg init -e none ssh://foo/$PATH_TO_REPOSITORIES/some_repository

foo is completely arbitrary; one could substitute anything here, because the socat-wrap.sh ignores its arguments.


Of course, it’s also possible to do the same with UNIX sockets, providing more isolation.

socat-wrap.sh:

#!/bin/bash
exec socat STDIO UNIX-CONNECT:/home/horazont/borg-remote.sock
socat UNIX-LISTEN:/home/horazont/borg-local.sock,fork \
    "EXEC:borg serve --append-only --restrict-to-path $PATH_TO_REPOSITORIES --umask 077"
ssh -R /home/horazont/borg-local.sock:/home/horazont/borg-remote.sock sourcehost \
    BORG_RSH="/home/horazont/socat-wrap.sh" \
    borg init -e none ssh://foo/$PATH_TO_REPOSITORIES/some_repository

ssh is friendly enough to automatically set very strict permissions on the socket on the remote side.

horazont commented Dec 1, 2016

FWIW, I have made a small hack which works with socat, thus saving the SSH-in-SSH overhead and obliterating the need for the remote machine to have an account on the local machine. Using --append-only and --restrict-to-path, this should be as safe as Borg is, but I’d like any feedback on that.

First, we create socat-wrap.sh, which we will use as BORG_RSH:

#!/bin/bash
exec socat STDIO TCP-CONNECT:localhost:12345

Locally, we run socat to offer the borg service:

socat TCP-LISTEN:12345,fork \
    "EXEC:borg serve --append-only --restrict-to-path $PATH_TO_REPOSITORIES --umask 077"

(omit the ,fork if you want to allow only exactly one borg command to be run)

Now we invoke borg on the remote using ssh, forwarding the port:

ssh -R 12345:localhost:12345 sourcehost \
    BORG_RSH="/home/horazont/socat-wrap.sh" \
    borg init -e none ssh://foo/$PATH_TO_REPOSITORIES/some_repository

foo is completely arbitrary; one could substitute anything here, because the socat-wrap.sh ignores its arguments.


Of course, it’s also possible to do the same with UNIX sockets, providing more isolation.

socat-wrap.sh:

#!/bin/bash
exec socat STDIO UNIX-CONNECT:/home/horazont/borg-remote.sock
socat UNIX-LISTEN:/home/horazont/borg-local.sock,fork \
    "EXEC:borg serve --append-only --restrict-to-path $PATH_TO_REPOSITORIES --umask 077"
ssh -R /home/horazont/borg-local.sock:/home/horazont/borg-remote.sock sourcehost \
    BORG_RSH="/home/horazont/socat-wrap.sh" \
    borg init -e none ssh://foo/$PATH_TO_REPOSITORIES/some_repository

ssh is friendly enough to automatically set very strict permissions on the socket on the remote side.

@ThomasWaldmann

This comment has been minimized.

Show comment
Hide comment
@ThomasWaldmann

ThomasWaldmann Dec 1, 2016

Member

@horazont looks good. did you compare performance ssh vs. socat?

Is the socat-wrap.sh needed or could the socat command be used directly in BORG_RSH?

Member

ThomasWaldmann commented Dec 1, 2016

@horazont looks good. did you compare performance ssh vs. socat?

Is the socat-wrap.sh needed or could the socat command be used directly in BORG_RSH?

@horazont

This comment has been minimized.

Show comment
Hide comment
@horazont

horazont Dec 1, 2016

@ThomasWaldmann socat doesn’t like the additional arguments borg is attempting to add. Not sure how to circumvent that.

re performance, I haven’t checked. My main motivation for finding this solution was that I didn’t want to setup an account for the remote to SSH into (even though it should be pretty safe authorized_keys command restrictions). The appeal is that it works out-of-the-box, no configuration on either side needed (the socat-wrapper.sh can be scp’d on demand).

horazont commented Dec 1, 2016

@ThomasWaldmann socat doesn’t like the additional arguments borg is attempting to add. Not sure how to circumvent that.

re performance, I haven’t checked. My main motivation for finding this solution was that I didn’t want to setup an account for the remote to SSH into (even though it should be pretty safe authorized_keys command restrictions). The appeal is that it works out-of-the-box, no configuration on either side needed (the socat-wrapper.sh can be scp’d on demand).

@ThomasWaldmann

This comment has been minimized.

Show comment
Hide comment
@ThomasWaldmann

ThomasWaldmann Dec 1, 2016

Member

ah, of course. yeah, then such a script is easiest way.

if you have that setup working ok, could you add a section to our docs about it and do a PR against 1.0-maint?

Member

ThomasWaldmann commented Dec 1, 2016

ah, of course. yeah, then such a script is easiest way.

if you have that setup working ok, could you add a section to our docs about it and do a PR against 1.0-maint?

@fake666

This comment has been minimized.

Show comment
Hide comment
@fake666

fake666 Jan 14, 2017

i set up the socat-based solution from @horazont mentioned above, running nightly backups from various locations. i noticed that with larger backup targets, after a couple of days, i reproducably get this error:

Traceback (most recent call last):
 File "/opt/lib/python3.5/site-packages/borg/repository.py", line 72, in __del__
   self.close()
 File "/opt/lib/python3.5/site-packages/borg/repository.py", line 192, in close
   self.lock.release()
 File "/opt/lib/python3.5/site-packages/borg/locking.py", line 298, in release
   self._roster.modify(EXCLUSIVE, REMOVE)
 File "/opt/lib/python3.5/site-packages/borg/locking.py", line 216, in modify
   elements.remove(self.id)
KeyError: (('storage', 31273, 0),)
$LOG ERROR Remote: Received SIGTERM.

after this happens once, the lockfile not having been deleted properly prohibits further backups...

i guess it has to do with one of the connections being closed prematurely?

edit: this error is reported on the server that "pulls" the backup from the client (i can only tell by of the /opt/lib/... location - this setup is pretty confusing to debug).

fake666 commented Jan 14, 2017

i set up the socat-based solution from @horazont mentioned above, running nightly backups from various locations. i noticed that with larger backup targets, after a couple of days, i reproducably get this error:

Traceback (most recent call last):
 File "/opt/lib/python3.5/site-packages/borg/repository.py", line 72, in __del__
   self.close()
 File "/opt/lib/python3.5/site-packages/borg/repository.py", line 192, in close
   self.lock.release()
 File "/opt/lib/python3.5/site-packages/borg/locking.py", line 298, in release
   self._roster.modify(EXCLUSIVE, REMOVE)
 File "/opt/lib/python3.5/site-packages/borg/locking.py", line 216, in modify
   elements.remove(self.id)
KeyError: (('storage', 31273, 0),)
$LOG ERROR Remote: Received SIGTERM.

after this happens once, the lockfile not having been deleted properly prohibits further backups...

i guess it has to do with one of the connections being closed prematurely?

edit: this error is reported on the server that "pulls" the backup from the client (i can only tell by of the /opt/lib/... location - this setup is pretty confusing to debug).

@enkore

This comment has been minimized.

Show comment
Hide comment
@enkore

enkore Jan 14, 2017

Contributor

Maybe socat times out?

-T<timeout>
    Total inactivity timeout: when socat is already in the transfer loop and nothing has happened for <timeout> [timeval] seconds (no data arrived, no interrupt occurred...) then it terminates. Useful with protocols like UDP that cannot transfer EOF. 

Not sure if that's on by default.

Contributor

enkore commented Jan 14, 2017

Maybe socat times out?

-T<timeout>
    Total inactivity timeout: when socat is already in the transfer loop and nothing has happened for <timeout> [timeval] seconds (no data arrived, no interrupt occurred...) then it terminates. Useful with protocols like UDP that cannot transfer EOF. 

Not sure if that's on by default.

@fake666

This comment has been minimized.

Show comment
Hide comment
@fake666

fake666 Jan 14, 2017

hm, i just realized i used kill $SOCAT_PID after the ssh command finished (i'm doing borg prune right after the backup finishes) - i replaced that with wait $SOCAT_PID now, i guess that should fix it...

thanks for the timeout hint, i now enabled socat logging with -lf and -d -d. if it happens again, we'll know for sure if there was a timeout!

fake666 commented Jan 14, 2017

hm, i just realized i used kill $SOCAT_PID after the ssh command finished (i'm doing borg prune right after the backup finishes) - i replaced that with wait $SOCAT_PID now, i guess that should fix it...

thanks for the timeout hint, i now enabled socat logging with -lf and -d -d. if it happens again, we'll know for sure if there was a timeout!

@thatguystone

This comment has been minimized.

Show comment
Hide comment
@thatguystone

thatguystone Feb 8, 2017

@horazont @ThomasWaldmann

You can invoke bash directly, instead of socat-wrap.sh:

ssh -R 12345:localhost:12345 sourcehost \
    BORG_RSH="'bash -c \"exec socat STDIO TCP-CONNECT:localhost:12345\"'"
    borg init -e none ssh://foo/$PATH_TO_REPOSITORIES/some_repository

thatguystone commented Feb 8, 2017

@horazont @ThomasWaldmann

You can invoke bash directly, instead of socat-wrap.sh:

ssh -R 12345:localhost:12345 sourcehost \
    BORG_RSH="'bash -c \"exec socat STDIO TCP-CONNECT:localhost:12345\"'"
    borg init -e none ssh://foo/$PATH_TO_REPOSITORIES/some_repository
@fake666

This comment has been minimized.

Show comment
Hide comment
@fake666

fake666 Mar 23, 2017

i am still having above symptoms, i am now trying with the unix domain socket version @horazont added above. i did notice that - as far as i can tell - the order of the sockets is reversed in the "-R" ssh option - right? also, ssh leaves the socket file on the backup source host (why?), so i had to add an ssh command before the borg call w/ the forward to just remove the liggering socket file.. i'll report in a month or so if that fixes my issue ;)

fake666 commented Mar 23, 2017

i am still having above symptoms, i am now trying with the unix domain socket version @horazont added above. i did notice that - as far as i can tell - the order of the sockets is reversed in the "-R" ssh option - right? also, ssh leaves the socket file on the backup source host (why?), so i had to add an ssh command before the borg call w/ the forward to just remove the liggering socket file.. i'll report in a month or so if that fixes my issue ;)

@kosli

This comment has been minimized.

Show comment
Hide comment
@kosli

kosli Jul 23, 2017

@horazont: great solutions, thanks! works so far.
have you found a good way to run your solution with encrypted backups? am i right that the remote host would need to know the passphrase?

kosli commented Jul 23, 2017

@horazont: great solutions, thanks! works so far.
have you found a good way to run your solution with encrypted backups? am i right that the remote host would need to know the passphrase?

@kosli

This comment has been minimized.

Show comment
Hide comment
@kosli

kosli Oct 22, 2017

When using the remote socket I always had the problem that the remote socket file doesn't get removed automatically, and the next connection cannot create a new socket for forwarding. The behaviour can be changed in the sshd_config of the server by setting the StreamLocalBindUnlink option to yes: https://man.openbsd.org/sshd_config#StreamLocalBindUnlink

kosli commented Oct 22, 2017

When using the remote socket I always had the problem that the remote socket file doesn't get removed automatically, and the next connection cannot create a new socket for forwarding. The behaviour can be changed in the sshd_config of the server by setting the StreamLocalBindUnlink option to yes: https://man.openbsd.org/sshd_config#StreamLocalBindUnlink

@zyro23

This comment has been minimized.

Show comment
Hide comment
@zyro23

zyro23 Feb 5, 2018

is this issue still just to document "workarounds"?

would love to see this as a feature, perhaps like @sudoman suggested above (#900 (comment)) - or similar.

i am currently testing borg as a replacement candidate for rdiff-backup.

but just had to give up trying to do a root partition backup of a small (~1.5GB) linux server installation over sshfs. speed of sshfs using a remote "high" latency connection (25ms, 100Mbit) is just too slow. transfer rate went down to a few kB/s for directories with lots of small files (no cpu or disk-io bottlenecks) 😞

zyro23 commented Feb 5, 2018

is this issue still just to document "workarounds"?

would love to see this as a feature, perhaps like @sudoman suggested above (#900 (comment)) - or similar.

i am currently testing borg as a replacement candidate for rdiff-backup.

but just had to give up trying to do a root partition backup of a small (~1.5GB) linux server installation over sshfs. speed of sshfs using a remote "high" latency connection (25ms, 100Mbit) is just too slow. transfer rate went down to a few kB/s for directories with lots of small files (no cpu or disk-io bottlenecks) 😞

@ThomasWaldmann

This comment has been minimized.

Show comment
Hide comment
@ThomasWaldmann

ThomasWaldmann Feb 6, 2018

Member

@zyro23 maybe one can't expect good performance for lots of little files on a network filesystem. but if you have reasons to believe that sshfs is unusually slow for this and should be faster, file a bug at sshfs project.

the best and fastest way to run borg is client/server, when borg on the client reads source files locally and then talks via ssh using borg's rpc protocol to a borg process on the backup server, which manages the repo.

Member

ThomasWaldmann commented Feb 6, 2018

@zyro23 maybe one can't expect good performance for lots of little files on a network filesystem. but if you have reasons to believe that sshfs is unusually slow for this and should be faster, file a bug at sshfs project.

the best and fastest way to run borg is client/server, when borg on the client reads source files locally and then talks via ssh using borg's rpc protocol to a borg process on the backup server, which manages the repo.

@zyro23

This comment has been minimized.

Show comment
Hide comment
@zyro23

zyro23 Feb 6, 2018

yes, thats my current understanding. i commented here because we are doing pull-style backups of "external" hosts to a machine within an internal network (firewalled to allow only outbound connections). thanks for your feedback!

zyro23 commented Feb 6, 2018

yes, thats my current understanding. i commented here because we are doing pull-style backups of "external" hosts to a machine within an internal network (firewalled to allow only outbound connections). thanks for your feedback!

@Alex131089

This comment has been minimized.

Show comment
Hide comment
@Alex131089

Alex131089 Mar 6, 2018

Hello,
based on @horazont's scripts, I wrote https://github.com/Alex131089/bbbs.
While I'm really enthusiastic about @marcpope's BBS announced in #2960 (comment) (which seems to use this mode), I needed a solution before my server eventually crashes, so I wrote this.
If it can be useful to someone else.

Alex131089 commented Mar 6, 2018

Hello,
based on @horazont's scripts, I wrote https://github.com/Alex131089/bbbs.
While I'm really enthusiastic about @marcpope's BBS announced in #2960 (comment) (which seems to use this mode), I needed a solution before my server eventually crashes, so I wrote this.
If it can be useful to someone else.

@marcpope

This comment has been minimized.

Show comment
Hide comment
@marcpope

marcpope Mar 6, 2018

marcpope commented Mar 6, 2018

@anarcat

This comment has been minimized.

Show comment
Hide comment
@anarcat

anarcat May 29, 2018

Contributor

what's the latest and greatest here? I see @marcpope bbbs wrapper and @enkore's borgcube (although that seems to do much more than just pull...) both seem to be based on the same socat hack... how brittle is that setup?

Contributor

anarcat commented May 29, 2018

what's the latest and greatest here? I see @marcpope bbbs wrapper and @enkore's borgcube (although that seems to do much more than just pull...) both seem to be based on the same socat hack... how brittle is that setup?

@marcpope

This comment has been minimized.

Show comment
Hide comment
@marcpope

marcpope May 29, 2018

marcpope commented May 29, 2018

@enkore

This comment has been minimized.

Show comment
Hide comment
@enkore

enkore May 29, 2018

Contributor

Actually borgcube's approach was to implement the Borg wire protocol in a reverse proxy of sorts; the proxy transparently re-encrypted data uploaded (so the client only got the ID key) and transparently created a fake manifest containing no archives (which worked because the server pushed a Borg cache matching that state exactly to the client) and meticulously verified that the client was only doing what it was supposed to be doing (creating an archive, possibly creating checkpoint archives and deleting those).

That's how borgcube managed to tick all these boxes:

  • Lightweight, untrusted clients:
    • They can't manipulate the repository
    • They don't know where the repository is
    • They don't get encryption keys
    • They can't read backups (of other clients and also themselves)
    • They don't need to maintain a Borg cache
    • They don't have to encrypt (if any of the BLAKE2 key types are used)
Contributor

enkore commented May 29, 2018

Actually borgcube's approach was to implement the Borg wire protocol in a reverse proxy of sorts; the proxy transparently re-encrypted data uploaded (so the client only got the ID key) and transparently created a fake manifest containing no archives (which worked because the server pushed a Borg cache matching that state exactly to the client) and meticulously verified that the client was only doing what it was supposed to be doing (creating an archive, possibly creating checkpoint archives and deleting those).

That's how borgcube managed to tick all these boxes:

  • Lightweight, untrusted clients:
    • They can't manipulate the repository
    • They don't know where the repository is
    • They don't get encryption keys
    • They can't read backups (of other clients and also themselves)
    • They don't need to maintain a Borg cache
    • They don't have to encrypt (if any of the BLAKE2 key types are used)
@anarcat

This comment has been minimized.

Show comment
Hide comment
@anarcat

anarcat May 29, 2018

Contributor
Contributor

anarcat commented May 29, 2018

@enkore

This comment has been minimized.

Show comment
Hide comment
@enkore

enkore May 29, 2018

Contributor

You speak in the past: is the project still on? :)

No. I just checked and it tells me I didn't even upload everything I did back then (timetrace, secrets, collective-service, opub, newcli, security, tls, schedxhr, modjob, ...); I don't remember which of these I actually meant to merge going forward or were just probes to test the territory.

Interesting approach, yes, sound idea, also yes, but futile if you want to build it based on the Borg package, because the Python API is incredibly unstable. Instead the relevant structures would need to be isolated / rewritten and packaged in a stable way, however, there is the very real risk then that you get breakage whenever Borg adds a new feature and bolts on another workaround/hack relying on minute implementation details of Borg versions past and present (as I did myself many times: while Borg has version fields mostly everywhere, there is usually no real provision for extending without breaking; you'll notice this every- and anywhere you look in Borg).

Borg itself already doesn't manage to take full advantage of a gigabit connection, less so with a proxy (written in Python) in-between: The proxy does handles effectively the same IO load as the repository itself, but has to replicate approximately the same work as borg check (checking all MACs, possibly re-compressing, checking that the structure is sane etc.) at the same time. It's possible to parallelize some of this, especially the heavy-lifting, even in Python — but MP in Python is just one big, unnecessary nuisance.

If I were motivated to start a new revision of this (which I'm not), I would likely start building a better foundation first.

Contributor

enkore commented May 29, 2018

You speak in the past: is the project still on? :)

No. I just checked and it tells me I didn't even upload everything I did back then (timetrace, secrets, collective-service, opub, newcli, security, tls, schedxhr, modjob, ...); I don't remember which of these I actually meant to merge going forward or were just probes to test the territory.

Interesting approach, yes, sound idea, also yes, but futile if you want to build it based on the Borg package, because the Python API is incredibly unstable. Instead the relevant structures would need to be isolated / rewritten and packaged in a stable way, however, there is the very real risk then that you get breakage whenever Borg adds a new feature and bolts on another workaround/hack relying on minute implementation details of Borg versions past and present (as I did myself many times: while Borg has version fields mostly everywhere, there is usually no real provision for extending without breaking; you'll notice this every- and anywhere you look in Borg).

Borg itself already doesn't manage to take full advantage of a gigabit connection, less so with a proxy (written in Python) in-between: The proxy does handles effectively the same IO load as the repository itself, but has to replicate approximately the same work as borg check (checking all MACs, possibly re-compressing, checking that the structure is sane etc.) at the same time. It's possible to parallelize some of this, especially the heavy-lifting, even in Python — but MP in Python is just one big, unnecessary nuisance.

If I were motivated to start a new revision of this (which I'm not), I would likely start building a better foundation first.

@anarcat

This comment has been minimized.

Show comment
Hide comment
@anarcat

anarcat May 29, 2018

Contributor

you mean the Borg Python API here? that's unfortunate, to say the least...

what would you suggest a best foundation would be for the pull model? socat seems really like a ugly hack...

Contributor

anarcat commented May 29, 2018

you mean the Borg Python API here? that's unfortunate, to say the least...

what would you suggest a best foundation would be for the pull model? socat seems really like a ugly hack...

@enkore

This comment has been minimized.

Show comment
Hide comment
@enkore

enkore May 29, 2018

Contributor

Tearing the core data structures and their tests out of Borg and stuffing them in a new package, specifically meant and maintained to provide a sane and stable API. But as I mentioned above, this is easy to do on a "take things and put them someplace else" perspective, but due to Borg development it would be difficult / dangerous to stray too far from the actual, literal implementation of Borg.

I see @marcpope bbbs

The link goes to a different project (b b b s) - the one by marcpope was bbs (borg backup server), but as far as I can see no public repository exists any more.

Contributor

enkore commented May 29, 2018

Tearing the core data structures and their tests out of Borg and stuffing them in a new package, specifically meant and maintained to provide a sane and stable API. But as I mentioned above, this is easy to do on a "take things and put them someplace else" perspective, but due to Borg development it would be difficult / dangerous to stray too far from the actual, literal implementation of Borg.

I see @marcpope bbbs

The link goes to a different project (b b b s) - the one by marcpope was bbs (borg backup server), but as far as I can see no public repository exists any more.

@anarcat

This comment has been minimized.

Show comment
Hide comment
@anarcat

anarcat May 29, 2018

Contributor

ah yes, that was not @marcpope but @Alex131089.

Contributor

anarcat commented May 29, 2018

ah yes, that was not @marcpope but @Alex131089.

@marcpope

This comment has been minimized.

Show comment
Hide comment
@marcpope

marcpope May 29, 2018

marcpope commented May 29, 2018

@ET-Bent

This comment has been minimized.

Show comment
Hide comment
@ET-Bent

ET-Bent Jul 23, 2018

@marcpope looks promising. Will your software be OpenSource in the end?

ET-Bent commented Jul 23, 2018

@marcpope looks promising. Will your software be OpenSource in the end?

@ThomasWaldmann

This comment has been minimized.

Show comment
Hide comment
@ThomasWaldmann

ThomasWaldmann Jul 23, 2018

Member

@marcpope please create a new issue for your stuff now and move all interesting content there.

Member

ThomasWaldmann commented Jul 23, 2018

@marcpope please create a new issue for your stuff now and move all interesting content there.

@m-osmani

This comment has been minimized.

Show comment
Hide comment
@m-osmani

m-osmani Sep 25, 2018

hello everybody,

i tried to solve the pull mode like this: https://github.com/m-osmani/borgbackup-pull.

Maybe this could be a solution for somebody.

regards

m-osmani commented Sep 25, 2018

hello everybody,

i tried to solve the pull mode like this: https://github.com/m-osmani/borgbackup-pull.

Maybe this could be a solution for somebody.

regards

@ThomasWaldmann

This comment has been minimized.

Show comment
Hide comment
@ThomasWaldmann

ThomasWaldmann Sep 25, 2018

Member

@m-osmani if you like, you could write some docs for pull-like operations. Then you would have some docs, documenting what you do with these scripts and we (borg) would have some docs (and could close this ticket) for people not wanting to use ansible.

Member

ThomasWaldmann commented Sep 25, 2018

@m-osmani if you like, you could write some docs for pull-like operations. Then you would have some docs, documenting what you do with these scripts and we (borg) would have some docs (and could close this ticket) for people not wanting to use ansible.

@m-osmani

This comment has been minimized.

Show comment
Hide comment
@m-osmani

m-osmani Sep 25, 2018

no problem, my idea is to write a script called "borg-backup-ctl.sh" which will produce server side client scripts which will be triggered by cron. And of course the doc. This will be more ansible independent and would work in a standalone manner.

@ThomasWaldmann can you please give me a short answer in #4085

thx

regards

m-osmani commented Sep 25, 2018

no problem, my idea is to write a script called "borg-backup-ctl.sh" which will produce server side client scripts which will be triggered by cron. And of course the doc. This will be more ansible independent and would work in a standalone manner.

@ThomasWaldmann can you please give me a short answer in #4085

thx

regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment