Usage with cloud storage like Amazon S3 or Glacier #123

vote539 · 2017-01-09T09:32:18Z

I would like to set up a BTRFS filesystem with backups, and was happy to find this project. I would like to have the backups sent to a cloud storage solution, however, instead of a hard drive or SSH server. Most of these cloud storage solutions expose RESTful APIs, and you don't have control over the storage medium they use on their end.

Does btrbk support sending backups to an arbitrary REST interface?

digint · 2017-01-09T12:29:42Z

Does btrbk support sending backups to an arbitrary REST interface?

No, this is neither implemented nor planned. If you want to push "target raw" backups to your amazon s3 storage, you need to somehow mount it locally. You could use s3fs for this, which should do exactly that. So your setup could be something like this:

mount amazon s3 using s3fs to /mnt/mys3drive
configure target raw /mnt/mys3drive/btrbk_backups/... in btrbk.conf

If you get this working, please post a note here, so that I could add a section for this on the FAQ.

vote539 · 2017-01-17T13:51:36Z

Thanks for the reply! Here's what ended up working for me. AWS Block Storage is the same price range per gigabyte as S3, so I created a block storage device and formatted it as BTRFS. I connected the block storage device to a "nano" head node, whose only job is to run btrbk. This setup gives me 500 GB of backup storage for about US$20/mo.

Witiko · 2017-05-24T18:59:21Z

Amazon S3 is quite pricey if you are looking only for long-term archival. However, services such as Amazon Glacier don't seem to be easily mountable. It would be convenient if btrbk provided a target type for piping incremental backups into arbitrary commands. Think

volume /mnt/btr_pool
  subvolume home
    target pipe /usr/bin/glacier archive upload my_vault --name={} -

where {} would expand to the name of the file which is being passed on stdin and where the /usr/bin/glacier command originates from basak/glacier-cli. It seems trivial to just add

btrbk run && btrfs send -p `find snapshot_dir/ -mindepth 1 -maxdepth 1 | tail -2` |
  (insert a compression and encryption pipeline) |
  glacier archive upload my_vault --name=`ls snapshot_dir | tail -1`.btrfs -

to one's crontab and be done with it, but then you also need to keep a journal of unsuccessful uploads (due to the machine being offline for example), so that everything gets backed up eventually. This is not an unsurmountable task, but direct support for this kind of usage in btrbk would definitely be welcome.

digint · 2017-05-31T19:58:00Z

This is a nice idea, but it's incomplete: As btrbk is stateless, it always needs information of which subvolumes are already present on the target side. For target send-receive, this information is fetched by btrfs subvolume list; for target raw, the uuid's are encoded in the filename.

In order to complete this, we should define some data structure: timestamp, UUID, received-UUID, parent-UUID (similar to btrfs subvolume list), and then also have a user-defined command which would generate it. Then btrbk would parse this data and figure out which subvolumes needs to be sent to the target by the configured target_preserve policy, and which parents to pick for incremental send.

PS: sorry for the late reply, I'm really busy with other things at the moment...

Witiko · 2017-05-31T20:22:15Z

My original idea was that btrbk would be keeping tabs on the successful invocations to automatically infer which volumes need sending. If /usr/bin/glacier archive upload my_vault --name={} - from my example returned with a zero exit code, btrbk would put down {} to a list. Note that the user could specify where they want this list stored:

volume /mnt/btr_pool
  subvolume home
    target pipe /usr/bin/glacier archive upload my_vault --name={} -
    journal /var/lib/btrbk/glacier

Deleted subvolumes could be removed from the list, so that it does not grow ad infinitum.

digint · 2017-05-31T20:41:18Z

Yeah well, but then people start deleting files on the target by hand, and the mess with the journal starts...

I guess glacier also provides some sort of directory listing, so if btrbk would generate filenames the same way as it does for target raw, we could always fetch them and parse them the same way.

volume /mnt/btr_pool
  subvolume home
    target pipe /usr/bin/glacier archive upload my_vault --name={} -
      list_cmd /usr/bin/glacier <insert list command here> my_vault

Witiko · 2017-05-31T21:09:36Z

That would be /usr/bin/glacier archive list my_vault in this case. However, my idea was that the pipe target would be a fire-and-forget kind of a thing. If the user wants to start deleting data from the target, that is not our problem. Suppose I am just piping the data to a mail transfer agent over SMTP, or to a remote shell; I may well not be able to report on what is stored on “the other side”. I find this concept more flexible than what you propose.

P.S.: I guess target pipe is a little confusing name, as it implies that the target is a named pipe. Both target command and target pipeline resolve this ambiguity.

digint · 2017-06-01T10:22:17Z

However, my idea was that the pipe target would be a fire-and-forget kind of a thing

Yes I understand, and I see the benefit in this, but that's not how btrbk works. Maybe we could introduce a new sub-command for this kind of thing, something like btrbk oneshot, which would simply create a new snapshot and transfer it (always non-incremental) to the target. The main problem here would be to keep the config consistent and non-confusing. Maybe something like this:

volume /mnt/btr_pool
  subvolume home
    target pipe /usr/bin/glacier archive upload my_vault --name={} -
      target_type oneshot

Witiko · 2017-06-01T11:53:56Z

and transfer it (always non-incremental) to the target.

Note that keeping a journal would make it possible to transfer incremental backups even in this setting.

sbrudenell · 2022-05-01T19:54:20Z

s3fs

I've been trying to get this to work. There are a number of issues.

fuse is an operational burden, and docker doesn't help.
- fuse in a docker container requires --cap-add SYS_ADMIN --device /dev/fuse, even if it's not exposed outside the container: Allow FUSE functionality by default docker/for-linux#321
- exposing fuse across containers requires special host config (mount --make-shared)
- if a fuse app shuts down uncleanly, then its mountpoint becomes broken and requires a manual umount before it can be used again. Docker does not clean this up automatically.
- It's not clear fuse issues will be resolved, because it's an inherent design mismatch. Requiring admin access and special configuration to do network storage is a non-starter.
s3fs is not production-quality
- After weeks of testing, I haven't been able to use it to upload large files.
  - The latest release is broken:
    - multipart upload switches to 5MB part size s3fs-fuse/s3fs-fuse#1941
    - S3FS mount point, upload restart if file > 5GB s3fs-fuse/s3fs-fuse#1936
  - Older releases don't support -o enable_content_md5, which is required for backblaze b2, and possibly others
- s3fs' cache options do not play well with btrbk
  - s3fs has a metadata cache, but cat /s3/file; cat /s3/file will still issue two HeadObject requests. This is bad with btrbk as it reads all the *.info files on a raw target on every run.
  - s3fs will cache huge amounts of data to disk during file uploads, rather than streaming them
- It's not clear s3fs issues will be resolved. Its codebase is undocumented, has heavy copy-paste duplication, uses non-meaningful naming schemes, and interlaces high-level business logic with utility functions. A large portion of it is dedicated to complex ad-hoc manipulations of a userspace cache. The design of this cache is questionable, and I certainly can't get it to perform well. Its user documentation is incoherent.

It would be a huge win if btrbk could use S3 APIs directly. Dozens of cloud providers expose an S3 API now.

The S3 API is a large surface though. Minimal S3 support probably still requires multiple signature versions and autodetection of multipart uploads, and likely other stuff.

In the meantime, I suggest btrbk.conf should offer a set of command endpoints, something like:

target pipe
  pipe_target_list_files /usr/local/bin/list_files_from_s3.sh my_bucket
  pipe_target_read_file /usr/local/bin/read_file_from_s3.sh my_bucket
  pipe_target_write_file /usr/local/bin/write_file_to_s3.sh my_bucket

The expected interactions would then be just like target raw, such that the scripts would be used to read and write *.info files in the same patterns currently used.

lpyparmentier · 2023-06-06T15:26:12Z

Looking for a similar solution, just want to push an encrypted archive of a snapshot into a s3 long term storage such as https://www.ovhcloud.com/en-ca/public-cloud/cold-archive/. For 2$/month/TB its worth it ! I guess I can do it in another way, but directly integrated with btrbk is a must.

bojidar-bg · 2024-03-06T12:51:35Z

Hm.. instead of implementing the whole S3 API ourselves or jumping the gun with custom scripts, how about adding rclone support for uploading and managing files? It seems like it has all the necessary commands, e.g.:

rclone rcat -- can be used to pipe directly into storage.
rclone lsf -- can be used to list current archives in storage.
rclone cat -- can be used to pipe directly out of storage.

The only downside is that rclone has its own config format.. that might make it messier than just allowing custom scripts.

kubrickfr · 2024-03-28T16:25:58Z

Shameless plug: my simple solution to this problem, https://github.com/kubrickfr/btrfs-send-to-s3

digint added the question label Jan 9, 2017

sbrudenell mentioned this issue Jan 4, 2023

Feature request: allow raw backup to be managed by external script #512

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage with cloud storage like Amazon S3 or Glacier #123

Usage with cloud storage like Amazon S3 or Glacier #123

vote539 commented Jan 9, 2017

digint commented Jan 9, 2017

vote539 commented Jan 17, 2017

Witiko commented May 24, 2017 •

edited

Loading

digint commented May 31, 2017

Witiko commented May 31, 2017 •

edited

Loading

digint commented May 31, 2017

Witiko commented May 31, 2017 •

edited

Loading

digint commented Jun 1, 2017

Witiko commented Jun 1, 2017 •

edited

Loading

sbrudenell commented May 1, 2022

lpyparmentier commented Jun 6, 2023 •

edited

Loading

bojidar-bg commented Mar 6, 2024

kubrickfr commented Mar 28, 2024

Usage with cloud storage like Amazon S3 or Glacier #123

Usage with cloud storage like Amazon S3 or Glacier #123

Comments

vote539 commented Jan 9, 2017

digint commented Jan 9, 2017

vote539 commented Jan 17, 2017

Witiko commented May 24, 2017 • edited Loading

digint commented May 31, 2017

Witiko commented May 31, 2017 • edited Loading

digint commented May 31, 2017

Witiko commented May 31, 2017 • edited Loading

digint commented Jun 1, 2017

Witiko commented Jun 1, 2017 • edited Loading

sbrudenell commented May 1, 2022

lpyparmentier commented Jun 6, 2023 • edited Loading

bojidar-bg commented Mar 6, 2024

kubrickfr commented Mar 28, 2024

Witiko commented May 24, 2017 •

edited

Loading

Witiko commented May 31, 2017 •

edited

Loading

Witiko commented May 31, 2017 •

edited

Loading

Witiko commented Jun 1, 2017 •

edited

Loading

lpyparmentier commented Jun 6, 2023 •

edited

Loading