Like zfs-auto-snapshot or syncoid, except that it
- does a recursive snapshot (so all datasets get the same timestamp)
- does an incremental replication push (so no arguments about who expired what)
- does timestamp-based snapshot names (YYYY-MM-DD...), not rotation-based snapshot names (daily.N).
Limitations (permanent):
- because of
zfs send --replicate, native encryption must be done right at the start. You can't encrypt only on thezfs receivehost. - because of
zfs send --replicate, you cannot have different retention policies on the backup server. - because of
zfs snapshot -r(recursive), you cannot "opt out" of snapshots for a boring dataset (com.sun:auto-snapshot=false, e.g. zippy/zippy/var/tmp or zippy/zippy/var/cache). - strongly encourages you not to start your datasets at the root of your pool (i.e. it wants -o mountpoint=/ on "zippy/zippy" not on "zippy").
- it's "some rando's crappy script", whereas zfs-auto-snapshot is a first-party OpenZFS thing with more mindshare.
Limitations (to be fixed):
- Retention policy is hard-coded (days, weeks, months = 31, 12, 36)
- No pre/post commands (e.g. for mariadb quiescence)
- Can't
--action=pushwithout--action=snapshot, because it will try to push a snapshot that doesn't exist. (push.py should just send "whatever the latest snapshot is", I think, i.e. "zfs list -Hp -rd1 -Screation -tsnapshot -oname|head -1")
Run it by hand from the git clone:
$ python3 -m cyber_zfs_backup --help
Or make it into a deb package:
$ apt-get build-dep ./ $ debuild # apt install ../python3-cyber-zfs-backup_…_all.deb # cyber-zfs-backup --help
The .deb provides a systemd timer (cron job).
To change when the job runs:
# systemctl edit cyber-zfs-backup.timer [Timer] OnCalendar= OnCalendar=13:00
To change what the job runs:
# systemctl edit cyber-zfs-backup [Service] ExecStart= ExecStart=cyber-zfs-backup --dataset=morpheus/my-funny-dataset-name
If you don't need push support, add "--action snapshot expire".
If you need push support, set up SSH:
The deb includes some examples in /etc/ssh/. To use them, add "--ssh-config=/etc/ssh/cyber-zfs-backup.ssh_config".
Make sure the remote host trusts /etc/ssh/cyber-zfs-backup.id_ed25519. ("ssh-copy-id -i", or edit authorized_keys by hand).
FIXME: work out something like "restrict,command=rrsync -rw /".
Make sure the local host trust's the remote host's host keys. Something like this:
# ssh -F /etc/ssh/cyber-zfs-backup.ssh_config -o BatchMode=no offsite The authenticity of host 'offsite.example.com (172.16.17.18)' can't be established. ECDSA key fingerprint is SHA256:deadbeefbabedeadbeefbabedeafbeefbabedeadbee. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '203.7.155.208' (ECDSA) to the list of known hosts.
Let each host have a hostname. Let each host have a ZFS pool. Let each host have a subset of that pool for its own datasets. By default assume these names all match.
For example, on the host "zippy" we have a pool "zippy" and a tree within that "zippy".
root@zippy:~# zippy zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT zippy 5.45T 797G 4.68T - - 0% 14% 1.00x ONLINE - root@zippy:~# zfs get mountpoint zippy/zippy zippy/zippy/home zippy/zippy/root NAME PROPERTY VALUE SOURCE zippy/zippy mountpoint / local zippy/zippy/home mountpoint /home local zippy/zippy/root mountpoint /root local
Each host shall make daily snapshots of its own dataset (A/A) with RFC 33339 names. Each host shall expire those snapshots according to its own expiry preferences (e.g. 7 dailies, 4 weeklies, 12 monthlies, and infinite yearlies).
No host shall expire snapshots from backups of another host (A/B).
If host A backs up host B, it logically ends up in A's A/B tree. For example, zippy it backing up these hosts.
root@zippy:~# zfs get refer -t filesystem NAME PROPERTY VALUE SOURCE zippy/host02 referenced 24.0G - zippy/host03 referenced 26.3G - zippy/host05 referenced 38.9G - zippy/host06 referenced 12.2G - zippy/mdhcp referenced 96K - zippy/storage01 referenced 16.6G - [...]
Typically A.example.com backs up B.example.com, so for laziness we omit all the domains. If we back up a host from a "foreign" domain, include it? Or should we use FQDNs throughout? (Can pools and datasets have "aliases", so that both FQDN and unqualified names work?)
All backups shall be "replication" backups, i.e. if A/A has twelve snapshots, then the backup on B/A must also have exactly those twelve snapshots.
Backups shall always be made over ssh. For now, backups shall be push-based (not pull-based).
Backups shall be incremental except for the initial backup. To compute the latest shared snapshot, the sender shall SSH into the receiver and ask "what snapshots do you have?" It SHALL NOT simply guess. If the sender and receiver both have data (i.e. initial backup has finished) AND have no snapshots in common, the backup process should abort noisly, not send a non-incremental.
We use an easy-to-parse timestamp format in the snapshot name. Why don't we just parse "zfs list -t snapshot -o creation" ? Because that is outputting a timestamp format that is GARBAGE and impossible to parse safely.
UPDATE: just pass "-p" to zfs list, and you get epoch time.
Currently we just run zfs and parse the output, like savages. We should use ZCP instead and get more atomicity.
- https://openzfs.org/wiki/Projects/ZFS_Channel_Programs
- https://www.delphix.com/blog/delphix-engineering/zfs-channel-programs
- https://zfsonlinux.org/manpages/0.8.4/man8/zfs-program.8.html
- https://github.com/openzfs/zfs/blob/master/contrib/zcp/autosnap.lua cf. https://github.com/zfsonlinux/zfs-auto-snapshot/blob/master/src/zfs-auto-snapshot.sh
UPDATE: this is a non-starter. There is no access to date/time functions, so there is no viable way to implement a retention policy inside a ZFS channel program:
# zfs program -j -n omega /dev/stdin <<< 'local s="" for k,v in pairs(_G) do s = s .. tostring(v) .. "\t" .. tostring(k) .. "\n" end return s' | jq --raw-output .return | sort 1 EPERM 10 ECHILD 11 EAGAIN 12 ENOMEM 122 EDQUOT 125 ECANCELED 13 EACCES 14 EFAULT 15 ENOTBLK 16 EBUSY 17 EEXIST 18 EXDEV 19 ENODEV 2 ENOENT 20 ENOTDIR 21 EISDIR 22 EINVAL 23 ENFILE 24 EMFILE 25 ENOTTY 26 ETXTBSY 27 EFBIG 28 ENOSPC 29 ESPIPE 3 ESRCH 30 EROFS 31 EMLINK 32 EPIPE 33 EDOM 34 ERANGE 35 EDEADLK 36 ENAMETOOLONG 37 ENOLCK 4 EINTR 5 EIO 6 ENXIO 7 E2BIG 8 ENOEXEC 9 EBADF 95 ENOTSUP Lua 5.2 _VERSION function: 00000000019d218e select function: 00000000080fb5bf rawequal function: 0000000009baa4e1 getmetatable function: 000000001d6dda9e rawlen function: 0000000023b33d74 error function: 000000002ab2ecbf ipairs function: 0000000039f68134 collectgarbage function: 000000004021bc73 type function: 000000004d872795 pairs function: 000000006d63bf09 tostring function: 00000000bcbba06f rawset function: 00000000c3939123 rawget function: 00000000cf76f1f1 tonumber function: 00000000def8c887 assert function: 00000000f6551542 next function: 00000000f6f338ae setmetatable table: 00000000210f0f67 _G table: 0000000024f294a7 coroutine table: 0000000054256033 string table: 000000006ad4255f zfs table: 00000000c49f1579 table # zfs program -j -n omega /dev/stdin <<< 'local s="" for k,v in pairs(_G.coroutine) do s = s .. tostring(v) .. "\t" .. tostring(k) .. "\n" end return s' | jq --raw-output .return | sort function: 0000000061b2c387 create function: 00000000661ce1c8 resume function: 000000006ba739ac running function: 00000000abd16109 status function: 00000000adc4bf6c yield function: 00000000dae48116 wrap # zfs program -j -n omega /dev/stdin <<< 'local s="" for k,v in pairs(_G.string) do s = s .. tostring(v) .. "\t" .. tostring(k) .. "\n" end return s' | jq --raw-output .return | sort function: 0000000036b575b7 reverse function: 0000000043205ae5 len function: 000000005b799fc2 gmatch function: 0000000060623f9f lower function: 000000007ea57532 format function: 000000009f43d105 char function: 00000000b53b8e9f upper function: 00000000ca8fc3f6 sub function: 00000000cae83a1e byte function: 00000000d56ed26c gsub function: 00000000e022c71e rep function: 00000000ed52ab72 find function: 00000000f603fa1f match # zfs program -j -n omega /dev/stdin <<< 'local s="" for k,v in pairs(_G.table) do s = s .. tostring(v) .. "\t" .. tostring(k) .. "\n" end return s' | jq --raw-output .return | sort function: 0000000025c2df24 concat function: 00000000469d5d8d insert function: 00000000bb53dc13 sort function: 00000000bca821b7 unpack function: 00000000eb0870da remove function: 00000000fe629b2f pack # zfs program -j -n omega /dev/stdin <<< 'local s="" for k,v in pairs(_G.zfs) do s = s .. tostring(v) .. "\t" .. tostring(k) .. "\n" end return s' | jq --raw-output .return | sort function: 00000000489d3d0f exists function: 0000000081600378 debug function: 00000000b872b118 get_prop table: 000000009a0e61fa list table: 00000000a5fe73ef sync table: 00000000aadedb25 check # zfs program -j -n omega /dev/stdin <<< 'local s="" for k,v in pairs(_G.zfs.list) do s = s .. tostring(v) .. "\t" .. tostring(k) .. "\n" end return s' | jq --raw-output .return | sort function: 000000005ddfc3ad clones function: 00000000974d946f properties function: 000000009c0bdb8f children function: 000000009e00ec0f system_properties function: 00000000c4414610 snapshots # zfs program -j -n omega /dev/stdin <<< 'local s="" for k,v in pairs(_G.zfs.sync) do s = s .. tostring(v) .. "\t" .. tostring(k) .. "\n" end return s' | jq --raw-output .return | sort function: 000000006dddd4e0 destroy function: 00000000c2f864fa promote function: 00000000d039fee1 rollback function: 00000000e27c65ce snapshot # zfs program -j -n omega /dev/stdin <<< 'local s="" for k,v in pairs(_G.zfs.check) do s = s .. tostring(v) .. "\t" .. tostring(k) .. "\n" end return s' | jq --raw-output .return | sort function: 000000000ff6dc42 snapshot function: 00000000ea051e3e rollback function: 00000000ebbecb73 destroy function: 00000000ef0734c9 promote
Also (aside) if you blow the stack, "zfs program" segfaults (meh), but also you get errors in dmesg and also all subsequent "zfs" commands block in D state until you reboot!
## DO NOT RUN THIS DANGEROUS CODE!
# <RhodiumToad> but I bet they didn't know about how gsub allocates a ton of stack
# <RhodiumToad> there are three problematic functions that can do this, gsub is one of them
# <RhodiumToad> hm, the table.concat one might not work on 5.2
local function f(s) s:gsub(".", f) return "x" end f("foo")
return tostring(setmetatable({},{__tostring=function(t) string.format("%s",t) end}))
FIXME: more discussion here.