Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send stream could be sent compressed if the dataset is compressed #60

Closed
Erisa opened this issue Mar 22, 2022 · 4 comments · Fixed by #63
Closed

Send stream could be sent compressed if the dataset is compressed #60

Erisa opened this issue Mar 22, 2022 · 4 comments · Fixed by #63

Comments

@Erisa
Copy link
Contributor

Erisa commented Mar 22, 2022

If the dataset being backed up has a compression property set to anything other than off, the default behaviour of zfs send is to decompress on the fly and send the full uncompressed dataset.

Simply by adding a -c, --compressed flag to zfs send, this will instead be sent compressed and takes up significantly less space on the remote. In my case this reduced a full backup of a PostgreSQL database from 56 GB to 24 GB.

I added this flag to my personal fork in Erisa@c192333 and noticed no regressions or repercussions, however since users may not always have their dataset set to compress or want this behaviour to change across versions, I believe the best way forward would be to add a zfs_uploader config variable that will enable this compressed flag.

@ddebeau
Copy link
Owner

ddebeau commented Mar 25, 2022

Thanks for the interest! We set the raw flag -w with zfs send which is equivalent to -Lec for unencrypted datasets. The raw flag is required for sending encrypted datasets.

https://openzfs.github.io/openzfs-docs/man/8/zfs-send.8.html#w

cmd = ['zfs', 'send', '-w', f'{filesystem}@{snapshot_name}']

I'm not sure why your dataset would be sent uncompressed. Which version of ZFS are you using?

@Erisa
Copy link
Contributor Author

Erisa commented Mar 25, 2022

Interesting, I had looked at the raw flag but didn't quite realise it was supposed to be doing anything with compression on unencrypted datasets.

The ZFS version I was/is using for that is admittedly a little old since it's from the Ubuntu 20.04 repos:

zfs-0.8.3-1ubuntu12.13
zfs-kmod-0.8.3-1ubuntu12.13

Perhaps a newer version may handle it better? The dataset in question is unencrypted and has compression=lz4

When I tried without the -c flag it tried to send the full uncompressed size:

time=2022-03-19T05:53:28.621 level=INFO filesystem=rpool/synapse snapshot_name=20220319_055300 s3_key=rpool/synapse/20220319_055300.full progress=1% speed="29 MBps" transferred="309/56196 MB" time_elapsed=0m

That size went down to 24837 MB after adding -c to the code, which lines up with the compressed size of the dataset at the time.

@ddebeau
Copy link
Owner

ddebeau commented Mar 26, 2022

Could you check the file size in the S3 bucket? The file size should not change when adding -c.

I think the problem is that we're not setting -w when we're calculating the snapshot size:

def get_snapshot_send_size(filesystem, snapshot_name):
cmd = ['zfs', 'send', '--parsable', '--dryrun',
f'{filesystem}@{snapshot_name}']
out = subprocess.run(cmd, **SUBPROCESS_KWARGS)
return out.stdout.splitlines()[1].split()[1]
def get_snapshot_send_size_inc(filesystem, snapshot_name_1, snapshot_name_2):
cmd = ['zfs', 'send', '--parsable', '--dryrun', '-i',
f'{filesystem}@{snapshot_name_1}',
f'{filesystem}@{snapshot_name_2}']
out = subprocess.run(cmd, **SUBPROCESS_KWARGS)
return out.stdout.splitlines()[1].split()[1]

@Erisa
Copy link
Contributor Author

Erisa commented Mar 26, 2022

You're right, in the past I was cancelling the job before it could actually finish uploading which is how I never saw the distinction, my bad there.

Running it with my latest incremental snapshot and leaving it to complete shows that it is indeed only the estimate that's incorrect, and the compressed dataset is what's sent:

time=2022-03-26T15:04:01.669 level=INFO filesystem=rpool/synapse snapshot_name=20220326_150200 s3_key=rpool/synapse/20220326_150200.inc progress=44% speed="30 MBps" transferred="3613/8200 MB" time_elapsed=2m
time=2022-03-26T15:04:06.673 level=INFO filesystem=rpool/synapse snapshot_name=20220326_150200 s3_key=rpool/synapse/20220326_150200.inc progress=46% speed="30 MBps" transferred="3764/8200 MB" time_elapsed=2m
time=2022-03-26T15:04:10.644 level=INFO filesystem=rpool/synapse snapshot_name=20220326_150200 s3_key=rpool/synapse/20220326_150200.inc msg="Finished incremental backup."
time=2022-03-26T15:04:10.644 level=INFO filesystem=rpool/synapse msg="Finished job."

It estimates that the size will be 8200 MB but finishes with the last progress being 3764 MB.

And the object is ~4 GB:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants