Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lvm plus rsync #314

Open
AboBuchanan opened this issue Sep 24, 2020 · 14 comments
Open

Lvm plus rsync #314

AboBuchanan opened this issue Sep 24, 2020 · 14 comments

Comments

@AboBuchanan
Copy link

Hi
Am using LVM with Tar at the moment, a lot of my databases are quite static anc contain large blobs,
running tar with 0 - effectively not tar backs up 122 Gig in 10 minutes with tar compression takes an hour

Ideally I would like to create a new way lvm+rsync
Create snapshop
Innodb recovery
rsync to target directory (in config I guess)
finish and drop snapshot

This could also be achieved with a command before snapshot is dropped and excluding everything

My first thoughts are to expand the lvm one

@soulen3
Copy link
Contributor

soulen3 commented Sep 24, 2020

This could be worked into the mysql-lvm module. Here's a list of things we would need to consider.

  • Holland doesn't really have a concept of incremental backups, so you would need to put the backup somewhere besides /var/spool/holland/. (New config option like you mentioned)
  • This will mess with purging failed backups.
  • Using rsync in this way will mean the plugin would only have one active copy of the data. By default holland will complete a backup before purging the old one.
  • I'm concerned that a corrupt backup directory would look similar to a successful one.

None of these things are deal breakers, but should be considers.

I'm assuming your goal is to reduce backup time, is that correct? Have you tested how long it takes for rsync to run after the data has been seeded? If it only takes 10 minutes to copy all the data using tar, you're really not going to save that much time and it's going to be CPU intensive.

@AboBuchanan
Copy link
Author

AboBuchanan commented Sep 24, 2020 via email

@mikegriffin
Copy link
Contributor

Hello,

I am curious if you could clarify, as I am not sure I understood this

effectively not tar backs up 122 Gig in 10 minutes with tar compression takes an hour

Do you mean to say that your backup takes 10 minutes with:

archive-method=dir

But that it takes an hour with both of these defaults:

archive-method=tar

[compression]
method = gzip
options = ""
inline = yes
split = no
level = 1

Using archive-method=tar with default gzip compression can be quite slow, which is why Holland supports pigz or zstd. If your data in the "large blobs" are binary data, this is effectively "compressed" data and you would not want to use a compression method besides none

My question is whether you mean as you said and that the backup is slow due to tar or if you meant that the backup is fast when compression is off, in either tar/dir archive-method.

Respectfully,
Mike

@AboBuchanan
Copy link
Author

AboBuchanan commented Sep 24, 2020 via email

@soulen3
Copy link
Contributor

soulen3 commented Sep 24, 2020

Being honest, I forgot archive-method was an option. Does that solve your issue? I'm still not sure if I understand what you're trying to accomplish. I'm assuming you're trying to reduce the amount of time you need the snapshot available. Is that correct?

@mikegriffin
Copy link
Contributor

Please let us know if (and maybe even how) the dir method is useful for you.

Adding in a new method like a hypothetical rsync-partial is a heavy hammer and we try to avoid confusing configuration options or footguns (understanding these implications is a very niche concept for data integrity.)

I would not think that dir is much faster than tar when compression method is none - it is mostly useful in cases where you want some other process to not copy a giant single file (external to holland) and where also the split option doesn't quite solve your dilemma. If I am honest, a ten minute backup sounds pretty good and I would not think that an rsync in default mode (without -P --append) would be faster.

@AboBuchanan
Copy link
Author

AboBuchanan commented Sep 24, 2020 via email

@AboBuchanan
Copy link
Author

AboBuchanan commented Sep 24, 2020 via email

@soulen3
Copy link
Contributor

soulen3 commented Sep 24, 2020

https://docs.hollandbackup.org/docs/provider_configs/mysql-lvm.html
Last option under mysql-lvm

archive-method = tar | dir (default: tar)

Create a tar file of the datadir, or just copy it.

After the snapshot is complete you can run a command using after-backup-command
https://docs.hollandbackup.org/docs/config.html#backup-set-configs

@AboBuchanan
Copy link
Author

AboBuchanan commented Sep 24, 2020 via email

@mikegriffin
Copy link
Contributor

The problems with an rsync command that writes outside of backupdir are many (some examples are checking disk space is no longer viable, options that don't have to do expensive checksums of those files are risky, etc) and there has been a goal, when you have open a snapshot, to close it as quickly as possible due to "negative scalability" of read performance, as the snapshot size grows.

Allowing any hooks while the snapshot is open is obviously possible but perhaps not something that should be encouraged.

Did the archive-method=dir generally solve your issue (not wanting to untar the resulting backup during restore) with acceptable performance during the backup?

If you really do think you have some large files in the MySQL data dir that are 100% never written to, I would encourage you to test the rsync performance outside of holland using two backups if you have the space. Something like:

  • Set Holland lvm to use dir copy instead of tar

  • Copy a backup out of your Holland backupdir when it is complete, to some other location on the same mount point, or one with similar performance

  • Wait a couple of days and when Holland is not running, do an rsync from the newest backup dir to the one you preserved and see if it is significantly faster (I think this is unlikely)

If you didn't have space for three backups or whatever is required to do such a test, make sure that you are reading the freshest backup from the same mount point, when you are doing an rsync test outside of Holland, so that you have the same read pressure.

@mikegriffin
Copy link
Contributor

By the way, nothing is stopping you now from the after-backup-command using rsync to remote, if you start using dir method.

If many or large data files haven't changed, then you should see a significant speed up there from the current 50 minutes you measure.

Whether Holland proper used rsync or not has no impact on your external copy (unless you meant that you wouldn't store a local copy at all)

@AboBuchanan
Copy link
Author

AboBuchanan commented Sep 25, 2020 via email

@soulen3
Copy link
Contributor

soulen3 commented Sep 28, 2020

Looks like there's a 'defaults-file' option being defined in the section 'mysql:client' of your backupset configuration file. 'defaults-file' isn't a valid option for that section of the config. It's looking for 'defaults-extra-file' if you're trying to add a .my.cnf file.

That warning shouldn't be causing any issues though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants