Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create backup file per db schema #64

Closed
Kusig opened this issue Nov 18, 2018 · 24 comments
Closed

Create backup file per db schema #64

Kusig opened this issue Nov 18, 2018 · 24 comments

Comments

@Kusig
Copy link

Kusig commented Nov 18, 2018

The container actually creates one big dump file of all schemas found on the db server.
It would be helpful to have an option to create a dump file per schema where the schema name is part of the dump file.

eg. db_backup_schemaname_20181118181008.gz

@deitch
Copy link
Collaborator

deitch commented Nov 19, 2018

No arguments here. What would be the API? How would you indicate you wanted "file per schema"?

I imagine it would be more work to write the tests for this than the actual implementation itself.

@Kusig
Copy link
Author

Kusig commented Nov 19, 2018

A simple boolean argument should be sufficient here eg:

DB_DUMP_BY_SCHEME=true

@patakijv
Copy link

patakijv commented Dec 1, 2018

May also be good to have if wanting to backup all databases without needing to use the DB_NAMES as a list only:

  • default exclusions for databases to be ignored such 'mysql',information_schema','performance_schema', etc.
  • ability to provide an exclusion list or pattern above and beyond the default
  • ability to provide a pattern for DB_NAMES vs just a list (i.e. for dynamically created databases such as with multi-tenant scenarios)

@deitch
Copy link
Collaborator

deitch commented Dec 14, 2018

Have you had a chance to look at the source code and API that consume the file? This is non-trivial to implement:

  • Scripts in post-backup expect to receive the location of the single dump file in DUMPFILE=<path>. Now there would be multiple, which would require changing the API (hopefully without breaking anything)
  • Scripts in pre-backup expect to receive the (future) location of the single dump file in DUMPFILE=<path>. Now there would be multiple, which would require changing the API (hopefully without breaking anything)
  • source processing (source.sh), e.g. adding other data such as Wordpress files or metadata, also expect the location of the single file (which it may modify) as DUMPFILE
  • target processing (target.sh), e.g. changing the name of the upload file to something else, also expects a single file path as DUMPFILE

We would break every one of those external contract APIs, some of which would struggle with it, e.g. target rewriting. In addition, it would require some complex rewriting of how we do the final upload.

I think this goes against the grain here, which is to have a single file backup, one nice .tgz file. It may have things added via source processing, it may be manipulated via post-processing or prepared via pre-processing, it may change its final target upload name, it may limit which schemas it backs up via DB_NAMES, but it all comes down to a single file (which is eminently more manageable).

The more I think about it, the more I struggle to understand the use case:

  • if you will not be manipulating backup files after upload, why does it matter if it is one or several?
  • if you will be manipulating backup files after upload, why not separate it then?

If you really want to separate it for future processing, why not just use post-backup processing to take the single dump file, separate it into files per schema, and zip it up again?

@Kusig
Copy link
Author

Kusig commented Dec 16, 2018

Your comment is understandable, having more than one file to process sounds breaking and hard to implement.

On the other side, extracting a single scheme out of the single file will as well not be that easy for everyone and that is what in my case would be helpful. I have to deal with the backups of the schemes differently sometimes and that would make it easy.

BTW, another way to achieve this, would be to keep the schemes as separate files within the single resulting compressed file. This way it would stay one file but just contain the files per schema. This would make the post-backup processing much easier.

@deitch
Copy link
Collaborator

deitch commented Dec 17, 2018

I have to deal with the backups of the schemes differently sometimes and that would make it easy.

Definitely understand. There is a difference between "single file for multiple schemas zipped up" and "multiple files, one per schema, zipped up".

another way to achieve this, would be to keep the schemes as separate files within the single resulting compressed file

This sounds like it might be a good path. Either way, it gets a single .gz file. By default, it is a single backup. If you enable the flag, it is one file per schema, in that same single .gz.

It looks like it is doable entirely within the scope of https://github.com/deitch/mysql-backup/blob/master/entrypoint#L181-L188

You want to take a crack at it?

@michabbb
Copy link

another way to achieve this, would be to keep the schemes as separate files within the single resulting compressed file

i personally support this, too.
one of my self written backup scripts (perl) do exactly the same and it´s comfortable.
you deal with only one file (all backups in one), but you are able to extract only the database (a single bzip/gzip) you want to restore 😏

@deitch
Copy link
Collaborator

deitch commented Jan 10, 2019

Yeah, that should be easy enough.

As far as I know, mysqldump outputs a single file, whether for one database, two or all. It is possible to do separate per table with --tab=<dir>, but not per schema. Do you know any differently?

@michabbb
Copy link

@deitch i guess you have todo seperate "mysqldump" calls, one call for each database, so you have to loop. the other solution would be to split the big dump file into pieces, but for me, that isn´t a good idea at all 😏

@deitch
Copy link
Collaborator

deitch commented Jan 10, 2019

I do not like the idea of trying to tease apart the structure of the file, and then having it fail or require rework for a newer version at some point. If mysql or one of its tools supported it, sure. Not hacking it, though.

@michabbb
Copy link

@deitch do you see a problem looping through the databases, create a tmp dir, put all dumps into it and compress all files to a single one, and then go on with the unmodified rest of your script ? 🤔

@deitch
Copy link
Collaborator

deitch commented Jan 10, 2019

You mean like this ?

@michabbb
Copy link

@deitch yeah, i am happy to test this 😄 👍

@michabbb
Copy link

@deitch one little thing i would like to recommend:

https://github.com/deitch/mysql-backup/blob/9df111989f5bc46c397f12936975ed8260b915c2/functions.sh#L121

it would be nice to have the timestamp in each of the schema dumps as well. so maybe you could change

$workdir/$onedb.sql

to

$workdir/${onedb}_${now}.sql

🙏 😏

@deitch
Copy link
Collaborator

deitch commented Jan 10, 2019

Why? It already is in the zipped file? And it would make it a bit harder to extract?

New issue anyways

@michabbb
Copy link

michabbb commented Jan 10, 2019

@deitch pardon: i cannot agree. a restore is something i do manually, usually, because its a ciritical act. so i don´t care about the filename. the main reason is: if i extract one single database and copy it to somewhere else, give it so someone else or someone finds that single file - that person (or myself) has absolutly no clue when that backup was created. that simple information can be easily stored inside the filename - that´s all. just a "very simple" practical thing. i this sounds reasonable for you, i would be happy to create a new issue fo that 😏

or maybe, you ment: it´s harder for your script, than sorry, i didn´t consider that 🙈 i just saw this from the manual point of view.

@Kusig
Copy link
Author

Kusig commented Jan 10, 2019

BTW, the completion date/time of export is in each dump at the end clearly declared by mysqldump itself. So there's no high need to have it somehow in the file name I assume.

@deitch
Copy link
Collaborator

deitch commented Jan 11, 2019

So there's no high need to have it somehow in the file name I assume

OK

@michabbb
Copy link

@Kusig not everyone is happy to open files with several GB of data, just to find a timestamp somewhere 😔 not everybody knows that, but everybody, really - everybody - is able to read a filename, even without any technical skills. Some things in live can become very hard, when you work with other people, that don't have your knowledge, things like that (timestamp in filename), can save you hours of your life, because you don't have to explain someone, what a "tail" is or even worse: explain a windows user: "you want to know when the dump was made? Sure, no problem, just open the 10GB file in your notepad and find somewhere a timestamp" this person will ask me: "are you kidding me?"

So, this is just a real life example of my past.
Cheers

@deitch
Copy link
Collaborator

deitch commented Jan 11, 2019

@michabbb is the timestamp on the .tar.gz or .tar.bz2 file insufficient? The backup itself has it.

I don't greatly object to it, but want to understand reasons first.

@michabbb
Copy link

@deitch thanks for your interest 👍 my little story - from real life - should only be a good example, why it can be very frustrating, if someone - not like you or me or anybody else who knows howto handle very large files.... or even ever has seen a mysql dump in his life. don´t ask, why such a person has a dump in his hands, don´t ask 😂

the reason is super super easy: why should i open (and again: not every like you and me knows howto do a grep, tail or less) a large file (>GB) just to find a timestamp somewhere, if that timestamp could be part of the file itself.

you think: whats the problem ? the main-zipped file has the timestamp. but please understand, in real life: you sometimes extract only a single database of the main file and send this file to someone else, that´s nothing unusual, if you work with other people 😏
sometimes people need todo something for you, because you are not at the keyboard, then you call someone and then you say (just another real life story example): "hey dude, please go to this URL and send the backup from 2019-01-05 to my colleague" - and this mac or windows user will say: "sure, but how do i know from when this backup is done??" i will say: "oh, sorry, you have to open the file" then he will say: "really? its size is 10GB!" i will say: "sorry, i told the project-owner it would make sense to put the timestamp into each file, but the people there thought, it´s not nessesary" he will say: "damn!" 😂

maybe this sounds stupid to you all, but these are things already happened to me, very frustrating, just because super simple main infos need to be found somehwere, even they could be placed into a filename.

@deitch
Copy link
Collaborator

deitch commented Jan 11, 2019

Ha! The world of non-automated systems admin. I started my career doing financial services IT, we were obsessed with automation. Long before the term DevOps was coined, we were using that mindset. Every manual task was just another reason to remove humans from the equation.

@deitch
Copy link
Collaborator

deitch commented Jan 11, 2019

Hmm.. beforehand, the gz file was just a single file, not even in a tar archive, so it had the file name in it. I wonder if we can recreate that here....

@deitch
Copy link
Collaborator

deitch commented Jan 11, 2019

I will have to do something about the restore, but it works for now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants