Create backup file per db schema #64

Kusig · 2018-11-18T18:28:16Z

The container actually creates one big dump file of all schemas found on the db server.
It would be helpful to have an option to create a dump file per schema where the schema name is part of the dump file.

eg. db_backup_schemaname_20181118181008.gz

deitch · 2018-11-19T05:37:11Z

No arguments here. What would be the API? How would you indicate you wanted "file per schema"?

I imagine it would be more work to write the tests for this than the actual implementation itself.

Kusig · 2018-11-19T11:36:15Z

A simple boolean argument should be sufficient here eg:

DB_DUMP_BY_SCHEME=true

patakijv · 2018-12-01T23:36:41Z

May also be good to have if wanting to backup all databases without needing to use the DB_NAMES as a list only:

default exclusions for databases to be ignored such 'mysql',information_schema','performance_schema', etc.
ability to provide an exclusion list or pattern above and beyond the default
ability to provide a pattern for DB_NAMES vs just a list (i.e. for dynamically created databases such as with multi-tenant scenarios)

deitch · 2018-12-14T06:52:25Z

Have you had a chance to look at the source code and API that consume the file? This is non-trivial to implement:

Scripts in post-backup expect to receive the location of the single dump file in DUMPFILE=<path>. Now there would be multiple, which would require changing the API (hopefully without breaking anything)
Scripts in pre-backup expect to receive the (future) location of the single dump file in DUMPFILE=<path>. Now there would be multiple, which would require changing the API (hopefully without breaking anything)
source processing (source.sh), e.g. adding other data such as Wordpress files or metadata, also expect the location of the single file (which it may modify) as DUMPFILE
target processing (target.sh), e.g. changing the name of the upload file to something else, also expects a single file path as DUMPFILE

We would break every one of those external contract APIs, some of which would struggle with it, e.g. target rewriting. In addition, it would require some complex rewriting of how we do the final upload.

I think this goes against the grain here, which is to have a single file backup, one nice .tgz file. It may have things added via source processing, it may be manipulated via post-processing or prepared via pre-processing, it may change its final target upload name, it may limit which schemas it backs up via DB_NAMES, but it all comes down to a single file (which is eminently more manageable).

The more I think about it, the more I struggle to understand the use case:

if you will not be manipulating backup files after upload, why does it matter if it is one or several?
if you will be manipulating backup files after upload, why not separate it then?

If you really want to separate it for future processing, why not just use post-backup processing to take the single dump file, separate it into files per schema, and zip it up again?

Kusig · 2018-12-16T20:46:18Z

Your comment is understandable, having more than one file to process sounds breaking and hard to implement.

On the other side, extracting a single scheme out of the single file will as well not be that easy for everyone and that is what in my case would be helpful. I have to deal with the backups of the schemes differently sometimes and that would make it easy.

BTW, another way to achieve this, would be to keep the schemes as separate files within the single resulting compressed file. This way it would stay one file but just contain the files per schema. This would make the post-backup processing much easier.

deitch · 2018-12-17T14:50:24Z

I have to deal with the backups of the schemes differently sometimes and that would make it easy.

Definitely understand. There is a difference between "single file for multiple schemas zipped up" and "multiple files, one per schema, zipped up".

another way to achieve this, would be to keep the schemes as separate files within the single resulting compressed file

This sounds like it might be a good path. Either way, it gets a single .gz file. By default, it is a single backup. If you enable the flag, it is one file per schema, in that same single .gz.

It looks like it is doable entirely within the scope of https://github.com/deitch/mysql-backup/blob/master/entrypoint#L181-L188

You want to take a crack at it?

michabbb · 2019-01-10T16:51:03Z

another way to achieve this, would be to keep the schemes as separate files within the single resulting compressed file

i personally support this, too.
one of my self written backup scripts (perl) do exactly the same and it´s comfortable.
you deal with only one file (all backups in one), but you are able to extract only the database (a single bzip/gzip) you want to restore 😏

deitch · 2019-01-10T17:02:35Z

Yeah, that should be easy enough.

As far as I know, mysqldump outputs a single file, whether for one database, two or all. It is possible to do separate per table with --tab=<dir>, but not per schema. Do you know any differently?

michabbb · 2019-01-10T17:06:15Z

@deitch i guess you have todo seperate "mysqldump" calls, one call for each database, so you have to loop. the other solution would be to split the big dump file into pieces, but for me, that isn´t a good idea at all 😏

deitch · 2019-01-10T17:11:15Z

I do not like the idea of trying to tease apart the structure of the file, and then having it fail or require rework for a newer version at some point. If mysql or one of its tools supported it, sure. Not hacking it, though.

michabbb · 2019-01-10T17:47:01Z

@deitch do you see a problem looping through the databases, create a tmp dir, put all dumps into it and compress all files to a single one, and then go on with the unmodified rest of your script ? 🤔

deitch · 2019-01-10T18:01:08Z

You mean like this ?

michabbb · 2019-01-10T18:06:50Z

@deitch yeah, i am happy to test this 😄 👍

michabbb · 2019-01-10T18:19:43Z

@deitch one little thing i would like to recommend:

https://github.com/deitch/mysql-backup/blob/9df111989f5bc46c397f12936975ed8260b915c2/functions.sh#L121

it would be nice to have the timestamp in each of the schema dumps as well. so maybe you could change

$workdir/$onedb.sql

to

$workdir/${onedb}_${now}.sql

🙏 😏

deitch · 2019-01-10T18:22:25Z

Why? It already is in the zipped file? And it would make it a bit harder to extract?

New issue anyways

michabbb · 2019-01-10T18:33:53Z

@deitch pardon: i cannot agree. a restore is something i do manually, usually, because its a ciritical act. so i don´t care about the filename. the main reason is: if i extract one single database and copy it to somewhere else, give it so someone else or someone finds that single file - that person (or myself) has absolutly no clue when that backup was created. that simple information can be easily stored inside the filename - that´s all. just a "very simple" practical thing. i this sounds reasonable for you, i would be happy to create a new issue fo that 😏

or maybe, you ment: it´s harder for your script, than sorry, i didn´t consider that 🙈 i just saw this from the manual point of view.

Kusig · 2019-01-10T21:27:05Z

BTW, the completion date/time of export is in each dump at the end clearly declared by mysqldump itself. So there's no high need to have it somehow in the file name I assume.

deitch · 2019-01-11T10:04:28Z

So there's no high need to have it somehow in the file name I assume

OK

michabbb · 2019-01-11T11:14:53Z

@Kusig not everyone is happy to open files with several GB of data, just to find a timestamp somewhere 😔 not everybody knows that, but everybody, really - everybody - is able to read a filename, even without any technical skills. Some things in live can become very hard, when you work with other people, that don't have your knowledge, things like that (timestamp in filename), can save you hours of your life, because you don't have to explain someone, what a "tail" is or even worse: explain a windows user: "you want to know when the dump was made? Sure, no problem, just open the 10GB file in your notepad and find somewhere a timestamp" this person will ask me: "are you kidding me?"

So, this is just a real life example of my past.
Cheers

deitch · 2019-01-11T12:26:21Z

@michabbb is the timestamp on the .tar.gz or .tar.bz2 file insufficient? The backup itself has it.

I don't greatly object to it, but want to understand reasons first.

michabbb · 2019-01-11T12:37:51Z

@deitch thanks for your interest 👍 my little story - from real life - should only be a good example, why it can be very frustrating, if someone - not like you or me or anybody else who knows howto handle very large files.... or even ever has seen a mysql dump in his life. don´t ask, why such a person has a dump in his hands, don´t ask 😂

the reason is super super easy: why should i open (and again: not every like you and me knows howto do a grep, tail or less) a large file (>GB) just to find a timestamp somewhere, if that timestamp could be part of the file itself.

you think: whats the problem ? the main-zipped file has the timestamp. but please understand, in real life: you sometimes extract only a single database of the main file and send this file to someone else, that´s nothing unusual, if you work with other people 😏
sometimes people need todo something for you, because you are not at the keyboard, then you call someone and then you say (just another real life story example): "hey dude, please go to this URL and send the backup from 2019-01-05 to my colleague" - and this mac or windows user will say: "sure, but how do i know from when this backup is done??" i will say: "oh, sorry, you have to open the file" then he will say: "really? its size is 10GB!" i will say: "sorry, i told the project-owner it would make sense to put the timestamp into each file, but the people there thought, it´s not nessesary" he will say: "damn!" 😂

maybe this sounds stupid to you all, but these are things already happened to me, very frustrating, just because super simple main infos need to be found somehwere, even they could be placed into a filename.

deitch · 2019-01-11T12:52:34Z

Ha! The world of non-automated systems admin. I started my career doing financial services IT, we were obsessed with automation. Long before the term DevOps was coined, we were using that mindset. Every manual task was just another reason to remove humans from the equation.

deitch · 2019-01-11T12:54:25Z

Hmm.. beforehand, the gz file was just a single file, not even in a tar archive, so it had the file name in it. I wonder if we can recreate that here....

deitch · 2019-01-11T12:58:46Z

I will have to do something about the restore, but it works for now...

deitch mentioned this issue Jan 10, 2019

Add support for using multiple files, one per schema #70

Merged

deitch closed this as completed in #70 Jan 10, 2019

michabbb mentioned this issue Jan 11, 2019

give single dump file name of db #76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create backup file per db schema #64

Create backup file per db schema #64

Kusig commented Nov 18, 2018

deitch commented Nov 19, 2018

Kusig commented Nov 19, 2018

patakijv commented Dec 1, 2018 •

edited

Loading

deitch commented Dec 14, 2018

Kusig commented Dec 16, 2018

deitch commented Dec 17, 2018

michabbb commented Jan 10, 2019

deitch commented Jan 10, 2019

michabbb commented Jan 10, 2019

deitch commented Jan 10, 2019

michabbb commented Jan 10, 2019

deitch commented Jan 10, 2019

michabbb commented Jan 10, 2019

michabbb commented Jan 10, 2019

deitch commented Jan 10, 2019

michabbb commented Jan 10, 2019 •

edited

Loading

Kusig commented Jan 10, 2019

deitch commented Jan 11, 2019

michabbb commented Jan 11, 2019

deitch commented Jan 11, 2019

michabbb commented Jan 11, 2019

deitch commented Jan 11, 2019

deitch commented Jan 11, 2019

deitch commented Jan 11, 2019

Create backup file per db schema #64

Create backup file per db schema #64

Comments

Kusig commented Nov 18, 2018

deitch commented Nov 19, 2018

Kusig commented Nov 19, 2018

patakijv commented Dec 1, 2018 • edited Loading

deitch commented Dec 14, 2018

Kusig commented Dec 16, 2018

deitch commented Dec 17, 2018

michabbb commented Jan 10, 2019

deitch commented Jan 10, 2019

michabbb commented Jan 10, 2019

deitch commented Jan 10, 2019

michabbb commented Jan 10, 2019

deitch commented Jan 10, 2019

michabbb commented Jan 10, 2019

michabbb commented Jan 10, 2019

deitch commented Jan 10, 2019

michabbb commented Jan 10, 2019 • edited Loading

Kusig commented Jan 10, 2019

deitch commented Jan 11, 2019

michabbb commented Jan 11, 2019

deitch commented Jan 11, 2019

michabbb commented Jan 11, 2019

deitch commented Jan 11, 2019

deitch commented Jan 11, 2019

deitch commented Jan 11, 2019

patakijv commented Dec 1, 2018 •

edited

Loading

michabbb commented Jan 10, 2019 •

edited

Loading