Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental backups #21

Closed
secwall opened this issue Dec 30, 2015 · 23 comments
Closed

Incremental backups #21

secwall opened this issue Dec 30, 2015 · 23 comments

Comments

@secwall
Copy link

secwall commented Dec 30, 2015

Hello.
I would like to discuss page-level incremental backups.
I’ve created proof-of-concept fork of barman here
There is no docs and unit-tests right now, but this will be fixed in near future.

Motivation:
We have large number of databases with pgdata size about 3 terabytes and changes about 1% of data per 24h.
Unfortunately barman backups with hardlinks gives us about 45% deduplication ratio (there are small changes in many data-files, so many data-files changes between backups, but page changed ratio is about 2%)

Solution to this problem seems simple: take only changed pages to backup.
I’ve created simple script named barman-incr (it is in bin dir of source code). It handles backup and restore operations. Barman runs it on database host and passes LSN, timestamp and list of files from previous backup. Then we just open each datafile and read every page in it (if it turns out that file we opened is not datafile, we’ll take it all). If page is lsn >= provided lsn we take this page to backup.

Some tests:
Database with pgdata size 2.7T, 120G wals per 24h.
Full backup size is 537G (compressed with gzip -3), time to take backup - 7h.
Incremental backup size is 14G (also compressed with gzip -3), time to take backup - 30m.

I’ve also tested restore consistency (restored database to some point of time and compared pg_dump result with paused replica).

Block change tracking (Oracle DBAs should be familiar with this, here is white paper about this) implementation will require some changes in wal archiving process. I’ll present some thoughts and test results on this in Q1 2016.

@man-brain
Copy link

Any thoughts, guys?

@secwall
Copy link
Author

secwall commented Jan 13, 2016

Hmm. It seems that there is no discussion. Let's move to exact questions:

  1. Is issue with many datafiles change with not so many pages change quite common (e.g. do we need page-level incremental backups in barman)?
  2. Running script over ssh on postgresql database host could be not so good idea. May be there are other ways of making page-level incremental backups possible?
  3. If current approach is ok, what should be fixed in my fork before merging? (code style in barman-incr, unit-tests and docs, anything else?)

@gbartolini
Copy link
Contributor

Hi,

first thanks for your contribution. We are currently 100% focused on
Barman 1.6.0 with streaming replication support. Hence we apologise for not
responding any earlier.

As far as this is concerned, our ultimate goal is to have this feature in
PostgreSQL's core (pg_basebackup), rather than having it part of Barman -
you can see our previous attempts at this in the hackers list of PostgreSQL.

However, having said this, we were discussing over lunch about your patch
just yesterday and one idea that came up could be to add a function in
pgespresso that returns the content a requested block in a file (or a list
of blocks). This would avoid installing an agent on the Postgres server.

Please bear with us, we will do our best to evaluate your code but it
won't be any time soon.

Thanks,
Gabriele

Gabriele Bartolini - 2ndQuadrant Italia - Managing Director
PostgreSQL Training, Services and Support
gabriele.bartolini@2ndQuadrant.it | www.2ndQuadrant.it

2016-01-13 9:58 GMT+01:00 secwall notifications@github.com:

Hmm. It seems that there is no discussion. Let's move to exact questions:

  1. Is issue with many datafiles change with not so many pages change quite
    common (e.g. do we need page-level incremental backups in barman)?
  2. Running script over ssh on postgresql database host could be not so
    good idea. May be there are other ways of making page-level incremental
    backups possible?
  3. If current approach is ok, what should be fixed in my fork before
    merging? (code style in barman-incr, unit-tests and docs, anything else?)


Reply to this email directly or view it on GitHub
#21 (comment)
.

@man-brain
Copy link

We are currently 100% focused on Barman 1.6.0 with streaming replication support. Hence we apologise for not responding any earlier.

No problem, guys. Although we are doing lots of rebasing :) you are doing right work, thanks!

As far as this is concerned, our ultimate goal is to have this feature in PostgreSQL's core (pg_basebackup), rather than having it part of Barman - you can see our previous attempts at this in the hackers list of PostgreSQL.

Yep, we've seen that but it seems that you have given up on it after you didn't have time to push it into 9.5. Having it in core PostgreSQL would be really great but our change brings not only increments, it also brings parallelism and compression. These two changes are really important for quite big databases. Rsync or pg_basebackup support compression but right now you can hit either network bandwidth (no compression) or speed of one CPU core (with compression). We are launching several processes to have an ability to utilize all resources and do it with maximum efficiency and flexibility.

... one idea that came up could be to add a function in pgespresso that returns the content a requested block in a file (or a list of blocks). This would avoid installing an agent on the Postgres server.

Yes, we do really want to avoid the need of installing something else on database servers, but implementing such a thing in pgespresso (or other extension with use of libpq) may be not a good decision. It would be quite difficult (but possible) to save parallelism and it would make restore much complicated. Actually, most of restore logic (decompression and merging increments) would be done on backup server neither on database host which seems a bit odd.

@secwall
Copy link
Author

secwall commented Feb 29, 2016

Hello, guys.
I see 1.6.0 release. So could we continue our discussion?
As @Dev1ant mentioned moving logic into pgespresso will make recover more complex.
Also db hosts have more CPU power and faster disks in our environment, so it's better to perform heavy operations on them (recover operation with barman-incr on db host in our tests is about 3 times faster than on barman host). And it seems to be quite common case?

@man-brain
Copy link

Any chance you will take a look at it, guys?

@secwall
Copy link
Author

secwall commented Mar 30, 2016

We started using fork with incremental backups on production.
Here are some numbers:
Our typical database looks like this (so pgdata is about 5 TiB):

root@xdb2011g ~ # df -h | grep pgsql
/dev/md4         14T  5.0T  8.1T  39% /var/lib/pgsql/9.4/data
/dev/md3        189G   82G   98G  46% /var/lib/pgsql/9.4/data/pg_xlog

It's backup looks like this (we use gzip -3 for backup compression and gzip -6 for WAL compression):

root@pg-backup05i ~ # barman list-backup xdb2011
xdb2011 20160330T020103 - Wed Mar 30 03:53:47 2016 - Size: 51.0 GiB - WAL Size: 60.8 GiB
xdb2011 20160329T020103 - Tue Mar 29 03:51:44 2016 - Size: 50.3 GiB - WAL Size: 114.8 GiB
xdb2011 20160328T020103 - Mon Mar 28 03:45:12 2016 - Size: 52.3 GiB - WAL Size: 112.8 GiB
xdb2011 20160327T020103 - Sun Mar 27 09:50:25 2016 - Size: 1.0 TiB - WAL Size: 88.7 GiB
xdb2011 20160326T020102 - Sat Mar 26 04:52:37 2016 - Size: 58.4 GiB - WAL Size: 122.1 GiB
xdb2011 20160325T020102 - Fri Mar 25 03:42:46 2016 - Size: 58.9 GiB - WAL Size: 122.6 GiB
xdb2011 20160324T020103 - Thu Mar 24 03:38:19 2016 - Size: 39.0 GiB - WAL Size: 126.5 GiB
xdb2011 20160323T020103 - Wed Mar 23 04:39:37 2016 - Size: 33.5 GiB - WAL Size: 82.2 GiB
xdb2011 20160322T020103 - Tue Mar 22 04:51:06 2016 - Size: 33.0 GiB - WAL Size: 76.1 GiB - OBSOLETE*
xdb2011 20160321T020103 - Mon Mar 21 04:20:11 2016 - Size: 28.2 GiB - WAL Size: 74.2 GiB - OBSOLETE*
xdb2011 20160320T020106 - Sun Mar 20 09:22:48 2016 - Size: 971.3 GiB - WAL Size: 48.4 GiB - OBSOLETE*

We start backups at 02:00, so full backup takes about 7-8 hours, and incremental backup takes about 3 hours (we could get speed up here by using block change tracking, but it is not ready yet). Backups + WALs for recovery window of 1 week consumes about 3.3 TB for this database.

@gbartolini
Copy link
Contributor

Hi guys,

I have to apologise again, but as you might have noticed, adding streaming replication support has taken longer than just 1.6.0! We have just released 1.6.1 and are working on 1.6.2/1.7.0 which will hopefully bring full pg_basebackup support and streaming-only backup solutions (suitable for PostgreSQL on Docker and Windows environments too).

Your patch is definitely very interesting but until we have completed support for streaming only backups, we have to postpone the review and the integration (mainly for testing purposes).

However, I thank you again for your interest and your efforts.

Ciao,
Gabriele

@gbartolini
Copy link
Contributor

While looking at your patch, I have been thinking about two possible ideas:

  1. Do you think you can isolate the lzma patch so that we can include that separately in Barman's core?
  2. I'd suggest having the remote 'barman-incr' as a separate script - it could even be a more generic barman-agent script that will be executed on the Postgres server via Ssh

Thanks again,
Gabriele

@secwall
Copy link
Author

secwall commented Jun 5, 2016

Hello.

  1. May be we could just import lzma only if lzma compression was requested by user (and return error if module is unavailable), is this approach ok? (lzma is currently used only in barman-incr, I didn't change WAL compression part).
  2. It seems that I don't understand this part. barman-incr is actually in separate package (https://github.com/secwall/barman/blob/master/rpm/barman.spec#L49-55) and it is really executed on postgresql server via ssh (https://github.com/secwall/barman/blob/master/barman/backup_executor.py#L774-783)

@man-brain
Copy link

Any success here, guys? Very soon it would be a one year open PR...

@gbartolini
Copy link
Contributor

In this period we have been extremely busy releasing version 2.0, with all the new features you are aware of. In order to include this patch in Barman we drafted a plan with secwall that included several code contributions, some of which have already been implemented, such as:

The difficulty in this patch is, as we have said in the past, to integrate it with any existing use case of Barman, without breaking back compatibility. Also, our approach is to reach the goal through an incremental process.

The next step will be to add parallel backup (v3?) - which should be quite straightforward now with the CopyController infrastructure, and then integrate the work of secwall on a remote PostgreSQL server, with an agent (for this reason too we have created the barman-cli package).

I hope that with this message you can clearly see our commitment and our efforts to get to this goal. Of course, having a stakeholder willing to fund the development of such a feature will raise the priority of this feature and allow us to develop that in a shorter timeframe.

@kamikaze
Copy link

This project is going to die with such speed and priorities, don't waste your time guys. Fork it

@secwall
Copy link
Author

secwall commented Feb 14, 2017

@kamikaze @soshnikov, could you kindly stop blaming? @gbartolini explained why incremental backups are not merged yet. We want this feature in mainline because we lack resources to support our own fork.

@RealLord
Copy link

Hm.... It's extremely strange, that the real feature that can provide more backup performance than all other Postgress backup software is still in working progress, not in production.

@FractalizeR
Copy link

FractalizeR commented Feb 17, 2017

Yandex guys said here, that Barman authors ask for money to merge this feature.

Уже почти год прошел, как мы их просим запилить эту киллер-фичу, а они просят с нас денег, чтобы замержить её.

Nearly a year has passed since we are asking [Barman team] to merge this killer-feature. And they are asking us for money to merge it.

Translation into English is mine.

2017-02-17 12 25 15

Can someone elaborate what the problem is? Where did this money question came from? I think this is just misunderstanding, right?

@man-brain
Copy link

man-brain commented Feb 19, 2017 via email

@FractalizeR
Copy link

Yep, sure, I can see that now. Sorry for late reply.

@s200999900
Copy link

s200999900 commented May 9, 2017

Hi!

Sorry for my easy english )

It is very helpful function!

I suggest looking at "borg backup" project as a storage backend.
https://github.com/borgbackup/borg
It has many good backup functionality: encryption, compression, deduplication, ssh as transport...

There are no python api for now, but it is possible to run as wrapper script for create, restore, check, list backup operation's.

I can help with testing in that way. But I need some help with right instruction to do that.

@AntonBushmelev
Copy link

Hello guys, any news about implementing this killer feature ?

@man-brain
Copy link

I suppose that this issue should be closed since nothing has been done for 2,5 years. We have merged all these features to wal-g upstream and we wouldn't support our barman fork any more.

@kamikaze
Copy link

kamikaze commented Sep 6, 2018

I suppose this project should be closed since nothing has been done for 2.5 years

@amenonsen
Copy link
Contributor

It's a pity this feature was not merged, especially because the patch (even today) looks really nicely done.

That said, with the benefit of several years of hindsight, scanning page header to detect updates based on LSN is a lot faster than rsync, but still too expensive for very large data directories. We know there are extensions like ptrack which take a more proactive approach to recording changes, and that seems like the right approach going forward.

Meanwhile, this project is now under active maintenance again. But I'll close this issue now because there's no point leaving it open. I do hope to support incremental backups, but we (still!) hope that core Postgres will eventually provide a feature that Barman can use to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

11 participants