Incremental backups #21

secwall · 2015-12-30T10:38:41Z

Hello.
I would like to discuss page-level incremental backups.
I’ve created proof-of-concept fork of barman here
There is no docs and unit-tests right now, but this will be fixed in near future.

Motivation:
We have large number of databases with pgdata size about 3 terabytes and changes about 1% of data per 24h.
Unfortunately barman backups with hardlinks gives us about 45% deduplication ratio (there are small changes in many data-files, so many data-files changes between backups, but page changed ratio is about 2%)

Solution to this problem seems simple: take only changed pages to backup.
I’ve created simple script named barman-incr (it is in bin dir of source code). It handles backup and restore operations. Barman runs it on database host and passes LSN, timestamp and list of files from previous backup. Then we just open each datafile and read every page in it (if it turns out that file we opened is not datafile, we’ll take it all). If page is lsn >= provided lsn we take this page to backup.

Some tests:
Database with pgdata size 2.7T, 120G wals per 24h.
Full backup size is 537G (compressed with gzip -3), time to take backup - 7h.
Incremental backup size is 14G (also compressed with gzip -3), time to take backup - 30m.

I’ve also tested restore consistency (restored database to some point of time and compared pg_dump result with paused replica).

Block change tracking (Oracle DBAs should be familiar with this, here is white paper about this) implementation will require some changes in wal archiving process. I’ll present some thoughts and test results on this in Q1 2016.

man-brain · 2016-01-05T16:27:22Z

Any thoughts, guys?

secwall · 2016-01-13T08:58:28Z

Hmm. It seems that there is no discussion. Let's move to exact questions:

Is issue with many datafiles change with not so many pages change quite common (e.g. do we need page-level incremental backups in barman)?
Running script over ssh on postgresql database host could be not so good idea. May be there are other ways of making page-level incremental backups possible?
If current approach is ok, what should be fixed in my fork before merging? (code style in barman-incr, unit-tests and docs, anything else?)

gbartolini · 2016-01-13T09:10:59Z

Hi,

first thanks for your contribution. We are currently 100% focused on
Barman 1.6.0 with streaming replication support. Hence we apologise for not
responding any earlier.

As far as this is concerned, our ultimate goal is to have this feature in
PostgreSQL's core (pg_basebackup), rather than having it part of Barman -
you can see our previous attempts at this in the hackers list of PostgreSQL.

However, having said this, we were discussing over lunch about your patch
just yesterday and one idea that came up could be to add a function in
pgespresso that returns the content a requested block in a file (or a list
of blocks). This would avoid installing an agent on the Postgres server.

Please bear with us, we will do our best to evaluate your code but it
won't be any time soon.

Thanks,
Gabriele

Gabriele Bartolini - 2ndQuadrant Italia - Managing Director
PostgreSQL Training, Services and Support
gabriele.bartolini@2ndQuadrant.it | www.2ndQuadrant.it

2016-01-13 9:58 GMT+01:00 secwall notifications@github.com:

Hmm. It seems that there is no discussion. Let's move to exact questions:

Is issue with many datafiles change with not so many pages change quite
common (e.g. do we need page-level incremental backups in barman)?

Running script over ssh on postgresql database host could be not so
good idea. May be there are other ways of making page-level incremental
backups possible?

If current approach is ok, what should be fixed in my fork before
merging? (code style in barman-incr, unit-tests and docs, anything else?)

—
Reply to this email directly or view it on GitHub
#21 (comment)
.

man-brain · 2016-01-13T09:47:44Z

We are currently 100% focused on Barman 1.6.0 with streaming replication support. Hence we apologise for not responding any earlier.

No problem, guys. Although we are doing lots of rebasing :) you are doing right work, thanks!

As far as this is concerned, our ultimate goal is to have this feature in PostgreSQL's core (pg_basebackup), rather than having it part of Barman - you can see our previous attempts at this in the hackers list of PostgreSQL.

Yep, we've seen that but it seems that you have given up on it after you didn't have time to push it into 9.5. Having it in core PostgreSQL would be really great but our change brings not only increments, it also brings parallelism and compression. These two changes are really important for quite big databases. Rsync or pg_basebackup support compression but right now you can hit either network bandwidth (no compression) or speed of one CPU core (with compression). We are launching several processes to have an ability to utilize all resources and do it with maximum efficiency and flexibility.

... one idea that came up could be to add a function in pgespresso that returns the content a requested block in a file (or a list of blocks). This would avoid installing an agent on the Postgres server.

Yes, we do really want to avoid the need of installing something else on database servers, but implementing such a thing in pgespresso (or other extension with use of libpq) may be not a good decision. It would be quite difficult (but possible) to save parallelism and it would make restore much complicated. Actually, most of restore logic (decompression and merging increments) would be done on backup server neither on database host which seems a bit odd.

secwall · 2016-02-29T14:13:02Z

Hello, guys.
I see 1.6.0 release. So could we continue our discussion?
As @Dev1ant mentioned moving logic into pgespresso will make recover more complex.
Also db hosts have more CPU power and faster disks in our environment, so it's better to perform heavy operations on them (recover operation with barman-incr on db host in our tests is about 3 times faster than on barman host). And it seems to be quite common case?

man-brain · 2016-03-11T05:38:34Z

Any chance you will take a look at it, guys?

secwall · 2016-03-30T12:51:17Z

We started using fork with incremental backups on production.
Here are some numbers:
Our typical database looks like this (so pgdata is about 5 TiB):

root@xdb2011g ~ # df -h | grep pgsql
/dev/md4         14T  5.0T  8.1T  39% /var/lib/pgsql/9.4/data
/dev/md3        189G   82G   98G  46% /var/lib/pgsql/9.4/data/pg_xlog

It's backup looks like this (we use gzip -3 for backup compression and gzip -6 for WAL compression):

root@pg-backup05i ~ # barman list-backup xdb2011
xdb2011 20160330T020103 - Wed Mar 30 03:53:47 2016 - Size: 51.0 GiB - WAL Size: 60.8 GiB
xdb2011 20160329T020103 - Tue Mar 29 03:51:44 2016 - Size: 50.3 GiB - WAL Size: 114.8 GiB
xdb2011 20160328T020103 - Mon Mar 28 03:45:12 2016 - Size: 52.3 GiB - WAL Size: 112.8 GiB
xdb2011 20160327T020103 - Sun Mar 27 09:50:25 2016 - Size: 1.0 TiB - WAL Size: 88.7 GiB
xdb2011 20160326T020102 - Sat Mar 26 04:52:37 2016 - Size: 58.4 GiB - WAL Size: 122.1 GiB
xdb2011 20160325T020102 - Fri Mar 25 03:42:46 2016 - Size: 58.9 GiB - WAL Size: 122.6 GiB
xdb2011 20160324T020103 - Thu Mar 24 03:38:19 2016 - Size: 39.0 GiB - WAL Size: 126.5 GiB
xdb2011 20160323T020103 - Wed Mar 23 04:39:37 2016 - Size: 33.5 GiB - WAL Size: 82.2 GiB
xdb2011 20160322T020103 - Tue Mar 22 04:51:06 2016 - Size: 33.0 GiB - WAL Size: 76.1 GiB - OBSOLETE*
xdb2011 20160321T020103 - Mon Mar 21 04:20:11 2016 - Size: 28.2 GiB - WAL Size: 74.2 GiB - OBSOLETE*
xdb2011 20160320T020106 - Sun Mar 20 09:22:48 2016 - Size: 971.3 GiB - WAL Size: 48.4 GiB - OBSOLETE*

We start backups at 02:00, so full backup takes about 7-8 hours, and incremental backup takes about 3 hours (we could get speed up here by using block change tracking, but it is not ready yet). Backups + WALs for recovery window of 1 week consumes about 3.3 TB for this database.

gbartolini · 2016-06-03T20:13:02Z

Hi guys,

I have to apologise again, but as you might have noticed, adding streaming replication support has taken longer than just 1.6.0! We have just released 1.6.1 and are working on 1.6.2/1.7.0 which will hopefully bring full pg_basebackup support and streaming-only backup solutions (suitable for PostgreSQL on Docker and Windows environments too).

Your patch is definitely very interesting but until we have completed support for streaming only backups, we have to postpone the review and the integration (mainly for testing purposes).

However, I thank you again for your interest and your efforts.

Ciao,
Gabriele

gbartolini · 2016-06-05T11:42:16Z

While looking at your patch, I have been thinking about two possible ideas:

Do you think you can isolate the lzma patch so that we can include that separately in Barman's core?
I'd suggest having the remote 'barman-incr' as a separate script - it could even be a more generic barman-agent script that will be executed on the Postgres server via Ssh

Thanks again,
Gabriele

secwall · 2016-06-05T11:49:44Z

Hello.

May be we could just import lzma only if lzma compression was requested by user (and return error if module is unavailable), is this approach ok? (lzma is currently used only in barman-incr, I didn't change WAL compression part).
It seems that I don't understand this part. barman-incr is actually in separate package (https://github.com/secwall/barman/blob/master/rpm/barman.spec#L49-55) and it is really executed on postgresql server via ssh (https://github.com/secwall/barman/blob/master/barman/backup_executor.py#L774-783)

man-brain · 2016-10-24T09:48:11Z

Any success here, guys? Very soon it would be a one year open PR...

gbartolini · 2016-11-04T12:11:43Z

In this period we have been extremely busy releasing version 2.0, with all the new features you are aware of. In order to include this patch in Barman we drafted a plan with secwall that included several code contributions, some of which have already been implemented, such as:

8cbe022
d4794e8
1dd8ea8 (even though it seems unrelated)
0a646b6 (ditto)
fec853b (ditto)
1c9ca22
22b852b (important)
32aafe3 (important)

The difficulty in this patch is, as we have said in the past, to integrate it with any existing use case of Barman, without breaking back compatibility. Also, our approach is to reach the goal through an incremental process.

The next step will be to add parallel backup (v3?) - which should be quite straightforward now with the CopyController infrastructure, and then integrate the work of secwall on a remote PostgreSQL server, with an agent (for this reason too we have created the barman-cli package).

I hope that with this message you can clearly see our commitment and our efforts to get to this goal. Of course, having a stakeholder willing to fund the development of such a feature will raise the priority of this feature and allow us to develop that in a shorter timeframe.

kamikaze · 2017-02-13T22:28:05Z

This project is going to die with such speed and priorities, don't waste your time guys. Fork it

secwall · 2017-02-14T00:00:17Z

@kamikaze @soshnikov, could you kindly stop blaming? @gbartolini explained why incremental backups are not merged yet. We want this feature in mainline because we lack resources to support our own fork.

RealLord · 2017-02-14T07:04:21Z

Hm.... It's extremely strange, that the real feature that can provide more backup performance than all other Postgress backup software is still in working progress, not in production.

FractalizeR · 2017-02-17T09:27:41Z

Yandex guys said here, that Barman authors ask for money to merge this feature.

Уже почти год прошел, как мы их просим запилить эту киллер-фичу, а они просят с нас денег, чтобы замержить её.

Nearly a year has passed since we are asking [Barman team] to merge this killer-feature. And they are asking us for money to merge it.

Translation into English is mine.

Can someone elaborate what the problem is? Where did this money question came from? I think this is just misunderstanding, right?

man-brain · 2017-02-19T05:35:53Z

@FractalizeR, yes, this is just misunderstanding. The reason is that article was written (in written form) from my verbal words on the conference and after that edited by copywriter. My point was that [here](#21 (comment)) @gbartolini wrote:

Of course, having a stakeholder willing to fund the development of such a feature will raise the priority of this feature and allow us to develop that in a shorter timeframe.

As you can see, these two sentences are completely different.

FractalizeR · 2017-03-15T15:39:44Z

Yep, sure, I can see that now. Sorry for late reply.

s200999900 · 2017-05-09T09:42:02Z

Hi!

Sorry for my easy english )

It is very helpful function!

I suggest looking at "borg backup" project as a storage backend.
https://github.com/borgbackup/borg
It has many good backup functionality: encryption, compression, deduplication, ssh as transport...

There are no python api for now, but it is possible to run as wrapper script for create, restore, check, list backup operation's.

I can help with testing in that way. But I need some help with right instruction to do that.

AntonBushmelev · 2017-07-10T08:42:28Z

Hello guys, any news about implementing this killer feature ?

man-brain · 2018-09-06T14:55:07Z

I suppose that this issue should be closed since nothing has been done for 2,5 years. We have merged all these features to wal-g upstream and we wouldn't support our barman fork any more.

kamikaze · 2018-09-06T15:06:58Z

I suppose this project should be closed since nothing has been done for 2.5 years

amenonsen · 2021-07-29T05:28:19Z

It's a pity this feature was not merged, especially because the patch (even today) looks really nicely done.

That said, with the benefit of several years of hindsight, scanning page header to detect updates based on LSN is a lot faster than rsync, but still too expensive for very large data directories. We know there are extensions like ptrack which take a more proactive approach to recording changes, and that seems like the right approach going forward.

Meanwhile, this project is now under active maintenance again. But I'll close this issue now because there's no point leaving it open. I do hope to support incremental backups, but we (still!) hope that core Postgres will eventually provide a feature that Barman can use to do so.

mnencia added the enhancement label Nov 26, 2016

amenonsen closed this as completed Jul 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental backups #21

Incremental backups #21

secwall commented Dec 30, 2015

man-brain commented Jan 5, 2016

secwall commented Jan 13, 2016

gbartolini commented Jan 13, 2016

man-brain commented Jan 13, 2016

secwall commented Feb 29, 2016

man-brain commented Mar 11, 2016

secwall commented Mar 30, 2016

gbartolini commented Jun 3, 2016

gbartolini commented Jun 5, 2016

secwall commented Jun 5, 2016 •

edited

Loading

man-brain commented Oct 24, 2016

gbartolini commented Nov 4, 2016

kamikaze commented Feb 13, 2017

secwall commented Feb 14, 2017

RealLord commented Feb 14, 2017

FractalizeR commented Feb 17, 2017 •

edited

Loading

man-brain commented Feb 19, 2017 via email

FractalizeR commented Mar 15, 2017

s200999900 commented May 9, 2017 •

edited

Loading

AntonBushmelev commented Jul 10, 2017

man-brain commented Sep 6, 2018

kamikaze commented Sep 6, 2018

amenonsen commented Jul 29, 2021

Incremental backups #21

Incremental backups #21

Comments

secwall commented Dec 30, 2015

man-brain commented Jan 5, 2016

secwall commented Jan 13, 2016

gbartolini commented Jan 13, 2016

man-brain commented Jan 13, 2016

secwall commented Feb 29, 2016

man-brain commented Mar 11, 2016

secwall commented Mar 30, 2016

gbartolini commented Jun 3, 2016

gbartolini commented Jun 5, 2016

secwall commented Jun 5, 2016 • edited Loading

man-brain commented Oct 24, 2016

gbartolini commented Nov 4, 2016

kamikaze commented Feb 13, 2017

secwall commented Feb 14, 2017

RealLord commented Feb 14, 2017

FractalizeR commented Feb 17, 2017 • edited Loading

man-brain commented Feb 19, 2017 via email

FractalizeR commented Mar 15, 2017

s200999900 commented May 9, 2017 • edited Loading

AntonBushmelev commented Jul 10, 2017

man-brain commented Sep 6, 2018

kamikaze commented Sep 6, 2018

amenonsen commented Jul 29, 2021

secwall commented Jun 5, 2016 •

edited

Loading

FractalizeR commented Feb 17, 2017 •

edited

Loading

s200999900 commented May 9, 2017 •

edited

Loading