Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some imporvements on barman documentation #559

Closed
elhananjair opened this issue Apr 24, 2022 · 27 comments
Closed

Some imporvements on barman documentation #559

elhananjair opened this issue Apr 24, 2022 · 27 comments
Labels
Milestone

Comments

@elhananjair
Copy link

elhananjair commented Apr 24, 2022

Hello there, I am trying to set up barman for the first time and its a little bit confusing, the documentation. I have created a user called barman on hostname called 'pg' as the doc says, and then at this point:

Make sure you test the following command before proceeding:
barman@backup$ psql -c 'SELECT version()' -U barman -h pg postgres

I got lost, barman@backup, I thought I should test that on pg instance not on the backup as the documentation says, in addition to that the user must be postgres to run the above command. Actually, I tested this as postgres@pg and it seems working like that. Here is an output in my case.

PostgreSQL 13.4 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 11.2.1 2021120
3 (Red Hat 11.2.1-7), 64-bit
(1 row)

I just wanna suggest a correction if my correction is right, and a little bit of improvement on the documentation, to grasp the concepts easily.
Thank you.

@mikewallace1979
Copy link
Contributor

The idea behind running psql -c 'SELECT version()' -U barman -h pg postgres is to verify that the barman user on the barman host can successfully connect to the "postgres" database on the PostgreSQL host. It therefore should be run as the barman user on the barman host, not as the postgres user on the PostgreSQL host.

We'll update the documentation to make this clearer.

@elhananjair
Copy link
Author

The idea behind running psql -c 'SELECT version()' -U barman -h pg postgres is to verify that the barman user on the barman host can successfully connect to the "postgres" database on the PostgreSQL host. It therefore should be run as the barman user on the barman host, not as the postgres user on the PostgreSQL host.

We'll update the documentation to make this clearer.

@mikewallace1979 Thank you so much for clarifying things for me, and now I saw the commit and that's perfect. But I am still little bit confused, am sorry to reply to the closed issue.

On this line,

You need to make sure that the backup server can connect to the PostgreSQL server on pg as superuser or, from PostgreSQL 10 or higher, that the correct set of privileges are granted to the user that connects to the database.

You can create a specific superuser in PostgreSQL, named barman, as follows:

postgres@pg$ createuser -s -P barman


Based on the above statement, I am creating a user 'barman' on pg, and am running that checking command as 'barman' on the ' backup' host, that's what am not getting. Do I have to create a user with the same name on both pg and backup hosts? My second question is, by default when I install PostgreSQL it will create a 'Postgres' user to access a database. So does the doc refer to the default Postgres user by PostgreSQL or is it just a Linux user?

Thank you.

@mikewallace1979
Copy link
Contributor

You don't need to create a user with the same name on both hosts - what is happening is the PostgreSQL user named "barman" is being used to connect to PostgreSQL from the backup host. This connection is made using psql which is run as the Linux user "barman" on the backup host.

Stepping through the whole process, we start after PostgreSQL is installed on the pg host and Barman is installed on the backup host - at this point there will be two new Linux user accounts:

  • The postgres user on the pg host.
  • The barman user on the backup host.

When you run createuser -s -P barman on the pg host you create the PostgreSQL superuser named "barman" - this uses the PostgreSQL user named "postgres" to make the DB connection and runs under the Linux user named "postgres" which was configured by the OS package which installed PostgreSQL.

This PostgreSQL user named "barman" is used by Barman to connect to PostgreSQL - because Barman runs under the "barman" Linux user we must therefore make sure the "barman" Linux user on the backup host is able to connect to PostgreSQL using the "barman" PostgreSQL user.

If you follow the docs you'll add the password entered during the createuser command to a .pgpass file in the home directory of the "barman" Linux user on the backup host - alternatively you may configure one of the other authentication methods available.

Whatever authentication method you use, a PostgreSQL client running under the "barman" Linux account should be able to authenticate to the PostgreSQL server on the pg host as the "barman" PostgreSQL user - this is what you are testing when running the psql command as the "barman" Linux user.

Does this help clear things up?

@elhananjair
Copy link
Author

elhananjair commented Apr 25, 2022

You don't need to create a user with the same name on both hosts - what is happening is the PostgreSQL user named "barman" is being used to connect to PostgreSQL from the backup host. This connection is made using psql which is run as the Linux user "barman" on the backup host.

Stepping through the whole process, we start after PostgreSQL is installed on the pg host and Barman is installed on the backup host - at this point there will be two new Linux user accounts:

* The `postgres` user on the `pg` host.

* The `barman` user on the `backup` host.

When you run createuser -s -P barman on the pg host you create the PostgreSQL superuser named "barman" - this uses the PostgreSQL user named "postgres" to make the DB connection and runs under the Linux user named "postgres" which was configured by the OS package which installed PostgreSQL.

This PostgreSQL user named "barman" is used by Barman to connect to PostgreSQL - because Barman runs under the "barman" Linux user we must therefore make sure the "barman" Linux user on the backup host is able to connect to PostgreSQL using the "barman" PostgreSQL user.

If you follow the docs you'll add the password entered during the createuser command to a .pgpass file in the home directory of the "barman" Linux user on the backup host - alternatively you may configure one of the other authentication methods available.

Whatever authentication method you use, a PostgreSQL client running under the "barman" Linux account should be able to authenticate to the PostgreSQL server on the pg host as the "barman" PostgreSQL user - this is what you are testing when running the psql command as the "barman" Linux user.

Does this help clear things up?

Again, thank you so much for taking a time to explain everything. Now things are clearer than before, but can I please still add some questions and suggestions to the doc?

  1. You have mentioned it in your last paragraph, I think it would be best if such thing ( 'installing PostgreSQL Client package is necessary to run psql on Barman/Backup host' ) is noted in the documentation, it might help for someone like me.
  2. is PostgreSQL user named 'barman' on pg host and Linux user named 'barman' intensionally used, or mandatory to be the same? if it's not mandatory, why not use different names in order to clear out the ambiguity to follow the documentation.

@elhananjair
Copy link
Author

Hello @mikewallace1979, I also find some issues with running this command as barman docs states to check if a streaming connection is working, barman@backup$ psql -U streaming_barman -h pg -c "IDENTIFY_SYSTEM" replication=1. This fires some error,

ERROR: syntax error at or near "IDENTIFY_SYSTEM"
LINE 1: IDENTIFY_SYSTEM;
^

and I check out this (Streaming Replication Protocol) link and try to fix it with a command I saw on this documentation, psql -h pg -U streaming_barman "replication=1" -c "IDENTIFY_SYSTEM;" , It shows this output which I think implies the streaming connection is working,
systemid | timeline | xlogpos | dbname ---------------------+----------+------------+-------- 7047857649439319016 | 1 | D/15D762B8 | (1 row)

@elhananjair
Copy link
Author

@mikewallace1979 I am trying my best to follow the doc and finish all barman setup.
I have chosen streaming backup as a backup method and WAL streaming and barman-wal-archive as an additional way of securing WAL backup. Currently am on "WAL archiving via barman-wal-archive" section and I have installed barman-cli on PostgreSQL (pg) server and then edited Postgresql.conf:

archive_mode = on
wal_level = 'replica'
archive_command = 'barman-wal-archive backup pg %p'

now after I restarted PostgreSQL process, I switched to postgres user and typed barman-wal-archive --test backup pg DUMMY
in my case am trying to backup the Nextcloud database, so my Nextcloud hostname is Nextcloud-Server and backup server hostname is Nextcloud-Backup, therefore I tried tried like this, barman-wal-archive --test Nextcloud-Backup Nextcloud-Server DUMMY, and am seeing this error after the execution of that command,
ERROR: Unknown server 'Nextcloud-Server'
Can you please suggest to me something on this? the reason I am asking it here is that I was expecting the documentation to specify the output of that command.
Thank you!

@mikewallace1979
Copy link
Contributor

@elhananjair The PostgreSQL server name in the barman-wal-archive command should be the name of the server as configured in Barman. What is currently happening in your case is that barman is checking its configuration on the Barman server for a server named Nextcloud-Server and failing to find it.

If the test command is successful you should see:

$ barman-wal-archive --test BACKUP_HOST SERVER_NAME DUMMY
Ready to accept WAL files for the server SERVER_NAME

We'll add this to the docs when the rest of the issues documented here are addressed.

@elhananjair
Copy link
Author

elhananjair commented May 3, 2022

@elhananjair The PostgreSQL server name in the barman-wal-archive command should be the name of the server as configured in Barman. What is currently happening in your case is that barman is checking its configuration on the Barman server for a server named Nextcloud-Server and failing to find it.

If the test command is successful you should see:

$ barman-wal-archive --test BACKUP_HOST SERVER_NAME DUMMY
Ready to accept WAL files for the server SERVER_NAME

We'll add this to the docs when the rest of the issues documented here are addressed.

@mikewallace1979 Thank you so so much. I could find the problem thanks to you,
image

I thought [pg] is predefined section, but now it's clear. Thanks again.

@elhananjair
Copy link
Author

elhananjair commented May 6, 2022

@mikewallace1979 am really sorry it's me again 😢, as you mentioned above I could finally see

Ready to accept WAL files for the server nextcloud-server
as I test if PostgreSQL server is configured in Barman to accept incoming WAL files...

I jumped over the topic "WAL archiving via rsync/SSH" since I already chose to use WAL archiving via barman-wal-archive. The next step is Verification of WAL archiving configuration, and I executed this command, barman switch-wal --force --archive nextcloud-server and faced this error,

ERROR: Unable to perform pg_switch_wal for server 'nextcloud-server'.

After that, I tried to run barman check nextcloud-server to check if the WAL archiving has been correctly configured as the documentation states. and here is the output
image

Can I add one generic question too, do I need to set cron job for both stream backup and wal archiving? I saw cron mentioned on wall streaming section but it doesn't say much about it on that particular topic.

@mikewallace1979
Copy link
Contributor

There are a few reasons why you might see ERROR: Unable to perform pg_switch_wal for server ... however the most likely one is that your PostgreSQL connection is not configured correctly. Take a look at the logs on your PostgreSQL server around the time you are running barman switch-wal --force --archive nextcloud-server - if Barman is able to connect you should see an error explaining why the command failed. This could be because the user doesn't exist, or because the user does not have the necessary permissions. If you see no related errors in the PostgreSQL logs then Barman is unable to connect to the host at all.

Alternatively, try running psql -d 'CONNINFO' on the Barman server where CONNINFO is the value of conninfo in the Barman configuration. If this fails then it is almost certainly the same reason why your switch wal command is failing and the output should give you a hint as to what the problem is.

As far as cron goes, barman cron needs to be running in a cron job - this command will automatically start any receive-wal processes which are required and carry out the archiving of incoming WAL files, there's no need to add specific cron jobs for wal streaming or wal archiving.

@elhananjair
Copy link
Author

elhananjair commented May 6, 2022

psql -d 'CONNINFO'

@mikewallace1979 thanks again, I checked PostgreSQL log file and I found this:

**ERROR: Error executing ssh: [Errno 13] Permission denied: 'ssh'
2022-05-06 21:20:14.326 EAT [2810] LOG: archive command failed with exit code 2
2022-05-06 21:20:14.326 EAT [2810] DETAIL: The failed archive command was: barman-wal-archive backup nextcloud-server pg_wal/000000010000000D00000046
2022-05-06 21:20:14.326 EAT [2810] WARNING: archiving write-ahead log file "000000010000000D00000046" failed too many times, will try again later
**

I just followed the documentation all along, I have configured ssh correctly and I have tested it from both machines as the documentation specifies and it was working. In addition to that, as you mentioned I tried to run psql -d 'CONNINFO', and it works fine:

psql -d 'host=nextcloud-server user=barman dbname=postgres'
Password for user barman: 
psql (13.4)
Type "help" for help.

postgres=# 

I am not sure what am missing here.

@mikewallace1979
Copy link
Contributor

Ok - so that eliminates the PostgreSQL connection itself as a possible cause.

It's possible the barman user permissions aren't quite right - could you please run the following in the psql shell resulting from psql -d 'host=nextcloud-server user=barman dbname=postgres':

CHECKPOINT;
select pg_walfile_name(pg_current_wal_insert_lsn());
select pg_walfile_name(pg_switch_wal());

The errors in the PostgreSQL log tell us that archive_command failing - this will also be a problem since the --archive option for barman switch-wal causes it to fail if a WAL is not archived successfully within 30 seconds (though you would see an error like ERROR: The WAL file 000000010000000000000073 has not been received in 30 seconds if barman switch-wal was making it that far). What happens if you manually run barman-wal-archive backup nextcloud-server pg_wal/000000010000000D00000046 as the postgres UNIX user?

@elhananjair
Copy link
Author

elhananjair commented May 9, 2022

Ok - so that eliminates the PostgreSQL connection itself as a possible cause.

It's possible the barman user permissions aren't quite right - could you please run the following in the psql shell resulting from psql -d 'host=nextcloud-server user=barman dbname=postgres':

CHECKPOINT;
select pg_walfile_name(pg_current_wal_insert_lsn());
select pg_walfile_name(pg_switch_wal());

The errors in the PostgreSQL log tell us that archive_command failing - this will also be a problem since the --archive option for barman switch-wal causes it to fail if a WAL is not archived successfully within 30 seconds (though you would see an error like ERROR: The WAL file 000000010000000000000073 has not been received in 30 seconds if barman switch-wal was making it that far). What happens if you manually run barman-wal-archive backup nextcloud-server pg_wal/000000010000000D00000046 as the postgres UNIX user?

@mikewallace1979 Thank you.

It's possible the barman user permissions aren't quite right...

but I have followed the documentation on creating a barman user... this is why somehow the documentation is confusing @mikewallace1979.
here is an output to this:

It's possible the barman user permissions aren't quite right - could you please run the following in the psql shell resulting from psql -d 'host=nextcloud-server user=barman dbname=postgres':

CHECKPOINT;
select pg_walfile_name(pg_current_wal_insert_lsn());
select pg_walfile_name(pg_switch_wal());

image

and

What happens if you manually run barman-wal-archive backup nextcloud-server pg_wal/000000010000000D00000046 as the postgres UNIX user?

does '000000010000000D00000046' implies some unique thing or you used it as an example? In my case I just used an output from select pg_walfile_name(pg_switch_wal()); and here is an output from barman-wal-archive nextcloud-backup nextcloud-server pg_wal/000000010000000000000001
image


and from the screenshot, I have uploaded here it shows that directories: FAIED (/var/lib/barman: permission denied), is this where WAL will be archived? and there is none in barman doc that mentions creating barman directory inside /var/lib/barman, I have created it myself and set ownership to a barman user, correct me if am doing it wrong.

@mikewallace1979
Copy link
Contributor

Could you also run CHECKPOINT; in the psql shell? That's the command which would confirm barman has the necessary permissions for the --force option on the barman switch-wal command.

000000010000000D00000046 came from your log messages - it was a specific failed archive_command logged by PostgreSQL so I thought the WAL would still be around. Unfortunately I gave you the wrong path in the test command because the WAL path in the PostgreSQL logs needs to be prefixed with the path to the PGDATA directory - could you try your command again but with the full path to the WAL file, e.g. if PGDATA was /opt/postgres/data:

barman-wal-archive nextcloud-backup nextcloud-server /opt/postgres/data/pg_wal/000000010000000000000001

I didn't notice the permission denied error on /var/lib/barman - that's definitely going to be a problem. This is usually created by the OS package which installs Barman - if you installed Barman some other way, either from source or via pip, then you would indeed need to create it and set ownership to the UNIX barman user.

@elhananjair
Copy link
Author

elhananjair commented May 10, 2022

Thanks again for taking a time and helping me @mikewallace1979

Could you also run CHECKPOINT; in the psql shell?

image

could you try your command again but with the full path to the WAL file...

here is the output from the above command, and I think it worked now, I checked inside, /var/lib/barman/nextcloud-server/incoming/ and found the WAL log file that I tried from nextcloud-server using barman-wal-archive
image

I didn't notice the permission denied error on /var/lib/barman - that's definitely going to be a problem. This is usually created by the OS package which installs Barman - if you installed Barman some other way, either from source or via pip, then you would indeed need to create it and set ownership to the UNIX barman user.

I did create /var/lib/barman and the error with FAILED (/var/lib/barman: permission denied) has gone now and the barman directory is now with some files...
image


and inside the pg_wal folder am seeing a bunch of WAL files, since it's changing rapidly I tested the above command with the resent wall log file.

image

@mikewallace1979
Copy link
Contributor

@elhananjair Great - so WAL archiving looks like it's working (barman check should hopefully report that it is ok now) but barman switch-wal --force --archive nextcloud-server is still failing.

We've established that the connection string in conninfo works from the Barman server and we've established that the barman PostgreSQL user has the necessary permissions to run all operations required to execute the WAL switch. I don't have any good ideas of what else could be going wrong - can you set the log_level in the barman section of the barman.conf file to DEBUG if you haven't already done so and check the barman log file while the switch-wal command is running?

@elhananjair
Copy link
Author

@elhananjair Great - so WAL archiving looks like it's working (barman check should hopefully report that it is ok now)

Unfortunately, it is still being shown as FAILED
image

can you set the log_level in the barman section of the barman.conf file to DEBUG if you haven't already done so and check the barman log file while the switch-wal command is running?

I set log level to debug, and here is the log.

2022-05-10 15:27:43,817 [77293] barman.cli DEBUG: Initialised Barman version 2.17 (config: /etc/barman/barman.conf, args: {'color': 'auto', 'quiet': False, 'debug': F>
2022-05-10 15:27:43,829 [77293] barman.backup_executor DEBUG: The default backup strategy for postgres backup_method is: concurrent_backup
2022-05-10 15:27:43,829 [77293] barman.server INFO: Force a CHECKPOINT before pg_switch_wal()
2022-05-10 15:27:43,836 [77293] barman.postgres DEBUG: Error issuing CHECKPOINT: fe_sendauth: no password supplied
2022-05-10 15:27:43,849 [77293] barman.postgres DEBUG: Impossible to detect the PostgreSQL version, name_map will return names from latest version
2022-05-10 15:27:43,849 [77293] barman.postgres DEBUG: Error issuing pg_switch_wal() command: fe_sendauth: no password supplied
2022-05-10 15:27:43,850 [77293] barman.server ERROR: Unable to perform pg_switch_wal for server 'nextcloud-server'.


I just tried barman cron to execute start WAL archiving operations as the doc states: the output from CLI is this

Starting WAL archiving for server nextcloud-server
Starting streaming archiver for server nextcloud-server
Starting WAL archiving for server streaming
Starting streaming archiver for server streaming

but I have checked out barman log and every minute there is an error showing up, and btw I tried to stop this process using:
barman receive-wal --stop nextcloud-server and it is showing me an error:
ERROR: Termination of receive-wal failed: no such process for server nextcloud-server

I am still having a hard time fully understanding barman from the documentation 😢

@mikewallace1979
Copy link
Contributor

It looks like barman is unable to find the password it needs to make the connection successfully - you need to create a .pgpass file in the barman home directory. The format and access requirements are described in the PostgreSQL docs but something like the following should work as the contents:

*:*:*:barman:BARMAN_PG_PASSWORD

where BARMAN_PG_PASSWORD is the password for the Barman PostgreSQL user. The file must only be readable by the barman UNIX user.

@elhananjair
Copy link
Author

I have already done that following PostgreSQL connection section from the documentation,
image

and I still don't know what I miss, I checked barman check nextcloud-server, am still having couples of FAILED results:

barman@nextcloud-backup ~]$ barman check nextcloud-server 
Server nextcloud-server:
        WAL archive: FAILED (please make sure WAL shipping is setup)
        empty incoming directory: FAILED ('/var/lib/barman/nextcloud-server/incoming' must be empty when archiver=off)
        PostgreSQL: OK
        superuser or standard user with backup privileges: OK
        PostgreSQL streaming: FAILED (fe_sendauth: no password supplied)
        wal_level: OK
        replication slot: FAILED (replication slot 'barman' doesn't exist. Please execute 'barman receive-wal --create-slot nextcloud-server')
        directories: OK
        retention policy settings: OK
        backup maximum age: OK (no last_backup_maximum_age provided)
        backup minimum size: OK (0 B)
        wal maximum age: OK (no last_wal_maximum_age provided)
        wal size: OK (0 B)
        compression settings: OK
        failed backups: OK (there are 0 failed backups)
        minimum redundancy requirements: OK (have 0 backups, expected at least 0)
        pg_basebackup: OK
        pg_basebackup compatible: FAILED (PostgreSQL version: None, pg_basebackup version: 13.4)
        pg_basebackup supports tablespaces mapping: OK
        systemid coherence: OK (no system Id stored on disk)
        pg_receivexlog: OK
        pg_receivexlog compatible: FAILED (PostgreSQL version: None, pg_receivexlog version: 13.4)
        receive-wal running: FAILED (See the Barman log file for more details)
        archiver errors: OK
  1. PostgreSQL streaming: FAILED (fe_sendauth: no password supplied)
  2. WAL archive: FAILED (please make sure WAL shipping is setup)
  3. replication slot: FAILED (replication slot 'barman' doesn't exist. Please execute 'barman receive-wal --create-slot nextcloud-server')
    • I tried to run barman receive-wal --create-slot nextcloud-server as the error implies and I am seeing an error... I read this from the documentation that's why I didn't try to create the slot manually.

    Starting with Barman 2.10, you can configure Barman to automatically create the replication slot by setting:
    create_slot = auto

@elhananjair
Copy link
Author

elhananjair commented May 10, 2022

and another unusual thing I saw:
[postgres@Nextcloud-Server ~]$ barman-wal-archive nextcloud-backup nextcloud-server /var/lib/pgsql/data/pg_wal/000000010000000D000000DA
this works fine and I could find this log inside /var/lib/barman/nextcloud-server/incoming/
and this one in nextcloud-backup machine:

[barman@nextcloud-backup ~]$ barman switch-wal --force --archive nextcloud-server 
The WAL file 000000010000000D000000DA has been closed on server 'nextcloud-server'
Waiting for the WAL file 000000010000000D000000DA from server 'nextcloud-server' (max: 30 seconds)
ERROR: The WAL file 000000010000000D000000DA has not been received in 30 seconds

I think I am commenting too much on this issue, so I should write more comments I guess @mikewallace1979, can you please just suggest to me in what way I should go forward having this issue... and thank you so much for helping me out.
This is where I am now following the documentation...
image

@mikewallace1979
Copy link
Contributor

Firstly, can you check whether the crond is running on your system? On the latest Fedora (35) cron is not installed by default and although it is installed as a barman dependency it is not automatically started, so try:

systemctl status crond

and if you see output like this:

systemctl status crond
○ crond.service - Command Scheduler
     Loaded: loaded (/usr/lib/systemd/system/crond.service; enabled; vendor preset: enabled)
     Active: inactive (dead)

then enable and start crond:

systemctl enable crond
systemcrl start crond

Secondly, it looks like the streaming_barman connection is broken - do you have an entry in your .pgpass file for the streaming_barman PostgreSQL user? This would be in addition to the barman PostgreSQL user which you've confirmed does already exist.

Thirdly, the message empty incoming directory: FAILED ('/var/lib/barman/nextcloud-server/incoming' must be empty when archiver=off) is because PostgreSQL is configured for WAL archiving and Barman is configured for WAL streaming (via streaming_archiver = on). You can either enable WAL archiving in Barman by explicitly setting archiver = on or disable WAL archiving in PostgreSQL by setting archive_mode = off so that it stops rsyncing WALs into /var/lib/barman/nextcloud-server/incoming. If you decide to disable WAL archiving (it is not required if you are using WAL streaming with any PostgreSQL version > 9.4) then you will need to remove any files currently in /var/lib/barman/nextcloud-server/incoming.

Hopefully these three things should fix WAL streaming and you should see files being written to /var/lib/barman/nextcloud-server/streaming. When barman cron is run (every 60s by the crond service) it should now start the receive-wal process (creating the slot if necessary) and copy any complete WALs (those without a .partial suffix) into /var/lib/barman/nextcloud-server/wals and barman check should be happy because there will be WALs in the archive.

One thing I do not understand is that barman check seems to be unable to determine the PostgreSQL version - given the standard barman connection appears to work I do not have any good ideas why this would be the case. If you post your full barman config file (removing any sensitive info) and the output of barman diagnose (again removing any sensitive info) then I can take a look and see if anything jumps out.

@elhananjair
Copy link
Author

elhananjair commented May 11, 2022

I think I am commenting too much on this issue, so I should write more comments I guess @mikewallace1979 ...
sorry @mikewallace1979 😨😨😨, I didn't mean to say that, I wanted to say that I am commenting too much on this issue and I don't think that's the right thing to do since it will be difficult to follow the point of this issue. Thank you

@elhananjair
Copy link
Author

elhananjair commented May 11, 2022

[EDITED]

Firstly, can you check whether the crond is running on your system? On the latest Fedora (35) cron is not installed by default and although it is installed as a barman dependency it is not automatically started, so try:

systemctl status crond

yes, I have checked that and it's active.
image

Secondly, it looks like the streaming_barman connection is broken - do you have an entry in your .pgpass file for the streaming_barman PostgreSQL user? This would be in addition to the barman PostgreSQL user which you've confirmed does already exist.

I haven't added streaming_barman, I have added that now for barman user in nextcloud-back server and I restarted PostgreSQL, and now I checked barman check nextcloud-server that error has been gone, am happy.
image

Thirdly, the message empty incoming directory: FAILED ('/var/lib/barman/nextcloud-server/incoming' must be empty when archiver=off)...

I was trying to implement this scenario:
image

But now I set archive_mode to off and delete all files inside ...incoming folder, and the error empty incoming directory: FAILED has gone thanks to you, wal archive Failed error is still there.
image

Hopefully these three things should fix WAL streaming and you should see files being written to /var/lib/barman/nextcloud-server/streaming. When barman cron is run (every 60s by the crond service) it should now start the receive-wal process (creating the slot if necessary)...

I tried barman cron and noted that some .partial suffixed file is inserted, I am so happy. Only one error left to fix, WAL archive: FAILED (please make sure WAL shipping is setup)

And here is the output of barman diagnose:

[barman@nextcloud-backup ~]$ barman diagnose 
{
    "global": {
        "config": {
            "barman_home": "/var/lib/barman",
            "barman_user": "barman",
            "compression": "gzip",
            "configuration_files_directory": "/etc/barman/conf.d",
            "errors_list": [],
            "log_file": "/var/log/barman/barman.log",
            "log_level": "DEBUG"
        },
        "system_info": {
            "barman_ver": "2.17",
            "kernel_ver": "Linux nextcloud-backup 5.17.6-200.fc35.x86_64 #1 SMP PREEMPT Mon May 9 14:22:05 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux",
            "python_ver": "Python 3.10.4",
            "release": "RedHat Linux Fedora release 35 (Thirty Five)",
            "rsync_ver": "rsync  version 3.2.3  protocol version 31",
            "ssh_ver": "",
            "timestamp": "Thu May 12 16:28:33 2022"
        }
    },
    "servers": {
        "nextcloud-server": {
            "backups": {},
            "config": {
                "active": true,
                "archiver": false,
                "archiver_batch_size": 0,
                "backup_directory": "/var/lib/barman/nextcloud-server",
                "backup_method": "postgres",
                "backup_options": "concurrent_backup",
                "bandwidth_limit": null,
                "barman_home": "/var/lib/barman",
                "barman_lock_directory": "/var/lib/barman",
                "basebackup_retry_sleep": 30,
                "basebackup_retry_times": 0,
                "basebackups_directory": "/var/lib/barman/nextcloud-server/base",
                "check_timeout": 30,
                "compression": "gzip",
                "conninfo": "host=nextcloud-server user=barman dbname=postgres",
                "create_slot": "auto",
                "custom_compression_filter": null,
                "custom_compression_magic": null,
                "custom_decompression_filter": null,
                "description": "Our main PostgreSQL server",
                "disabled": false,
                "errors_directory": "/var/lib/barman/nextcloud-server/errors",
                "forward_config_path": false,
                "immediate_checkpoint": false,
                "incoming_wals_directory": "/var/lib/barman/nextcloud-server/incoming",
                "last_backup_maximum_age": null,
                "last_backup_minimum_size": null,
                "last_wal_maximum_age": null,
                "max_incoming_wals_queue": null,
                "minimum_redundancy": 0,
                "msg_list": [],
                "name": "nextcloud-server",
                "network_compression": false,
                "parallel_jobs": 1,
                "path_prefix": null,
                "post_archive_retry_script": null,
                "post_archive_script": null,
                "post_backup_retry_script": null,
                "post_backup_script": null,
                "post_delete_retry_script": null,
                "post_delete_script": null,
                "post_recovery_retry_script": null,
                "post_recovery_script": null,
                "post_wal_delete_retry_script": null,
                "post_wal_delete_script": null,
                "pre_archive_retry_script": null,
                "pre_archive_script": null,
                "pre_backup_retry_script": null,
                "pre_backup_script": null,
                "pre_delete_retry_script": null,
                "pre_delete_script": null,
                "pre_recovery_retry_script": null,
                "pre_recovery_script": null,
                "pre_wal_delete_retry_script": null,
                "pre_wal_delete_script": null,
                "primary_ssh_command": null,
                "recovery_options": "",
                "retention_policy": null,
                "retention_policy_mode": "auto",
                "reuse_backup": null,
                "slot_name": "barman",
                "ssh_command": null,
                "streaming_archiver": true,
                "streaming_archiver_batch_size": 0,
                "streaming_archiver_name": "barman_receive_wal",
                "streaming_backup_name": "barman_streaming_backup",
                "streaming_conninfo": "host=nextcloud-server user=streaming_barman dbname=postgres",
                "streaming_wals_directory": "/var/lib/barman/nextcloud-server/streaming",
                "tablespace_bandwidth_limit": null,
                "wal_retention_policy": "main",
                "wals_directory": "/var/lib/barman/nextcloud-server/wals"
            },
            "status": {
                "archive_timeout": 0,
                "checkpoint_timeout": 300,
                "config_file": "/var/lib/pgsql/data/postgresql.conf",
                "connection_error": null,
                "current_lsn": "E/5C60340",
                "current_size": 378086610.0,
                "current_xlog": "000000010000000E00000005",
                "data_checksums": "off",
                "data_directory": "/var/lib/pgsql/data",
                "has_backup_privileges": true,
                "hba_file": "/var/lib/pgsql/data/pg_hba.conf",
                "hot_standby": "on",
                "ident_file": "/var/lib/pgsql/data/pg_ident.conf",
                "included_files": [
                    "/var/lib/pgsql/data/postgresql.auto.conf"
                ],
                "is_in_recovery": false,
                "is_superuser": true,
                "max_replication_slots": "10",
                "max_wal_senders": "5",
                "pg_basebackup_bwlimit": true,
                "pg_basebackup_compatible": true,
                "pg_basebackup_installed": true,
                "pg_basebackup_path": "/usr/bin/pg_basebackup",
                "pg_basebackup_tbls_mapping": true,
                "pg_basebackup_version": "13.4",
                "pg_receivexlog_compatible": true,
                "pg_receivexlog_installed": true,
                "pg_receivexlog_path": "/usr/bin/pg_receivewal",
                "pg_receivexlog_supports_slots": true,
                "pg_receivexlog_synchronous": false,
                "pg_receivexlog_version": "13.4",
                "pgespresso_installed": false,
                "postgres_systemid": "7047857649439319016",
                "replication_slot": [
                    "barman",
                    true,
                    "E/5000000"
                ],
                "replication_slot_support": true,
                "server_txt_version": "13.4",
                "streaming": true,
                "streaming_supported": true,
                "streaming_systemid": "7047857649439319016",
                "synchronous_standby_names": [
                    ""
                ],
                "timeline": 1,
                "wal_compression": "off",
                "wal_keep_size": "0",
                "wal_level": "replica",
                "xlog_segment_size": 16777216,
                "xlogpos": "E/5C60340"
            },
            "wals": {
                "last_archived_wal_per_timeline": {}
            }
        },
        "streaming": {
            "backups": {},
            "config": {
                "active": true,
                "archiver": false,
                "archiver_batch_size": 0,
                "backup_directory": "/var/lib/barman/streaming",
                "backup_method": "postgres",
                "backup_options": "concurrent_backup",
                "bandwidth_limit": null,
                "barman_home": "/var/lib/barman",
                "barman_lock_directory": "/var/lib/barman",
                "basebackup_retry_sleep": 30,
                "basebackup_retry_times": 0,
                "basebackups_directory": "/var/lib/barman/streaming/base",
                "check_timeout": 30,
                "compression": "gzip",
                "conninfo": "host=nextcloud-server user=barman dbname=postgres",
                "create_slot": "manual",
                "custom_compression_filter": null,
                "custom_compression_magic": null,
                "custom_decompression_filter": null,
                "description": "WAL Streaming method for Nextcloud PostgreSQL Database",
                "disabled": false,
                "errors_directory": "/var/lib/barman/streaming/errors",
                "forward_config_path": false,
                "immediate_checkpoint": false,
                "incoming_wals_directory": "/var/lib/barman/streaming/incoming",
                "last_backup_maximum_age": null,
                "last_backup_minimum_size": null,
                "last_wal_maximum_age": null,
                "max_incoming_wals_queue": null,
                "minimum_redundancy": 0,
                "msg_list": [],
                "name": "streaming",
                "network_compression": false,
                "parallel_jobs": 1,
                "path_prefix": null,
                "post_archive_retry_script": null,
                "post_archive_script": null,
                "post_backup_retry_script": null,
                "post_backup_script": null,
                "post_delete_retry_script": null,
                "post_delete_script": null,
                "post_recovery_retry_script": null,
                "post_recovery_script": null,
                "post_wal_delete_retry_script": null,
                "post_wal_delete_script": null,
                "pre_archive_retry_script": null,
                "pre_archive_script": null,
                "pre_backup_retry_script": null,
                "pre_backup_script": null,
                "pre_delete_retry_script": null,
                "pre_delete_script": null,
                "pre_recovery_retry_script": null,
                "pre_recovery_script": null,
                "pre_wal_delete_retry_script": null,
                "pre_wal_delete_script": null,
                "primary_ssh_command": null,
                "recovery_options": "",
                "retention_policy": null,
                "retention_policy_mode": "auto",
                "reuse_backup": null,
                "slot_name": "barman",
                "ssh_command": null,
                "streaming_archiver": true,
                "streaming_archiver_batch_size": 0,
                "streaming_archiver_name": "barman_receive_wal",
                "streaming_backup_name": "barman_streaming_backup",
                "streaming_conninfo": "host=nextcloud-server user=streaming_barman",
                "streaming_wals_directory": "/var/lib/barman/streaming/streaming",
                "tablespace_bandwidth_limit": null,
                "wal_retention_policy": "main",
                "wals_directory": "/var/lib/barman/streaming/wals"
            },
            "status": {
                "archive_timeout": 0,
                "checkpoint_timeout": 300,
                "config_file": "/var/lib/pgsql/data/postgresql.conf",
                "connection_error": null,
                "current_lsn": "E/5C60340",
                "current_size": 378086610.0,
                "current_xlog": "000000010000000E00000005",
                "data_checksums": "off",
                "data_directory": "/var/lib/pgsql/data",
                "has_backup_privileges": true,
                "hba_file": "/var/lib/pgsql/data/pg_hba.conf",
                "hot_standby": "on",
                "ident_file": "/var/lib/pgsql/data/pg_ident.conf",
                "included_files": [
                    "/var/lib/pgsql/data/postgresql.auto.conf"
                ],
                "is_in_recovery": false,
                "is_superuser": true,
                "max_replication_slots": "10",
                "max_wal_senders": "5",
                "pg_basebackup_bwlimit": true,
                "pg_basebackup_compatible": true,
                "pg_basebackup_installed": true,
                "pg_basebackup_path": "/usr/bin/pg_basebackup",
                "pg_basebackup_tbls_mapping": true,
                "pg_basebackup_version": "13.4",
                "pg_receivexlog_compatible": true,
                "pg_receivexlog_installed": true,
                "pg_receivexlog_path": "/usr/bin/pg_receivewal",
                "pg_receivexlog_supports_slots": true,
                "pg_receivexlog_synchronous": false,
                "pg_receivexlog_version": "13.4",
                "pgespresso_installed": false,
                "postgres_systemid": "7047857649439319016",
                "replication_slot": [
                    "barman",
                    true,
                    "E/5000000"
                ],
                "replication_slot_support": true,
                "server_txt_version": "13.4",
                "streaming": true,
                "streaming_supported": true,
                "streaming_systemid": "7047857649439319016",
                "synchronous_standby_names": [
                    ""
                ],
                "timeline": 1,
                "wal_compression": "off",
                "wal_keep_size": "0",
                "wal_level": "replica",
                "xlog_segment_size": 16777216,
                "xlogpos": "E/5C60340"
            },
            "wals": {
                "last_archived_wal_per_timeline": {}
            }
        }
    }
}


@elhananjair
Copy link
Author

elhananjair commented May 12, 2022

-hello @mikewallace1979, I was updating the last comment and it didn't get updated. Here is new update thanks to you, I added the streaming_barman user to .pgpass and that error has gone now.

  • crond is active as I checked with systemctl status crond
  • As you said, I set archive mode to off archive_mode = off. I used that because of this sentence in barman doc: However, as mentioned before, you can configure standard archiving as well and implement a more robust architecture... But as you said I will only use only wal streaming.
    • Now empty incoming directory: FAILED ('/var/lib/barman/nextcloud-server/incoming' must be empty when archiver=off) error is gone, am happy.

Now everything is perfect and no error is showing up.
image

both /var/lib/barman/nextcloud-server/streaming and /var/lib/barman/nextcloud-server/wals are receiving .partial incomplete WAL segments and completed wal files.
image

Finally, do I have to do something about these errors from barman.log or leave it like that...

2022-05-12 21:35:02,378 [14639] barman.wal_archiver DEBUG: Look for 'barman_receive_wal' in 'synchronous_standby_names': ['']
2022-05-12 21:35:02,378 [14639] barman.wal_archiver DEBUG: Synchronous WAL streaming for barman_receive_wal: False
2022-05-12 21:35:02,381 [14639] barman.server ERROR: ArchiverFailure:replication slot 'barman' is already in use

I cant thank you enough @mikewallace1979 you helped me a lot. Thank you so much!

@mikewallace1979
Copy link
Contributor

@elhananjair You're welcome - glad you got it working in the end :)

I think those barman.server ERROR: ArchiverFailure:replication slot 'barman' is already in use errors are because your barman config has a second server named "streaming" configured which is trying to use the same replication slot as "nextcloud-server" - in the diagnose output the slot_name fields both have a value of barman. So when barman cron runs, it tries to start WAL streaming for this second server and it fails because a replication slot with the name barman is already being used by nextcloud-server.

You can probably just remove the [streaming] section from the barman config or, if you do need the server named "streaming", then change its slot_value to something else.

@elhananjair
Copy link
Author

I had configured inside two different files, 1. the one I created as nextcloud-server.conf and I also used template config file streaming-server.conf-template to do the same thing, now I removed streaming-server.conf and the error has gone. Thanks. Final question does barman cron base backup or do I need to set up an automation script to run barman backup <server_name>?

Thanks @mikewallace1979 now this issue can be closed I guess, this whole started as an improvement to the documentation, I hope some updates will be done very soon to the doc.

@mikewallace1979
Copy link
Contributor

barman cron won't take backups - you'll need to schedule a specific barman backup job via your preferred automation.

I'll keep the issue open until we've updated the relevant points in the docs :)

@mikewallace1979 mikewallace1979 modified the milestones: 3.0.0, 3.1.0 Jun 21, 2022
mikewallace1979 added a commit that referenced this issue Sep 5, 2022
Adds a note about the psql binary in the `PostgreSQL client/server
binaries` section of the manual.

Relates to #559.
mikewallace1979 added a commit that referenced this issue Sep 5, 2022
Adds an example response for the psql command used to verify that the
that the streaming connection is working.

Relates to #559.
mikewallace1979 added a commit that referenced this issue Sep 5, 2022
Adds example responses for the `barman-wal-archive --test` and
`barman-wal-restore --test` commands.

Relates to #559.
mikewallace1979 added a commit that referenced this issue Sep 5, 2022
Adds a note which clarifies that the `barman cron` cron entry handles
background maintenance tasks and does not perform regularly scheduled
backups.

Relates to #559.
toniczh pushed a commit to toniczh/barman that referenced this issue Oct 11, 2022
Adds a note about the psql binary in the `PostgreSQL client/server
binaries` section of the manual.

Relates to EnterpriseDB#559.
toniczh pushed a commit to toniczh/barman that referenced this issue Oct 11, 2022
Adds an example response for the psql command used to verify that the
that the streaming connection is working.

Relates to EnterpriseDB#559.
toniczh pushed a commit to toniczh/barman that referenced this issue Oct 11, 2022
Adds example responses for the `barman-wal-archive --test` and
`barman-wal-restore --test` commands.

Relates to EnterpriseDB#559.
toniczh pushed a commit to toniczh/barman that referenced this issue Oct 11, 2022
Adds a note which clarifies that the `barman cron` cron entry handles
background maintenance tasks and does not perform regularly scheduled
backups.

Relates to EnterpriseDB#559.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants