New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failure to send RestoreObject for Jobs #119

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@gnif
Contributor

gnif commented Sep 22, 2018

When using plugins such as Percona Xtrabackup restoration fails due to failure to send the RestoreObjects for the related jobs. This patch corrects this.

Fix failure to send RestoreObject for Jobs
When using plugins such as Percona Xtrabackup restoration fails due to failure to send the RestoreObjects for the related jobs. This patch corrects this.
@aussendorf

This comment has been minimized.

Member

aussendorf commented Sep 25, 2018

Hi,
I do not really understand what this patch aims to fix. The restore with plugins like Percona xtrabackup works in our environment and customers' environments. The restore object is transmitted for each single job that is processed during restore and it contains basically the from and to LSN of the related job.
Can you explain what problem under which circumstances it fixes? Job/Debug logs of a failed restore without the patch and logs of successful restore job with patched director would help us to understand.
Thanks - Maik

@gnif

This comment has been minimized.

Contributor

gnif commented Sep 25, 2018

When using the WebUI to restore individual files, the director uses the method InsertFileIntoFindexList via InsertOneFileOrDir to resolve the file path to the JobID that contains the files to restore, it however never appends the JobID that has been resovled onto the rx.JobIds list. Because of this the restore objects are not sent to the client. See:

/*
* Transfer jobids to jcr to for picking up restore objects
*/
jcr->JobIds = rx.JobIds;
rx.JobIds = NULL;

This can be verified by turning on debugging on the director and checking for SQL queries against the RestoreObject table, which never occur without this patch.

Here is an unsuccessful restore before the patch:

2018-09-19 17:55:06 | bareos-dir JobId 1896: Error: Bareos bareos-dir 18.2.3 (01Jun18):
Build OS: Linux-4.9.0-7-amd64 debian Debian GNU/Linux 9.5 (stretch)
JobId: 1896
Job: RestoreFiles.2018-09-19_17.55.02_04
Restore Client: redacted-fd
Start time: 19-Sep-2018 17:55:04
End time: 19-Sep-2018 17:55:06
Elapsed time: 2 secs
Files Expected: 3
Files Restored: 1
Bytes Restored: 14,656,956
Rate: 7328.5 KB/s
FD Errors: 2
FD termination status: Fatal Error
SD termination status: Fatal Error
Termination: *** Restore Error ***
-- | --
2018-09-19 17:55:05 | bareos-sd JobId 1896: Ready to read from volume "Differential-0020" on device "FileStorage" (/var/lib/bareos/storage/hosting).
2018-09-19 17:55:05 | bareos-sd JobId 1896: Error: lib/bsock_tcp.cc:458 Socket has errors=1 on call to File Daemon:redacted:9102
2018-09-19 17:55:05 | bareos-sd JobId 1896: Fatal error: stored/read.cc:161 Error sending to File daemon. ERR=Connection reset by peer
2018-09-19 17:55:05 | bareos-sd JobId 1896: Error: lib/bsock_tcp.cc:418 Wrote 19793 bytes to File Daemon:redacted:9102, but only 0 accepted.
2018-09-19 17:55:05 | bareos-fd JobId 1896: Fatal error: python-fd: Restore with xbstream needs empty directoy: /tmp/bareos-restores//_percona/1896/
2018-09-19 17:55:05 | bareos-fd JobId 1896: Error: python-fd: No lsn information found in restore object for file /tmp/bareos-restores//_percona/xbstream.0000000004 from job 4
2018-09-19 17:55:05 | bareos-sd JobId 1896: End of Volume at file 0 on device "FileStorage" (/var/lib/bareos/storage/hosting), Volume "Full-0002"
2018-09-19 17:55:05 | bareos-sd JobId 1896: Forward spacing Volume "Differential-0020" to file:block 0:3148806328.
2018-09-19 17:55:05 | bareos-fd JobId 1896: Error: python-fd: No lsn information found in restore object for file /tmp/bareos-restores//_percona/xbstream.0000000355 from job 355
2018-09-19 17:55:04 | bareos-dir JobId 1896: Secure connection to Storage daemon at bareos.vpn:9103 with cipher ECDHE-PSK-CHACHA20-POLY1305 established
2018-09-19 17:55:04 | bareos-sd JobId 1896: Forward spacing Volume "Full-0002" to file:block 0:826560851.
2018-09-19 17:55:04 | bareos-sd JobId 1896: Ready to read from volume "Full-0002" on device "FileStorage" (/var/lib/bareos/storage/hosting).
2018-09-19 17:55:04 | bareos-sd JobId 1896: Secure connection to File Daemon at redacted:9102 with cipher ECDHE-RSA-AES256-GCM-SHA384 established
2018-09-19 17:55:04 | bareos-dir JobId 1896: Using Device "FileStorage" to read.
2018-09-19 17:55:04 | bareos-dir JobId 1896: Secure connection to Client: redacted-fd at redacted:9102 with cipher ECDHE-RSA-AES256-GCM-SHA384 established
2018-09-19 17:55:04 | bareos-dir JobId 1896: Start Restore Job RestoreFiles.2018-09-19_17.55.02_04

And here is the result after applying this patch:

2018-09-19 17:57:37 | bareos-dir JobId 1897: Bareos bareos-dir 18.2.3 (01Jun18):
Build OS: Linux-4.9.0-7-amd64 debian Debian GNU/Linux 9.5 (stretch)
JobId: 1897
Job: RestoreFiles.2018-09-19_17.57.33_04
Restore Client: redacted-fd
Start time: 19-Sep-2018 17:57:35
End time: 19-Sep-2018 17:57:37
Elapsed time: 2 secs
Files Expected: 3
Files Restored: 3
Bytes Restored: 18,347,542
Rate: 9173.8 KB/s
FD Errors: 0
FD termination status: OK
SD termination status: OK
Termination: Restore OK
-- | --
2018-09-19 17:57:37 | bareos-sd JobId 1897: End of Volume at file 0 on device "FileStorage" (/var/lib/bareos/storage/hosting), Volume "Full-0002"
2018-09-19 17:57:37 | bareos-sd JobId 1897: Ready to read from volume "Differential-0020" on device "FileStorage" (/var/lib/bareos/storage/hosting).
2018-09-19 17:57:37 | bareos-sd JobId 1897: Forward spacing Volume "Differential-0020" to file:block 0:3148806328.
2018-09-19 17:57:37 | bareos-sd JobId 1897: End of Volume at file 0 on device "FileStorage" (/var/lib/bareos/storage/hosting), Volume "Differential-0020"
2018-09-19 17:57:37 | bareos-sd JobId 1897: Ready to read from volume "Incremental-0045" on device "FileStorage" (/var/lib/bareos/storage/hosting).
2018-09-19 17:57:37 | bareos-sd JobId 1897: Forward spacing Volume "Incremental-0045" to file:block 0:955071687.
2018-09-19 17:57:36 | bareos-sd JobId 1897: Ready to read from volume "Full-0002" on device "FileStorage" (/var/lib/bareos/storage/hosting).
2018-09-19 17:57:36 | bareos-dir JobId 1897: Secure connection to Client: redacted-fd at redacted:9102 with cipher ECDHE-RSA-AES256-GCM-SHA384 established
2018-09-19 17:57:36 | bareos-sd JobId 1897: Forward spacing Volume "Full-0002" to file:block 0:826560851.
2018-09-19 17:57:36 | bareos-dir JobId 1897: Using Device "FileStorage" to read.
2018-09-19 17:57:36 | bareos-fd JobId 1897: python-fd: Got to_lsn 1616879 from restore object of job 4
2018-09-19 17:57:36 | bareos-fd JobId 1897: python-fd: Got to_lsn 1616879 from restore object of job 355
2018-09-19 17:57:36 | bareos-fd JobId 1897: python-fd: Got to_lsn 1616879 from restore object of job 640
2018-09-19 17:57:36 | bareos-sd JobId 1897: Secure connection to File Daemon at redacted:9102 with cipher ECDHE-RSA-AES256-GCM-SHA384 established
2018-09-19 17:57:35 | bareos-dir JobId 1897: Secure connection to Storage daemon at redacted:9103 with cipher ECDHE-PSK-CHACHA20-POLY1305 established
2018-09-19 17:57:35 | bareos-dir JobId 1897: Start Restore Job RestoreFiles.2018-09-19_17.57.33_04
@aussendorf

This comment has been minimized.

Member

aussendorf commented Oct 11, 2018

Hi,
thanks for the explanation, it's the same with the VMWare plugin and related to https://bugs.bareos.org/view.php?id=805
We will test the patch (probably next week) to make sure it does not have any side effects.
Regards
Maik

@aussendorf aussendorf assigned fbergkemper and unassigned aussendorf Oct 11, 2018

@pstorz

This comment has been minimized.

Member

pstorz commented Oct 12, 2018

Hello,

thanks for your contribution.

I have looked into what you did, and it seems that you have copied the content of
JobidHandler() into JobidFileindexHandler().

I would like to suggest to not repeat yourself and do the following:


diff --git a/core/src/dird/ua_restore.cc b/core/src/dird/ua_restore.cc
index 23946071b..457a924ae 100644
--- a/core/src/dird/ua_restore.cc
+++ b/core/src/dird/ua_restore.cc
@@ -1556,6 +1556,9 @@ static int JobidFileindexHandler(void *ctx, int num_fields, char **row)
    AddFindex(rx->bsr, rx->JobId, str_to_int64(row[1]));
    rx->found = true;
    rx->selected_files++;
+
+   JobidHandler(ctx, num_fields, row);
+
    return 0;
 }

What do you think?

Best regards,

Philipp

@gnif

This comment has been minimized.

Contributor

gnif commented Nov 9, 2018

Sure, but IMO either is fine since it's such a small amount of code. I will try to find time to update this PR but at this time my schedule is very full.

@pstorz

This comment has been minimized.

Member

pstorz commented Nov 15, 2018

I have applied your PR to master and also added your name to the AUTHORS file.

Thank you very much!

Best regards,

Philipp

@pstorz pstorz closed this Nov 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment