Merge bareos-16.2-droplet into bareos-17.2 #91

joergsteffens · 2018-06-14T15:13:12Z

Merges the droplet changes from bareos-16.2-droplet into bareos-17.2

Previos version did not always return an error, if data could not be written. Especially the load_chunk ignored EIO errors, properly because of a typo. As the droplet_device in iothread mode relies on asynchronious write-backs, the new device method flush() has been introduced. If a droplet_device is configured to use iothreads and unlimited retries, this will do busy waiting until all data is written to the droplet backend. In case of a connection problems to the droplet_device, this will be forever. Note that a bconsole "status storage=..." command will inform about "Pending IO flush requests". Fixes bareos#892: bareos-storage-droplet: if configured with unreachable S3 system, backup will terminate with OK

…rector and storage daemons fails this results to failed jobs, instead of terminated with warnings.

This results in a more accurate time period.

EIO (io error) is normally permanent, so a retry will not help. When doing a retry on a EIO on the droplet_device, this results in lost data (because of the chunked_device caching). This could be fixed, however this is the quicker solution.

We need to reset the JobStatus value to its previous after each storage or client status call, otherwise if we receive an JS_Error for one storage or client, all the following status calls will fail as well. This could happen for example if a client or storage is offline but more status calls to other clients and storages will follow in the loop. (cherry picked from commit 751787a)

pstorz

autoconf/configure.in has the ceph_statx.h check now two times. We only need this check once
msgchan.c: why was the error changed to M_FATAL?
If we are sure that M_FATAL is correct, we should remove the above comment.
acquire.c: the job flush should only be called if the device really needs a flushing, otherwise people will be confused of the job messages.
block.c: why was errno = EIO removed?

joergsteffens · 2018-07-05T16:46:52Z

autoconf/configure.in has the ceph_statx.h check now two times. We only need this check once

I'll fix this.

msgchan.c: why was the error changed to M_FATAL?
If we are sure that M_FATAL is correct, we should remove the above comment.

Well, I'm not 100% sure, that why I requested the review. However, I can confirm, that this is required for the droplet backend. If not, a Job can finish with status JS_Warnings, without having all data (or none att all) written to the Storage Daemon.

acquire.c: the job flush should only be called if the device really needs a flushing, otherwise people will be confused of the job messages.

Is it okay, to rephrase this to "releasing device"? Because that is what is done on all backends. On some, this will be fast, on others (droplet) this can take longer.

block.c: why was errno = EIO removed?

See commit 346907f:

write_block_to_dev: don't retry on EIO, only on EBUSY

EIO (io error) is normally permanent, so a retry will not help.
When doing a retry on a EIO on the droplet_device,
this results in lost data (because of the chunked_device caching).
This could be fixed, however this is the quicker solution.

Retrying on EIO will only work as expected, when NO data has been written. If part of the data has been written successfully and the "cursor" in the backend has already moved on, rewriting the full data block will result in having the already data duplicated and therefore a corrupted Bareos block .
I've not found out, if a EIO should guarantee that no data has been written. At least, for the droplet backend, this is not the case.

As the original author have already been unsure if this code segment is useful ("feeble attempt"), I opt for removing it.

…16.2-droplet

joergsteffens and others added 10 commits May 3, 2018 14:28

chunked_device: fix inflight counter

11d9845

generate fatal error instead of normal error if connection between di…

703e877

…rector and storage daemons fails this results to failed jobs, instead of terminated with warnings.

calculate job time after releasing a device

bb27419

This results in a more accurate time period.

update droplet readme

a9f890c

droplet readme: added troubleshooting section

77792e6

Merge branch 'bareos-16.2' into bareos-16.2-droplet

99d9fd7

droplet: update example

b29bd50

joergsteffens requested review from pstorz and sduehr June 14, 2018 15:13

pstorz reviewed Jun 19, 2018

View reviewed changes

sduehr and others added 2 commits June 29, 2018 15:47

Merge remote-tracking branch 'bareos-16.2' into bareos-16.2-droplet

ddc7eba

added comment about changed behavior

aa34bdd

joergsteffens self-assigned this Jul 5, 2018

sduehr and others added 5 commits July 6, 2018 15:08

Build: Bump version number.

c533f28

modify job messages when releasing device

4bd389c

write_block_to_dev: don't retry on EIO, only on EBUSY: cleanup

fa8f15e

Merge branch 'bareos-16.2'

ac2ccea

Merge branch 'bareos-16.2-droplet' into dev/joergs/bareos-17.2/merge-…

38aebda

…16.2-droplet

joergsteffens force-pushed the dev/joergs/bareos-17.2/merge-16.2-droplet branch from c46bfae to 38aebda Compare July 9, 2018 12:29

pstorz merged commit 0aadb37 into bareos:bareos-17.2 Jul 9, 2018

joergsteffens deleted the dev/joergs/bareos-17.2/merge-16.2-droplet branch May 11, 2020 16:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge bareos-16.2-droplet into bareos-17.2 #91

Merge bareos-16.2-droplet into bareos-17.2 #91

joergsteffens commented Jun 14, 2018

pstorz left a comment

joergsteffens commented Jul 5, 2018

Merge bareos-16.2-droplet into bareos-17.2 #91

Merge bareos-16.2-droplet into bareos-17.2 #91

Conversation

joergsteffens commented Jun 14, 2018

pstorz left a comment

Choose a reason for hiding this comment

joergsteffens commented Jul 5, 2018