New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge bareos-16.2-droplet into bareos-17.2 #91
Merge bareos-16.2-droplet into bareos-17.2 #91
Conversation
Previos version did not always return an error, if data could not be written. Especially the load_chunk ignored EIO errors, properly because of a typo. As the droplet_device in iothread mode relies on asynchronious write-backs, the new device method flush() has been introduced. If a droplet_device is configured to use iothreads and unlimited retries, this will do busy waiting until all data is written to the droplet backend. In case of a connection problems to the droplet_device, this will be forever. Note that a bconsole "status storage=..." command will inform about "Pending IO flush requests". Fixes bareos#892: bareos-storage-droplet: if configured with unreachable S3 system, backup will terminate with OK
…rector and storage daemons fails this results to failed jobs, instead of terminated with warnings.
This results in a more accurate time period.
EIO (io error) is normally permanent, so a retry will not help. When doing a retry on a EIO on the droplet_device, this results in lost data (because of the chunked_device caching). This could be fixed, however this is the quicker solution.
We need to reset the JobStatus value to its previous after each storage or client status call, otherwise if we receive an JS_Error for one storage or client, all the following status calls will fail as well. This could happen for example if a client or storage is offline but more status calls to other clients and storages will follow in the loop. (cherry picked from commit 751787a)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
autoconf/configure.in has the ceph_statx.h check now two times. We only need this check once
-
msgchan.c: why was the error changed to M_FATAL?
If we are sure that M_FATAL is correct, we should remove the above comment. -
acquire.c: the job flush should only be called if the device really needs a flushing, otherwise people will be confused of the job messages.
-
block.c: why was errno = EIO removed?
I'll fix this.
Well, I'm not 100% sure, that why I requested the review. However, I can confirm, that this is required for the droplet backend. If not, a Job can finish with status JS_Warnings, without having all data (or none att all) written to the Storage Daemon.
Is it okay, to rephrase this to "releasing device"? Because that is what is done on all backends. On some, this will be fast, on others (droplet) this can take longer.
See commit 346907f:
Retrying on EIO will only work as expected, when NO data has been written. If part of the data has been written successfully and the "cursor" in the backend has already moved on, rewriting the full data block will result in having the already data duplicated and therefore a corrupted Bareos block . As the original author have already been unsure if this code segment is useful ("feeble attempt"), I opt for removing it. |
c46bfae
to
38aebda
Compare
Merges the droplet changes from bareos-16.2-droplet into bareos-17.2