Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge bareos-16.2-droplet into bareos-17.2 #91

Conversation

joergsteffens
Copy link
Member

Merges the droplet changes from bareos-16.2-droplet into bareos-17.2

joergsteffens and others added 10 commits May 3, 2018 14:28
Previos version did not always return an error, if data could not be written.
Especially the load_chunk ignored EIO errors, properly because of a typo.

As the droplet_device in iothread mode relies on asynchronious write-backs,
the new device method flush() has been introduced.
If a droplet_device is configured to use iothreads and unlimited retries,
this will do busy waiting until all data is written to the droplet backend.
In case of a connection problems to the droplet_device, this will be forever.

Note that a bconsole "status storage=..." command will inform about "Pending IO flush requests".

Fixes bareos#892: bareos-storage-droplet: if configured with unreachable S3 system, backup will terminate with OK
…rector and storage daemons fails

this results to failed jobs, instead of terminated with warnings.
This results in a more accurate time period.
EIO (io error) is normally permanent, so a retry will not help.
When doing a retry on a EIO on the droplet_device,
this results in lost data (because of the chunked_device caching).
This could be fixed, however this is the quicker solution.
We need to reset the JobStatus value to its previous after each storage or client status call,
otherwise if we receive an JS_Error for one storage or client, all the following status calls
will fail as well.

This could happen for example if a client or storage is offline but more status calls to other
clients and storages will follow in the loop.

(cherry picked from commit 751787a)
Copy link
Member

@pstorz pstorz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • autoconf/configure.in has the ceph_statx.h check now two times. We only need this check once

  • msgchan.c: why was the error changed to M_FATAL?
    If we are sure that M_FATAL is correct, we should remove the above comment.

  • acquire.c: the job flush should only be called if the device really needs a flushing, otherwise people will be confused of the job messages.

  • block.c: why was errno = EIO removed?

@joergsteffens joergsteffens self-assigned this Jul 5, 2018
@joergsteffens
Copy link
Member Author

  • autoconf/configure.in has the ceph_statx.h check now two times. We only need this check once

I'll fix this.

  • msgchan.c: why was the error changed to M_FATAL?
    If we are sure that M_FATAL is correct, we should remove the above comment.

Well, I'm not 100% sure, that why I requested the review. However, I can confirm, that this is required for the droplet backend. If not, a Job can finish with status JS_Warnings, without having all data (or none att all) written to the Storage Daemon.

  • acquire.c: the job flush should only be called if the device really needs a flushing, otherwise people will be confused of the job messages.

Is it okay, to rephrase this to "releasing device"? Because that is what is done on all backends. On some, this will be fast, on others (droplet) this can take longer.

  • block.c: why was errno = EIO removed?

See commit 346907f:

write_block_to_dev: don't retry on EIO, only on EBUSY

EIO (io error) is normally permanent, so a retry will not help.
When doing a retry on a EIO on the droplet_device,
this results in lost data (because of the chunked_device caching).
This could be fixed, however this is the quicker solution.

Retrying on EIO will only work as expected, when NO data has been written. If part of the data has been written successfully and the "cursor" in the backend has already moved on, rewriting the full data block will result in having the already data duplicated and therefore a corrupted Bareos block .
I've not found out, if a EIO should guarantee that no data has been written. At least, for the droplet backend, this is not the case.

As the original author have already been unsure if this code segment is useful ("feeble attempt"), I opt for removing it.

@joergsteffens joergsteffens force-pushed the dev/joergs/bareos-17.2/merge-16.2-droplet branch from c46bfae to 38aebda Compare July 9, 2018 12:29
@pstorz pstorz merged commit 0aadb37 into bareos:bareos-17.2 Jul 9, 2018
@joergsteffens joergsteffens deleted the dev/joergs/bareos-17.2/merge-16.2-droplet branch May 11, 2020 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants