Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: update Blacklisting and OSD epoch barrier #19701

Merged
merged 2 commits into from Dec 29, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
47 changes: 46 additions & 1 deletion doc/cephfs/eviction.rst
Expand Up @@ -133,6 +133,51 @@ Note that if blacklisting is disabled, then evicting a client will
only have an effect on the MDS you send the command to. On a system
with multiple active MDS daemons, you would need to send an
eviction command to each active daemon. When blacklisting is enabled
(the default), sending an eviction to command to just a single
(the default), sending an eviction command to just a single
MDS is sufficient, because the blacklist propagates it to the others.

.. _background_blacklisting_and_osd_epoch_barrier:

Background: Blacklisting and OSD epoch barrier
==============================================

After a client is blacklisted, it is necessary to make sure that
other clients and MDS daemons have the latest OSDMap (including
the blacklist entry) before they try to access any data objects
that the blacklisted client might have been accessing.

This is ensured using an internal "osdmap epoch barrier" mechanism.

The purpose of the barrier is to ensure that when we hand out any
capabilities which might allow touching the same RADOS objects, the
clients we hand out the capabilities to must have a sufficiently recent
OSD map to not race with cancelled operations (from ENOSPC) or
blacklisted clients (from evictions).

More specifically, the cases where an epoch barrier is set are:

* Client eviction (where the client is blacklisted and other clients
must wait for a post-blacklist epoch to touch the same objects).
* OSD map full flag handling in the client (where the client may
cancel some OSD ops from a pre-full epoch, so other clients must
wait until the full epoch or later before touching the same objects).
* MDS startup, because we don't persist the barrier epoch, so must
assume that latest OSD map is always required after a restart.

Note that this is a global value for simplicity. We could maintain this on
a per-inode basis. But we don't, because:

* It would be more complicated.
* It would use an extra 4 bytes of memory for every inode.
* It would not be much more efficient as almost always everyone has the latest.
OSD map anyway, in most cases everyone will breeze through this barrier
rather than waiting.
* This barrier is done in very rare cases, so any benefit from per-inode
granularity would only very rarely be seen.

The epoch barrier is transmitted along with all capability messages, and
instructs the receiver of the message to avoid sending any more RADOS
operations to OSDs until it has seen this OSD epoch. This mainly applies
to clients (doing their data writes directly to files), but also applies
to the MDS because things like file size probing and file deletion are
done directly from the MDS.
2 changes: 1 addition & 1 deletion doc/cephfs/full.rst
Expand Up @@ -40,7 +40,7 @@ time the OSD full flag is sent. Clients update the ``osd_epoch_barrier``
when releasing capabilities on files affected by cancelled operations, in
order to ensure that these cancelled operations do not interfere with
subsequent access to the data objects by the MDS or other clients. For
more on the epoch barrier mechanism, see :doc:`eviction`.
more on the epoch barrier mechanism, see :ref:`background_blacklisting_and_osd_epoch_barrier`.

Legacy (pre-hammer) behavior
----------------------------
Expand Down