ceph · batrick · Apr 18, 2018 · Apr 17, 2018 · Jul 25, 2017 · Jul 25, 2017
diff --git a/PendingReleaseNotes b/PendingReleaseNotes
@@ -13,10 +13,10 @@
     (even as standby). Operators may ignore the error messages and continue
     upgrading/restarting or follow this upgrade sequence:
 
-    Reduce the number of ranks to 1 (`ceph fs set <fs_name> max_mds 1`),
-    deactivate all other ranks (`ceph mds deactivate <fs_name>:<n>`), shutdown
-    standbys leaving the one active MDS, upgrade the single active MDS, then
-    upgrade/start standbys. Finally, restore the previous max_mds.
+	 Reduce the number of ranks to 1 (`ceph fs set <fs_name> max_mds 1`), wait
+	 for all other MDS to deactivate, leaving the one active MDS, upgrade the
+	 single active MDS, then upgrade/start standbys. Finally, restore the
+	 previous max_mds.
 
     See also: https://tracker.ceph.com/issues/23172
 
@@ -27,12 +27,32 @@
       - mds stop -> mds deactivate
       - mds set_max_mds -> fs set max_mds
       - mds set -> fs set
-      - mds cluster_down -> fs set cluster_down true
-      - mds cluster_up -> fs set cluster_down false
+      - mds cluster_down -> fs set joinable false
+      - mds cluster_up -> fs set joinable true
       - mds add_data_pool -> fs add_data_pool
       - mds remove_data_pool -> fs rm_data_pool
       - mds rm_data_pool -> fs rm_data_pool
 
+  * As the multiple MDS feature is now standard, it is now enabled by
+    default. ceph fs set allow_multimds is now deprecated and will be
+    removed in a future release.
+
+  * As the directory fragmentation feature is now standard, it is now
+    enabled by default. ceph fs set allow_dirfrags is now deprecated and
+    will be removed in a future release.
+
+  * MDS daemons now activate and deactivate based on the value of
+    max_mds. Accordingly, ceph mds deactivate has been deprecated as it
+    is now redundant.
+
+  * Taking a CephFS cluster down is now done by setting the down flag which
+    deactivates all MDS. For example: `ceph fs set cephfs down true`.
+
+  * Preventing standbys from joining as new actives (formerly the now
+    deprecated cluster_down flag) on a file system is now accomplished by
+    setting the joinable flag.  This is useful mostly for testing so that a
+    file system may be quickly brought down and deleted.
+
   * New CephFS file system attributes session_timeout and session_autoclose
     are configurable via `ceph fs set`. The MDS config options
     mds_session_timeout, mds_session_autoclose, and mds_max_file_size are now

diff --git a/doc/cephfs/administration.rst b/doc/cephfs/administration.rst
@@ -77,23 +77,59 @@ to enumerate the objects during operations like stats or deletes.
 Taking the cluster down
 -----------------------
 
-Taking a CephFS cluster down is done by reducing the number of ranks to 1,
-setting the cluster_down flag, and then failing the last rank. For example:
+Taking a CephFS cluster down is done by setting the down flag:
+
+:: 
+
+    mds set <fs_name> down true
+
+To bring the cluster back online:
+
+:: 
+
+    mds set <fs_name> down false
+
+This will also restore the previous value of max_mds. MDS daemons are brought
+down in a way such that journals are flushed to the metadata pool and all
+client I/O is stopped.
+
+
+Taking the cluster down rapidly for deletion or disaster recovery
+-----------------------------------------------------------------
+
+To allow rapidly deleting a file system (for testing) or to quickly bring MDS
+daemons down, the operator may also set a flag to prevent standbys from
+activating on the file system. This is done using the ``joinable`` flag:
+
+::
+
+    fs set <fs_name> joinable false
+
+Then the operator can fail all of the ranks which causes the MDS daemons to
+respawn as standbys. The file system will be left in a degraded state.
 
 ::
-    ceph fs set <fs_name> max_mds 1
-    ceph mds deactivate <fs_name>:1 # rank 2 of 2
-    ceph status # wait for rank 1 to finish stopping
-    ceph fs set <fs_name> cluster_down true
-    ceph mds fail <fs_name>:0
 
-Setting the ``cluster_down`` flag prevents standbys from taking over the failed
-rank.
+    # For all ranks, 0-N:
+    mds fail <fs_name>:<n>
+
+Once all ranks are inactive, the file system may also be deleted or left in
+this state for other purposes (perhaps disaster recovery).
+
 
 Daemons
 -------
 
-These commands act on specific mds daemons or ranks.
+Most commands manipulating MDSs take a ``<role>`` argument which can take one
+of three forms:
+
+::
+
+    <fs_name>:<rank>
+    <fs_id>:<rank>
+    <rank>
+
+Comamnds to manipulate MDS daemons:
 
 ::
 
@@ -108,29 +144,13 @@ If the MDS daemon was in reality still running, then using ``mds fail``
 will cause the daemon to restart.  If it was active and a standby was
 available, then the "failed" daemon will return as a standby.
 
-::
-
-    mds deactivate <role>
-
-Deactivate an MDS, causing it to flush its entire journal to
-backing RADOS objects and close all open client sessions. Deactivating an MDS
-is primarily intended for bringing down a rank after reducing the number of
-active MDS (max_mds). Once the rank is deactivated, the MDS daemon will rejoin the
-cluster as a standby.
-``<role>`` can take one of three forms:
 
 ::
 
-    <fs_name>:<rank>
-    <fs_id>:<rank>
-    <rank>
-
-Use ``mds deactivate`` in conjunction with adjustments to ``max_mds`` to
-shrink an MDS cluster.  See :doc:`/cephfs/multimds`
-
-::
+    tell mds.<daemon name> command ...
 
-    tell mds.<daemon name>
+Send a command to the MDS daemon(s). Use ``mds.*`` to send a command to all
+daemons. Use ``ceph tell mds.* help`` to learn available commands.
 
 ::
 
@@ -208,5 +228,5 @@ These legacy commands are obsolete and no longer usable post-Luminous.
     mds remove_data_pool # replaced by "fs rm_data_pool"
     mds set # replaced by "fs set"
     mds set_max_mds # replaced by "fs set max_mds"
-    mds stop  # replaced by "mds deactivate"
+    mds stop  # obsolete
 
diff --git a/doc/cephfs/dirfrags.rst b/doc/cephfs/dirfrags.rst
@@ -25,10 +25,6 @@ fragments may be *merged* to reduce the number of fragments in the directory.
 Splitting and merging
 =====================
 
-An MDS will only consider doing splits if the allow_dirfrags setting is true in
-the file system map (set on the mons).  This setting is true by default since
-the *Luminous* release (12.2.X).
-
 When an MDS identifies a directory fragment to be split, it does not
 do the split immediately.  Because splitting interrupts metadata IO,
 a short delay is used to allow short bursts of client IO to complete

diff --git a/doc/cephfs/multimds.rst b/doc/cephfs/multimds.rst
@@ -27,17 +27,13 @@ are those with many clients, perhaps working on many separate directories.
 Increasing the MDS active cluster size
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Each CephFS filesystem has a *max_mds* setting, which controls
-how many ranks will be created.  The actual number of ranks
-in the filesystem will only be increased if a spare daemon is
-available to take on the new rank. For example, if there is only one MDS daemon running, and max_mds is set to two, no second rank will be created.
-
-Before ``max_mds`` can be increased, the ``allow_multimds`` flag must be set.
-The following command sets this flag for a filesystem instance.
-
-::
-
-    # ceph fs set <fs_name> allow_multimds true --yes-i-really-mean-it
+Each CephFS filesystem has a *max_mds* setting, which controls how many ranks
+will be created.  The actual number of ranks in the filesystem will only be
+increased if a spare daemon is available to take on the new rank. For example,
+if there is only one MDS daemon running, and max_mds is set to two, no second
+rank will be created. (Note that such a configuration is not Highly Available
+(HA) because no standby is available to take over for a failed rank. The
+cluster will complain via health warnings when configured this way.)
 
 Set ``max_mds`` to the desired number of ranks.  In the following examples
 the "fsmap" line of "ceph status" is shown to illustrate the expected
@@ -63,7 +59,7 @@ requires standby daemons** to take over if any of the servers running
 an active daemon fail.
 
 Consequently, the practical maximum of ``max_mds`` for highly available systems
-is one less than the total number of MDS servers in your system.
+is at most one less than the total number of MDS servers in your system.
 
 To remain available in the event of multiple server failures, increase the
 number of standby daemons in the system to match the number of server failures
@@ -72,49 +68,35 @@ you wish to withstand.
 Decreasing the number of ranks
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-All ranks, including the rank(s) to be removed must first be active.  This
-means that you must have at least max_mds MDS daemons available.
-
-First, set max_mds to a lower number, for example we might go back to
-having just a single active MDS:
+Reducing the number of ranks is as simple as reducing ``max_mds``:
 
 ::
 
     # fsmap e9: 2/2/2 up {0=a=up:active,1=c=up:active}, 1 up:standby
     ceph fs set <fs_name> max_mds 1
-    # fsmap e10: 2/2/1 up {0=a=up:active,1=c=up:active}, 1 up:standby
-
-Note that we still have two active MDSs: the ranks still exist even though
-we have decreased max_mds, because max_mds only restricts creation
-of new ranks.
-
-Next, use the ``ceph mds deactivate <role>`` command to remove the
-unneeded rank:
-
-::
-
-    ceph mds deactivate cephfs_a:1
-    telling mds.1:1 172.21.9.34:6806/837679928 to deactivate
+    # fsmap e10: 2/2/1 up {0=a=up:active,1=c=up:stopping}, 1 up:standby
+    # fsmap e10: 2/2/1 up {0=a=up:active,1=c=up:stopping}, 1 up:standby
+    ...
+    # fsmap e10: 1/1/1 up {0=a=up:active}, 2 up:standby
 
-    # fsmap e11: 2/2/1 up {0=a=up:active,1=c=up:stopping}, 1 up:standby
-    # fsmap e12: 1/1/1 up {0=a=up:active}, 1 up:standby
-    # fsmap e13: 1/1/1 up {0=a=up:active}, 2 up:standby
+The cluster will automatically deactivate extra ranks incrementally until
+``max_mds`` is reached.
 
 See :doc:`/cephfs/administration` for more details which forms ``<role>`` can
 take.
 
-The deactivated rank will first enter the stopping state for a period
-of time while it hands off its share of the metadata to the remaining
-active daemons.  This phase can take from seconds to minutes.  If the
-MDS appears to be stuck in the stopping state then that should be investigated
-as a possible bug.
+Note: deactivated ranks will first enter the stopping state for a period of
+time while it hands off its share of the metadata to the remaining active
+daemons.  This phase can take from seconds to minutes.  If the MDS appears to
+be stuck in the stopping state then that should be investigated as a possible
+bug.
 
-If an MDS daemon crashes or is killed while in the 'stopping' state, a
-standby will take over and the rank will go back to 'active'.  You can
-try to deactivate it again once it has come back up.
+If an MDS daemon crashes or is killed while in the ``up:stopping`` state, a
+standby will take over and the cluster monitors will against try to deactivate
+the daemon.
 
-When a daemon finishes stopping, it will respawn itself and go
-back to being a standby.
+When a daemon finishes stopping, it will respawn itself and go back to being a
+standby.
 
 
 Manually pinning directory trees to a particular rank

diff --git a/doc/cephfs/upgrading.rst b/doc/cephfs/upgrading.rst
@@ -18,29 +18,39 @@ The proper sequence for upgrading the MDS cluster is:
 
     ceph fs set <fs_name> max_mds 1
 
-2. Deactivate all non-zero ranks, from the highest rank to the lowest, while waiting for each MDS to finish stopping:
+2. Wait for cluster to deactivate non-zero ranks where only rank 0 is active and the rest are standbys.
 
 ::
 
-    ceph mds deactivate <fs_name>:<n>
     ceph status # wait for MDS to finish stopping
 
 3. Take all standbys offline, e.g. using systemctl:
 
 ::
 
     systemctl stop ceph-mds.target
-    ceph status # confirm only one MDS is online and is active
 
-4. Upgrade the single active MDS, e.g. using systemctl:
+4. Confirm only one MDS is online and is rank 0 for your FS:
 
 ::
 
+    ceph status
+
+5. Upgrade the single active MDS, e.g. using systemctl:
+
+::
+
+    # use package manager to update cluster
     systemctl restart ceph-mds.target
 
-5. Upgrade/start the standby daemons.
+6. Upgrade/start the standby daemons.
+
+::
+
+    # use package manager to update cluster
+    systemctl restart ceph-mds.target
 
-6. Restore the previous max_mds for your cluster:
+7. Restore the previous max_mds for your cluster:
 
 ::
 

diff --git a/qa/cephfs/overrides/whitelist_health.yaml b/qa/cephfs/overrides/whitelist_health.yaml
@@ -7,3 +7,5 @@ overrides:
       - \(MDS_DEGRADED\)
       - \(FS_WITH_FAILED_MDS\)
       - \(MDS_DAMAGE\)
+      - \(MDS_ALL_DOWN\)
+      - \(MDS_UP_LESS_THAN_MAX\)
diff --git a/qa/suites/fs/multifs/tasks/failover.yaml b/qa/suites/fs/multifs/tasks/failover.yaml
@@ -3,6 +3,8 @@ overrides:
     log-whitelist:
       - not responding, replacing
       - \(MDS_INSUFFICIENT_STANDBY\)
+      - \(MDS_ALL_DOWN\)
+      - \(MDS_UP_LESS_THAN_MAX\)
   ceph-fuse:
     disabled: true
 tasks:

diff --git a/qa/suites/kcephfs/recovery/tasks/failover.yaml b/qa/suites/kcephfs/recovery/tasks/failover.yaml
@@ -3,6 +3,8 @@ overrides:
     log-whitelist:
       - not responding, replacing
       - \(MDS_INSUFFICIENT_STANDBY\)
+      - \(MDS_ALL_DOWN\)
+      - \(MDS_UP_LESS_THAN_MAX\)
 tasks:
   - cephfs_test_runner:
       fail_on_skip: false

diff --git a/qa/tasks/ceph.py b/qa/tasks/ceph.py
@@ -377,7 +377,6 @@ def cephfs_setup(ctx, config):
         num_active = len([r for r in all_roles if is_active_mds(r)])
 
         fs.set_max_mds(num_active)
-        fs.set_allow_dirfrags(True)
 
     yield