xml/operations-maintenance-controller-pit_database_recovery.xml

<?xml version="1.0"?>
<?xml-stylesheet href="urn:x-suse:xslt:profiling:docbook51-profile.xsl"
 type="text/xml"
 title="Profiling step"?>
<!DOCTYPE section [
 <!ENTITY % entities SYSTEM "entity-decl.ent"> %entities;
]>
<section xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="pit-database-recovery">
 <title>Point-in-Time &mariadb; Database Recovery</title>
 <para>
  In this scenario, everything is still running (&clm;, cloud controller nodes,
  and compute nodes) but you want to restore the &mariadb; database to a
  previous state.
 </para>
 <section>
  <title>Restore &mariadb; manually</title>
  <para>
   Follow this procedure to manually restore &mariadb;:
  </para>
  <procedure>
   <step>
    <para>
     Log in to the &clm;.
    </para>
   </step>
   <step>
    <para>
     Stop the &mariadb; cluster:
    </para>
<screen>&prompt.ardana;cd ~/scratch/ansible/next/ardana/ansible
&prompt.ardana;ansible-playbook -i hosts/verb_hosts percona-stop.yml</screen>
   </step>
   <step>
    <para>
     On all of the nodes running the &mariadb; service, which should be all of
     your controller nodes, run the following command to purge the old
     database:
    </para>
<screen>&prompt.ardana;sudo rm -r /var/lib/mysql/*</screen>
   </step>
   <step>
    <para>
     On the first node running the &mariadb; service restore the backup with
     the command below. If you have already restored to a temporary directory,
     copy the files again.
    </para>
<screen>&prompt.ardana;sudo cp -pr /tmp/mysql_restore/* /var/lib/mysql</screen>
   </step>
   <step>
    <para>
     If you need to restore the files manually from SSH, follow these steps:
    </para>
    <substeps>
     <step>
      <para>
       Log in to the first node running the &mariadb; service.
      </para>
     </step>
     <step>
      <para>
       Retrieve the &mariadb; backup that was created with <xref
       linkend="mariadb-database-backup"/>.
       </para>
      </step>
      <step>
       <para>
        Create a temporary directory and extract the TAR archive (for example,
       <filename>mydb.tar.gz</filename>).
       </para>
       <screen>&prompt.ardana;mkdir /tmp/mysql_restore; sudo tar -z --incremental \
--extract --ignore-zeros --warning=none --overwrite --directory /tmp/mysql_restore/ \
-f mydb.tar.gz</screen>
      </step>
      <step>
       <para>
        Verify that the files have been restored on the controller.
       </para>
       <screen>&prompt.ardana;sudo du -shx /tmp/mysql_restore/*
16K     /tmp/mysql_restore/aria_log.00000001
4.0K    /tmp/mysql_restore/aria_log_control
3.4M    /tmp/mysql_restore/barbican
8.0K    /tmp/mysql_restore/ceilometer
4.2M    /tmp/mysql_restore/cinder
2.9M    /tmp/mysql_restore/designate
129M    /tmp/mysql_restore/galera.cache
2.1M    /tmp/mysql_restore/glance
4.0K    /tmp/mysql_restore/grastate.dat
4.0K    /tmp/mysql_restore/gvwstate.dat
2.6M    /tmp/mysql_restore/heat
752K    /tmp/mysql_restore/horizon
4.0K    /tmp/mysql_restore/ib_buffer_pool
76M     /tmp/mysql_restore/ibdata1
128M    /tmp/mysql_restore/ib_logfile0
128M    /tmp/mysql_restore/ib_logfile1
12M     /tmp/mysql_restore/ibtmp1
16K     /tmp/mysql_restore/innobackup.backup.log
313M    /tmp/mysql_restore/keystone
716K    /tmp/mysql_restore/magnum
12M     /tmp/mysql_restore/mon
8.3M    /tmp/mysql_restore/monasca_transform
0       /tmp/mysql_restore/multi-master.info
11M     /tmp/mysql_restore/mysql
4.0K    /tmp/mysql_restore/mysql_upgrade_info
14M     /tmp/mysql_restore/nova
4.4M    /tmp/mysql_restore/nova_api
14M     /tmp/mysql_restore/nova_cell0
3.6M    /tmp/mysql_restore/octavia
208K    /tmp/mysql_restore/opsconsole
38M     /tmp/mysql_restore/ovs_neutron
8.0K    /tmp/mysql_restore/performance_schema
24K     /tmp/mysql_restore/tc.log
4.0K    /tmp/mysql_restore/test
8.0K    /tmp/mysql_restore/winchester
4.0K    /tmp/mysql_restore/xtrabackup_galera_info</screen>
      </step>
    </substeps>
   </step>
   <step>
    <para>
     Log back in to the &clm;.
    </para>
   </step>
   <step>
    <para>
     Start the &mariadb; service.
    </para>
<screen>&prompt.ardana;cd ~/scratch/ansible/next/ardana/ansible
&prompt.ardana;ansible-playbook -i hosts/verb_hosts galera-bootstrap.yml</screen>
   </step>
   <step>
    <para>
     After approximately 10-15 minutes, the output of the
     <literal>percona-status.yml</literal> playbook should show all the
     &mariadb; nodes in sync. &mariadb; cluster status can be checked using
     this playbook:
    </para>
<screen>&prompt.ardana;cd ~/scratch/ansible/next/ardana/ansible
&prompt.ardana;ansible-playbook -i hosts/verb_hosts percona-status.yml</screen>
    <para>
     An example output is as follows:
    </para>
<screen>TASK: [FND-MDB | status | Report status of "{{ mysql_service }}"] *************
  ok: [ardana-cp1-c1-m1-mgmt] =&gt; {
  "msg": "mysql is synced."
  }
  ok: [ardana-cp1-c1-m2-mgmt] =&gt; {
  "msg": "mysql is synced."
  }
  ok: [ardana-cp1-c1-m3-mgmt] =&gt; {
  "msg": "mysql is synced."
  }</screen>
   </step>
  </procedure>
 </section>
 <section>
  <title>Point-in-Time Cassandra Recovery</title>
  <para>
   A node may have been removed either due to an intentional action in the
   &clm; Admin UI or as a result of a fatal hardware event that requires a
   server to be replaced. In either case, the entry for the failed or deleted
   node should be removed from Cassandra before a new node is brought up.
  </para>
  <para>
   The following steps should be taken before enabling and deploying the
   replacement node.
  </para>
  <procedure>
   <step>
    <para>
     Determine the IP address of the node that was removed or is being replaced.
    </para>
   </step>
   <step>
    <para>
     On one of the functional &cassandra; control plane nodes, log in as the
     <literal>ardana</literal> user.
    </para>
   </step>
   <step>
    <para>
     Run the command <command>nodetool status</command> to display a list of
     &cassandra; nodes.
    </para>
   </step>
   <step>
    <para>
     If the node that has been removed (no IP address matches that of the
     removed node) is not in the list, skip the next step.
    </para>
   </step>
   <step>
    <para>
     If the node that was removed is still in the list, copy its node
     <replaceable>ID</replaceable>.
    </para>
   </step>
   <step>
    <para>
     Run the command <command>nodetool removenode
     <replaceable>ID</replaceable></command>.
    </para>
   </step>
  </procedure>
  <para>
   After any obsolete node entries have been removed, the replacement node can
   be deployed as usual (for more information, see <xref
   linkend="cont-planned"/>). The new &cassandra; node will be able to join the
   cluster and replicate data.
  </para>
  <para>
   For more information, please consult <link
   xlink:href="http://cassandra.apache.org/doc/latest/operating/topo_changes.html">the
   &cassandra; documentation</link>.
  </para>
 </section>
</section>