/
operations-maintenance-controller-pit_database_recovery.xml
216 lines (216 loc) · 6.86 KB
/
operations-maintenance-controller-pit_database_recovery.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
<?xml version="1.0"?>
<?xml-stylesheet href="urn:x-suse:xslt:profiling:docbook51-profile.xsl"
type="text/xml"
title="Profiling step"?>
<!DOCTYPE section [
<!ENTITY % entities SYSTEM "entity-decl.ent"> %entities;
]>
<section xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="pit-database-recovery">
<title>Point-in-Time &mariadb; Database Recovery</title>
<para>
In this scenario, everything is still running (&clm;, cloud controller nodes,
and compute nodes) but you want to restore the &mariadb; database to a
previous state.
</para>
<section>
<title>Restore &mariadb; manually</title>
<para>
Follow this procedure to manually restore &mariadb;:
</para>
<procedure>
<step>
<para>
Log in to the &clm;.
</para>
</step>
<step>
<para>
Stop the &mariadb; cluster:
</para>
<screen>&prompt.ardana;cd ~/scratch/ansible/next/ardana/ansible
&prompt.ardana;ansible-playbook -i hosts/verb_hosts percona-stop.yml</screen>
</step>
<step>
<para>
On all of the nodes running the &mariadb; service, which should be all of
your controller nodes, run the following command to purge the old
database:
</para>
<screen>&prompt.ardana;sudo rm -r /var/lib/mysql/*</screen>
</step>
<step>
<para>
On the first node running the &mariadb; service restore the backup with
the command below. If you have already restored to a temporary directory,
copy the files again.
</para>
<screen>&prompt.ardana;sudo cp -pr /tmp/mysql_restore/* /var/lib/mysql</screen>
</step>
<step>
<para>
If you need to restore the files manually from SSH, follow these steps:
</para>
<substeps>
<step>
<para>
Log in to the first node running the &mariadb; service.
</para>
</step>
<step>
<para>
Retrieve the &mariadb; backup that was created with <xref
linkend="mariadb-database-backup"/>.
</para>
</step>
<step>
<para>
Create a temporary directory and extract the TAR archive (for example,
<filename>mydb.tar.gz</filename>).
</para>
<screen>&prompt.ardana;mkdir /tmp/mysql_restore; sudo tar -z --incremental \
--extract --ignore-zeros --warning=none --overwrite --directory /tmp/mysql_restore/ \
-f mydb.tar.gz</screen>
</step>
<step>
<para>
Verify that the files have been restored on the controller.
</para>
<screen>&prompt.ardana;sudo du -shx /tmp/mysql_restore/*
16K /tmp/mysql_restore/aria_log.00000001
4.0K /tmp/mysql_restore/aria_log_control
3.4M /tmp/mysql_restore/barbican
8.0K /tmp/mysql_restore/ceilometer
4.2M /tmp/mysql_restore/cinder
2.9M /tmp/mysql_restore/designate
129M /tmp/mysql_restore/galera.cache
2.1M /tmp/mysql_restore/glance
4.0K /tmp/mysql_restore/grastate.dat
4.0K /tmp/mysql_restore/gvwstate.dat
2.6M /tmp/mysql_restore/heat
752K /tmp/mysql_restore/horizon
4.0K /tmp/mysql_restore/ib_buffer_pool
76M /tmp/mysql_restore/ibdata1
128M /tmp/mysql_restore/ib_logfile0
128M /tmp/mysql_restore/ib_logfile1
12M /tmp/mysql_restore/ibtmp1
16K /tmp/mysql_restore/innobackup.backup.log
313M /tmp/mysql_restore/keystone
716K /tmp/mysql_restore/magnum
12M /tmp/mysql_restore/mon
8.3M /tmp/mysql_restore/monasca_transform
0 /tmp/mysql_restore/multi-master.info
11M /tmp/mysql_restore/mysql
4.0K /tmp/mysql_restore/mysql_upgrade_info
14M /tmp/mysql_restore/nova
4.4M /tmp/mysql_restore/nova_api
14M /tmp/mysql_restore/nova_cell0
3.6M /tmp/mysql_restore/octavia
208K /tmp/mysql_restore/opsconsole
38M /tmp/mysql_restore/ovs_neutron
8.0K /tmp/mysql_restore/performance_schema
24K /tmp/mysql_restore/tc.log
4.0K /tmp/mysql_restore/test
8.0K /tmp/mysql_restore/winchester
4.0K /tmp/mysql_restore/xtrabackup_galera_info</screen>
</step>
</substeps>
</step>
<step>
<para>
Log back in to the &clm;.
</para>
</step>
<step>
<para>
Start the &mariadb; service.
</para>
<screen>&prompt.ardana;cd ~/scratch/ansible/next/ardana/ansible
&prompt.ardana;ansible-playbook -i hosts/verb_hosts galera-bootstrap.yml</screen>
</step>
<step>
<para>
After approximately 10-15 minutes, the output of the
<literal>percona-status.yml</literal> playbook should show all the
&mariadb; nodes in sync. &mariadb; cluster status can be checked using
this playbook:
</para>
<screen>&prompt.ardana;cd ~/scratch/ansible/next/ardana/ansible
&prompt.ardana;ansible-playbook -i hosts/verb_hosts percona-status.yml</screen>
<para>
An example output is as follows:
</para>
<screen>TASK: [FND-MDB | status | Report status of "{{ mysql_service }}"] *************
ok: [ardana-cp1-c1-m1-mgmt] => {
"msg": "mysql is synced."
}
ok: [ardana-cp1-c1-m2-mgmt] => {
"msg": "mysql is synced."
}
ok: [ardana-cp1-c1-m3-mgmt] => {
"msg": "mysql is synced."
}</screen>
</step>
</procedure>
</section>
<section>
<title>Point-in-Time Cassandra Recovery</title>
<para>
A node may have been removed either due to an intentional action in the
&clm; Admin UI or as a result of a fatal hardware event that requires a
server to be replaced. In either case, the entry for the failed or deleted
node should be removed from Cassandra before a new node is brought up.
</para>
<para>
The following steps should be taken before enabling and deploying the
replacement node.
</para>
<procedure>
<step>
<para>
Determine the IP address of the node that was removed or is being replaced.
</para>
</step>
<step>
<para>
On one of the functional &cassandra; control plane nodes, log in as the
<literal>ardana</literal> user.
</para>
</step>
<step>
<para>
Run the command <command>nodetool status</command> to display a list of
&cassandra; nodes.
</para>
</step>
<step>
<para>
If the node that has been removed (no IP address matches that of the
removed node) is not in the list, skip the next step.
</para>
</step>
<step>
<para>
If the node that was removed is still in the list, copy its node
<replaceable>ID</replaceable>.
</para>
</step>
<step>
<para>
Run the command <command>nodetool removenode
<replaceable>ID</replaceable></command>.
</para>
</step>
</procedure>
<para>
After any obsolete node entries have been removed, the replacement node can
be deployed as usual (for more information, see <xref
linkend="cont-planned"/>). The new &cassandra; node will be able to join the
cluster and replicate data.
</para>
<para>
For more information, please consult <link
xlink:href="http://cassandra.apache.org/doc/latest/operating/topo_changes.html">the
&cassandra; documentation</link>.
</para>
</section>
</section>