/
README.txt
367 lines (288 loc) · 14.8 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
Contents:
README: this file
RUNBOOK: a list of state descriptions, validations, and remedial actions
replicate.sh: the all-singing, all-dancing HA (re)activator
this installs and sets up the HA function for a controller pair.
appdcontroller.sh: a file intended to be placed into /etc/init.d to control
the controller, watchdog, and assassin
appdcontroller-db.sh: a file intended to be placed into /etc/init.d to control
the mysql database
appdynamics-machine-agent.sh: a file to start the machine agent
assassin.sh: a script run on a failed-over primary to kill the old primary
failover.sh: a script run on a secondary to become the new primary
install-init.sh: an installer for the appdcontroller.sh
uninstall-init.sh: an uninstaller for the appdcontroller.sh
watchdog.sh: run on a secondary to watch the primary and maybe failover
watchdog.settings.template: copy this to watchdog.settings to override defaults
appdservice-root.sh: a null privilege escalation wrapper
appdservice-pbrun.sh: a privilege escalation wrapper around pbrun
appdservice.c: a privilege escalation c program
numa.settings.template: a template file containing numa static node assignments
numa-patch-controller.sh: a script to edit numa hooks into controller.sh
appdcontroller-db.sysconfig: source files for system configuration
appdcontroller.sysconfig
appdynamics-machine-agent.sysconfig
save_mysql_passwd.sh: a script used to obfuscate and save the mysql root password
getaccess.sh: a script to extract the access key from a database to set
up monitoring
setmonitor.sh: a script to patch various files to set up controller
monitoring
appdstatus.sh: a script to replace 'service appdcontroller status' on
systemd machines
mysqlclient.sh: a script that uses the built-in authentication mechanism to
allow the user to execute mysql commands
failover_pre_hook.sh
failover_hook.sh: these files may be created to put site-specific commands
that get executed before and after a secondary becomes the new primary.
this would be, for example, a place to do a REST call to change a CNAME
record in a DNS server if using that mechanism to route traffic to the active
node.
Installation notes:
This software is intended to connect the appdynamics controller into linux's
service machinery. This optionally includes a watchdog process running on the
secondary HA node that will initiate a failover if a failure is detected in
the primary controller or database.
Permissions:
If the controller is to be run as a non-root user, part of the
installation cannot be directly automated, as it involves installing of a
system service into /etc/init.d and ancillary directories using install-init.sh
Prerequisites:
--------------
1) Ssh must be installed in such a way that the user the controller is to
be run as has symmetrical passwordless ssh access. This is done by generating
a key pair on each node, and placing the other's public key into the appropriate
authorized_keys file. in detail, assuming user appduser, node1 and node2
on node1:
su - appduser
mkdir -p .ssh
ssh-keygen -t rsa -N "" -f .ssh/id_rsa
scp .ssh/id_rsa.pub node2:/tmp
on node2:
su - appduser
mkdir -p .ssh
ssh-keygen -t rsa -N "" -f .ssh/id_rsa
cat /tmp/id_rsa.pub >> .ssh/authorized_keys
scp .ssh/id_rsa.pub node1:/tmp
on node1:
cat /tmp/id_rsa.pub >> ~/.ssh/authorized_keys
All of the above commands may not be needed, and some of them may prompt for a
password.
Permissions need to be as below:
chmod 700 .ssh
chmod 644 .ssh/id_rsa.pub
chmod 600 .ssh/id_rsa
chmod 600 .ssh/authorized_keys
To check if the passwordless ssh succeeds test it with the command below.
ssh -oNumberOfPasswordPrompts=0 other_node "echo success"
2) reliable symmetrical reverse host lookup must be configured. the best
way is to place the host names into each /etc/hosts file. reverse DNS adds
an additional point of failure.
a) /etc/nsswitch.conf should have files placed before dns. example:
hosts: files dns
b) /etc/hosts:
192.168.144.128 host1
192.168.144.137 host2
3) each machine must have the root and data directory writable by the
appropriate appdynamics user:
ls -lad /opt/AppDynamics/Controller
drwxr-xr-x. 18 appduser users 4096 Jan 26 18:18 /opt/AppDynamics/Controller
4) the primary controller should be installed as a standalone controller;
the secondary should not be installed at all.
Installation:
-------------
On the primary, unpack the shar file using bash into a directory HA under the
controller root install subdirectory.
cd /opt/AppDynamics/Controller
mkdir -p HA
chmod +w *
bash HA.shar
Mysql Password:
---------------
newer controllers remove the db/.rootpw file from the controller installation for
security reasons, plaintext passwords in data files being a known vulnerability.
as the HA package requires frequent database access, it is impractical to prompt
for the password every time the database is used. accordingly, we decrypt the
password at each required access from a data file. this data file must be written
by the save_mysql_passwd.sh script before running any component of the HA toolkit.
cd HA
./save_mysql_passwd.sh
this will prompt for the mysql root password
Activation:
-----------
The key script to replicate the primary database to the secondary, make all the
appropriate state changes, and activate the HA pair is the replicate.sh script.
it is run on an active controller. Attempts to run it on a passive controller
will be rejected. it has a few specialized options, but it has reasonable
defaults and is extracts a lot of configuration information from the existing
installation. the most simple usage is to activate a HA pair immediately.
run the following as the same user as appdynamics is running as.
since the controller is taken down, the command will prompt for a confirmation message.
./replicate.sh -s node2 -f -w -e proxy
when it has completed, the HA pair will be running and replicating.
If running as non-root, the command asks that some commands manually be run as
root to complete the installation.
Incremental Activation:
-----------------------
Runs of the replicate script without the -f option will perform an imperfect
copy of the primary controller to the secondary without taking the primary down.
This can be used to minimize the downtime necessary to do the initial
installation. if the data volume to replicate is large, several runs without
the -f option would approach a perfect copy over a period of days. the final
activation with -f during a maintenance window would only copy those data filesi
that differ from the last copy.
Privilege Escalation:
---------------------
the install-init.sh script is used to install the init scripts, and to set
up a controlled privilege escalation. this can take the form of sudo settings,
or one of 3 flavors of /sbin/appdservice. run install-init.sh for usage.
Sudo:
----
if sudo is used, the following commands need to be executed by the appd user,
and should be added to the sudoers file or LDAP resource. note that they
need to be executable without entering a password, so the NOPASSWD: flag
must be used.
service appdcontroller *
service appdcontroller-db *
service appdynamics-machine-agent *
chkconfig appdcontroller *
chkconfig appdcontroller-db *
chkconfig appdynamics-machine-agent *
update-rc.d appdcontroller *
update-rc.d appdcontroller-db *
update-rc.d appdynamics-machine-agent *
Service Control:
----------------
After activation, the controller service and HA facility can be controlled
using the linux service command. these options must be executed as root.
The default installation will automatically shut down the controller when
the system is halted, and automatically start it at boot time.
service appdcontroller start
service appdcontroller stop
an additional service, appdcontroller-db, is used to manage the database.
a sensible dependency between the two services is implemented
Status:
-------
Once installed as a service, the linux service utility can be run on either
node to report the current state of the replication, background processes, and
the controller itself.
service appdcontroller status
Watchdog:
---------
If enabled, this background process running on the secondary will monitor the
primary controller and database, and if it detects a failure, will initiate a
failover automatically. The failure mode timings are defined in watchdog.sh.
The watchdog is only enabled if the file <controller root>/HA/WATCHDOG_ENABLE
exists. Removing the file causes the watchdog to exit.
to enable the watchdog, as root:
touch <controller root>/HA/WATCHDOG_ENABLE
chmod 777 <controller root>/HA/WATCHDOG_ENABLE
service appdcontroller start
running the replicate.sh script with the -w option at final activation will
create the watchdog control file automatically.
Assassin:
---------
After a failover, it is possible that the old primary may come online. If this
occurs, the load balancer may send load to the old primary. To prevent this,
the new primary continually polls the old primary and if it becomes accessible,
kills it and inhibits it from starting again.
Failover:
---------
A manual failover can be triggered by running failover.sh on the secondary.
This will kill the watchdog and activate the database. it will also try to
assassinate the old primary.
This only happens if replication is broken. if replication is good, we just
deactivate the other appserver and activation this one, while leaving the db
up. this case also does not fire up the assassin.
Logging:
--------
the logs directory contains several status and progress logs of the various components.
Remote controller monitoring
----------------------------
If desired it is possible to have the controller's internal Java app agent report to
another controller. This is most often useful if two or more controllers have been
deployed on-premises. Having them both report their health to a controller monitor
simlifies the monitoring of them all as common health rules and notification policies are
more easily re-used.
At least four pieces of information are needed to configure remote controller
monitoring:
- controller monitor's hostname
- controller monitor's port
- account name within controller monitor
- controller monitor's access key for that account
- [optional] application name to report under
The controller monitor's account names and access keys can be determined with:
cd <controller install dir>
echo "select access_key,name,id from account\G"| bin/controller.sh login-db
this has been put into a script:
./getaccess.sh -p password -h monitorhost:3388
this will output the access key. you can specify account name.
see usage.
You can send a controller's app agent output to another controller with hostname
"cmonitor", access_key "ac-ce-ss-key", account name "customer1", application name
'Prod HA pair' with:
./replicate.sh -s <secondary> -m url=http://cmonitor:8090,access_key="ac-ce-ss-key",account_name=customer1,app_name='Prod HA pair' -f
Machine Agent
-------------
Having a machine agent on both primary and secondary servers is a pre-requisite step
to simple monitoring and warning of critical health issues affecting the stability
of the HA controller pair. Getting to this state involves:
1. downloading and installing the machine agent on both primary and
secondary servers from download.appdynamics.com. For compatibility see
docs.appdynamics.com for your version of the controller.
Ensure that the machine agent install directory is the *same* for both
primary and secondary servers.
2. Ensure that the same version of the HA Toolkit is available on both
primary and secondary servers. Use scp or replicate.sh -s <other>
3. As root (re)run HA Toolkit install on both primary and secondary servers
including '-a <agent install dir>' parameter. For example:
sudo ./install-init.sh -s -a /opt/appdyn/machine-agent/4.1.5.1
if the machine agent was extracted into the parent of the appdynamics
controller, or the controller directory itself, the -a may be ommitted.
4. As regular AppD user (re)run replicate.sh .. -f to shutdown controller and
configure all remaining files with an extra parameter referring to machine
agent install directory. For example:
replicate.sh -s <secondary> -e https://proxy -a /opt/appdyn/machine-agent/3.9.0.0 -t 0 -z -f
If a remote controller monitor has been configured, include that '-m' option in the
replicate.sh command to ensure the machine agents report there also. For example:
./replicate.sh -s <secondary>
-m url=http://cmonitor:8090,access_key="ac-ce-ss-key",account_name=customer1,app_name='Prod HA pair'
-a /opt/appdyn/machine-agent/3.9.0.0 -f
5. please note that the machine agent will be run as the same user as
the mysql database.
NUMA
----
on a numa machine, it may be useful, for performance reasons, to statically
partition the machine to run mysqld on one set of nodes and the java appserver on
another set of nodes. this can be easily done by running numa-patch-controller.sh
from the HA directory, and copying the numa.settings.template to numa.settings.
edit numa.settings as needed.
Best Practices:
---------------
If possible, a dedicated network connection should be provisioned between the
HA pair. this set of interfaces should be the ones placed into the /etc/hosts
files, and used as the argument for the -s option to the replicate.sh script.
Backups are best done by stopping the appdcontroller service on the secondary
and performing a file-level copy of the appdynamics directories. these can
be incremental or complete, depending on the reliability of your solution.
when the backup is done, simply start the service; replication will catch up
and guarantee the integrity of your data.
A load balancer should do a GET to
http://<controller>:<port>/controller/rest/serverstatus
to determine which of the two controllers is active. the active node will
return a HTTP 200, and the response will contain <available>true</available>.
should it be necessary to have a hook in the failover process, for example to update
a dynamics DNS service or to notify a load balancer or proxy, the failover.sh script calls
failover_pre_hook.sh and/or failover_hook.sh if they exist and are executable.
Version and Copyright
---------------------
$Id: README.txt 3.46 2019-03-12 07:01:50 cmayer Exp $
Copyright 2016 AppDynamics, Inc
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.