You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Start Pulp normally and perform a basic sync to sanity check Pulp.
start Pulp normally. Allow the system to idle for 40 seconds and observe the logs. Verify that no errors are reported while Pulp idles over 40 seconds.
Upgrade test
This is a 1-time manual test on rpm based installation on EL7.
This is not a dev test and likely should not be automated.
Upgrade to this release from an earlier release of Pulp.
Verify that the /usr/lib/systemd/system/pulp_resource_manager.service contains the line: --heartbeat-interval=5
Worker Failure Testing
With Pulp normally started, look at the output of pulp-admin status and verify all expected workers are present. Then kill -9 a specific worker, for example reserved_resource_worker-0 with sudo pkill -9 -f reserved_resource_worker-0.
After 30 seconds have passed check pulp-admin status
Verify that the worker is no longer shown. It is expected that the status API will not show a killed worker 30 seconds after the kill occurs.
Verify that errors are shown in the log that the worker has gone missing.
Resource Manager Failover Testing
Testing normal concurrent operation
Start Pulp normally, but keep the resource manager stopped. Then in one terminal on the box run one resource manager A with:
Verify that you see the full Celery "banner" with stars in resource manager A.
Verify that you see a log statement that resource manager A has acquired the lock.
Verify that you do not see any output from resource manager B.
Verify that you see a log statement that resource manager B is a hot spare.
Verify that both resource_managers are reported via pulp-admin status
Failover due to graceful shutdown
Start both resource_manager started processes as described above.
Ctrl+C from resource manager A
Verify that within 5 seconds the logs emit a statement like new lock acquired by 'resource_manager@yyyyyyy'
Verify that you see a log statement stating that failover has occurred, and resource manager B is the new primary resource manager
Verify that resource manager B displays the celery banner with stars within 5 seconds of the Ctrl-C
Verify that only resource manager B is shown in pulp-admin status
Failover due to killing
Start both resource_manager started processes as described above.
kill -9 the resource_manager that has acquired the lock
Verify that resource manager B displays the celery banner with stars within 30 seconds of the kill
Verify that within 5 seconds the logs emit a statement like new lock acquired by 'resource_manager@yyyyyyy'
Verify that you see a log statement stating that failover has occurred, and resource manager B is the new primary resource manager
Verify that only resource manager B is shown in pulp-admin status
Celerybeat Failover Testing
This can't be done on 1 computer without changing Pulp code due to both celerybeats having the same name. You can do the same test on two machines with one celerybeat on one, and a second on another. I'm going to apply this diff to the code to allow me to do it on 1 machine in two separate terms.
diff --git a/server/pulp/server/async/scheduler.py b/server/pulp/server/async/scheduler.py
index d745406..b158386 100644
--- a/server/pulp/server/async/scheduler.py+++ b/server/pulp/server/async/scheduler.py@@ -28,8 +28,9 @@ import pulp.server.logs # noqa
_logger = logging.getLogger(__name__)
+import random
# setting the celerybeat name
-CELERYBEAT_NAME = constants.SCHEDULER_WORKER_NAME + "@" + platform.node()+CELERYBEAT_NAME = constants.SCHEDULER_WORKER_NAME + "@" + str(random.randint(1,10000))
class EventMonitor(threading.Thread):
Testing normal concurrency operation
Start Pulp normally, but keep celerybeat stopped. Then in one terminal on the box run celerybeat A with:
Verify that pulp-admin status shows both scheduler@ entries
Verify that the logs contain a statement like 'New lock acquired by scheduler@xxxxxxxx'
Verify that a log statement exists identifying the extra celerybeat as a hot spare.
Failover due to graceful shutdown
Start both celerybeat processes as described above.
Ctrl-C the celerybeat that has acquired the lock.
Verify that within 5 seconds the logs emit a statement like New lock acquired by 'scheduler@yyyyyyy'
Verify that you see a log statement stating that failover has occurred, and that the other celerybeat is the new primary celerybeat instance
Verify that pulp-admin status shows only one scheduler@ entry
Verify that no Errors are shown in the logs
Failover due to killing
Start both celerybeat processes as described above.
kill -9 the pid of the celerybeat that has the lock. If you start celerybeatA first then it will get the lock.
Verify that within 30 seconds the logs emit a statement like New lock acquired by 'scheduler@yyyyyyy'
Verify that you see a log statement stating that failover has occurred, and that the other celerybeat is the new primary celerybeat instance
Verify that an Error is emitted saying something like: Worker 'scheduler@xxxx' has gone missing
Verify that pulp-admin status shows only one scheduler@ entry
The text was updated successfully, but these errors were encountered:
This is to test a rather large story: https://pulp.plan.io/issues/2509
Sanity Checking
Start Pulp normally and perform a basic sync to sanity check Pulp.
start Pulp normally. Allow the system to idle for 40 seconds and observe the logs. Verify that no errors are reported while Pulp idles over 40 seconds.
Upgrade test
This is a 1-time manual test on rpm based installation on EL7.
This is not a dev test and likely should not be automated.
Upgrade to this release from an earlier release of Pulp.
Verify that the
/usr/lib/systemd/system/pulp_resource_manager.service
contains the line:--heartbeat-interval=5
Worker Failure Testing
With Pulp normally started, look at the output of
pulp-admin status
and verify all expected workers are present. Then kill -9 a specific worker, for example reserved_resource_worker-0 withsudo pkill -9 -f reserved_resource_worker-0
.After 30 seconds have passed check
pulp-admin status
Verify that the worker is no longer shown. It is expected that the status API will not show a killed worker 30 seconds after the kill occurs.
Verify that errors are shown in the log that the worker has gone missing.
Resource Manager Failover Testing
Testing normal concurrent operation
Start Pulp normally, but keep the resource manager stopped. Then in one terminal on the box run one resource manager A with:
sudo -u apache /bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@boxA -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_managerA.pid --heartbeat-interval=5
On a second terminal run resource manager B with:
sudo -u apache /bin/python /usr/bin/celery worker -A pulp.server.async.app -n resource_manager@boxB -Q resource_manager -c 1 --events --umask 18 --pidfile=/var/run/pulp/resource_managerB.pid --heartbeat-interval=5
Verify that you see the full Celery "banner" with stars in resource manager A.
Verify that you see a log statement that resource manager A has acquired the lock.
Verify that you do not see any output from resource manager B.
Verify that you see a log statement that resource manager B is a hot spare.
Verify that both resource_managers are reported via
pulp-admin status
Failover due to graceful shutdown
Start both resource_manager started processes as described above.
Ctrl+C from resource manager A
Verify that within 5 seconds the logs emit a statement like new lock acquired by 'resource_manager@yyyyyyy'
Verify that you see a log statement stating that failover has occurred, and resource manager B is the new primary resource manager
Verify that resource manager B displays the celery banner with stars within 5 seconds of the Ctrl-C
Verify that only resource manager B is shown in
pulp-admin status
Failover due to killing
Start both resource_manager started processes as described above.
kill -9 the resource_manager that has acquired the lock
Verify that resource manager B displays the celery banner with stars within 30 seconds of the kill
Verify that within 5 seconds the logs emit a statement like new lock acquired by 'resource_manager@yyyyyyy'
Verify that you see a log statement stating that failover has occurred, and resource manager B is the new primary resource manager
Verify that only resource manager B is shown in
pulp-admin status
Celerybeat Failover Testing
This can't be done on 1 computer without changing Pulp code due to both celerybeats having the same name. You can do the same test on two machines with one celerybeat on one, and a second on another. I'm going to apply this diff to the code to allow me to do it on 1 machine in two separate terms.
Testing normal concurrency operation
Start Pulp normally, but keep celerybeat stopped. Then in one terminal on the box run celerybeat A with:
sudo -u apache /bin/python /usr/bin/celery beat --app=pulp.server.async.celery_instance.celery --scheduler=pulp.server.async.scheduler.Scheduler --pidfile=/var/run/pulp/celerybeatA.pid
On a second terminal run celerybeat B with:
sudo -u apache /bin/python /usr/bin/celery beat --app=pulp.server.async.celery_instance.celery --scheduler=pulp.server.async.scheduler.Scheduler --pidfile=/var/run/pulp/celerybeatB.pid
Verify that
pulp-admin status
shows both scheduler@ entriesVerify that the logs contain a statement like 'New lock acquired by scheduler@xxxxxxxx'
Verify that a log statement exists identifying the extra celerybeat as a hot spare.
Failover due to graceful shutdown
Start both celerybeat processes as described above.
Ctrl-C the celerybeat that has acquired the lock.
Verify that within 5 seconds the logs emit a statement like New lock acquired by 'scheduler@yyyyyyy'
Verify that you see a log statement stating that failover has occurred, and that the other celerybeat is the new primary celerybeat instance
Verify that
pulp-admin status
shows only one scheduler@ entryVerify that no Errors are shown in the logs
Failover due to killing
Start both celerybeat processes as described above.
kill -9 the pid of the celerybeat that has the lock. If you start celerybeatA first then it will get the lock.
Verify that within 30 seconds the logs emit a statement like New lock acquired by 'scheduler@yyyyyyy'
Verify that you see a log statement stating that failover has occurred, and that the other celerybeat is the new primary celerybeat instance
Verify that an Error is emitted saying something like:
Worker 'scheduler@xxxx' has gone missing
Verify that
pulp-admin status
shows only one scheduler@ entryThe text was updated successfully, but these errors were encountered: