Support for draining workers #12178

kaushik-develop · 2021-09-24T18:48:11Z

(If this PR fixes a github issue, please add Fixes #<xyz>.)

Fixes #

(or if this PR is one task of a github issue, please add Master Issue: #<xyz> to link to the master issue.)

Master Issue: #

Motivation

Support for draining function workers, to scale in a cluster.

Explain here the context, and why you're making that change. What is the problem you're trying to solve.

Modifications

Describe the modifications you've done.

added new endpoints to drain function assignments off a worker
added new endpoints to get the status of a drain operation

Verifying this change

Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

Added the following to SchedulerManagerTest: testDrain, testGetDrainStatus, testDrainExceptions, testUpdateWorkerDrainMap

(example:)

Added integration tests for end-to-end deployment with large payloads (10MB)
Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

Dependencies (does it add or upgrade a dependency): (yes)
The public API: (no)
The schema: (no)
The default values of configurations: (no)
The wire protocol: (no)
The rest endpoints: (yes)
[new endpoints were added under "admin/v2/worker/", related to draining nodes, and getting status of a drain op]
The admin cli options: (no)
Anything that affects deployment: (no)

Documentation

Check the box below and label this PR (if you have committer privilege).

Need to update docs?

doc-required

(If you need help on updating docs, create a doc issue)
no-need-doc

(Please explain why)
doc

(If this PR contains doc changes)

kaushik-develop · 2021-09-24T21:54:05Z

/pulsarbot run-failure-checks

kaushik-develop · 2021-09-25T19:24:35Z

/pulsarbot run-failure-checks

pulsar-functions/worker/src/main/java/org/apache/pulsar/functions/worker/SchedulerManager.java

...worker/src/main/java/org/apache/pulsar/functions/worker/rest/api/v2/WorkerApiV2Resource.java

pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/v2/Worker.java

pulsar-functions/worker/src/main/java/org/apache/pulsar/functions/worker/SchedulerManager.java

jerrypeng · 2021-09-27T18:13:06Z

pulsar-functions/worker/src/main/java/org/apache/pulsar/functions/worker/SchedulerManager.java

+        long startTime = System.nanoTime();
+        int numRemovedWorkerIds = 0;
+
+        if (drainOpStatusMap.size() > 0) {


I know drainOpStatusMap is a concurrent map but we are reading and writing from different different threads. This makes me nervous. Is there a reason we should synchronize this with drain operation?

I didn't quite understand the question: "Is there a reason we should synchronize this with drain operation?"

updateWorkerDrainMap() does a periodic cleanup of stale information about drained workers. We need to do the operation some time after the drain has finished, when the drained worker is removed from the cluster. Since there is currently no hook (that I know of) into the SchedulerManager when a worker is added to, or removed from the cluster, the cleanup is done through a periodic poll (updateWorkerDrainMap).

The drain operation adds a record into the concurrent map [drainOpStatusMap] when a worker is drained. An implicit assumption in the system is that the drained worker will be removed from the system, soon, by an external orchestrator. When the drained worker is seen to be removed from the system, the drainOpStatusMap is cleaned up of stale information in the updateWorkerDrainMap() code.

PLMK if I misunderstood the qs.

@kaushik-develop there are two independent threads read and updating drainOpStatusMap. One is a schedule periodic task:

https://github.com/apache/pulsar/pull/12178/files#diff-343b3460561c5e794ce7351d663880e37784d37e8c0a877c9e4845b5209a8c84R568

The other is when a drain operation is triggered. Two independent actors can be read and modifying the map concurrently.

That is by design. The external orchestrator is expected to trigger the draining of a worker, and check for drain status. It is also expected to remove the worker after a drain operation completes, and not re-add a worker with the same name as the drained/removed worker for one period (configurable, nominally 60 seconds) after the worker-removal. This will ensure that the two actors work on different entries of the concurrent map. But PLMK if you foresee problems in any specific scenario.

pulsar-functions/worker/src/main/java/org/apache/pulsar/functions/worker/SchedulerManager.java

kaushik-develop · 2021-09-29T04:54:44Z

/pulsarbot run-failure-checks

pulsar-functions/worker/src/main/java/org/apache/pulsar/functions/worker/SchedulerManager.java

kaushik-develop · 2021-10-01T15:03:06Z

/pulsarbot run-failure-checks

Co-authored-by: Kaushik Ghosh <kaushikg@splunk.com>

kaushik-develop force-pushed the drain_workers branch from 622d733 to 5eea4cb Compare September 24, 2021 20:52

jerrypeng assigned kaushik-develop Sep 24, 2021

jerrypeng added the area/function label Sep 24, 2021

jerrypeng added this to the 2.9.0 milestone Sep 24, 2021

kaushik-develop force-pushed the drain_workers branch from 5eea4cb to 52557aa Compare September 25, 2021 14:37

Anonymitaet reviewed Sep 26, 2021

View reviewed changes

pulsar-functions/worker/src/main/java/org/apache/pulsar/functions/worker/SchedulerManager.java Show resolved Hide resolved

Anonymitaet added the doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. label Sep 26, 2021