[BEAM-3418] Send worker_id in all grpc channels to runner harness by angoenka · Pull Request #4587 · apache/beam

angoenka · 2018-02-02T22:27:57Z

DESCRIPTION HERE

Follow this checklist to help us incorporate your contribution quickly and easily:

angoenka · 2018-02-02T22:30:38Z

@aaltay @lukecwik @robertwb Can you please take a look!

aaltay · 2018-02-03T00:43:55Z

sdks/python/setup.py

    'avro>=1.8.1,<2.0.0',
    'crcmod>=1.7,<2.0',
    'dill==0.2.6',
-    'grpcio>=1.0,<2',


What is the reason for this change? AFAIK, some runner (Dataflow) requires grpcio 1.3? Also, it is became more restrictive for users each time we reduce the list of allowed versions for a dependency.

grpcio 1.8 is the first python version which allows passing/receiving client headers

We use header to send worker_id in all channels and GRPC headers are only supported after 1.8.

Sounds good. Just try running dataflow runner test before finishing this PR.

Sure, Tried running apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest

I meant running a job on dataflow using: https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PostCommit_Python_Verify.groovy

Commenting "Run Python PostCommit" on the PR should trigger one.

herohde · 2018-02-03T02:06:36Z

sdks/python/apache_beam/runners/worker/worker_id_interceptor.py

+class WorkerIdInterceptor(grpc.StreamStreamClientInterceptor):
+
+  # Unique worker Id for this worker.
+  _worker_id = os.environ['WORKER_ID'] if os.environ.has_key(


The ID I think you want is this one: https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L41

Yes, For testing I am modifying the internal version of boot.go. Currently its a bit of hack as I artificially generate the id in boot.go for testing. The hack will be removed once we have the multi container thing setup.
I was using woker_id to be more explicit but we can change it to just id.

Sorry if I was unclear: this PR should also change the referenced boot.go code to set the WORKER_ID. The code here should also fail if that ID is not found -- the ID is set by the runner so that the gRPC servers can know who's who; they are not just unique IDs.

Added the relevant boot.go changes

Do we want to fail if WORKER_ID is not found?

For backward compatibility of containers, I would like to assign a UUID if worker_id is not provided.

Do we really need to be backward compatible? This is mostly new code with no production usage. I would prefer to not have it succeed like this. But if you think this is necessary in the interim, we can add a TODO to remove the UUID generation.

Created a jira issue BEAM-3904 to clean it up.
I want to keep keep it around to decouple sdk changes to internal container changes.

herohde · 2018-02-05T23:00:28Z

sdks/python/apache_beam/runners/worker/worker_id_interceptor.py

+class WorkerIdInterceptor(grpc.StreamStreamClientInterceptor):
+
+  # Unique worker Id for this worker.
+  _worker_id = os.environ['WORKER_ID'] if os.environ.has_key(


Sorry if I was unclear: this PR should also change the referenced boot.go code to set the WORKER_ID. The code here should also fail if that ID is not found -- the ID is set by the runner so that the gRPC servers can know who's who; they are not just unique IDs.

angoenka · 2018-02-09T21:22:21Z

@herohde Please have a look. I have updated the PR to include the boot.go changes. However I am not sure how the actual id is passed to the boot.go as a flag. I made similar change in internal version of boot.go and using the new worker image.

angoenka · 2018-03-13T20:57:44Z

retest this please

tvalentyn · 2018-03-16T00:09:14Z

sdks/python/apache_beam/runners/dataflow/internal/apiclient.py

+    # the flag if 'NO_MULTIPLE_SDK_CONTAINERS' is present.
+    # TODO: Cleanup MULTIPLE_SDK_CONTAINERS once we depricate Python SDK till
+    # version 2.4.
+    if ('MULTIPLE_SDK_CONTAINERS' not in self.proto.experiments and


How confident are we to make this a default behavior for 2.5?

I expect this CL to get in 2.5.
In a way this flag is required to help router distinguish between old SDK (sdk till 2.4) and new SDK (sdk from 2.5). So once we do not have any sdk which is older than 2.5, we don't need to distinguish between sdk atleast for MultiSdk functionality and hence it automatically becomes the default feature.

@robertwb @aaltay I am planning to make this feature opt out for new SDKs. Instead should we keep it opt in?

herohde

LGTM for the boot code

lukecwik · 2018-03-16T17:16:58Z

sdks/python/apache_beam/runners/dataflow/internal/apiclient.py

        self.proto.experiments.append(experiment)
+    # Add MULTIPLE_SDK_CONTAINERS flag if its not already present. Do not add
+    # the flag if 'NO_MULTIPLE_SDK_CONTAINERS' is present.
+    # TODO: Cleanup MULTIPLE_SDK_CONTAINERS once we depricate Python SDK till


depricate -> deprecate

angoenka · 2018-03-20T19:40:36Z

@aaltay Can you please take a look?

aaltay · 2018-03-20T20:24:44Z

sdks/python/apache_beam/runners/worker/worker_id_interceptor.py

+class WorkerIdInterceptor(grpc.StreamStreamClientInterceptor):
+
+  # Unique worker Id for this worker.
+  _worker_id = os.environ['WORKER_ID'] if os.environ.has_key(


Do we want to fail if WORKER_ID is not found?

aaltay · 2018-03-20T20:27:10Z

sdks/python/apache_beam/runners/worker/worker_id_interceptor.py

+#
+"""Client Interceptor to inject worker_id"""
+from __future__ import absolute_import
+from __future__ import division


Do we need print_function and division imports?

No, we don't need these imports.
I added them based to resolve the compatibility issue between python 2 and 3 based on https://docs.python.org/3/howto/pyporting.html

Should I remove them?

Let's remove them, if they are not needed now. I do not see print() or / being used here.

aaltay · 2018-03-20T20:30:26Z

sdks/python/apache_beam/runners/worker/worker_id_interceptor.py

+    metadata = []
+    if client_call_details.metadata is not None:
+      metadata = list(client_call_details.metadata)
+    metadata.append(('worker_id', self._worker_id))


Would it be an error (or expected) for client_call_details to already have worker_id?

It should be an error.

aaltay · 2018-03-20T20:32:13Z

sdks/python/apache_beam/runners/worker/sdk_worker.py

    self._worker_count = worker_count
    self._worker_index = 0
    self._control_channel = grpc.insecure_channel(control_address)
+    self._control_channel = grpc.intercept_channel(self._control_channel,


Should we simplify this as:

self._control_channel = grpc.intercept_channel(grpc.insecure_channel(control_address), WorkerIdInterceptor())

aaltay · 2018-03-20T20:33:08Z

sdks/python/apache_beam/runners/worker/log_handler.py

  def __init__(self, log_service_descriptor):
    super(FnApiLogRecordHandler, self).__init__()
    self._log_channel = grpc.insecure_channel(log_service_descriptor.url)
+    self._log_channel = grpc.intercept_channel(self._log_channel,


(Same simplification comment applies here.)

aaltay · 2018-03-20T20:38:30Z

sdks/python/apache_beam/runners/dataflow/internal/apiclient.py

+    if (job_type.startswith('FNAPI_') and
+        'use_multiple_sdk_containers' not in self.proto.experiments and
+        'no_use_multiple_sdk_containers' not in self.proto.experiments):
+      self.proto.experiments.append('use_multiple_sdk_containers')


It is preferable to modify debug_options.experiments (as done above for runner_harness_override). This also properly helps with updating the user visible pipeline options in the UI, and it will auto added to the proto by the loop above.

It would also help combine things related to if job_type.startswith('FNAPI_'): in a single place.

Makes Sense!

angoenka · 2018-03-20T20:49:00Z

sdks/python/apache_beam/runners/worker/worker_id_interceptor.py

+class WorkerIdInterceptor(grpc.StreamStreamClientInterceptor):
+
+  # Unique worker Id for this worker.
+  _worker_id = os.environ['WORKER_ID'] if os.environ.has_key(


For backward compatibility of containers, I would like to assign a UUID if worker_id is not provided.

angoenka · 2018-03-20T20:50:22Z

sdks/python/apache_beam/runners/worker/worker_id_interceptor.py

+    metadata = []
+    if client_call_details.metadata is not None:
+      metadata = list(client_call_details.metadata)
+    metadata.append(('worker_id', self._worker_id))


It should be an error.

angoenka · 2018-03-20T20:52:08Z

sdks/python/apache_beam/runners/worker/sdk_worker.py

    self._worker_count = worker_count
    self._worker_index = 0
    self._control_channel = grpc.insecure_channel(control_address)
+    self._control_channel = grpc.intercept_channel(self._control_channel,


angoenka · 2018-03-20T20:52:52Z

sdks/python/apache_beam/runners/worker/log_handler.py

  def __init__(self, log_service_descriptor):
    super(FnApiLogRecordHandler, self).__init__()
    self._log_channel = grpc.insecure_channel(log_service_descriptor.url)
+    self._log_channel = grpc.intercept_channel(self._log_channel,


angoenka · 2018-03-20T20:54:40Z

sdks/python/apache_beam/runners/worker/data_plane.py

              options=[("grpc.max_receive_message_length", -1),
                       ("grpc.max_send_message_length", -1)])
+          # Add workerId to the grpc channel
+          grpc_channel = grpc.intercept_channel(grpc_channel,


Not Simplifying to keep readability.

angoenka · 2018-03-20T20:57:50Z

sdks/python/apache_beam/runners/dataflow/internal/apiclient.py

+    if (job_type.startswith('FNAPI_') and
+        'use_multiple_sdk_containers' not in self.proto.experiments and
+        'no_use_multiple_sdk_containers' not in self.proto.experiments):
+      self.proto.experiments.append('use_multiple_sdk_containers')


Makes Sense!

aaltay · 2018-03-22T19:59:45Z

Could you squash your commits?

angoenka · 2018-03-22T20:01:14Z

Sure, I will squash them.

Adding use_multiple_sdk_containers flag for FNAPI pipelines.

aaltay reviewed Feb 3, 2018

View reviewed changes

herohde reviewed Feb 3, 2018

View reviewed changes

herohde requested changes Feb 5, 2018

View reviewed changes

angoenka force-pushed the multiprocess_new branch from 5970737 to c9338fa Compare February 5, 2018 23:39

angoenka force-pushed the multiprocess_new branch from c9338fa to 483f26f Compare March 16, 2018 00:02

tvalentyn reviewed Mar 16, 2018

View reviewed changes

herohde approved these changes Mar 16, 2018

View reviewed changes

lukecwik reviewed Mar 16, 2018

View reviewed changes

angoenka force-pushed the multiprocess_new branch 2 times, most recently from bd777cf to 8c92cce Compare March 16, 2018 23:53

aaltay reviewed Mar 20, 2018

View reviewed changes

angoenka commented Mar 20, 2018

View reviewed changes

aaltay approved these changes Mar 22, 2018

View reviewed changes

Send worker_id in all channels to runner

1f52bb8

Adding use_multiple_sdk_containers flag for FNAPI pipelines.

angoenka force-pushed the multiprocess_new branch from 10cadca to 1f52bb8 Compare March 22, 2018 20:06

aaltay merged commit 6740ead into apache:master Mar 22, 2018

Conversation

angoenka commented Feb 2, 2018

Uh oh!

angoenka commented Feb 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

angoenka Feb 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

angoenka commented Feb 9, 2018

Uh oh!

angoenka commented Mar 13, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

herohde left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

angoenka commented Mar 20, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

angoenka commented Feb 2, 2018 •

edited

Loading

angoenka Feb 3, 2018 •

edited

Loading