[BEAM-8626] Implement status fn api handler in python sdk #10598

y1chi · 2020-01-15T18:04:30Z

Please add a meaningful description for your change here

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang	SDK	Apex	Dataflow	Gearpump	Samza	Spark
Go		---	---	---	---
Java
Python		---		---	---
XLang	---	---	---	---	---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

y1chi · 2020-01-16T17:53:06Z

R: @angoenka

angoenka

Thanks @y1chi !

sdks/python/apache_beam/runners/test/utils.py

sdks/python/apache_beam/runners/worker/sdk_worker.py

angoenka · 2020-01-17T02:44:00Z

sdks/python/apache_beam/runners/worker/sdk_worker.py

@@ -110,6 +112,15 @@ def __init__(self,
        data_channel_factory=self._data_channel_factory,
        fns=self._fns)

+    if status_address:


Let's move it in sdk_worker_main where we keep other reporting related code.

I need to do the actual initialization inside sdk_worker since I want to pass the active bundle cache in sdk worker in order to report the dangling operation.

Sounds good

angoenka · 2020-01-17T02:55:40Z

sdks/python/apache_beam/runners/worker/worker_status.py

+from apache_beam.runners.worker.worker_id_interceptor import WorkerIdInterceptor
+
+
+def thread_dump():


We have thread dump code in sdk_worker_main.py under get_thread_dump.
Shall we reuse it or move it here.

I made few changes to the thread dump format. I'll reuse the function, I think eventually we probably won't need the status http server.

I agree, we can get rid of status http server.

As mentioned above, Lets add a jira to clean StatusServer in sdk_worker_main

angoenka · 2020-01-17T03:13:40Z

sdks/python/apache_beam/runners/worker/worker_status.py

+            beam_fn_api_pb2.WorkerStatusResponse(id=request.id,
+                                                 status_info=response))
+
+  def generate_status_response(self):


We can also expose it over a http server on dynamic port.

it's possible but since eventually we'll be able to query the runner like localhost:port/sdk_status?id=<sdk_id>, it has same effect as exposing it individually.

Sounds reasonable.

sdks/python/apache_beam/runners/worker/worker_status.py

y1chi · 2020-01-21T19:02:37Z

retest this please

angoenka

LGTM.
Few minor comments.

angoenka · 2020-01-21T19:11:24Z

sdks/python/apache_beam/runners/worker/sdk_worker.py

@@ -110,6 +112,15 @@ def __init__(self,
        data_channel_factory=self._data_channel_factory,
        fns=self._fns)

+    if status_address:


Sounds good

angoenka · 2020-01-21T19:19:17Z

sdks/python/apache_beam/runners/worker/sdk_worker.py

+            status_address, self._bundle_processor_cache)
+      except Exception:
+        traceback_string = traceback.format_exc()
+        _LOGGER.info('Error creating worker status request handler, skipping '


Suggested change

_LOGGER.info('Error creating worker status request handler, skipping '

_LOGGER.warn('Error creating worker status request handler, skipping '

angoenka · 2020-01-21T19:20:47Z

sdks/python/apache_beam/runners/worker/sdk_worker_main.py

@@ -78,8 +67,7 @@ def do_GET(self):  # pylint: disable=invalid-name
        self.send_header('Content-Type', 'text/plain')
        self.end_headers()

-        for line in StatusServer.get_thread_dump():
-          self.wfile.write(line.encode('utf-8'))
+        self.wfile.write(thread_dump())


Lets add a jira to clean StatusServer from here completely once we have rolled out Debug capture.

created https://issues.apache.org/jira/browse/BEAM-9165

angoenka · 2020-01-21T19:21:51Z

sdks/python/apache_beam/runners/worker/worker_status.py

+from apache_beam.runners.worker.worker_id_interceptor import WorkerIdInterceptor
+
+
+def thread_dump():


As mentioned above, Lets add a jira to clean StatusServer in sdk_worker_main

angoenka · 2020-01-21T19:24:47Z

sdks/python/apache_beam/runners/worker/worker_status.py

+    stack_traces[stack_trace].append(thread_ident_name)
+
+  all_traces = ['=' * 10 + 'THREAD DUMP' + '=' * 10]
+  for stack, identity in stack_traces.items():


Let's add a jira to group threads which have same thread stack for easier analysis and reducing text size.
This can be a starter task for new beam contributors.

isn't it already included in this PR?

You are right. It's already in this PR.
We can print names of all the threads along with count so that we don't miss any information.

angoenka · 2020-01-21T19:27:07Z

sdks/python/apache_beam/runners/worker/worker_status.py

+  return '\n'.join(x.encode('utf-8') for x in all_traces)
+
+
+def active_processing_bundles_state(bundle_process_cache):


This can be private method.

angoenka · 2020-01-21T21:29:08Z

sdks/python/apache_beam/runners/worker/worker_status.py

+    stack_traces[stack_trace].append(thread_ident_name)
+
+  all_traces = ['=' * 10 + 'THREAD DUMP' + '=' * 10]
+  for stack, identity in stack_traces.items():


You are right. It's already in this PR.
We can print names of all the threads along with count so that we don't miss any information.

angoenka · 2020-01-21T21:41:05Z

sdks/python/apache_beam/runners/worker/worker_status.py

+  all_traces = ['=' * 10 + 'THREAD DUMP' + '=' * 10]
+  for stack, identity in stack_traces.items():
+    ident, name = identity[0]
+    trace = '--- Thread #%s name: %s %s---\n' % (


Suggested change

trace = '--- Thread #%s name: %s %s---\n' % (

trace = '--- Threads (%d) %s --- \n' % (len(identity), [ident+':'+name for (ident, name) in identity])

this is already printed in a separated line below.

angoenka

Thanks!
LGTM

angoenka · 2020-01-22T00:31:03Z

Retest this please

angoenka · 2020-01-22T00:31:48Z

Retest this please

angoenka · 2020-01-22T00:32:15Z

Run Python PreCommit

y1chi · 2020-01-23T18:52:27Z

Retest this please

angoenka · 2020-01-23T19:24:59Z

Retest this please

y1chi · 2020-01-23T21:57:39Z

retest this please

angoenka · 2020-01-23T22:25:44Z

Retest this please

angoenka · 2020-01-23T22:29:23Z

Run Dataflow ValidatesRunner

angoenka · 2020-01-23T22:30:16Z

I am not sure why the tests are not running on this PR.

y1chi · 2020-01-23T23:38:41Z

retest this please

angoenka · 2020-01-24T18:54:11Z

retest this please

angoenka · 2020-01-24T19:18:57Z

Test link: https://builds.apache.org/job/beam_PreCommit_Python_Commit/10886/

y1chi · 2020-01-24T21:34:51Z

retest this please

angoenka · 2020-01-24T21:52:34Z

retest this please

angoenka · 2020-01-27T22:27:19Z

retest this please

y1chi · 2020-01-28T01:27:32Z

Run PythonLint PreCommit

angoenka · 2020-01-28T01:29:35Z

Run PythonLint PreCommit

angoenka · 2020-01-28T01:42:02Z

Run PythonLint PreCommit

y1chi requested a review from angoenka January 15, 2020 18:04

y1chi force-pushed the BEAM-8626 branch 2 times, most recently from 93665a9 to 7cc2797 Compare January 15, 2020 21:11

angoenka reviewed Jan 17, 2020

View reviewed changes

angoenka reviewed Jan 21, 2020

View reviewed changes

angoenka approved these changes Jan 22, 2020

View reviewed changes

y1chi force-pushed the BEAM-8626 branch from af30777 to 04c14a1 Compare January 23, 2020 21:51

y1chi added 3 commits January 23, 2020 13:52

[BEAM-8626] Implement status fn api handler in python sdk

747cd9c

Address comments

549d7c1

fixup

d753736

y1chi force-pushed the BEAM-8626 branch from 04c14a1 to d753736 Compare January 23, 2020 21:55

y1chi force-pushed the BEAM-8626 branch from 9116e0a to 2e02479 Compare January 24, 2020 21:21

fix test

b1bd466

y1chi force-pushed the BEAM-8626 branch from 2e02479 to b1bd466 Compare January 24, 2020 21:32

angoenka merged commit d1b70d6 into apache:master Jan 28, 2020

		from apache_beam.runners.worker.worker_id_interceptor import WorkerIdInterceptor


		def thread_dump():

	_LOGGER.info('Error creating worker status request handler, skipping '
	_LOGGER.warn('Error creating worker status request handler, skipping '

		return '\n'.join(x.encode('utf-8') for x in all_traces)


		def active_processing_bundles_state(bundle_process_cache):

	trace = '--- Thread #%s name: %s %s---\n' % (
	trace = '--- Threads (%d) %s --- \n' % (len(identity), [ident+':'+name for (ident, name) in identity])

[BEAM-8626] Implement status fn api handler in python sdk #10598

[BEAM-8626] Implement status fn api handler in python sdk #10598

Conversation

y1chi commented Jan 15, 2020

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

y1chi commented Jan 16, 2020

angoenka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

y1chi commented Jan 21, 2020

angoenka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angoenka left a comment

Choose a reason for hiding this comment

angoenka commented Jan 22, 2020

angoenka commented Jan 22, 2020

angoenka commented Jan 22, 2020

y1chi commented Jan 23, 2020

angoenka commented Jan 23, 2020

y1chi commented Jan 23, 2020

angoenka commented Jan 23, 2020

angoenka commented Jan 23, 2020

angoenka commented Jan 23, 2020

y1chi commented Jan 23, 2020

angoenka commented Jan 24, 2020

angoenka commented Jan 24, 2020

y1chi commented Jan 24, 2020

angoenka commented Jan 24, 2020

angoenka commented Jan 27, 2020

y1chi commented Jan 28, 2020

angoenka commented Jan 28, 2020

angoenka commented Jan 28, 2020