BEAM-7141: add key value timer callback #8739

rakeshcusat · 2019-06-01T03:20:47Z

Key parameter was missing from the callback method. It makes the debugging harder because developer does not have enough context about the callback and its associated data. Key and window parameter will provide enough information in the callback and developer can successfully debug issue.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang	Apex	Dataflow	Gearpump	Samza	Spark
Go	---	---	---	---	---
Java
Python	---		---	---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

merging from upstream

rakeshcusat · 2019-06-01T03:24:42Z

@pabloem @robertwb Can you review this?

rakeshcusat · 2019-06-01T06:28:06Z

@tweise

aaltay · 2019-06-03T20:49:49Z

sdks/python/apache_beam/transforms/core.py

  StateParam = _StateDoFnParam
  TimerParam = _TimerDoFnParam
+  KeyParam = _DoFnParam('KeyParam')


Do we need to add all these args to DoFnProcessparams list? Its only use so far is to check that these args are not used for bundle methods.

sounds good.

aaltay · 2019-06-03T20:50:27Z

sdks/python/apache_beam/runners/common.py

@@ -183,6 +184,8 @@ def __init__(self, obj_to_invoke, method_name):
        self.timestamp_arg_name = kw
      elif v == core.DoFn.WindowParam:
        self.window_arg_name = kw
+      elif v == core.DoFn.KeyParam:


Could you also update the PerWindowInvoker (for process methods) (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L597)? (If possible please also update for window param that was added in the previous PR.)

@aaltay
I may not have a complete background but in my understanding, these parameters are mainly for the timer callback method and may not be exactly the same as the process parameter. do we still need to update _invoke_process_per_window? am I missing something?

You have a valid a point. There may not always be a key that can be passed to process() method. Although sometimes there might be.

Now that we have a KeyParam, a user might write a process method using that e.g. (process(mykey=KeyParam). What would be the reasonable thing to do in this case:

We can do nothing and mykey will literally have the value KeyParam.

We can fail with an error saying that KeyParam is not a valid parameter for process method.

We can try to pass the key (e.g. k, v = element) and set mykey = k. And if that fails (i.e. element is not a K,V) we can fail.

I was leaning towards the third option. I think second option would also be fine and the first option will be confusing.

What do you think?

Is there any way to know at pipeline construction time that the user is using a KV as the input type so that we don't have to defer to pipeline execution time to get an error. If no, I like Ahmets 3rd suggestion as well.

It is possible to do this at pipeline construction time. This will be similar to how type hints detection happens. (It will be inspection at execution time vs inspection at construction time.) However we do not have the machinery in place and I suspect it will make this PR more complicated that it is intended.

We can start with a execution time error and later on improve it to be a construction time error. We can do this backward compatibly by failing in construction time only when input type hints are not in the [K, V] form, preserving the same functionality.

I am also leaning towards option 3. If the implementation scope is bigger than I would fallback to option 2 and complete the 3rd option in a separate PR.

The type inference machinery can tell us whether we certainly have a KV type, whether we certainly do not have a KV type, or that it doesn't know whether we have a KV type (e.g. all it knows is that it's a Python object).

I also like option 3, and don't think it would be that hard.

Why? Key parameter was missing in the timer callback so it makes the debugging harder.

rakeshcusat · 2019-06-23T00:16:23Z

R: @aaltay
I think I have taken care of all required comments. Let me know if I have missed anything.

aaltay

Thank you!

* BEAM-7141: Add key parameter in timer callback Why? Key parameter was missing in the timer callback so it makes the debugging harder.

rakeshcusat and others added 2 commits April 9, 2019 15:55

Merge pull request #1 from apache/master

665570d

merging from upstream

Merge remote-tracking branch 'upstream/master'

f70a239

aaltay requested changes Jun 3, 2019

View reviewed changes

rakeshcusat added 2 commits June 8, 2019 16:34

BEAM-7141: Add key parameter in timer callback

2fe8a68

Why? Key parameter was missing in the timer callback so it makes the debugging harder.

fix lint warnings

c7ebfa6

rakeshcusat force-pushed the BEAM-7141-add-key-value-timer-callback branch from a3f4406 to c7ebfa6 Compare June 8, 2019 23:50

rakeshcusat added 7 commits June 8, 2019 16:54

minor change

5993c1e

minor fix

c3fad39

add one more line for consistent code format

2778175

Merge branch 'master' into BEAM-7141-add-key-value-timer-callback

3d75359

fix lint

d535f93

minor changes

9e18b97

Include StateParam and TimerParam in DoFnProcessParams

b05fb7a

aaltay approved these changes Jun 25, 2019

View reviewed changes

aaltay merged commit fb8efe3 into apache:master Jun 25, 2019

davidcavazos pushed a commit to davidcavazos/beam that referenced this pull request Jul 1, 2019

BEAM-7141: add key value timer callback (apache#8739)

8472374

* BEAM-7141: Add key parameter in timer callback Why? Key parameter was missing in the timer callback so it makes the debugging harder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BEAM-7141: add key value timer callback #8739

BEAM-7141: add key value timer callback #8739

rakeshcusat commented Jun 1, 2019 •

edited

rakeshcusat commented Jun 1, 2019

rakeshcusat commented Jun 1, 2019

aaltay Jun 3, 2019

rakeshcusat Jun 8, 2019

aaltay Jun 3, 2019

rakeshcusat Jun 4, 2019

aaltay Jun 4, 2019 •

edited

lukecwik Jun 5, 2019

aaltay Jun 6, 2019

rakeshcusat Jun 8, 2019

robertwb Jun 10, 2019

rakeshcusat commented Jun 23, 2019

aaltay left a comment

BEAM-7141: add key value timer callback #8739

BEAM-7141: add key value timer callback #8739

Conversation

rakeshcusat commented Jun 1, 2019 • edited

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

rakeshcusat commented Jun 1, 2019

rakeshcusat commented Jun 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaltay Jun 4, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rakeshcusat commented Jun 23, 2019

aaltay left a comment

Choose a reason for hiding this comment

rakeshcusat commented Jun 1, 2019 •

edited

aaltay Jun 4, 2019 •

edited