Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task][Go SDK] Disallow iterators with initial timestamp values (ET,V) and (ET, K,V) -> not in the beam model. #22404

Closed
lostluck opened this issue Jul 21, 2022 · 1 comment · Fixed by #22435
Assignees
Labels
done & done Issue has been reviewed after it was closed for verification, followups, etc. go P1 task

Comments

@lostluck
Copy link
Contributor

lostluck commented Jul 21, 2022

What needs to happen?

Timestamp metadata for individual elements is lost after sending values through a GroupByKey. This is not an issue, and is WAI as a consequence of the post GBK coder (WindowedValueHeader+KV<K, Iter>), as the runner has combined timestamps.

We need to disallow usage of those iterators: They don't work because they're not in the beam model.

After a GBK, the whole <K, V*> element has a single timestamp, and while side inputs are placed into the corresponding windows, they do not preserve their timestamps.

As such, it's just a painpoint waiting to happen.

The best path forward is to have a clear error when we detect these forms instructing users that this is decomissioned and not part of the model and instruct them on the model instead. The existing code that allows them here can return this error instead.

See next comment for other places that need updating as a result.

This is separate from whether the SDK supports specifying timestamp combine strategies for a window. Presently it does not, and should default to "End Of Window" for a given Post GBK element.

Issue Priority

Priority: 2

Issue Component

Component: sdk-go

@lostluck
Copy link
Contributor Author

lostluck commented Jul 22, 2022

Discussing this with @robertwb and @reuvenlax indicates that this is something incorrect in the Go SDK, and shouldn't be included.

By the Model and the Coders, the timestamps for individual values are either combined (in the GBK case) or dropped after being used for window assignment (for Side inputs).

So ultimately, instead we need to simply fail these pipelines at construction time (with an error referring to this issue and explain this part of the model).

There are also some bad tests & documentation that need correction.

// func (*typex.EventTime, *int) bool returns {typex.EventTime, int}

{func(*typex.EventTime) bool { return false }, false}, // no values

Reflective Iterators should have the handling code removed:

case t == typex.EventTimeType:

legacy template:

exec.RegisterInput(reflect.TypeOf((*func (*typex.EventTime, *{{$x.Type}})bool)(nil)).Elem(), iterMakerET{{$x.Name}})

@lostluck lostluck changed the title [Task][Go SDK] Define behavior of timestamp reading iterables for GBK value streams [Task][Go SDK] Disallow iterators with initial timestamp values (ET,V) and (ET, K,V) -> not in the beam model. Jul 22, 2022
@lostluck lostluck added P1 and removed P2 labels Jul 22, 2022
@jrmccluskey jrmccluskey self-assigned this Jul 25, 2022
@chamikaramj chamikaramj added the done & done Issue has been reviewed after it was closed for verification, followups, etc. label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
done & done Issue has been reviewed after it was closed for verification, followups, etc. go P1 task
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants