-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python CombineGlobally().with_fanout() cause duplicate combine results for sliding windows #20528
Comments
If this is still true, then it should be very easy to reproduce. @robertwb or @tvalentyn or @pabloem do you know? |
@BjornPrime can you try to reproduce this ? |
I've successfully reproduced the issue. Compare with fanout:
and without:
|
Thanks, I'll add it to our interrupts tracker. |
I feel like the labels |
it may be neither actually. |
It sounds like data loss, which should certainly be P1 |
Sure, we can investigate as P1. P2 was based on the suspicion that this behavior has been like this from initial implementation. |
If we cannot quick fix, we should remove with_fanout until we can fix it, or at least warn somehow, and add to release notes for released SDKs. |
started working on this today, no PR yet. |
Does not seem to repro with sessions. Passing test:
|
Yea, it probably has to do with any windowfn that duplicates an element into multiple windows |
Can we re-window only if we detect accumulating mode? See: #23828 Alternatively, we could not rewindow if we detect slidng windows. disabling sliding windows is straightforward too if we want to go that route for now. |
This is the only issue currently blocking the 2.43.0 release without an existing cherry-pick request. Can we push this to the next release ? |
I discussed the fix yesterday w/ Robert and we will pursue a slightly different fix, which should be ready shortly. however, this bug has been there from the very first commit implementing |
Ack. Removing this from the milestone for now. |
current fix caused an OOM in one of customers' pipelines , planning to revert and investigate further. |
Is it possible that the OOM is unavoidable because of the fix? This is a pretty seriouis data corruption problem, no? I suppose it is not a regression but I would very much like 2.44.0 to have correct results in this situation. |
it's unlikely
that's correct. it has been there since when with_fanout was added.
I am planning to come back to it next week when I am on a rotational duty again. |
OK given this context I am going to remove the release milestone. We can document the range of releases for which it does not work. |
Actually simply disabling it would be a reasonable way to protect user data and we should do that in the immediate term. |
The mitigation has been cherrypicked, so I am removing it from the 2.44.0 milestone. The bug remains open. |
If this is now disabled, is it perhaps P2? |
If it isn't actively being worked on, I suggest downgrading and unassigning. |
any progress on this ? |
we had a tentative fix but it has caused a performance regression and was rolled back; since then it remains in the backlog. |
I need the combiner because group-by is ineffective. And i needed it to further fanout as its streaming kafka input at quite a high rate. any possible workaround approaches ? |
Ack. this feedback would help w/ prioritization. I'm not sure I can answer based on limited information about the usecase, but for experimentation processes you could apply #23828 locally. These changes only matter at job submission. |
not only there are more than 1 result per window, results for each window got duplicated as well.
here is some code I made to reproduce the issue, just run it with and without
*.with_fanout*
if running with Dataflow runner, add appropriate
*gs://path/*
in*WriteToText*
Imported from Jira BEAM-10617. Original Jira may contain additional context.
Reported by: leiyiz.
The text was updated successfully, but these errors were encountered: