-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20441][SPARK-20432][SS] Within the same streaming query, one StreamingRelation should only be transformed to one StreamingExecutionRelation #17735
Conversation
Test build #76080 has finished for PR 17735 at commit
|
.collect { case ser: StreamingExecutionRelation => ser } | ||
assert(executionRelations.size == 2) | ||
assert(executionRelations.distinct.size == 1) | ||
query.stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please wrap this in a finally
?
Test build #76208 has finished for PR 17735 at commit
|
Test build #76211 has finished for PR 17735 at commit
|
@brkyvz please take another look |
Jenkins retest this please |
Test build #76347 has finished for PR 17735 at commit
|
Jenkins retest this please |
Test build #76363 has finished for PR 17735 at commit
|
.streamingQuery | ||
val executionRelations = | ||
query | ||
.logicalPlan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to call query.awaitInitialization
before accessing logicalPlan
. Otherwise this test will be flaky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, i see.
fixed. thanks!
Looks pretty good except one minor issue in tests. |
Test build #76398 has finished for PR 17735 at commit
|
LGTM. Merging to master/2.2. Thanks for the PR! |
…treamingRelation should only be transformed to one StreamingExecutionRelation ## What changes were proposed in this pull request? Within the same streaming query, when one `StreamingRelation` is referred multiple times – e.g. `df.union(df)` – we should transform it only to one `StreamingExecutionRelation`, instead of two or more different `StreamingExecutionRelation`s (each of which would have a separate set of source, source logs, ...). ## How was this patch tested? Added two test cases, each of which would fail without this patch. Author: Liwei Lin <lwlin7@gmail.com> Closes #17735 from lw-lin/SPARK-20441. (cherry picked from commit 27f543b) Signed-off-by: Burak Yavuz <brkyvz@gmail.com>
…treamingRelation should only be transformed to one StreamingExecutionRelation ## What changes were proposed in this pull request? Within the same streaming query, when one `StreamingRelation` is referred multiple times – e.g. `df.union(df)` – we should transform it only to one `StreamingExecutionRelation`, instead of two or more different `StreamingExecutionRelation`s (each of which would have a separate set of source, source logs, ...). ## How was this patch tested? Added two test cases, each of which would fail without this patch. Author: Liwei Lin <lwlin7@gmail.com> Closes apache#17735 from lw-lin/SPARK-20441. (cherry picked from commit 27f543b) Signed-off-by: Burak Yavuz <brkyvz@gmail.com> (cherry picked from commit b1a732f)
…treamingRelation should only be transformed to one StreamingExecutionRelation ## What changes were proposed in this pull request? Within the same streaming query, when one `StreamingRelation` is referred multiple times – e.g. `df.union(df)` – we should transform it only to one `StreamingExecutionRelation`, instead of two or more different `StreamingExecutionRelation`s (each of which would have a separate set of source, source logs, ...). ## How was this patch tested? Added two test cases, each of which would fail without this patch. Author: Liwei Lin <lwlin7@gmail.com> Closes apache#17735 from lw-lin/SPARK-20441. (cherry picked from commit 27f543b)
// "df.logicalPlan" has already used attributes of the previous `output`. | ||
StreamingExecutionRelation(source, output) | ||
case streamingRelation@StreamingRelation(dataSource, _, output) => | ||
toExecutionRelationMap.getOrElseUpdate(streamingRelation, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changes were proposed in this pull request?
Within the same streaming query, when one
StreamingRelation
is referred multiple times – e.g.df.union(df)
– we should transform it only to oneStreamingExecutionRelation
, instead of two or more differentStreamingExecutionRelation
s (each of which would have a separate set of source, source logs, ...).How was this patch tested?
Added two test cases, each of which would fail without this patch.