Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23788][SS] Fix race in StreamingQuerySuite #20896

Closed
wants to merge 1 commit into from

Conversation

jose-torres
Copy link
Contributor

What changes were proposed in this pull request?

The serializability test uses the same MemoryStream instance for 3 different queries. If any of those queries ask it to commit before the others have run, the rest will see empty dataframes. This can fail the test if q3 is affected.

We should use one instance per query instead.

How was this patch tested?

Existing unit test. If I move q2.processAllAvailable() before starting q3, the test always fails without the fix.

@jose-torres
Copy link
Contributor Author

@zsxwing

@SparkQA
Copy link

SparkQA commented Mar 24, 2018

Test build #88555 has finished for PR 20896 at commit e257b69.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val input = MemoryStream[Int]
val q1 = startQuery(input.toDS, "stream_serializable_test_1")
val q2 = startQuery(input.toDS.map { i =>
val input = MemoryStream[Int] :: MemoryStream[Int] :: MemoryStream[Int] :: Nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why build a list and not use 3 separate variables?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just to save several lines.

@zsxwing
Copy link
Member

zsxwing commented Mar 25, 2018

LGTM. Merging to master and 2.3. Thanks!

asfgit pushed a commit that referenced this pull request Mar 25, 2018
## What changes were proposed in this pull request?

The serializability test uses the same MemoryStream instance for 3 different queries. If any of those queries ask it to commit before the others have run, the rest will see empty dataframes. This can fail the test if q3 is affected.

We should use one instance per query instead.

## How was this patch tested?

Existing unit test. If I move q2.processAllAvailable() before starting q3, the test always fails without the fix.

Author: Jose Torres <torres.joseph.f+github@gmail.com>

Closes #20896 from jose-torres/fixrace.

(cherry picked from commit 816a549)
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
@asfgit asfgit closed this in 816a549 Mar 25, 2018
asfgit pushed a commit that referenced this pull request Mar 25, 2018
## What changes were proposed in this pull request?

The serializability test uses the same MemoryStream instance for 3 different queries. If any of those queries ask it to commit before the others have run, the rest will see empty dataframes. This can fail the test if q3 is affected.

We should use one instance per query instead.

## How was this patch tested?

Existing unit test. If I move q2.processAllAvailable() before starting q3, the test always fails without the fix.

Author: Jose Torres <torres.joseph.f+github@gmail.com>

Closes #20896 from jose-torres/fixrace.
@zsxwing
Copy link
Member

zsxwing commented Mar 25, 2018

Also merged to 2.2.

MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
## What changes were proposed in this pull request?

The serializability test uses the same MemoryStream instance for 3 different queries. If any of those queries ask it to commit before the others have run, the rest will see empty dataframes. This can fail the test if q3 is affected.

We should use one instance per query instead.

## How was this patch tested?

Existing unit test. If I move q2.processAllAvailable() before starting q3, the test always fails without the fix.

Author: Jose Torres <torres.joseph.f+github@gmail.com>

Closes apache#20896 from jose-torres/fixrace.
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
The serializability test uses the same MemoryStream instance for 3 different queries. If any of those queries ask it to commit before the others have run, the rest will see empty dataframes. This can fail the test if q3 is affected.

We should use one instance per query instead.

Existing unit test. If I move q2.processAllAvailable() before starting q3, the test always fails without the fix.

Author: Jose Torres <torres.joseph.f+github@gmail.com>

Closes apache#20896 from jose-torres/fixrace.

(cherry picked from commit 816a549)
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>

Change-Id: I6e757c20fd3e214dfeba6ac21628fe772afc4926
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants