Skip to content

[SPARK-15112][SQL] Disables EmbedSerializerInFilter for plan fragments that change schema#13362

Closed
liancheng wants to merge 2 commits intoapache:masterfrom
liancheng:spark-15112-corrupted-filter
Closed

[SPARK-15112][SQL] Disables EmbedSerializerInFilter for plan fragments that change schema#13362
liancheng wants to merge 2 commits intoapache:masterfrom
liancheng:spark-15112-corrupted-filter

Conversation

@liancheng
Copy link
Contributor

@liancheng liancheng commented May 27, 2016

What changes were proposed in this pull request?

EmbedSerializerInFilter implicitly assumes that the plan fragment being optimized doesn't change plan schema, which is reasonable because Dataset.filter should never change the schema.

However, due to another issue involving DeserializeToObject and SerializeFromObject, typed filter does change plan schema (see SPARK-15632). This breaks EmbedSerializerInFilter and causes corrupted data.

This PR disables EmbedSerializerInFilter when there's a schema change to avoid data corruption. The schema change issue should be addressed in follow-up PRs.

How was this patch tested?

New test case added in DatasetSuite.

@liancheng liancheng force-pushed the spark-15112-corrupted-filter branch from 7091e65 to 5b8362a Compare May 27, 2016 20:11
@liancheng
Copy link
Contributor Author

liancheng commented May 27, 2016

cc @cloud-fan

@liancheng liancheng changed the title [SPARK-15112][SQL] Disables EmbedDeserializerInFilter for plan fragments that change schema [SPARK-15112][SQL] Disables EmbedSerializerInFilter for plan fragments that change schema May 27, 2016
@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented May 27, 2016

Test build #59520 has finished for PR 13362 at commit 5b8362a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 28, 2016

Test build #59556 has finished for PR 13362 at commit edce7a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng liancheng force-pushed the spark-15112-corrupted-filter branch from edce7a6 to 4099d69 Compare May 28, 2016 15:25
@SparkQA
Copy link

SparkQA commented May 28, 2016

Test build #59563 has finished for PR 13362 at commit 4099d69.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

Merging to master and branch-2.0. Thanks for the review!

asfgit pushed a commit that referenced this pull request May 30, 2016
…s that change schema

## What changes were proposed in this pull request?

`EmbedSerializerInFilter` implicitly assumes that the plan fragment being optimized doesn't change plan schema, which is reasonable because `Dataset.filter` should never change the schema.

However, due to another issue involving `DeserializeToObject` and `SerializeFromObject`, typed filter *does* change plan schema (see [SPARK-15632][1]). This breaks `EmbedSerializerInFilter` and causes corrupted data.

This PR disables `EmbedSerializerInFilter` when there's a schema change to avoid data corruption. The schema change issue should be addressed in follow-up PRs.

## How was this patch tested?

New test case added in `DatasetSuite`.

[1]: https://issues.apache.org/jira/browse/SPARK-15632

Author: Cheng Lian <lian@databricks.com>

Closes #13362 from liancheng/spark-15112-corrupted-filter.

(cherry picked from commit 1360a6d)
Signed-off-by: Cheng Lian <lian@databricks.com>
@asfgit asfgit closed this in 1360a6d May 30, 2016
@liancheng liancheng deleted the spark-15112-corrupted-filter branch May 30, 2016 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants