Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17640][SQL]Avoid using -1 as the default batchId for FileStreamSource.FileEntry #15206

Closed
wants to merge 1 commit into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Sep 22, 2016

What changes were proposed in this pull request?

Avoid using -1 as the default batchId for FileStreamSource.FileEntry so that we can make sure not writing any FileEntry(..., batchId = -1) into the log. This also avoids people misusing it in future (#15203 is an example).

How was this patch tested?

Jenkins.

@zsxwing
Copy link
Member Author

zsxwing commented Sep 22, 2016

/cc @jerryshao

@brkyvz
Copy link
Contributor

brkyvz commented Sep 22, 2016

@zsxwing Will this break streaming jobs if someone upgrades their Spark version, since they won't be able to deserialize the class correctly?

@zsxwing
Copy link
Member Author

zsxwing commented Sep 23, 2016

@zsxwing Will this break streaming jobs if someone upgrades their Spark version, since they won't be able to deserialize the class correctly?

@brkyvz No. This doesn't change the file format. FileEntry(..., batchId = -1) is only used in SeenFilesMap which is only in memory.

By the way, people cannot just upgrade the Spark version since FileStreamSource's format has been changed in #13513.

@zsxwing
Copy link
Member Author

zsxwing commented Sep 23, 2016

@brkyvz right now we cannot support upgrading anyway since the execution metadata uses Java serialization.

@jerryshao
Copy link
Contributor

LGTM, thanks for the fix.

@SparkQA
Copy link

SparkQA commented Sep 23, 2016

Test build #65801 has finished for PR 15206 at commit d9177e5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class FileEntry(path: String, timestamp: Timestamp, batchId: Long) extends Serializable

@SparkQA
Copy link

SparkQA commented Sep 23, 2016

Test build #65803 has finished for PR 15206 at commit f506a43.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class FileEntry(path: String, timestamp: Timestamp, batchId: Long) extends Serializable

@zsxwing
Copy link
Member Author

zsxwing commented Sep 23, 2016

Thanks. Merging to master and 2.0.

@asfgit asfgit closed this in 62ccf27 Sep 23, 2016
asfgit pushed a commit that referenced this pull request Sep 23, 2016
…amSource.FileEntry

## What changes were proposed in this pull request?

Avoid using -1 as the default batchId for FileStreamSource.FileEntry so that we can make sure not writing any FileEntry(..., batchId = -1) into the log. This also avoids people misusing it in future (#15203 is an example).

## How was this patch tested?

Jenkins.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #15206 from zsxwing/cleanup.

(cherry picked from commit 62ccf27)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
@zsxwing zsxwing deleted the cleanup branch September 23, 2016 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants