[SPARK-4026][Streaming] Write ahead log management #2882

tdas · 2014-10-21T20:06:05Z

As part of the effort to avoid data loss on Spark Streaming driver failure, we want to implement a write ahead log that can write received data to HDFS. This allows the received data to be persist across driver failures. So when the streaming driver is restarted, it can find and reprocess all the data that were received but not processed.

This was primarily implemented by @harishreedharan. This is still WIP, as he is going to improve the unitests by using HDFS mini cluster.

…king

tdas · 2014-10-21T20:06:31Z

Please review this @JoshRosen

SparkQA · 2014-10-21T20:09:51Z

QA tests have started for PR 2882 at commit 5182ffb.

This patch merges cleanly.

SparkQA · 2014-10-21T20:10:04Z

QA tests have finished for PR 2882 at commit 5182ffb.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class LogInfo(startTime: Long, endTime: Long, path: String)

AmplabJenkins · 2014-10-21T20:10:05Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22002/
Test FAILed.

SparkQA · 2014-10-21T20:29:47Z

QA tests have started for PR 2882 at commit 4ab602a.

This patch merges cleanly.

SparkQA · 2014-10-21T21:43:35Z

QA tests have finished for PR 2882 at commit 4ab602a.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class LogInfo(startTime: Long, endTime: Long, path: String)

AmplabJenkins · 2014-10-21T21:43:38Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22006/
Test PASSed.

tdas · 2014-10-21T22:46:47Z

streaming/src/main/scala/org/apache/spark/streaming/util/WriteAheadLogWriter.scala

+ */
+private[streaming] class WriteAheadLogWriter(path: String, hadoopConf: Configuration)
+  extends Closeable {
+  private val underlyingStream: Either[DataOutputStream, FSDataOutputStream] = {


WIP: this file is going to be updated by @harishreedharan to get rid of the local file customizations.

Ah, that makes sense. I guess you can still use the HDFS API to write to local files for testing purposes.

Yep. And for all tests, we are just going to use Hadoop Minicluster anyway.

JoshRosen · 2014-10-22T05:33:07Z

streaming/src/main/scala/org/apache/spark/streaming/util/WriteAheadLogWriter.scala

+
+  private lazy val hadoopFlushMethod = {
+    val cls = classOf[FSDataOutputStream]
+    Try(cls.getMethod("hflush")).orElse(Try(cls.getMethod("sync"))).toOption


Nice Scala one-liner :)

Why do we need this reflection, though? Is this necessary to support multiple Hadoop versions? If so, could you add a one-line comment to explain this?

Actually we do, since Spark supports Hadoop 1 to Hadoop 2.5.0 right now. In Hadoop 1.x, the "sync" method did the same thing hflush does in 2.5.0 - so in short we do.

Credit goes to Colin McCabe who wrote this line.
https://github.com/apache/spark/blame/master/core/src/main/scala/org/apache/spark/util/FileLogger.scala#L106
Stole from there.

SparkQA · 2014-10-23T08:37:33Z

QA tests have finished for PR 2882 at commit 3881706.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class LogInfo(startTime: Long, endTime: Long, path: String)

AmplabJenkins · 2014-10-23T08:37:37Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22067/
Test PASSed.

SparkQA · 2014-10-23T09:29:49Z

QA tests have started for PR 2882 at commit 9514dc8.

This patch merges cleanly.

tdas · 2014-10-23T09:30:06Z

@JoshRosen
@harishreedharan addressed all your comments, and also simplified the writer code
I did some further cleanups, and also added two new unit tests that test the writer and manager with corrupted writes.

SparkQA · 2014-10-23T10:22:31Z

QA tests have finished for PR 2882 at commit 9514dc8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class LogInfo(startTime: Long, endTime: Long, path: String)

AmplabJenkins · 2014-10-23T10:22:35Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22068/
Test FAILed.

Directory deletion should not fail tests

SparkQA · 2014-10-23T17:19:55Z

QA tests have started for PR 2882 at commit d29fddd.

This patch merges cleanly.

SparkQA · 2014-10-23T18:30:56Z

QA tests have finished for PR 2882 at commit d29fddd.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class LogInfo(startTime: Long, endTime: Long, path: String)

AmplabJenkins · 2014-10-23T18:30:59Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22075/
Test PASSed.

harishreedharan · 2014-10-23T18:31:53Z

Yay, finally!

tdas · 2014-10-23T22:58:17Z

@JoshRosen whenever you get a chance. :)

JoshRosen · 2014-10-24T00:36:53Z

streaming/src/main/scala/org/apache/spark/streaming/util/HdfsUtils.scala

+private[streaming] object HdfsUtils {
+
+  def getOutputStream(path: String, conf: Configuration): FSDataOutputStream = {
+    // HDFS is not thread-safe when getFileSystem is called, so synchronize on that


It looks like this comment is no longer relevant, or perhaps like it should be moved somewhere else?

JoshRosen · 2014-10-24T01:06:17Z

This looks good to me!

tdas · 2014-10-24T01:34:00Z

Alright, thanks! I will merge when this last set of changes gets through jenkins.

SparkQA · 2014-10-24T01:35:15Z

QA tests have started for PR 2882 at commit e4bee20.

This patch merges cleanly.

SparkQA · 2014-10-24T02:45:53Z

QA tests have finished for PR 2882 at commit e4bee20.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class LogInfo(startTime: Long, endTime: Long, path: String)

AmplabJenkins · 2014-10-24T02:45:56Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22103/
Test PASSed.

harishreedharan · 2014-10-24T18:02:13Z

Let's merge this for now. I will try and find out more about the getFileSystem thread-safety without doAs (which is what we support anyway)

harishreedharan · 2014-10-24T18:32:59Z

Talked to @cmccabe who says we should not worry about the thread-safety. If at all there was an issue, it was in too old a version which we need not worry about. Let's merge this!

tdas · 2014-10-24T18:44:12Z

Cool! Thanks for check with @cmccabe. Merging this.

tdas and others added 2 commits October 20, 2014 19:52

Pulled WriteAheadLog-related stuff from tdas/spark/tree/driver-ha-wor…

172358d

…king

Added documentation

5182ffb

Adding missing license.

b06be2b

Refactored write ahead stuff from streaming.storage to streaming.util

4ab602a

tdas changed the title ~~[SPARK-4026][Streaming] synchronously write received data to HDFS and recover on driver failure~~ [SPARK-4026][Streaming] Write ahead log management Oct 21, 2014

tdas reviewed Oct 21, 2014
View reviewed changes

Remove underlying stream from the WALWriter.

5c70d1f

JoshRosen reviewed Oct 22, 2014
View reviewed changes

Added unit tests to test reading of corrupted data and other minor edits

9514dc8

harishreedharan and others added 2 commits October 23, 2014 09:26

Directory deletion should not fail tests

a317a4d

Merge pull request #20 from harishreedharan/driver-ha-wal

d29fddd

Directory deletion should not fail tests

JoshRosen reviewed Oct 24, 2014
View reviewed changes

tdas added 2 commits October 23, 2014 18:19

Minor changes based on PR comments.

55514e2

Removed synchronized, Path.getFileSystem is threadsafe

e4bee20

asfgit closed this in 6a40a76 Oct 24, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-4026][Streaming] Write ahead log management #2882

[SPARK-4026][Streaming] Write ahead log management #2882

tdas commented Oct 21, 2014

tdas commented Oct 21, 2014

SparkQA commented Oct 21, 2014

SparkQA commented Oct 21, 2014

AmplabJenkins commented Oct 21, 2014

SparkQA commented Oct 21, 2014

SparkQA commented Oct 21, 2014

AmplabJenkins commented Oct 21, 2014

tdas Oct 21, 2014

JoshRosen Oct 22, 2014

harishreedharan Oct 22, 2014

JoshRosen Oct 22, 2014

harishreedharan Oct 22, 2014

tdas Oct 22, 2014

SparkQA commented Oct 23, 2014

AmplabJenkins commented Oct 23, 2014

SparkQA commented Oct 23, 2014

tdas commented Oct 23, 2014

SparkQA commented Oct 23, 2014

AmplabJenkins commented Oct 23, 2014

SparkQA commented Oct 23, 2014

SparkQA commented Oct 23, 2014

AmplabJenkins commented Oct 23, 2014

harishreedharan commented Oct 23, 2014

tdas commented Oct 23, 2014

JoshRosen Oct 24, 2014

tdas Oct 24, 2014

JoshRosen commented Oct 24, 2014

tdas commented Oct 24, 2014

SparkQA commented Oct 24, 2014

SparkQA commented Oct 24, 2014

AmplabJenkins commented Oct 24, 2014

harishreedharan commented Oct 24, 2014

harishreedharan commented Oct 24, 2014

tdas commented Oct 24, 2014

[SPARK-4026][Streaming] Write ahead log management #2882

[SPARK-4026][Streaming] Write ahead log management #2882

Conversation

tdas commented Oct 21, 2014

tdas commented Oct 21, 2014

SparkQA commented Oct 21, 2014

SparkQA commented Oct 21, 2014

AmplabJenkins commented Oct 21, 2014

SparkQA commented Oct 21, 2014

SparkQA commented Oct 21, 2014

AmplabJenkins commented Oct 21, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Oct 23, 2014

AmplabJenkins commented Oct 23, 2014

SparkQA commented Oct 23, 2014

tdas commented Oct 23, 2014

SparkQA commented Oct 23, 2014

AmplabJenkins commented Oct 23, 2014

SparkQA commented Oct 23, 2014

SparkQA commented Oct 23, 2014

AmplabJenkins commented Oct 23, 2014

harishreedharan commented Oct 23, 2014

tdas commented Oct 23, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoshRosen commented Oct 24, 2014

tdas commented Oct 24, 2014

SparkQA commented Oct 24, 2014

SparkQA commented Oct 24, 2014

AmplabJenkins commented Oct 24, 2014

harishreedharan commented Oct 24, 2014

harishreedharan commented Oct 24, 2014

tdas commented Oct 24, 2014