New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21239][STREAMING] Support WAL recover in windows #18452
Conversation
When driver failed over, it will read WAL from HDFS by calling WriteAheadLogBackedBlockRDD.getBlockFromWriteAheadLog(), however, it need a dummy local path to satisfy the method parameter requirements, but the path in windows will contain a colon which is not valid for hadoop.
val nonExistentDirectory = new File( | ||
System.getProperty("java.io.tmpdir"), UUID.randomUUID().toString).getAbsolutePath | ||
System.getProperty("java.io.tmpdir").replaceFirst("[a-zA-Z]:", ""), | ||
UUID.randomUUID().toString).getPath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @Myasuka .
I'm just wondering why do you change getAbsolutePath
to getPath
together? Is it related to your fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, getAbsolutePath
will still get path with driver letter and colon, which is illegal fro HDFS, that's why I change getAbsolutePath
to getPath
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, what about this?
val nonExistentDirectory = new File(
System.getProperty("java.io.tmpdir"),
UUID.randomUUID().toString).getAbsolutePath.replaceFirst("[a-zA-Z]:", "")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the same, if you prefer this change, I could add another commit.
Can one of the admins verify this patch? |
Doesn't SPARK-25778 / #22867 already take care of this? |
@Myasuka +1 what Marcelo asked. Can you either update your PR or close it? |
I'm just going to close this for now since it's pretty old anyway. It can always be reopened. |
What changes were proposed in this pull request?
When driver failed over, it will read WAL from HDFS by calling WriteAheadLogBackedBlockRDD.getBlockFromWriteAheadLog(), however, it need a dummy local path to satisfy the method parameter requirements, but the path in windows will contain a colon which is not valid for hadoop. I removed the potential driver letter and colon.
I found one email from spark-user ever talked about this bug
How was this patch tested?
Without this fix, once driver failed over on YARN, WAL recovery would not take effect. But the WAL recovery mechanism would take effect once patched this fix of this PR on windows YARN cluster.