-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3061] Fix Maven build under Windows #2165
Conversation
This fixes the Maven build on Windows.
The filename listed in <filereports> is placed under the <reportsDirectory>; the current code places it in a subdirectory of reportsDirectory, e.g. ${project.build.directory}/surefire-reports/${project.build.directory}/SparkTestSuite.txt This caused problems under Windows because it would try to create a subdirectory with a colon in its name.
QA tests have started for PR 2165 at commit
|
<argument>build</argument> | ||
</arguments> | ||
<tasks> | ||
<unzip src="../python/lib/py4j-0.8.2.1-src.zip" dest="build" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 2 spaces instead of 4
QA tests have started for PR 2165 at commit
|
QA tests have finished for PR 2165 at commit
|
QA tests have finished for PR 2165 at commit
|
test this please |
Jenkins, retest this please. |
QA tests have started for PR 2165 at commit
|
QA tests have finished for PR 2165 at commit
|
I tested this on Windows and it works there. I still have to test PySpark on Yarn with this PR because it has the potential to break things there, though from a cursory glance the changes seem fine. |
@andrewor14 Unless the new unzipping code is somehow corrupting the files or unzipping them to a different location, won't it be sufficient to check that the same files are present in the assembly JAR without having to run an actual YARN cluster? |
I suppose so. I am extra cautious about this because the whole PySpark on YARN thing is somewhat fickle. On a related note I was able to verify that the python files were actually there. |
Re: can this break something -- we have seen in recent memory how zip behavior can be inconsistent with archives of >= 65536 files, which comes up with Spark assembly jars. I assume that's not true of the .zip file being unzipped here though. In that case this does seem like something that will work everywhere if it works anywhere. |
LGTM. I tested this extensively locally on Windows and OSX, and remotely on a HDP cluster and on a standalone cluster. PySpark works as expected on all environments. I'll merge this once I discuss with @pwendell on what should go into branch-1.1 at this point. |
Hey @andrewor14 you can merge this into master and just leave the JIRA as unresolved and add targetVersion 1.1.1 and fixVersion 1.2.0 and make it a blocker. That way it will be clear it needs to be backported. |
Ok, I will add a reminder for myself to back port this after the release. Thanks Josh. |
The Maven build was failing on Windows because it tried to call the unix `unzip` utility to extract the Py4J files into core's build directory. I've fixed this issue by using the `maven-antrun-plugin` to perform the unzipping. I also fixed an issue that prevented tests from running under Windows: In the Maven ScalaTest plugin, the filename listed in <filereports> is placed under the <reportsDirectory>; the current code places it in a subdirectory of reportsDirectory, e.g. ``` ${project.build.directory}/surefire-reports/${project.build.directory}/SparkTestSuite.txt ``` This caused problems under Windows because it would try to create a subdirectory named "c:\\". Note that the tests still fail under Windows (for other reasons); this PR just allows them to run and fail rather than crash when trying to create the test reports directory. Author: Josh Rosen <joshrosen@apache.org> Author: Josh Rosen <rosenville@gmail.com> Author: Josh Rosen <joshrosen@databricks.com> Closes apache#2165 from JoshRosen/windows-support and squashes the following commits: 651d210 [Josh Rosen] Unzip to python/build instead of core/build fbf3e61 [Josh Rosen] 4 spaces -> 2 spaces e347668 [Josh Rosen] Fix Maven scalatest filereports path: 4994af1 [Josh Rosen] [SPARK-3061] Use maven-antrun-plugin to unzip Py4J.
The Maven build was failing on Windows because it tried to call the unix `unzip` utility to extract the Py4J files into core's build directory. I've fixed this issue by using the `maven-antrun-plugin` to perform the unzipping. I also fixed an issue that prevented tests from running under Windows: In the Maven ScalaTest plugin, the filename listed in <filereports> is placed under the <reportsDirectory>; the current code places it in a subdirectory of reportsDirectory, e.g. ``` ${project.build.directory}/surefire-reports/${project.build.directory}/SparkTestSuite.txt ``` This caused problems under Windows because it would try to create a subdirectory named "c:\\". Note that the tests still fail under Windows (for other reasons); this PR just allows them to run and fail rather than crash when trying to create the test reports directory. Author: Josh Rosen <joshrosen@apache.org> Author: Josh Rosen <rosenville@gmail.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #2165 from JoshRosen/windows-support and squashes the following commits: 651d210 [Josh Rosen] Unzip to python/build instead of core/build fbf3e61 [Josh Rosen] 4 spaces -> 2 spaces e347668 [Josh Rosen] Fix Maven scalatest filereports path: 4994af1 [Josh Rosen] [SPARK-3061] Use maven-antrun-plugin to unzip Py4J.
The Maven build was failing on Windows because it tried to call the unix
unzip
utility to extract the Py4J files into core's build directory. I've fixed this issue by using themaven-antrun-plugin
to perform the unzipping.I also fixed an issue that prevented tests from running under Windows:
In the Maven ScalaTest plugin, the filename listed in is placed under the ; the current code places it in a subdirectory of reportsDirectory, e.g.
This caused problems under Windows because it would try to create a subdirectory named "c:".
Note that the tests still fail under Windows (for other reasons); this PR just allows them to run and fail rather than crash when trying to create the test reports directory.