Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3061] Fix Maven build under Windows #2165

Closed
wants to merge 4 commits into from

Conversation

JoshRosen
Copy link
Contributor

The Maven build was failing on Windows because it tried to call the unix unzip utility to extract the Py4J files into core's build directory. I've fixed this issue by using the maven-antrun-plugin to perform the unzipping.

I also fixed an issue that prevented tests from running under Windows:

In the Maven ScalaTest plugin, the filename listed in is placed under the ; the current code places it in a subdirectory of reportsDirectory, e.g.

${project.build.directory}/surefire-reports/${project.build.directory}/SparkTestSuite.txt

This caused problems under Windows because it would try to create a subdirectory named "c:".

Note that the tests still fail under Windows (for other reasons); this PR just allows them to run and fail rather than crash when trying to create the test reports directory.

This fixes the Maven build on Windows.
The filename listed in <filereports> is placed under the <reportsDirectory>;
the current code places it in a subdirectory of reportsDirectory, e.g.

${project.build.directory}/surefire-reports/${project.build.directory}/SparkTestSuite.txt

This caused problems under Windows because it would try to create
a subdirectory with a colon in its name.
@SparkQA
Copy link

SparkQA commented Aug 27, 2014

QA tests have started for PR 2165 at commit e347668.

  • This patch merges cleanly.

<argument>build</argument>
</arguments>
<tasks>
<unzip src="../python/lib/py4j-0.8.2.1-src.zip" dest="build" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 2 spaces instead of 4

@SparkQA
Copy link

SparkQA commented Aug 27, 2014

QA tests have started for PR 2165 at commit fbf3e61.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 27, 2014

QA tests have finished for PR 2165 at commit e347668.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class MutableLiteral(var value: Any, dataType: DataType, nullable: Boolean = true)

@SparkQA
Copy link

SparkQA commented Aug 27, 2014

QA tests have finished for PR 2165 at commit fbf3e61.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

test this please

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 28, 2014

QA tests have started for PR 2165 at commit 651d210.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 28, 2014

QA tests have finished for PR 2165 at commit 651d210.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

I tested this on Windows and it works there. I still have to test PySpark on Yarn with this PR because it has the potential to break things there, though from a cursory glance the changes seem fine.

@JoshRosen
Copy link
Contributor Author

@andrewor14 Unless the new unzipping code is somehow corrupting the files or unzipping them to a different location, won't it be sufficient to check that the same files are present in the assembly JAR without having to run an actual YARN cluster?

@andrewor14
Copy link
Contributor

I suppose so. I am extra cautious about this because the whole PySpark on YARN thing is somewhat fickle. On a related note I was able to verify that the python files were actually there.

@srowen
Copy link
Member

srowen commented Aug 28, 2014

Re: can this break something -- we have seen in recent memory how zip behavior can be inconsistent with archives of >= 65536 files, which comes up with Spark assembly jars. I assume that's not true of the .zip file being unzipped here though. In that case this does seem like something that will work everywhere if it works anywhere.

@andrewor14
Copy link
Contributor

LGTM. I tested this extensively locally on Windows and OSX, and remotely on a HDP cluster and on a standalone cluster. PySpark works as expected on all environments. I'll merge this once I discuss with @pwendell on what should go into branch-1.1 at this point.

@pwendell
Copy link
Contributor

pwendell commented Sep 2, 2014

Hey @andrewor14 you can merge this into master and just leave the JIRA as unresolved and add targetVersion 1.1.1 and fixVersion 1.2.0 and make it a blocker. That way it will be clear it needs to be backported.

@andrewor14
Copy link
Contributor

Ok, I will add a reminder for myself to back port this after the release. Thanks Josh.

@asfgit asfgit closed this in 378b231 Sep 2, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
The Maven build was failing on Windows because it tried to call the unix `unzip` utility to extract the Py4J files into core's build directory.  I've fixed this issue by using the `maven-antrun-plugin` to perform the unzipping.

I also fixed an issue that prevented tests from running under Windows:

In the Maven ScalaTest plugin, the filename listed in <filereports> is placed under the <reportsDirectory>; the current code places it in a subdirectory of reportsDirectory, e.g.

```
${project.build.directory}/surefire-reports/${project.build.directory}/SparkTestSuite.txt
```

This caused problems under Windows because it would try to create a subdirectory named "c:\\".

Note that the tests still fail under Windows (for other reasons); this PR just allows them to run and fail rather than crash when trying to create the test reports directory.

Author: Josh Rosen <joshrosen@apache.org>
Author: Josh Rosen <rosenville@gmail.com>
Author: Josh Rosen <joshrosen@databricks.com>

Closes apache#2165 from JoshRosen/windows-support and squashes the following commits:

651d210 [Josh Rosen] Unzip to python/build instead of core/build
fbf3e61 [Josh Rosen] 4 spaces -> 2 spaces
e347668 [Josh Rosen] Fix Maven scalatest filereports path:
4994af1 [Josh Rosen] [SPARK-3061] Use maven-antrun-plugin to unzip Py4J.
asfgit pushed a commit that referenced this pull request Sep 9, 2014
The Maven build was failing on Windows because it tried to call the unix `unzip` utility to extract the Py4J files into core's build directory.  I've fixed this issue by using the `maven-antrun-plugin` to perform the unzipping.

I also fixed an issue that prevented tests from running under Windows:

In the Maven ScalaTest plugin, the filename listed in <filereports> is placed under the <reportsDirectory>; the current code places it in a subdirectory of reportsDirectory, e.g.

```
${project.build.directory}/surefire-reports/${project.build.directory}/SparkTestSuite.txt
```

This caused problems under Windows because it would try to create a subdirectory named "c:\\".

Note that the tests still fail under Windows (for other reasons); this PR just allows them to run and fail rather than crash when trying to create the test reports directory.

Author: Josh Rosen <joshrosen@apache.org>
Author: Josh Rosen <rosenville@gmail.com>
Author: Josh Rosen <joshrosen@databricks.com>

Closes #2165 from JoshRosen/windows-support and squashes the following commits:

651d210 [Josh Rosen] Unzip to python/build instead of core/build
fbf3e61 [Josh Rosen] 4 spaces -> 2 spaces
e347668 [Josh Rosen] Fix Maven scalatest filereports path:
4994af1 [Josh Rosen] [SPARK-3061] Use maven-antrun-plugin to unzip Py4J.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants