Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41377][BUILD] Fix spark-version-info.properties not found on Windows #38903

Closed

Conversation

GauthamBanasandra
Copy link
Member

@GauthamBanasandra GauthamBanasandra commented Dec 4, 2022

What changes were proposed in this pull request?

This PR enhances the Maven build configuration to automatically detect and switch between using Powershell for Windows and Bash for non-Windows OS to generate spark-version-info.properties file.

Why are the changes needed?

While building Spark, the spark-version-info.properties file is generated using bash. In Windows environment, if Windows Subsystem for Linux (WSL) is installed, it somehow overrides the other bash executables in the PATH, as noted in SPARK-40739. The bash in WSL has a different mounting configuration and thus, the target location specified for spark-version-info.properties won't be the expected location. Ultimately, this leads to spark-version-info.properties to get excluded from the spark-core jar, thus causing the SparkContext initialization to fail with the above depicted error message.

This PR fixes the issue by directing the build system to use the right shell according to the platform.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

I tested this by building on a Windows 10 PC.

mvn -Pyarn '-Dhadoop.version=3.3.0' -DskipTests clean package

Once the build finished, I verified that spark-version-info.properties file was included in the spark-core jar.

image

I also ran the SparkPi application and verified that it ran successfully without any errors.

image

@GauthamBanasandra
Copy link
Member Author

@rxin could you please review this PR?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-41377] Fix spark-version-info.properties not found on Windows [SPARK-41377][BUILD] Fix spark-version-info.properties not found on Windows Dec 6, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @GauthamBanasandra .

We need to register build/spark-build-info.ps1 to AppVeyor.yml like the following. That will help you verify.

- dev/appveyor-install-dependencies.ps1

@github-actions github-actions bot added the INFRA label Dec 8, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. I verified this PR on Windows manually.

C:\Users\dongj\spark>type spark-version-info.properties
version=3.4.0-SNAPSHOT
user=dongj
revision=b5fc6ed1a8924cebd6632312ad7bcbd956d2171b
branch=spark-version-info-ps
date=2022-12-09T17:53:44Z
url=https://github.com/apache/spark.git

Merged to master for Apache Spark 3.4.0.

@dongjoon-hyun
Copy link
Member

I added you to the Apache Spark JIRA contributor group and assign SPARK-41377 to you.
Welcome to the Apache Spark community, @GauthamBanasandra .

@GauthamBanasandra
Copy link
Member Author

@dongjoon-hyun Thanks for the help and review. 😊

beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
…indows

### What changes were proposed in this pull request?

This PR enhances the Maven build configuration to automatically detect and switch between using Powershell for Windows and Bash for non-Windows OS to generate `spark-version-info.properties` file.

### Why are the changes needed?

While building Spark, the `spark-version-info.properties` file [is generated using bash](https://github.com/apache/spark/blob/d62c18b7497997188ec587e1eb62e75c979c1c93/core/pom.xml#L560-L564). In Windows environment, if Windows Subsystem for Linux (WSL) is installed, it somehow overrides the other bash executables in the PATH, as noted in SPARK-40739. The bash in WSL has a different mounting configuration and thus, [the target location specified for spark-version-info.properties](https://github.com/apache/spark/blob/d62c18b7497997188ec587e1eb62e75c979c1c93/core/pom.xml#L561-L562) won't be the expected location. Ultimately, this leads to `spark-version-info.properties` to get excluded from the spark-core jar, thus causing the SparkContext initialization to fail with the above depicted error message.

This PR fixes the issue by directing the build system to use the right shell according to the platform.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I tested this by building on a Windows 10 PC.

```psh
mvn -Pyarn '-Dhadoop.version=3.3.0' -DskipTests clean package
```

Once the build finished, I verified that `spark-version-info.properties` file was included in the spark-core jar.

![image](https://user-images.githubusercontent.com/10280768/205497898-80e53617-c991-460e-b04a-a3bdd4f298ae.png)

I also ran the SparkPi application and verified that it ran successfully without any errors.

![image](https://user-images.githubusercontent.com/10280768/205499567-f6e8e10a-dcbb-45fb-b282-fc29ba58adee.png)

Closes apache#38903 from GauthamBanasandra/spark-version-info-ps.

Authored-by: Gautham Banasandra <gautham.bangalore@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants