Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22495] Fix setup of SPARK_HOME variable on Windows #19807

Closed
wants to merge 1 commit into from

Conversation

jsnowacki
Copy link
Contributor

What changes were proposed in this pull request?

This is a cherry pick of the original PR 19370 onto branch-2.2 as suggested in #19370 (comment).

Fixing the way how SPARK_HOME is resolved on Windows. While the previous version was working with the built release download, the set of directories changed slightly for the PySpark pip or conda install. This has been reflected in Linux files in bin but not for Windows cmd files.

First fix improves the way how the jars directory is found, as this was stoping Windows version of pip/conda install from working; JARs were not found by on Session/Context setup.

Second fix is adding find-spark-home.cmd script, which uses find_spark_home.py script, as the Linux version, to resolve SPARK_HOME. It is based on find-spark-home bash script, though, some operations are done in different order due to the cmd script language limitations. If environment variable is set, the Python script find_spark_home.py will not be run. The process can fail if Python is not installed, but it will mostly use this way if PySpark is installed via pip/conda, thus, there is some Python in the system.

How was this patch tested?

Tested on local installation.

@jsnowacki
Copy link
Contributor Author

@HyukjinKwon I've created this cherry picked PR onto branch-2.2. Please take a look if this is what you had in mind.

@SparkQA
Copy link

SparkQA commented Nov 23, 2017

Test build #84142 has finished for PR 19807 at commit bd24e47.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Build started: [SparkR] ALL PR-19807
Diff: branch-2.2...spark-test:4A7B521C-83BF-4F47-84AF-94D49BCBF40E

@HyukjinKwon
Copy link
Member

LGTM pending AppVeyor tests.

)

rem If there is python installed, trying to use the root dir as SPARK_HOME
where %PYTHON_RUNNER% > nul 2>$1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems a typo here actually !! where %PYTHON_RUNNER% > nul 2>$1 -> where %PYTHON_RUNNER% > nul 2>&1.

Will fix it up in master by myself soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes

asfgit pushed a commit that referenced this pull request Nov 24, 2017
## What changes were proposed in this pull request?

This is a cherry pick of the original PR 19370 onto branch-2.2 as suggested in #19370 (comment).

Fixing the way how `SPARK_HOME` is resolved on Windows. While the previous version was working with the built release download, the set of directories changed slightly for the PySpark `pip` or `conda` install. This has been reflected in Linux files in `bin` but not for Windows `cmd` files.

First fix improves the way how the `jars` directory is found, as this was stoping Windows version of `pip/conda` install from working; JARs were not found by on Session/Context setup.

Second fix is adding `find-spark-home.cmd` script, which uses `find_spark_home.py` script, as the Linux version, to resolve `SPARK_HOME`. It is based on `find-spark-home` bash script, though, some operations are done in different order due to the `cmd` script language limitations. If environment variable is set, the Python script `find_spark_home.py` will not be run. The process can fail if Python is not installed, but it will mostly use this way if PySpark is installed via `pip/conda`, thus, there is some Python in the system.

## How was this patch tested?

Tested on local installation.

Author: Jakub Nowacki <j.s.nowacki@gmail.com>

Closes #19807 from jsnowacki/fix_spark_cmds_2.
@felixcheung
Copy link
Member

I fixed it during merge and merged this to 2.2.
@jsnowacki thanks and please close this PR.

@jsnowacki
Copy link
Contributor Author

Thanks! Closing.

@jsnowacki jsnowacki closed this Nov 24, 2017
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
## What changes were proposed in this pull request?

This is a cherry pick of the original PR 19370 onto branch-2.2 as suggested in apache#19370 (comment).

Fixing the way how `SPARK_HOME` is resolved on Windows. While the previous version was working with the built release download, the set of directories changed slightly for the PySpark `pip` or `conda` install. This has been reflected in Linux files in `bin` but not for Windows `cmd` files.

First fix improves the way how the `jars` directory is found, as this was stoping Windows version of `pip/conda` install from working; JARs were not found by on Session/Context setup.

Second fix is adding `find-spark-home.cmd` script, which uses `find_spark_home.py` script, as the Linux version, to resolve `SPARK_HOME`. It is based on `find-spark-home` bash script, though, some operations are done in different order due to the `cmd` script language limitations. If environment variable is set, the Python script `find_spark_home.py` will not be run. The process can fail if Python is not installed, but it will mostly use this way if PySpark is installed via `pip/conda`, thus, there is some Python in the system.

## How was this patch tested?

Tested on local installation.

Author: Jakub Nowacki <j.s.nowacki@gmail.com>

Closes apache#19807 from jsnowacki/fix_spark_cmds_2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants