Skip to content

Commit

Permalink
add coment for hack why PYSPARK_PYTHON is needed in spark-submit
Browse files Browse the repository at this point in the history
  • Loading branch information
Ken Takagiwa authored and Ken Takagiwa committed Jul 16, 2014
1 parent 72bfc66 commit a7a0b5c
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions bin/spark-submit
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,16 @@ done

DEPLOY_MODE=${DEPLOY_MODE:-"client"}


# This is a hack to make DStream.pyprint work.
# This will be removed after pyprint is moved to PythonDStream.
# Problem is that print function is in (Scala)DStream.
# Whenever python code is executed, we call PythonDStream which passes
# pythonExec(which python Spark should execute).
# Since pyprint is located in DStream, Spark does not know which python should use.
# In that case, get python path from PYSPARK_PYTHON, environmental variable.
# This fix is ongoing in print branch in my repo.

# Figure out which Python executable to use
if [[ -z "$PYSPARK_PYTHON" ]]; then
PYSPARK_PYTHON="python"
Expand Down

0 comments on commit a7a0b5c

Please sign in to comment.