Spark PBSPro Support
Status of build with latest Spark
You can run Spark on the PBS cluster just by adding "--master pbs" while submitting as follows:
# start spark shell. only in client mode ./bin/spark-shell --master pbs # submit a spark application in client mode ./bin/spark-submit --master pbs --deploy-mode client --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/target/scala-2.12/jars/spark-examples_2.12-3.1.0-SNAPSHOT.jar 100 # submit a spark application in cluster mode ./bin/spark-submit --master pbs --deploy-mode cluster --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/target/scala-2.12/jars/spark-examples_2.12-3.1.0-SNAPSHOT.jar 100
You can also just append
spark.master pbs in
conf/spark-defaults.conf to avoid adding
--master pbs on every submit.
To run Spark UI with PBS cluster:
This expects PBSPro to be installed at
Clone the Spark repository and move to spark folder
git clone https://github.com/apache/spark.git cd spark
In the spark project root, punch in these commands:
# Clone the repo git clone https://github.com/PBSPro/spark-pbspro-connector resource-managers/pbs # Apply patch to spark (in the root directory). git am resource-managers/pbs/*.patch # Build! build/mvn -DskipTests -Ppbs package
Add executor home to your configuration:
# in file conf/spark-defaults.conf add line: spark.pbs.executor.home "SPARK INSTALLATION DIRECTORY PATH IN PBS MOMS"