# Submit to a remote Spark cluster
When you create a notebook in the Data Scientist Workbench, you have full access to a local Spark. Local Spark is very useful for building code and working with small datasets.

When working with large datasets, you will need the power of a __remote Spark cluster__ to finish your jobs in a reasonable time. 

This notebook is used for submitting other notebooks to a remote Spark cluster. We're working on a better way for you to do this. If you have ideas for how to improve Data Scientist Workbench, you should submit them in the [Idea forum](http://support.datascientistworkbench.com/forums/246775-workbench/category/109170-jupyter-notebooks).
__Warning__: You should only submit one notebook to the Spark cluster at a time using this feature.

To submit your Python notebook to the Spark cluster for execution, use *Insert Path* menu of the notebook to replace `PATH_OF_NOTEBOOK` in the below cell. See screenshot below.

<img src="https://ibm.box.com/shared/static/u0eqawyqeo5y1zo4886f6dp1ee95gmbv.png"/>

## DSWB Spark Sandbox cluster

Data Scientist Workbench provides access to a shared *Spark Sandbox cluster* for you to use.

To submit your Python notebook to the Spark cluster for execution, use Insert Path menu of the notebook to replace PATH_OF_NOTEBOOK in the below cell.

In [None]:
# EITHER: Run this cell to submit to the DSWB Spark Sandbox cluster
!submit-notebook.sh spark://spark.datascientistworkbench.com:7077 PATH_OF_NOTEBOOK

## Apache Spark cluster on Bluemix

You can also submit to your own [Apache Spark cluster](https://console.ng.bluemix.net/catalog/services/apache-spark) on Bluemix.

To submit your Python notebook to the Spark cluster for execution:

1. Replace the credentials block with one for your own Apache Spark cluster on Bluemix. See screenshot at the bottom.
1. Use Insert Path menu of the notebook to replace PATH_OF_NOTEBOOK in the below cell.

In [None]:
# OR: Run this cell to submit to your own Bluemix Spark Service cluster

# 1. Paste in your Spark Service credentials between the triple quotes
# See screenshot below
with open('vcap.json', 'w') as f:
    f.write("""
{
  "credentials": {
    "tenant_id": "",
    "tenant_id_full": "",
    "cluster_master_url": "https://spark.bluemix.net",
    "instance_id": "",
    "tenant_secret": "",
    "plan": ""
  }
}
""")
    
# 2. Replace PATH_OF_NOTEBOOK with the actual path
# See screenshot above
!submit-notebook.sh   https://spark.bluemix.net   PATH_OF_NOTEBOOK

After execution completes, the output will show up in a stdout_111 file at the top of your Recent Data sidebar. You can view the contents by:

1. Selecting `PATH_OF_STDOUT` in the cell below
1. Expanding the twistie next to the the stdout file in the sidebar
1. Selecting __Insert Path__ to replace `PATH_OF_STDOUT` with the actual file location
1. Running the cell

In [None]:
!cat PATH_OF_STDOUT

You can find your [Apache Spark service](https://console.ng.bluemix.net/catalog/services/apache-spark) credentials under __Service Credentials__ when you open the service in your Bluemix Dashboard.

<img src="https://ibm.app.box.com/representation/file_version_79393329462/image_2048/1.png?shared_name=na3mzdnx6j8by5k4m0ulvn1k4orf22lq" alt="How to access your Bluemix Spark Service credentials">

## Want to learn more?

<a href="http://bigdatauniversity.com/courses/spark-fundamentals/?utm_source=tutorial-flight-demo-2&utm_medium=dswb&utm_campaign=bdu"><img src = "https://ibm.box.com/shared/static/r3pj5oo2ivnzqar0poj2eexiqrnvq6vy.png"> </a>

## Authors

<article class="teacher">
<div class="teacher-image" style="    float: left;
    width: 115px;
    height: 115px;
    margin-right: 10px;
    margin-bottom: 10px;
    border: 1px solid #CCC;
    padding: 3px;
    border-radius: 3px;
    text-align: center;"><img class="alignnone wp-image-2258 " src="https://ibm.box.com/shared/static/s5g531ti3qy3bgrwwynfxuvebe99viu9.jpg" alt="Leo Wu" width="178" height="178" /></div>
<h4>Leo Wu</h4>
<p><a href="https://www.linkedin.com/in/leo-wu-a232313">Leo Wu</a> is an emerging technologies practitioner,  evangelist and leader of Big Data University in China as well.  He ever worked for IBM DB2 database for LUW for many years, and also participated in the design,  development of 1st generation of IBM Smart Analytics System (ISAS), widely used by IBM key customers worldwide, like banks, telecommunications. Now, he is closely working with local team to drive the success of Big Data University in China..</p>
</article>
<article class="teacher">
<div class="teacher-image" style="    float: left;
    width: 115px;
    height: 115px;
    margin-right: 10px;
    margin-bottom: 10px;
    border: 1px solid #CCC;
    padding: 3px;
    border-radius: 3px;
    text-align: center;"><img class="alignnone size-medium wp-image-2177" src="https://ibm.box.com/shared/static/84lm9zlz2lkbco0l570sut710858gsr7.jpg" alt="Leons Petrazickis" width="300" height="300" /></div>
<h4>Leons Petrazickis</h4>
<p>
<a href="https://www.linkedin.com/in/leonsp">Leons Petrazickis</a> is a full-stack developer at the IBM Data Science Institute. He does Ruby, JS, Python, Hadoop, Spark, as well as web scale devops with Chef and Docker.</p>
</article>

Created by: <a href="https://bigdatauniversity.com/?utm_source=bducreatedbylink&utm_medium=dswb&utm_campaign=bdu">The Cognitive Class Team</a>