# Creating standalone programs

Since the amount of memory available to a Jupyter notebook in SageMathCloud is limited (which is why we have to close one to be able to open another), any larger scale programs have to be run on a larger machine and so need us to create standalone programs. We will continue to demonstrate small scale examples in the notebooks, and you can try out small bits of code in a Jupyter cell, but you should try to create the corresponding <tt>.scala</tt> programs on <tt>iceberg</tt> and elsewhere.

If you are using a Windows machine to access <tt>iceberg.shef.ac.uk</tt> and <tt>sharc.shef.ac.uk</tt>, you'll need to download the portable version of [<tt>mobaXterm</tt>](http://mobaxterm.mobatek.net/download.html). This will provide you with access to your chosen HPC and a way to copy files between your Windows system and the remote HPC.

Once you have logged into a node on <tt>iceberg</tt>, you may need to ask for more memory than the default <tt>qrsh</tt> gives you:

    qrsh -l mem=8G -l rmem=8G
    
You can access <tt>iceberg</tt> from any location, but <tt>sharc</tt> must be accessed from the University network or via VPN. Both <tt>iceberg</tt> and <tt>sharc</tt> run Linux. While you can develop your initial programs on the Windows system in a text editor of your choice, and copy over to the HPC for compiling every time, you may benefit from taking a look at some Linux text editors as the programs get more complicated.

## Differences between Sharc and Iceberg

Iceberg and Sharc are configured slightly differently.  For example, the module command for `sbt`on Iceberg is

```
module load apps/binapps/sbt/0.13.13
```

Whereas on Sharc, you need to run two module commands 

```
module load dev/sbt/0.13.13
module load apps/java/jdk1.8.0_102/binary
```

The notes here are for Iceberg. The documentation for both HPC systems is at http://docs.hpc.shef.ac.uk/en/latest/

# Using SBT (Scala Build Tool)

1. The organization of your project should be:
```
[user@HPC project]$ find .
.
./project.sbt
./src
./src/main
./src/main/scala
./src/main/scala/Project.scala
```

This means that you need to create your project using `mkdir` (which makes a directory), for example:
```
[user@HPC]$ mkdir Hello
```

Then screate the subdirectories

```
[user@HPC]$ mkdir Hello/src
[user@HPC]$ mkdir Hello/src/main
[user@HPC]$ mkdir Hello/src/main/scala
```

Both `project.sbt` and `Project.scala` are text files (so to start with can be created on your Windows machine and copied over to the HPC).

2. `project.sbt` should contain any requirements of your program, such as library dependencies. For example:

```
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-yarn" % "2.0.1"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.0.1"
```

## To package a module using sbt on iceberg:

- Start an interactive session on a worker node
```
qrsh
```

- Enter the relevant project

```
[user@HPC]$ cd Hello
```

- Package your module

```
[user@HPC project]$ sbt package
```

sbt should be loaded for you, but if the call fails then run

```
module load apps/binapps/sbt/0.13.13
```

## RUNNING on HPC

### INTERACTIVE MODE

To run in interactive mode (this DOES NOT have all the resources you would like and can end up being killed - it should only be used for `playing' with a program):

- Start an interactive session on a worker node (making sure memory requirements are specified). For example:

```
qrsh -l mem=8G -l rmem=8G
```

- Load Spark

```
module load apps/gcc/4.4.7/spark/2.0
```

- Package your project as above using sbt

- Execute using

```
spark-submit --master local[n] target/scala-2.11/project_2.11-1.0.jar
```

where

n specifies the number of threads needed

So the Pi estimation program (from Notebook 3) could be run as follows

```
spark-submit --master local[1] target/scala-2.11/pi-estimation_2.11-1.0.jar 1000000 > out$CORES.1000000.txt
```

You can only use 1 thread in interactive mode.

### SCHEDULER MODE

To submit a job to the scheduler, create a bash job to submit containing what you need from the scheduler (i.e. the memory you want per core and the number of cores you'd like), and what your program requires. For the Pi estimation program, the bash program can be called "pi.sh" and can look like this (if it's being submitted from the toplevel "project" directory):

```
#!/bin/bash
# Load spark
module load apps/gcc/4.4.7/spark/2.0
# Request 4 cores from the HPC scheduler
#$ -pe openmp 4
# Request 4Gig of virtual and real memory PER CORE
#$ -l rmem=4G
#$ -l mem=4G

# Make sure CORES is equal to the number of openmp slots
export CORES=4
spark-submit --master local\[$CORES\] target/scala-2.11/pi-estimation_2.11-1.0.jar 1000000 > out$CORES.1000000.txt
```