# Scaling up compute resources

Scaling up the computational resources is a big advantage for doing certain large scale calculations on OSPool. Consider the extensive sampling for a multi-dimensional Monte Carlo integration or molecular dynamics simulation with several initial conditions. These type of calculations require submitting a lot of jobs.

About a million CPU hours per day are available to OSPool users on an opportunistic basis. Learning how to scale up and control large numbers of jobs is key to realizing the full potential of distributed high throughput computing on the OSPool. In this tutorial, we will see how to scale up calculations for a simple example. 

## Background

For this example, we will use computational methods to estimate π. First, we will define a square inscribed by a unit circle from which we will randomly sample points. The ratio of the points outside the circle to the points in the circle is calculated, which approaches π/4.

This method converges extremely slowly, which makes it great for a CPU-intensive exercise (but bad for a real estimation!).

## Create and Test an R Script

Our code is a simple R script that does the estimation. It takes in a single argument in order to differentiate the jobs. The code for the script is contained in the file mcpi.R

In [1]:
cat mcpi.R

#!/usr/bin/env Rscript

args = commandArgs(trailingOnly = TRUE)
iternum = as.numeric(args[[1]]) + 100

montecarloPi <- function(trials) {
  count = 0
  for(i in 1:trials) {
    if((runif(1,0,1)^2 + runif(1,0,1)^2)<1) {
      count = count + 1
    }
  }
  return((count*4)/trials)
}
 
montecarloPi(iternum)


The header at the top of the file indicates that this script is meant to be run using R. 

If we were running a more intensive script, we would want to test our pipeline with a shortened, test script first.

> If you want to test the script, start a separate terminal window and then run the following 
> two cmmands to start an R container, and then run 
> the script using `Rscript`: 
> 
> ```
> $ singularity shell \
>     /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-r:3.5.0
> Singularity osgvo-r:3.5.0:~> ./mcpi.R 10
> [1] 3.14
> Singularity osgvo-r:3.5.0:~> exit
> $ 
> ```



## Create a Submit File and Log Directory

Now that we have our R script written and tested, we can begin building the submit file for our job. If we want to submit several jobs, we need to track log, output, and error files for each job. An easy way to do this is to use the Cluster and Process ID values assigned by HTCondor to create unique files for each job in our overall workflow.

In this example, the submit file is called `R.container.submit`. 

In [5]:
cat R.container.submit

universe = container
container_image = /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-r:3.5.0

executable = mcpi.R
arguments = $(Process)

#transfer_input_files = 
should_transfer_files = YES
when_to_transfer_output = ON_EXIT

log = logs/job.log.$(Cluster).$(Process)
error = logs/job.error.$(Cluster).$(Process)
output = output/mcpi.out.$(Cluster).$(Process)

request_cpus = 1
request_memory = 1GB
request_disk = 1GB

queue 100


There are several items to note about this submit file:

* The `queue 100` statement in the submit file. This tells HTCondor to enqueue 100 copies of this job as one cluster.
* The submit variables `$(Cluster)` and `$(Process)`. These are used to specify unique output files. HTCondor will replace these with the Cluster and Process ID numbers for each individual process within the submission. The `$(Process)` is also passed as an argument to our R script.


## Submit the Jobs

Now it is time to submit our job! Submit the job with the following command:

In [3]:
condor_submit R.container.submit

Submitting job(s)....................................................................................................
100 job(s) submitted to cluster 1.


Apply your `condor_q` knowledge to see the progress of these jobs. Check you `logs` folder to see the error and HTCondor log files, and check the `output` folder to see the results of the scripts.

## Post Process

Once the jobs are completed, you can use the information in the output files 
to calculate an average of all of our computed estimates of &pi;.

To see this, we can use the command: 

In [4]:
cat output/mcpi.out* | awk '{ sum += $2; print $2"   "NR} END { print "---------------\n Grand Average = " sum/NR }'

3.12   1
3.287129   2
3.418182   3
3.135135   4
3.035714   5
3.079646   6
3.052632   7
3.095652   8
3.482759   9
2.837607   10
3.254237   11
3.12605   12
3.019608   13
3.233333   14
3.140496   15
3.147541   16
3.154472   17
3.096774   18
3.008   19
3.238095   20
2.992126   21
3.25   22
3.162791   23
3.145631   24
2.953846   25
3.29771   26
2.969697   27
3.368421   28
3.313433   29
2.874074   30
3.235294   31
3.270073   32
3.130435   33
2.964029   34
3.384615   35
3.228571   36
3.29078   37
3.098592   38
2.909091   39
3.138889   40
3.089655   41
3.287671   42
3.129252   43
3.081081   44
3.355705   45
3.238095   46
3.173333   47
3.072848   48
3.368421   49
3.294118   50
3.142857   51
3.2   52
2.923077   53
3.133758   54
2.936709   55
3.144654   56
3.320755   57
3.125   58
3.229814   59
2.91358   60
3.116564   61
3.02439   62
3.030303   63
3.180723   64
3.209581   65
3.071429   66
3.12426   67
2.990654   68
3.223529   69
2.900585   70
3.069767   71
3.236994   72
3.011494   73
3.291429   7

## Key Points

* Scaling up the number of jobs is crucial for taking full advantage of the computational resources of the OSPool.
* Changing the `queue` statement allows the user to scale up the resources.
* The `argument` option can be used to pass parameters to a job script.
* The submit variables `$(Cluster)` and `$(Process)` can be used to name log files uniquely.