# Working with Python on the ACCRE Cluster

## Overview

1. Python Basics
1. Anaconda and Virtual Environments
1. Python practical examples
    1. vectorization
    1. multi-threading
    1. distributed memory computing

## Intro to Python

1. High-level (can be slow)
1. Duck typing
1. Assumes the programmer / user knows what she/he is doing!

### Duck typing

In [1]:
# Comments begin with hashes
a = 41
b = 1
a + b

42

In [5]:
a = "41"
b = "1"
a + b

'411'

### Indentation and punctuation matter

In [6]:
a = 42
if a == 42: 
    print("meaning of life")
elif a == 24:
    print("life of meaning")
else:
    print("just another number")

meaning of life


## For loops exist but are not very *Pythonic*

-> Use comprehensions instead

In [7]:
# Prefer comprehensions
list0 = [i for i in range(5)]

# to for loops
list1 = []
for i in range(5):
    list1.append(i)
    
print(list0)
print(list1)

[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]


## Comprehensions and Types

In [3]:
print(type( (i for i in range(10)) ))
print(type( [i for i in range(10)] ))
print(type( {i for i in range(10)} ))
print(type( {i: "some_value" for i in range(10)} ))

<class 'generator'>
<class 'list'>
<class 'set'>
<class 'dict'>


## Comprehensions can be filtered

In [8]:
# filter a list
print([i for i in range(10) if i % 3 != 0 ])

# nonzero ints evaluate to true
print([i for i in range(10) if i % 3])

[1, 2, 4, 5, 7, 8]
[1, 2, 4, 5, 7, 8]


## Dictionaries

The type `dict`:
1. supports very fast lookup
1. does not preserve order
1. keys are unique

In [9]:
animals = {"wolverine":100, "badger": 99}
print("The variable animals is of type %s:" % type(animals))
print("The original dict:", animals)
print()

The variable animals is of type <class 'dict'>:
The original dict: {'badger': 99, 'wolverine': 100}



In [13]:
try:
    animals["puma"]
except:
    print("Couldn't get puma, but I can " \
          "specify a default value",
         animals.get("puma", 50))

Couldn't get puma, but I can specify a default value 50


In [14]:
animals.update({"puma": 85})
print("The updated dict:", animals, "\n")

The updated dict: {'badger': 99, 'wolverine': 100, 'puma': 85} 



In [15]:
print("I can iterate over dicts as key, value pairs:")
for k, v in animals.items():
    print("The %s has ferocity %d" % (k, v))

I can iterate over dicts as key, value pairs:
The badger has ferocity 99
The wolverine has ferocity 100
The puma has ferocity 85


## `set`s and `dict`s are unique

In [16]:
foo = [i % 3 for i in range(10)]
print(foo)
print(set(foo))

[0, 1, 2, 0, 1, 2, 0, 1, 2, 0]
{0, 1, 2}


In [12]:
bar = dict(a=1, b=2, c=3)
bar.update({"c":5})
print(bar)

[0, 1, 2, 0, 1, 2, 0, 1, 2, 0]
{0, 1, 2}
{'c': 5, 'b': 2, 'a': 1}


## A note on Python2 v. Python3

1. If you have the choice, use Python3
1. Python2 code can be transformed to Python3 code before integration into a project
1. Gotchas: 
    * `print something` v. `print(something)`
    * integer division v. floating point division
    * etc.

## Anaconda

### What is it?
1. free
1. package manager -> import code from others (don't reinvent the wheel)
1. environment manager -> isolate projects and their **dependencies**
1. Python distribution
1. collection of over 720 open source packages with free community support
1. install packages (and their dependencies) with `conda install [packagename]`
1. platform-agnostic (Windows, OS X and Linux)

### How do I use it on the cluster?

1. Initial Setup
    1. Set the anaconda ACCRE package 
    1. Create a conda virtual environment and source it
    1. Install additional dependencies via `conda` (preferred) or `pip`
1. Using with SLURM
    1. Set the anaconda ACCRE package
    1. Source the (existing) conda environment
    1. Execute Python code
    
    
Note that Anaconda3 can run Python2 versions.

## Conda Environment Initial Setup

Note that `%%sh` is a Jupyter notebook *magic* command which invokes a shell environmenet. You will want to run this from a terminal session logged into the cluster as usual.

In [1]:
%%sh
conda -V
conda info -e

conda 4.2.11
# conda environments:
#
ipyparallel_env          /home/arnoldjr/.conda/envs/ipyparallel_env
lasagne_env              /home/arnoldjr/.conda/envs/lasagne_env
mpi4py_env               /home/arnoldjr/.conda/envs/mpi4py_env
multi-gpu                /home/arnoldjr/.conda/envs/multi-gpu
my_root                  /home/arnoldjr/.conda/envs/my_root
myenvironment            /home/arnoldjr/.conda/envs/myenvironment
nb_conda_kernels         /home/arnoldjr/.conda/envs/nb_conda_kernels
neural_nets              /home/arnoldjr/.conda/envs/neural_nets
pycuda                   /home/arnoldjr/.conda/envs/pycuda
tensorflow               /home/arnoldjr/.conda/envs/tensorflow
test_github              /home/arnoldjr/.conda/envs/test_github
theano_env               /home/arnoldjr/.conda/envs/theano_env
theano_nomkl             /home/arnoldjr/.conda/envs/theano_nomkl
threading_env            /home/arnoldjr/.conda/envs/threading_env
root                  *  /usr/local/python3/anaconda3



## Conda environments must be activated

Use `source activate <conda_env>`

```
[fido@vmps11 ~]$ setpkgs -a anaconda3
[fido@vmps11 ~]$ source activate intro_to_python_env
prepending /home/fido/.conda/envs/intro_to_python_env/bin to PATH
(intro_to_python_env) [fido@vmps11 ~]$ 
```

In [17]:
%%sh
cat texts/activate.txt

[fido@vmps11 ~]$ setpkgs -a anaconda3
[fido@vmps11 ~]$ source activate intro_to_python_env
prepending /home/fido/.conda/envs/intro_to_python_env/bin to PATH
(intro_to_python_env) [fido@vmps11 ~]$ 


## List installed packages

In [2]:
%%sh
conda list -n ipyparallel_env

# packages in environment at /home/arnoldjr/.conda/envs/ipyparallel_env:
#
decorator                 4.0.10                    <pip>
ipykernel                 4.5.0                     <pip>
ipyparallel               5.2.0                     <pip>
ipython                   5.1.0                     <pip>
ipython-genutils          0.1.0                     <pip>
jupyter-client            4.4.0                     <pip>
jupyter-core              4.2.0                     <pip>
mkl                       11.3.3                        0  
mpmath                    0.19                     py35_1  
numpy                     1.11.2                   py35_0  
openssl                   1.0.2j                        0  
pexpect                   4.2.1                     <pip>
pickleshare               0.7.4                     <pip>
pip                       8.1.2                    py35_0  
prompt-toolkit            1.0.8                     <pip>
ptyprocess                0.5.1              

## Search for available packages

In [3]:
%%sh
conda search beautifulsoup

Fetching package metadata .......
beautifulsoup4               4.4.0                    py27_0  defaults        
                             4.4.0                    py34_0  defaults        
                             4.4.0                    py35_0  defaults        
                             4.4.1                    py27_0  defaults        
                             4.4.1                    py34_0  defaults        
                          *  4.4.1                    py35_0  defaults        
                          .  4.5.1                    py27_0  defaults        
                          .  4.5.1                    py34_0  defaults        
                          .  4.5.1                    py35_0  defaults        


## Install a package (into a *specific* environment)

In [4]:
%%sh
conda install beautifulsoup4 -n ipyparallel_env

Fetching package metadata .......
Solving package specifications: ..........

Package plan for installation in environment /home/arnoldjr/.conda/envs/ipyparallel_env:

The following NEW packages will be INSTALLED:

    beautifulsoup4: 4.5.1-py35_0

Proceed ([y]/n)? 
Linking packages ...
[                    ]|                                                     |   0%[beautifulsoup4      ]|                                                     |   0%[      COMPLETE      ]|#####################################################| 100%


## Conda environments should be deactivated

```bash
(intro_to_python_env) [fido@vmps11 python-job]$ source deactivate
[fido@vmps11 python-job]$
```

## Using conda with SLURM

Since all conda environments are children of `/home` they are shared across the cluster. This means that the environment only has to be created once and can be used by any node.

*Avoid creating the environment within a SLURM script!!*

## A typical SLURM batch script
```bash
#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:10:00
#SBATCH --mem=500M

setpkgs -a anaconda3

source activate intro_to_python_env

python foo.py
```

## A typical Python script


### Import modules
```python
#!/usr/bin/env python

from random import random

s = "This string belongs to the global namespace" \
    "but using globals is discouraged"

```

### Define functions, classes, etc.
``` python
def sub_run(threshold=0.5):
    """ Prints a string determined by the input
    
    :param threshold: specifies minimum value that prints "Hello"; default=0.5
    :return None
    """
    
    if random() > threshold:
        print("Hello")
    else:
        print("World!)

def main_fun():
    """ The primary function of the program """
    sub_fun()
```

### Define the standalone behavior
```python
if __name__ == "__main__":
    """ __name__ == "__main__" is true if the program
    is executing as a standalone """
    main_fun()

```

## Jupyter (formerly IPython) notebooks

### What are they?
* Working coding document
* Integrates Python code and markdown
* Tool for presenting your work:
    1. HTML/JavaScript interactive notebook
    1. statically saved as HTML, PDF, Reveal.js slideshow (Hint: use `jupyter nbconvert`)
    1. Renders on GitHub
* What this presentation is written in!

### How do I use them?
1. Download Anaconda locally, then `jupyter notebook` from the command line
1. Use them on the cluster
    1. Include the ACCRE Anaconda package 
        * `setpkgs -a anaconda3`
    1. Use either `sbatch` or `salloc` to launch a job
    1. Launch a notebook
        * `jupyter notebook --no-browser -ip='*' --port=8888 [my_notebook.ipynb]`
    1. Make note of the node you land on, and `ssh` into it using a seperate process
        * `ssh -L 9999:vmpXXX:8888 \ vunetid@login.accre.vanderbilt.edu`
    1. In a web browser, navigate to `localhost:9999`

## IPython magic

Specific to IPython, primarily `jupyter console` and `jupyter notebook`

* `%%sh` - run shell commands
* `%%html` - render html
* `%%load` - load from file
* `%%save` - save lines to file
* `%%timeit` - execute code and record execution time

In [1]:
%%timeit

a = []
for i in range(100):
    a.append(i*i)

100000 loops, best of 3: 19.3 µs per loop


In [18]:
%timeit a = [i*i for i in range(100)]

100000 loops, best of 3: 9.28 µs per loop


## `timeit` module

`timeit` can also be used in Python scripts.

In [33]:
from timeit import timeit


N = 100000
loop_string = '''\
s = 0
for i in range(100):
    s += 1
'''
print("native       : %fs" % timeit(loop_string, number=N))

print("comprehension: %fs" % timeit('sum(i for i in range(100))', number=N) )

print("numpy:         %fs" % timeit('numpy.arange(0,100).sum()', number=N, setup="import numpy") )

native       : 0.540301s
comprehension: 0.701319s
numpy:         0.800476s


## Numpy

1. Analogous to Matlab
1. Supports **vectorization** using compile C++ code (can be much faster)

*Allows fast development with optimization of the bottlenecks*

In [30]:
import numpy as np

a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [27]:
%timeit a = [i * i for i in range(100)]
%timeit b = np.power(np.arange(100), 2)

assert([ i * i for i in range(100)] == np.power(np.arange(100), 2).tolist())

100000 loops, best of 3: 9.44 µs per loop
The slowest run took 11.42 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.94 µs per loop


## Multi-threaded jobs

* Execute on single node across multiple cores
* Memory is shared across all threads
      #SBATCH --nodes=1
      #SBATCH --tasks-per-node=1
      #SBATCH --cpus-per-task=8
* Supported by the `threading` module (also `multiprocessing`)

In [12]:
import threading
import numpy as np
import timeit

In [10]:
def worker(result, index, num_elems):
    """ thread worker function 

    Thread functions do not return values, so instead one or more of the
    input arguments must be modified.

    :param result: mutable array of floats
    :param index: integer index of the result array
    :param num_elems: number of elements to compute in this thread
    :return None
    """

    xy = np.random.rand(num_elems,2)  
    result[index] = 4 * np.mean(xy[:,0] ** 2 + xy[:,1] ** 2 < 1.0)

In [13]:
threads = []
num_threads = 32  
n_per_thread = 1000000

def compute_multi():
    r_par = np.empty(num_threads)

    # Starts the processes
    for i in range(num_threads):
        t = threading.Thread(target=worker, 
                             args=(r_par, i, n_per_thread))
        threads.append(t)
        t.start()

    # Joins the worker threads to the "master" thread
    for i in range(num_threads):
        t.join()

    return r_par.mean()

In [14]:
def compute_single():
    r_seq = np.empty(1) 
    worker(r_seq, 0, num_threads * n_per_thread)

    return r_seq[0]        

In [15]:
n_repetitions = 10
t_multi = timeit.timeit(compute_multi, number=n_repetitions)
t_single = timeit.timeit(compute_single, number=n_repetitions)

print("Multi-threaded:  %fs" % (t_multi / n_repetitions))
print("Single-threaded: %fs" % (t_single / n_repetitions))

print("\nSpeedup: %f" % (t_single / t_multi))

Multi-threaded:  1.608150s
Single-threaded: 1.490850s

Speedup: 0.927059


## Distributed Memory Jobs

* Execute on multiple cores across one or more notes
* Memory is specific to each task
        #SBATCH --ntasks=8
        #SBATCH --cpus-per-task=1
* Supported by the `ipyparallel` module


## A note on Project Setup

Loading ACCRE packages, building/activating the conda environment can be separated into tasks.

In [2]:
%%sh
cd ~/accre/Python/ipyparallel/
ls

batch_job.slurm
compute_pi.py
foo15.44.36.out
foo15.44.43.out
helpers.py
__init__.py
launcher.sh
Makefile
pkgs.sh
__pycache__
README.md
slurm-10910565.out
source_file.sh


* `pkgs.sh`: lists the shell commands for loading ACCRE packages

In [4]:
%%sh
cd ~/accre/Python/ipyparallel/
cat pkgs.sh

setpkgs -a anaconda3


* `Makefile`: provides rules for building the conda environment, installing additional packages, testing the install, and cleaning up the environment

In [6]:
%%sh 
cd ~/accre/Python/ipyparallel/
cat Makefile

SHELL := /bin/bash

# Creates the new conda environment
env: 
	conda create -n $(MY_CONDA_ENV) python=3.5 numpy sympy ;

# Installs any pip only packages
install:
	pip install ipyparallel

# Tests that the install proceeded correctly
test:
	python -c "import ipyparallel";

# Removes the created environment
clean:
	conda remove -n $(MY_CONDA_ENV) --all;


* `source_file.sh`: sources pkgs.sh and either creates or activates conda environment

In [7]:
%%sh
cd ~/accre/Python/ipyparallel/
cat source_file.sh

#!/bin/bash

# This file either loads or creates an appropriate conda environment 


# Checks for more than one argument and throws an error
if [ $# -gt 1 ]; then
    (>&2 echo "Error: did not expect more than one argument.")
    (>&2 echo "    (Got $@)")
    return 1
fi

# Sets a default value for the environment name if not present
if [ -z "$1" ]; then
    MY_NAME="ipyparallel_env"
else
    MY_NAME="$1"
fi

# Loads necessary ACCRE packages
source pkgs.sh

# Checks if environment name is valid and, if so, exports name
if [[ "$MY_NAME" =~ ^[0-9A-Za-z_]+$ ]]; then
    export MY_CONDA_ENV=$MY_NAME ;
else
    echo "Invalid name $MY_NAME$";
    return 1
fi

# If the conda environment exists, then activates the environment
# else creates the new conda environment with the Makefile
if $(conda env list | grep -q $MY_CONDA_ENV); then
    echo "Found existing conda environment $MY_CONDA_ENV" 
    source activate $MY_CONDA_ENV 
else
    echo "Creating conda environment $MY_CONDA_ENV";
    make e

## Usage

Using `source source_file.sh [my_env_name]`
* First time, creates *my_env_name* or default
* Subsequent times, activates the conda environment

For example:
```
[fido@vmps12 ipyparallel]$ source source_file.sh 
Found existing conda environment ipyparallel_env
(ipyparallel_env) [fido@vmps12 ipyparallel]$ 
```

## Distributed Memory Jobs

* Execute on multiple cores across one or more notes
* Memory is specific to each task
        #SBATCH --ntasks=8
        #SBATCH --cpus-per-task=1
* Supported by the `ipyparallel` module

* `batch_job.slurm`: specifes the SLURM configuration and the application

In [8]:
%%sh
cd ~/accre/Python/ipyparallel/
cat batch_job.slurm

#!/bin/bash

#SBATCH --ntasks=14
#SBATCH --time=0-03:00:00
#SBATCH --mem=10G

# Sources the appropriate packages and conda environments
source source_file.sh

# Sets the profile and starts communication processes 
source launcher.sh
echo Using prfile $profile

# Creates output filename including timestamp
outfile=pi_estimate$(date +%Y%m%d_%H%M%S).txt
echo Using output file $outfile

# Runs the application
python compute_pi.py --profile ${profile} -n 1e12 -o $outfile 


* `launcher.sh`: shell script to be sourced which
    1. creates new ipython profile specific to this job
    1. launches a *controller* process for coordinating communication
    1. launches the compute engines on each available task

In [9]:
%%sh
cd ~/accre/Python/ipyparallel/
cat launcher.sh

#!/bin/bash

profile=job_${SLURM_JOB_ID}_$(hostname)

echo "Creating profile ${profile}"
ipython profile create ${profile}

echo "Launching controller"
ipcontroller --ip="*" --profile=${profile} & 
sleep 10

echo "Launching engines"
srun ipengine --profile=${profile} --location=$(hostname) &
sleep 25 


* `compute_pi.py`

```python
import argparse
from ipyparallel import Client
import numpy as np
import sympy
from helpers import stopwatch, to_numeric
import os

PI = 3.141592653589793
```

```python

def worker_fun_1(n=1000):
  """ worker function """
  from random import random
  s = 0
  for i in range(n):
    if random() ** 2 + random() ** 2 <= 1:
      s += 1
  
  return s 
```

```python
def worker_fun_2(n=1000):
  """ worker function """
  import numpy as np
 
  chunksize = 1000000
  num_chunks = max(n // chunksize, 1)
  slop = n - num_chunks * chunksize
  chunks = [chunksize] * num_chunks
  if slop > 0:
    chunks.append(slop)

  s = 0 
  for chunk in chunks:
    s += int(np.sum(np.sum(np.square(np.random.rand(chunk, 2)), axis=1) < 1.))
  
  return s 

```

```python
def main(profile, ntasks, niter):
  rc = Client(profile=profile)
  views = rc[:]

  n = round(niter / ntasks)

  results = views.apply_sync(worker_fun_2, n)
  # Uses sypmy to compute the ratio to arbitrary precision
  my_pi = 4. * sympy.Rational(sum(results), (n * ntasks)).n(20)

  with open(filename, "w") as f:
    f.write("Estimate of pi: %0.16f\n" % my_pi)
    f.write("Actual pi:      %0.16f\n" % PI) 
    f.write("Percent error:  %0.16f\n" % np.abs(100. * (PI - my_pi) / PI))
```

```python
if __name__ == "__main__":
  parser = argparse.ArgumentParser()
  parser.add_argument("-p", 
                      "--profile", 
                      type=str, 
                      required=True,
                      help="Name of IPython profile to use")
  parser.add_argument("-n", 
                      "--niter", 
                      type=str, 
                      required=True,
                      help="Number of stochastic iterations")
  parser.add_argument("-o", 
                      "--output", 
                      type=str, 
                      required=True,
                      help="Name of output file for writing")

  args = parser.parse_args()

  main(args.profile,
      to_numeric(os.environ['SLURM_NTASKS']),
      to_numeric(args.niter))
```

## Some results


## github.com/accre/Python

![ACCRE Python github repo](images/ACCRE_Python_github.png)
