# Working with Python on the ACCRE Cluster

## Overview

1. Intro to Python
    1. types
    1. control flow
    1. syntactic sugars
1. Intro to Anaconda
    1. creating virtual environments
1. Python practical examples
    1. vectorization
    1. threading
    1. ipyparallel
    1. plotting
    1. pandas

## Intro to Python

1. High-level (can be slow)
1. Duck typing
1. Assumes the programmer / user knows what she/he is doing!

In [1]:
# Comments begin with hashes
a = 41
b = 1
a + b

42

In [2]:
# Indentation and punctuation matter
a = 42
if a == 42: 
    print("meaning of life")
elif a == 24:
    print("life of meaning")
else:
    print("just another number")

meaning of life


## For loops exist but are not very *pythonic*

-> Use comprehensions instead

In [3]:
# Prefer
list0 = [i for i in range(5)]
# to
list1 = []
for i in range(5):
    list1.append(i)
    
print(list0)
print(list1)

[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]


This is an example of a comprehension (in this case a list comprehension)

## Comprehensions and Types

In [3]:
print(type( (i for i in range(10)) ))
print(type( [i for i in range(10)] ))
print(type( {i for i in range(10)} ))
print(type( {i: "some_value" for i in range(10)} ))

<class 'generator'>
<class 'list'>
<class 'set'>
<class 'dict'>


## Comprehensions can be filtered

In [8]:
# filter a list
print([i for i in range(10) if i % 3 != 0 ])

# nonzero ints evaluate to true
print([i for i in range(10) if i % 3])

[1, 2, 4, 5, 7, 8]
[1, 2, 4, 5, 7, 8]


## Dictionaries

The type `dict`:
1. supports very fast lookup
1. does not preserve order
1. keys are unique

In [5]:
# Dictionaries

animals = {"wolverine":100, "badger": 99}
print("The variable animals is of type %s:" % type(animals))
print("The original dict:", animals)
print()

try:
    animals["puma"]
except:
    print("Couldn't get puma")
print("I can specify a default value with <dict>.get:", animals.get("puma", 50))
print()

animals.update({"puma": 85})
print("The updated dict:", animals, "\n")

print("I can iterate over dicts as key, value pairs:")
for k, v in animals.items():
    print("The %s has ferocity %d" % (k, v))

The variable animals is of type <class 'dict'>:
The original dict: {'badger': 99, 'wolverine': 100}

Couldn't get puma
I can specify a default value with <dict>.get: 50

The updated dict: {'puma': 85, 'badger': 99, 'wolverine': 100} 

I can iterate over dicts as key, value pairs:
The puma has ferocity 85
The badger has ferocity 99
The wolverine has ferocity 100


## `set`s and `dict`s are unique

In [12]:
foo = [i % 3 for i in range(10)]
print(foo)
print(set(foo))

bar = dict(a=1, b=2, c=3)
bar.update({"c":5})
print(bar)

[0, 1, 2, 0, 1, 2, 0, 1, 2, 0]
{0, 1, 2}
{'c': 5, 'b': 2, 'a': 1}


## A note of Python2 v Python3

1. If you have the choice, use Python3
1. Python2 code can be transformed to Python3 code before integration into a project
1. Gotchas: 
    * print something v. print(something)
    * integer division (`/`) v. floating point division

## Anaconda
### What is it?
1. free
1. package manager -> import code from others (don't reinvent the wheel)
1. environment manager -> isolate projects and their **dependencies**
1. Python distribution
1. collection of over 720 open source packages with free community support
1. install packages (and their dependencies) with `conda install [packagename]`
1. platform-agnostic (Windows, OS X and Linux)

## Anaconda
### How do I use it on the cluster?

1. Initial Setup
    1. Set the anaconda ACCRE package 
    1. Create a conda virtual environment and source it
    1. Install additional dependencies via `conda` (preferred) or `pip`
1. Using with SLURM
    1. Set the anaconda ACCRE package
    1. Source the (existing) conda environment
    1. Execute Python code
    
    
Note that Anaconda3 can run Python2 versions.

## Conda Environment Initial Setup

Note that `%%sh` is a Jupyter notebook *magic* command which invokes a shell environmenet. You will want to run this from a terminal session logged into the cluster as usual.

```
[fido@vmps11 python-job]$ setpkgs -a anaconda3
[fido@vmps11 python-job]$ conda -V
conda 4.1.5
[fido@vmps11 python-job]$ conda info -e
# conda environments:
#
foo                      /home/fido/.conda/envs/foo
intro_to_python_env      /home/fido/.conda/envs/intro_to_python_env
ipyparallel_env          /home/fido/.conda/envs/ipyparallel_env
lasagne_env              /home/fido/.conda/envs/lasagne_env
mpi4py_env               /home/fido/.conda/envs/mpi4py_env
multi-gpu                /home/fido/.conda/envs/multi-gpu
my_root                  /home/fido/.conda/envs/my_root
myenvironment            /home/fido/.conda/envs/myenvironment
nb_conda_kernels         /home/fido/.conda/envs/nb_conda_kernels
neural_nets              /home/fido/.conda/envs/neural_nets
pycuda                   /home/fido/.conda/envs/pycuda
tensorflow               /home/fido/.conda/envs/tensorflow
test_github              /home/fido/.conda/envs/test_github
theano_env               /home/fido/.conda/envs/theano_env
theano_nomkl             /home/fido/.conda/envs/theano_nomkl
threading_env            /home/fido/.conda/envs/threading_env
root                  *  /usr/local/python3/anaconda3
```

## Conda environments must be activated

Use `source activate <conda_env>`

```
[fido@vmps11 python-job]$ setpkgs -a anaconda3
[fido@vmps11 python-job]$ source activate intro_to_python_env
prepending /home/fido/.conda/envs/intro_to_python_env/bin to PATH
(intro_to_python_env) [fido@vmps11 python-job]$ 
```

To deactivate the environment:

```bash
(intro_to_python_env) [fido@vmps11 python-job]$ source deactivate
[fido@vmps11 python-job]$
```

## List installed packages

```
(intro_to_python_env) [fido@vmps11 python-job]$ conda list
# packages in environment at /home/fido/.conda/envs/intro_to_python_env:
#
_nb_ext_conf              0.3.0                    py35_0
anaconda-client           1.5.2                    py35_0
clyent                    1.2.2                    py35_0
dbus                      1.10.10                       0
decorator                 4.0.10                   py35_0
entrypoints               0.2.2                    py35_0
expat                     2.1.0                         0
fontconfig                2.11.1                        6
freetype                  2.5.5                         1
glib                      2.43.0                        1
gst-plugins-base          1.8.0                         0
gstreamer                 1.8.0                         0
icu                       54.1                          0
ipykernel                 4.5.0                    py35_0
ipython                   5.1.0                    py35_0
simplegeneric             0.8.1                    py35_1
sip                       4.18                     py35_0
six                       1.10.0                   py35_0
sqlite                    3.13.0                        0
terminado                 0.6                      py35_0
tk                        8.5.18                        0
tornado                   4.4.2                    py35_0
traitlets                 4.3.1                    py35_0
wcwidth                   0.1.7                    py35_0
wheel                     0.29.0                   py35_0
widgetsnbextension        1.2.6                    py35_0
xz                        5.2.2                         0
yaml                      0.1.6                         0
zeromq                    4.1.5                         0
zlib                      1.2.8                         3
(intro_to_python_env) [fido@vmps11 python-job]$ 
```

## Search for available packages

```
(intro_to_python_env) [fido@vmps11 python-job]$ conda search beautifulsoup
Fetching package metadata .......
beautifulsoup4               4.4.0                    py27_0  defaults        
                             4.4.0                    py34_0  defaults        
                             4.4.0                    py35_0  defaults        
                             4.4.1                    py27_0  defaults        
                             4.4.1                    py34_0  defaults        
                          .  4.4.1                    py35_0  defaults        
                          .  4.5.1                    py27_0  defaults        
                             4.5.1                    py34_0  defaults        
                             4.5.1                    py35_0  defaults        
(intro_to_python_env) [fido@vmps11 python-job]$ 
```

## Install a package (into the *current* environment)

```
(intro_to_python_env) [fido@vmps11 python-job]$ conda install beautifulsoup4
Fetching package metadata .......
Solving package specifications: ..........

Package plan for installation in environment /home/fido/.conda/envs/intro_to_python_env:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    beautifulsoup4-4.5.1       |           py35_0         125 KB

The following NEW packages will be INSTALLED:

    beautifulsoup4: 4.5.1-py35_0

Proceed ([y]/n)? y

Fetching packages ...
beautifulsoup4 100% |####################################| Time: 0:00:00   1.19 MB/s
Extracting packages ...
[      COMPLETE      ]|#######################################################| 100%
Linking packages ...
[      COMPLETE      ]|#######################################################| 100%
(intro_to_python_env) [fido@vmps11 python-job]$ 
```

## Install a package (into a *specific* environment)

```
(intro_to_python_env) [fido@vmps11 python-job]$ conda install beautifulsoup4 -n foo 
Fetching package metadata .......
Solving package specifications: ..........

Package plan for installation in environment /home/fido/.conda/envs/foo:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    beautifulsoup4-4.5.1       |           py34_0         125 KB
    
The following NEW packages will be INSTALLED:

    beautifulsoup4: 4.5.1-py34_0
    
Proceed ([y]/n)? y

Fetching packages ...
beautifulsoup4 100% |####################################| Time: 0:00:00   1.16 MB/s
Extracting packages ...
[      COMPLETE      ]|#######################################################| 100%
Linking packages ... 
[      COMPLETE      ]|#######################################################| 100%
(intro_to_python_env) [fido@vmps11 python-job]$ 
```

## Conda environments should be deactivated

```bash
(intro_to_python_env) [fido@vmps11 python-job]$ source deactivate
[fido@vmps11 python-job]$
```

## Using conda with SLURM

Since all conda environments are children of `/home` they are shared across the cluster. This means that the environment only has to be created once and can be used by any node.

*Avoid creating the environment within a SLURM script!!*

### A typical SLURM batch script
```bash
#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:10:00
#SBATCH --mem=500M

setpkgs -a anaconda3

source activate intro_to_python_env

python vectorization.py
```

## IPython magic

Specific to IPython, primarily `jupyter console` and `jupyter notebook`

* `%%sh` - run shell commands
* `%%html` - render html
* `%%load` - load from file
* `%%save` - save lines to file
* `%%timeit` - execute code and record execution time

In [17]:
%%timeit

a = []
for i in range(100):
    a.append(i*i)

100000 loops, best of 3: 13.8 µs per loop


In [18]:
%timeit a = [i*i for i in range(100)]

100000 loops, best of 3: 9.28 µs per loop


## Jupyter (formerly IPython) notebooks

### What are they?
* Working coding document
* Integrates Python code and markdown
* Tool for presenting your work:
    1. HTML/JavaScript interactive notebook
    1. statically saved as HTML, PDF, Reveal.js slideshow (Hint: use `jupyter nbconvert`)
    1. Renders on GitHub
* What this presentation is written in!

### How do I use them?
1. Download Anaconda locally, then `jupyter notebook` from the command line
1. Use them on the cluster
    1. Include the ACCRE Anaconda package 
        * `setpkgs -a anaconda3`
    1. Use either `sbatch` or `salloc` to launch a job
    1. Launch a notebook
        * `jupyter notebook --no-browser -ip='*' --port=8888 [my_notebook.ipynb]`
    1. Make note of the node you land on, and `ssh` into it using a seperate process
        * `ssh -L 9999:vmpXXX:8888 \ vunetid@login.accre.vanderbilt.edu`
    1. In a web browser, navigate to `localhost:9999`

## `timeit` module

`timeit` can also be used in Python scripts.

In [33]:
from timeit import timeit


N = 100000
loop_string = '''\
s = 0
for i in range(100):
    s += 1
'''
print("native       : %fs" % timeit(loop_string, number=N))

print("comprehension: %fs" % timeit('sum(i for i in range(100))', number=N) )

print("numpy:         %fs" % timeit('numpy.arange(0,100).sum()', number=N, setup="import numpy") )

native       : 0.540301s
comprehension: 0.701319s
numpy:         0.800476s


## Numpy

1. Analogous to Matlab
1. Supports **vectorization** using compile C++ code (can be much faster)

*Allows fast development with optimization of the bottlenecks*

In [30]:
import numpy as np

a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [27]:
%timeit a = [i * i for i in range(100)]
%timeit b = np.power(np.arange(100), 2)

assert([ i * i for i in range(100)] == np.power(np.arange(100), 2).tolist())

100000 loops, best of 3: 9.44 µs per loop
The slowest run took 11.42 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.94 µs per loop


## github.com/accre/Python

![ACCRE Python github repo](ACCRE_Python_github.png)