#  Python for Data Analysis and HPC
## PHIT P8010 Fundamentals of High Performance Computing
Daniel Bauer (bauer@cs.columbia.edu)

February 2, 2016

# Part 3: Python and HPC

## Faster Intepreters
- Standard Python is often incredibly slow when compared to Java or C (though numpy and other packages are reasonably fast).
- There are versions of Python that try to 'go faster'

   * **[cython](http://docs.cython.org/src/tutorial/cython_tutorial.html)**  
    -writes a 'C' version of your python, which in theory might run faster

   *  **[PyPy](http://pypy.org/)**  
     -uses a JIT(Just in Time Compiler), like Java has  
     -in theory can do better than static code compiler  
     -uses a simple garbage collector instead of reference counting  
     -"Numpy support is not complete"  
     -easy to use - just load code as usual, no preprocessing like cython  
     -everyone uses this in practice  


## Parallelization

* Idea behind cluster: faster computing by parallelization.
* Different levels of independence: 
    * Just run multiple different programs in parallel. easy (PBS script).
    * Run the same program on different splits of the data. easy (PBS script).
    * Different parallel processes must exchange data. hard!
        * Need synchronyzation.
* Network bottleneck: Transferring data is expensive, computation is cheap. 
* Use local resources as much as possible before you parallelize. 
    * Prefer local SMP over cluster-level parallelization.
* Two models for parallelization: 
    * Shared memory
    * Message passing (we will focus on this). 
    
    

## Multithreading

The typical approach in most programming languages is *multithreading*.

Python offers: 
- simple thread system.
- threads run under one process and share memory
- has some concurrent data structures, like queues, locks, semaphores

In [5]:
import time
import threading

def counter(n):
    for j in range(n):
        print('count is', j)
        time.sleep(1)

t = threading.Thread(target=counter, args=(3,))
t.start()

('count is', 0)
('count is', 1)
('count is', 2)


* **Problem:** The core of python is NOT concurrent.
* Global Interpreter Lock can only be aquired by ONE thread at a time.
* No matter how many threads you have, only ONE core will be used

## multiprocessing

* The multiprocessing module "simulates" threads by spawning multiple processes.
* Operating system makes sure different processes use different CPU cores.

In [8]:
from multiprocessing import Process, Queue

def f(q):
    q.put([42, None, 'hello'])

if __name__ == '__main__':
    q = Queue()
    p = Process(target=f, args=(q,)) 
    p.start()
    print("This happens in parallel.")
    print q.get()    # blocks. Prints "[42, None, 'hello']"
    p.join()

This happens in parallel.
[42, None, 'hello']


In [7]:
from multiprocessing import Pool

def square(x):
    return x*x

# make a pool of 5 pythons
# each square call will run in a separate python
p = Pool(5)
print(p.map(square, [1, 2, 3]))

[1, 4, 9]
