## The following topics are proposed to be covered:

1. #### Python for scientific computing: Numpy an introduction
    1. Vectorization - how to achieve higher performance in iterated operations. Numpy is a precompiled and optimized library / package within Python that is written in Cython (which is a wrapper for C in Python). Vectorization of code essentially pushes the operations into the precompiled layer of Numpy which allows higher performance that is more or less equivalent to that of C.
    2. Using Numba and Cython to write *typed* code which will run faster, and again, be more or less equivalent to the performance of C.
    
1. #### Pandas in Python for dataset analysis and visualization
    1. Why Pandas?
    2. How to read in data, perform preliminary operations to index it correctly
    3. How to clean the data, remove null values etc.
    3. Perform basic dataset operations
    3. Visualization of data
    
1. #### Jupyter notebooks - To promote reproducibility and collaboration in Science
    1. What are *Jupyter Notebooks*?
    1. How can one create and share their entire scientific workflow in such a manner that it becomes easily understandable and easily reproduced

## Let us begin with Numpy

In [7]:
import numpy as np

pArr = [i for i in range(10000)]
pArr[1] = 1.

In [9]:
print type(pArr), type(pArr[0]), type(pArr[1]), type(pArr[2])

<type 'list'> <type 'int'> <type 'float'> <type 'int'>


We have created a list in Python called *pArr*. Python uses dynamic typing as seen above. Whenever Python code runs, the compiler checks each element for its type before performing operations on it. This imposes an overhead on operations on lists or arrays that need to be done on every element in the list.

In [29]:
%%timeit -q
for i in range(10000):
    pArr[i] = pArr[i] + 1.

1000 loops, best of 3: 679 µs per loop


Now let's create a Numpy array

In [30]:
nArr = np.arange(10000)

In [31]:
nArr

array([   0,    1,    2, ..., 9997, 9998, 9999])

In [38]:
print type(nArr), type(nArr[0])

<type 'numpy.ndarray'> <type 'numpy.int64'>


In [33]:
%%timeit -q
nArr[:] = nArr[:] + 1.

The slowest run took 7.89 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 18.2 µs per loop


In [44]:
679/18.2

37.30769230769231

We created an integer array called nArr and we add a float 1.0 to every element, we see that there is a 37 times speedup in the code. Can we make it better by ensuring that the type of variables being operated on remain the same?

In [45]:
npfloatArr = np.linspace(0.,10000.,1.)

In [46]:
print type(npfloatArr), type(npfloatArr[0])

<type 'numpy.ndarray'> <type 'numpy.float64'>


In [43]:
%%timeit -q
npfloatArr[:] = npfloatArr[:] + 1.

The slowest run took 31.85 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 816 ns per loop


In [47]:
679/.816

832.1078431372549

**By ensuring that the type of the variable remains the same, ie. all are float64, here we achieve an 832 times improvement in performance!**

In [144]:
pArr = [i for i in range(1000000)]

In [105]:
from numba import double
from numba.decorators import jit, autojit

In [109]:
def myloop_python():
    for i in range(1000000):
        pArr[i] = pArr[i] + 1

In [112]:
myloop_numba = autojit(myloop_python)
%timeit myloop_numba()

1 loop, best of 3: 112 ms per loop


In [113]:
%timeit myloop_python()

10 loops, best of 3: 72.7 ms per loop


In [114]:
%%timeit
@numba.jit
def myloop_python():
    for i in range(1000000):
        pArr[i] = pArr[i] + 1

10000 loops, best of 3: 87.4 µs per loop


Again, we see that there is an improvement in performance, an 8 times improvement, but not as much as we saw with Numpy vectorization. What is going on here?

In [86]:
import cython
%load_ext Cython

In [100]:
%%timeit
%%cython
cdef myCloop():
    cdef int i=0
    cdef int cArr[1000000]
    for i in range(1000000):
        cArr[i] = 1 + 2

The slowest run took 1459.88 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 90.5 µs per loop


#### Numpy still seems to be a 100 times faster than both cython and numba!

In [101]:
%%timeit
nArr = np.arange(1000000)
nArr[:] = 1 + 2

1000 loops, best of 3: 1.55 ms per loop


When we increase the size of arrays involved, we see that Cython beats Numpy. For large arrays, Cython would be the better choice. However, for most applications, such as the usual Linear Algebra operations, there are libraries available within Scipy. Their algorithms are optimised for performance and studies show that they are comparable to C when it comes to large data sizes.

### Let's look at an example of Poisson's equation

$$\frac{d^2 \theta}{dx^2} + \frac{d^2 \theta}{dy^2} = 0 $$

For the following BC:
$\theta_{x=0} = 1$

The Discretized form of the equation is:
$$\frac{\theta_{(i-1),j} - 2\theta_{i,j} + \theta_{(i+1),j}}{\Delta x^2} + \frac{\theta_{i,(j-1)} - 2\theta_{i,j} + \theta_{i,(j+1)}}{\Delta y^2} = 0 $$

Writing the same in the traditional way with loops would produce the following function:

In [166]:
def compute_Poissons_looped(N=10000):
    theta = np.zeros((N,N))
    
    dx = 1./float(N)
    
    
    theta[:,0] = 1.
    counter = 0
    while(counter < 10):
        for i in range(1,N-1,1):
            for j in range(1,N-1, 1):
                theta[i,j] = (theta[i,j+1] + theta[i,j-1] + theta[i+1,j] + theta[i-1,j]) * 0.25
            counter+=1
            

In [167]:
import time
start = time.time()
compute_Poissons_looped(N=10000)
print "total time elapsed = "+str(time.time() - start)

total time elapsed = 70.1255040169


In [168]:
def compute_Poissons_vectorized(N=10000):
    theta = np.zeros((N,N))
    dx = 1./float(N)
    theta[:,0] = 1.
    counter = 0
    while(counter<10):
        theta[1:-1,1:-1] = (theta[1:-1, 2:] + theta[1:-1, :-2] + theta[2:, 1:-1] + theta[:-2, 1:-1] ) * 0.25
        counter+=1

In [169]:
start = time.time()
compute_Poissons_vectorized()
print "total time elapsed = "+str(time.time() - start)

total time elapsed = 6.71180987358


In [170]:
70.125/6.711

10.449262405006705

So we see a 10 times speedup between vectorized and pure python code. Let's try out the pure python code with a numba jit call:

In [172]:
@numba.jit
def compute_Poissons_numba(N=10000):
    theta = np.zeros((N,N))
    
    dx = 1./float(N)
    
    
    theta[:,0] = 1.
    counter = 0
    while(counter < 10):
        for i in range(1,N-1,1):
            for j in range(1,N-1, 1):
                theta[i,j] = (theta[i,j+1] + theta[i,j-1] + theta[i+1,j] + theta[i-1,j]) * 0.25
            counter+=1

In [173]:
start = time.time()
compute_Poissons_numba()
print "total time elapsed = "+str(time.time() - start)

total time elapsed = 0.705707788467


Numba seems to have done better, with a 100 times speedup! Cython usually shows similar performance as compared to numba