# Numpy

## Introduction to numpy arrays

* The numpy module is the essential workhorse of scientific computing in python
* The numpy module provides access to numpy multidimensional arrays
* Lists in python are very flexible and versatile, but they don't have a predefined type and are not generaly stored in consecutive positions in memory; it is therefore quite time-consuming to interate over them to make mathematical operations
* Numpy implements arrays coded in C, stored in contiguous memory positions, and having all of them the same type
* Structure of a numpy array: Header with metadata plus the real data itself

<img src="threefundamental.jpg">

* Being coded in C, numpy arrays offer a much better performance in looping
* Moreover, numpy arrays behave as vectors in other languages (Matlab): Multiplication of arrays is defined element-wise
* Vectorialized code in numpy takes full advantage of the hard coded C part, and can accelerate enormously python calculations: iterating over arrays to make calculations is slower that using the vectorial operations


To use numpy, we have first we have to load the module

In [None]:
# loading numpy in abbreaviated form
import numpy as np

In [None]:
# import also useful stuff
import math
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
%config  InlineBackend.figure_format="svg"

## Create numpy arrays

### From a list

In [None]:
# From an array
arr = np.array([1,3,5,7,8,9])
print arr

In [None]:
x = [i/10.0 for i in range(1, 100)]
y = [math.sin(float(i)) for i in x]
x = np.array(x)
y = np.array(y)

### Useful buil-in array creation functions

In [None]:
# Built in useful creation of arrays

n = 1000  

# array of ones of lenght n
one = np.ones(n)
print "Ones:", one[:10]

# array of zeros of lenght n (useful to initialize counters)
zero = np.zeros(n)
print "Zeros:", zero[:10]

# array with some starting point, end point, and step between points
arr = np.arange(5.1, 11.5, 0.3)
print "Generic:", arr[:10]
print "Generic, last element:", arr[-1]

# equi-spaced linear space with start, end, and number of points
lin = np.linspace(1, 50, 10)
print "Linear space:", lin

# equi-spaced log space with start, end, and number of points
log = np.logspace(np.log10(1), np.log10(50), 10)
print "Logarithmic space:", log

## Properties of numpy arrays

In [None]:
# size
print x.size

In [None]:
# type

In [None]:
a1 = np.array([0,2,3])
print a1.dtype

In [None]:
a2 = np.array([0.0,2.0,3.0])
print a2.dtype

### Specify the type of an array

In [None]:
# array on integer zeros
a1 = np.zeros(100, dtype=int)
print a1[:10]

In [None]:
# array on float ones
a2 = np.ones(100, dtype=float)
print a2[:10]

## Multidimensional numpy arrays

Numpy arrays can be multidimensional. We can define arrays of dimension $N \times M$ (matrices) or of larger dimensionality

In [None]:
# Create an 4x4 array
mat = np.array([[0,1,9], [2,3,9], [4,5,9]])
print mat

Multidimensional arrays can be accessed using standard python slicing notation, and some other useful slicing tricks. Remember however that the numbering of elements starts at 0, as in C, and opposite to fortran or matlab

In [None]:
# you can access the elements using a comma notation, starting at 0
print mat[0,0]

# or perform slices of columns and rows
# ex. accessing the first column
print mat[:,0]

# or defining directy a list or tuple of coordinates
pos = (0, 1)
print mat[pos]

Also, you can create multidimensional arrays using convenience functions

In [None]:
mat = np.ones((4,4))  # note the special tuple syntax
print mat

mat = np.zeros((4,4))  # note the special tuple syntax
print mat

In multidimensional arrays, some of the methods can be applied only along one given axis, using the `axis=` keyword argument

Numering of axis follows the C notation: 0 for rows, 1 for columns, etc.

<img src="axis_2.jpeg" width="200"> 

If we specify a method for a given axis, its action will be performed along the given axis, returning a n-1 dimensional array

<img src="axis.png" width="500">


In [None]:
mat = np.array([[0,1,9], [2,3,9], [4,5,9]])
print mat

In [None]:
mat.sum(axis=0)  # sum along the columns, gives the value corresponding to each colum

In [None]:
mat.sum(axis=1)  # sum along the rows, gives the value corresponding to each row

## Operations with arrays

Numpy arrays behave as vectors, with the caveat that operations are applied in an element-wise fashion, as you may be used from Matlab. When we multiply them by a scalar, sum them, and even apply to them functions (numpy functions, of course), the operation is performed element by element

In [None]:
# create two simple numpy arrays
x = np.array([1, 2, 3, 4, 5])
y = np.array([6, 7, 8, 9, 10])

print 2 * x
print x + y

## Numpy functions and methods

Numpy offers many commodity functions, sometimes stored inside submodules, specially designed to deal with numpy arrays in a fast way. Numpy functions are accesses using the dot notation `np.`

Apart from this functions, numpy arrays are fully-fledges objects, so they have associated many methods that can replace functions. Methods are called with the usual dot nomenclature, in object oriented programming

     object.method(parameters)

Let us see some examples

#### Generating random numbers, using the module numpy.random

In [None]:
# generate n random numbers distributed uniformely
n = 1000 
x_np = np.random.rand(n)
plt.plot(x_np[:100])  # plot the first 100 random numbers only

In [None]:
# generate n random numbers distributed normally
n = 1000 
x_np = np.random.randn(n)
plt.plot(x_np[:100])  # plot the first 100 random numbers only

#### Reading and writing data

* We can read and write data from/to files using basic python commands. 
* Numpy offers, however, fast and efficient functions to read and write homogeneous numerical data, using the  **np.loadtxt** and **np.savetxt** functions. 
* They can be used to write a set of vectors (with the same lenght!) to a file, ordered by columns. 

Get information from ipython

In [None]:
np.savetxt?

* Basic usage

        np.savetxt(filename, objects_to_save)
        
* objects_to_store is a tuple of objects (vectors) that we function will save
* The problem here is that it will save the objects one by one. If we give two vectors, it will save two rows of number
* if we have a set of vectors we want to store as columns, so as to represent arguments and values of a function, we have to "zip" them


Example

In [None]:
# generate a set of x values
x = np.linspace(-25, 25, 100)

# generate a suitable y function
y = x**3 - 10*x**2

In [None]:
# save the values 
# we have to "zip" the values, to create a two-column format
np.savetxt("Prova.dat", zip(x,y))

In [None]:
# magic command
%more Prova.dat

In [None]:
# now read the values again
# we have to unzip them, we can do it with the option unpack=True
x1, y1 = np.loadtxt("Prova.dat", unpack=True)

In [None]:
# check original and read are equal
y == y1

## Fancy indexing

A very useful way of accessing numpy array elements is "fancy indexing", in which we can access a set of elements that are not consequtive (as in normal python slicing).

There are two versions:

In [None]:
# create a test array
x = 2*np.arange(10)
print x

In [None]:
# first way: access directly the indices you want to retrieve
indices = [0, 3, 7]
x[indices]

In [None]:
# second way: create an index array with boolen (True, False) values. 
# True values will be retrieved
indices = np.array([False, True, False, True, False, False, True, True, False, True])
x[indices]

This can be conveniently used using numpy array comparisons, that yield a boolean result

In [None]:
x = np.random.rand(25)
print x

# look for the values of x larger than some value
cut = 0.5

# printing the explicit values
print x[x>cut]

# calculating the number of elements fulfilling the condition; 
# boolean values are treated as 0's and 1's
print np.sum(x>cut)

In [None]:
# other forms of finding values
np.where(x>cut)[0]  # result is given as a tuple, collect only the first element

The `np.where` function provides the indices of the elements in which the condition in true

## Matrix operations

Numpy offers a `matrix` object, that is similar to an bidimensial `array`, but behaves as a matrix, im particular with respect to multiplication

In [None]:
# create a bidimensiona array, using a nested list comprehension

m = [[i**2 +j/2.0 + 1 for i in range(1,5)] for j in range(1,5)]

In [None]:
# trasnform to array
A = np.array(m)
print A

In [None]:
# transform to matrix
M = np.matrix(A)
print M

Seem similar, but they are different objects: M is an actual matrix, as in linear algebra

In [None]:
print A*A  # element-wise multiplication

In [None]:
print M*M  # real matrix multiplication

### Linear algebra with numpy matrices

We can perform linear algebra operations with numpy matrices

In [None]:
M = np.matrix([[1,3], [7,9]])
print M

In [None]:
print M.T # transpose matrix

In [None]:
# determinant, from the submodule linalg
np.linalg.det(M)

In [None]:
# inverse, from the submodule linalg
np.linalg.inv(M)

In [None]:
# eigenvalues and eigenvectors
np.linalg.eig(M)

In [None]:
# Check
evals = np.linalg.eig(M)[0]
evecs = np.linalg.eig(M)[1]

v = evecs[:,0]


print M*v

evals[0]*v

## Numpy efficiency

Operations performed in Numpy can be orders of magnitude faster than those in pure python. This is due to the fact that numpy function and operations are coded in C under the hood. 

To take full advantage of numpy efficiency, use always vectorial operations when possible, and never mix numpy function (i.e. `np.sum()`) with python functions (`sum()`)

Let us create some large arrays or random numbers in python, and multiply them

In [None]:
n = 10000 # choose a reasonable number according to the speed of your computer!!
x = []; y = []
for i in range(n):
    temp1 = np.random.rand()
    temp2 = np.random.rand()
    x.append(temp1)
    y.append(temp2)

Let is first multiply them the python way

In [None]:
z = []
for i in range(n):
    temp = x[i] * y[i]
    z.append(temp)

To evaluate the time taken by a python statement we can use the line magic 
**%timeit** (when it is a single line command) or the cell magic **%%timeit** (when we have a whole cell of calculations to evaluate)

In [None]:
%%timeit
z = []
for i in range(n):
    temp = x[i] * y[i]
    z.append(temp)

Let's try a more pythonic way: a list comprehension

In [None]:
% timeit z = [x[i] * y[i] for i in range(len(x))]

Better, because we avoid growing the z list. In this sort of calculation, list comprehensions  are to be preferred

We can now make it in numpy

In [None]:
# first transform the lists into numpy arrays
x_np = np.array(x)
y_np = np.array(y)

In [None]:
%%timeit
# let us perform the operation iterating over arrays
# create first an empty numpy array
z_np = np.zeros(n)
for i in range(n):
    z_np[i] = x_np[i]*y_np[i]

Even worse than with python arrays

In [None]:
# pure numpy operation, using vectorial notation
%timeit z_np = x_np*y_np

When working with numpy arrays, avoid using them as lists; take advantage of vectorial form of operations whenever possible

Numpy arrays behave as vectors, with the caveat that operations are applied in an element-wise fashin. When we multiply them by a scalar, sum them, and even apply to them functions (numpy functions, of course), the operation is performed element by element

In [None]:
# beware of mixing numpy and python functions, a common source of error
print "Python sum"
%timeit sum(x)

print "numpy sum"
%timeit np.sum(x_np)

print "other numpy sum"
%timeit x_np.sum()