# Numpy -  multidimensional data arrays

Based  on J.R. Johansson's notebook (jrjohansson at gmail.com)

## Introduction

The `numpy` package (module) is used in almost all numerical computation using Python. It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), performance is very good. 

To use `numpy` you need to import the module, using for example:

In [0]:
from numpy import *

In the `numpy` package the terminology used for vectors, matrices and higher-dimensional data sets is *array*. 



## Creating `numpy` arrays

There are a number of ways to initialize new numpy arrays, for example from

* a Python list or tuples
* using functions that are dedicated to generating numpy arrays, such as `arange`, `linspace`, etc.
* reading data from files

### From lists

For example, to create new vector and matrix arrays from Python lists we can use the `numpy.array` function.

In [5]:
# a vector: the argument to the array function is a Python list
v = array([1,2,3,4])

v

array([1, 2, 3, 4])

In [6]:
# a matrix: the argument to the array function is a nested Python list
M = array([[1, 2], [3, 4]])

M

array([[1, 2],
       [3, 4]])

The `v` and `M` objects are both of the type `ndarray` that the `numpy` module provides.

In [7]:
type(v), type(M)

(numpy.ndarray, numpy.ndarray)

The difference between the `v` and `M` arrays is only their shapes. We can get information about the shape of an array by using the `ndarray.shape` property.

In [8]:
v.shape

(4,)

In [9]:
M.shape

(2, 2)

The number of elements in the array is available through the `ndarray.size` property:

In [10]:
M.size

4

Equivalently, we could use the function `numpy.shape` and `numpy.size`

In [11]:
shape(M)

(2, 2)

In [12]:
size(M)

4

So far the `numpy.ndarray` looks awefully much like a Python list (or nested list). Why not simply use Python lists for computations instead of creating a new array type? 

There are several reasons:

* Python lists are very general. They can contain any kind of object. They are dynamically typed. They do not support mathematical functions such as matrix and dot multiplications, etc. Implementing such functions for Python lists would not be very efficient because of the dynamic typing.
* Numpy arrays are **statically typed** and **homogeneous**. The type of the elements is determined when the array is created.
* Numpy arrays are memory efficient.
* Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of `numpy` arrays can be implemented in a compiled language (C and Fortran is used).

Using the `dtype` (data type) property of an `ndarray`, we can see what type the data of an array has:

In [13]:
M.dtype

dtype('int64')

We get an error if we try to assign a value of the wrong type to an element in a numpy array:

In [14]:
M[0,0] = "hello"

ValueError: ignored

If we want, we can explicitly define the type of the array data when we create it, using the `dtype` keyword argument: 

In [15]:
M = array([[1, 2], [3, 4]], dtype=complex)

M

array([[1.+0.j, 2.+0.j],
       [3.+0.j, 4.+0.j]])

Common data types that can be used with `dtype` are: `int`, `float`, `complex`, `bool`, `object`, etc.

We can also explicitly define the bit size of the data types, for example: `int64`, `int16`, `float128`, `complex128`.

### Using array-generating functions

For larger arrays it is inpractical to initialize the data manually, using explicit python lists. Instead we can use one of the many functions in `numpy` that generate arrays of different forms. Some of the more common are:

In [16]:
# create a range

x = arange(0, 10, 1) # arguments: start, stop, step

x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]:
x = arange(-1, 1, 0.1)

x

array([-1.00000000e+00, -9.00000000e-01, -8.00000000e-01, -7.00000000e-01,
       -6.00000000e-01, -5.00000000e-01, -4.00000000e-01, -3.00000000e-01,
       -2.00000000e-01, -1.00000000e-01, -2.22044605e-16,  1.00000000e-01,
        2.00000000e-01,  3.00000000e-01,  4.00000000e-01,  5.00000000e-01,
        6.00000000e-01,  7.00000000e-01,  8.00000000e-01,  9.00000000e-01])

In [18]:
# using linspace, both end points ARE included
linspace(0, 10, 25)

array([ 0.        ,  0.41666667,  0.83333333,  1.25      ,  1.66666667,
        2.08333333,  2.5       ,  2.91666667,  3.33333333,  3.75      ,
        4.16666667,  4.58333333,  5.        ,  5.41666667,  5.83333333,
        6.25      ,  6.66666667,  7.08333333,  7.5       ,  7.91666667,
        8.33333333,  8.75      ,  9.16666667,  9.58333333, 10.        ])

#### mgrid

In [0]:
x, y = mgrid[0:5, 0:5] # similar to meshgrid in MATLAB

In [0]:
x

array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])

In [0]:
y

array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])

#### random data

In [0]:
from numpy import random

In [0]:
# uniform random numbers in [0,1]
random.rand(5,5)

array([[ 0.92932506,  0.19684255,  0.736434  ,  0.18125714,  0.70905038],
       [ 0.18803573,  0.9312815 ,  0.1284532 ,  0.38138008,  0.36646481],
       [ 0.53700462,  0.02361381,  0.97760688,  0.73296701,  0.23042324],
       [ 0.9024635 ,  0.20860922,  0.67729644,  0.68386687,  0.49385729],
       [ 0.95876515,  0.29341553,  0.37520629,  0.29194432,  0.64102804]])

In [0]:
# standard normal distributed random numbers
random.randn(5,5)

array([[ 0.117907  , -1.57016164,  0.78256246,  1.45386709,  0.54744436],
       [ 2.30356897, -0.28352021, -0.9087325 ,  1.2285279 , -1.00760167],
       [ 0.72216801,  0.77507299, -0.37793178, -0.31852241,  0.84493629],
       [-0.10682252,  1.15930142, -0.47291444, -0.69496967, -0.58912034],
       [ 0.34513487, -0.92389516, -0.216978  ,  0.42153272,  0.86650101]])

#### zeros and ones

In [0]:
zeros((3,3))

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [0]:
ones((3,3))

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

## More properties of numpy arrays

In [0]:
M.itemsize # bytes per element

8

In [0]:
M.nbytes # number of bytes

72

In [0]:
M.ndim # number of dimensions

2

## Manipulating arrays

### Indexing

We can index elements in an array using square brackets and indices:

In [0]:
# v is a vector, and has only one dimension, taking one index
v[0]

1

In [0]:
# M is a matrix, or a 2 dimensional array, taking two indices 
M[1,1]

0.47913739949636192

If we omit an index of a multidimensional array it returns the whole row (or, in general, a N-1 dimensional array) 

In [0]:
M

array([[ 0.77872576,  0.40043577,  0.66254019],
       [ 0.60410063,  0.4791374 ,  0.8237106 ],
       [ 0.96856318,  0.15459644,  0.96082399]])

In [0]:
M[1]

array([ 0.60410063,  0.4791374 ,  0.8237106 ])

The same thing can be achieved with using `:` instead of an index: 

In [0]:
M[1,:] # row 1

array([ 0.60410063,  0.4791374 ,  0.8237106 ])

In [0]:
M[:,1] # column 1

array([ 0.40043577,  0.4791374 ,  0.15459644])

We can assign new values to elements in an array using indexing:

In [0]:
M[0,0] = 1

In [0]:
M

array([[ 1.        ,  0.40043577,  0.66254019],
       [ 0.60410063,  0.4791374 ,  0.8237106 ],
       [ 0.96856318,  0.15459644,  0.96082399]])

In [0]:
# also works for rows and columns
M[1,:] = 0
M[:,2] = -1

In [0]:
M

array([[ 1.        ,  0.40043577, -1.        ],
       [ 0.        ,  0.        , -1.        ],
       [ 0.96856318,  0.15459644, -1.        ]])

### Index slicing

Index slicing is the technical name for the syntax `M[lower:upper:step]` to extract part of an array:

In [0]:
A = array([1,2,3,4,5])
A

array([1, 2, 3, 4, 5])

In [0]:
A[1:3]

array([2, 3])

Array slices are *mutable*: if they are assigned a new value the original array from which the slice was extracted is modified:

In [0]:
A[1:3] = [-2,-3]

A

array([ 1, -2, -3,  4,  5])

Negative indices counts from the end of the array (positive index from the begining):

In [0]:
A = array([1,2,3,4,5])

In [0]:
A[-1] # the last element in the array

5

In [0]:
A[-3:] # the last three elements

array([3, 4, 5])

### Fancy indexing

Fancy indexing is the name for when an array or list is used in-place of an index: 

In [0]:
row_indices = [1, 2, 3]
A[row_indices]

array([[10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34]])

In [0]:
col_indices = [1, 2, -1] # remember, index -1 means the last element
A[row_indices, col_indices]

array([11, 22, 34])

### Linear and Matrix algebra

Numpy's real strength is in optimized linear and matrix algebric operations on vectors and matrices, but that's less relevant here.

### Data processing

Often it is useful to store datasets in Numpy arrays. Numpy provides a number of functions to calculate statistics of datasets in arrays. 

For example, let's calculate some properties from the Stockholm temperature dataset used above.

In [0]:
# reminder, the tempeature dataset is stored in the data variable:
data = random.randint(10,size=(8,8))
shape(data)

(8, 8)

#### mean

In [0]:
mean(data[:,3])

5.25

#### standard deviations and variance

In [0]:
std(data[:,3]), var(data[:,3])

(1.6393596310755001, 2.6875)

#### min and max

In [0]:
data[:,3].min()

4

In [0]:
data[:,3].max()

9

#### sum, prod, and their cumulative versions

In [0]:
d = arange(0, 10)
d

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [0]:
# sum up all elements
sum(d)

45

In [0]:
# product of all elements
prod(d+1)

3628800

In [0]:
# cummulative sum
cumsum(d)

array([ 0,  1,  3,  6, 10, 15, 21, 28, 36, 45])

In [0]:
# cummulative product
cumprod(d+1)

array([      1,       2,       6,      24,     120,     720,    5040,
         40320,  362880, 3628800])

## Iterating over array elements

Generally, we want to avoid iterating over the elements of arrays whenever we can (at all costs). The reason is that in a interpreted language like Python (or MATLAB), iterations are really slow compared to vectorized operations. 

However, sometimes iterations are unavoidable. For such cases, the Python `for` loop is the most convenient way to iterate over an array:

In [0]:
v = array([1,2,3,4])

for element in v:
    print(element)

1
2
3
4


In [0]:
M = array([[1,2], [3,4]])

for row in M:
    print("row", row)
    
    for element in row:
        print(element)

row [1 2]
1
2
row [3 4]
3
4


When we need to iterate over each element of an array and modify its elements, it is convenient to use the `enumerate` function to obtain both the element and its index in the `for` loop: 

In [0]:
for row_idx, row in enumerate(M):
    print("row_idx", row_idx, "row", row)
    
    for col_idx, element in enumerate(row):
        print("col_idx", col_idx, "element", element)
       
        # update the matrix M: square each element
        M[row_idx, col_idx] = element ** 2

row_idx 0 row [1 2]
col_idx 0 element 1
col_idx 1 element 2
row_idx 1 row [3 4]
col_idx 0 element 3
col_idx 1 element 4


In [0]:
# each element in M is now squared
M

array([[ 1,  4],
       [ 9, 16]])