# Numpy

Author: Manuel Dalcastagnè. This work is licensed under a CC Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/).

Original material, "Numpy - multidimensional data arrays", was created by J.R. Johansson under the CC Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/) and can be found at https://github.com/jrjohansson/scientific-python-lectures.

## Introduction

Numpy (Numerical Python) is a linear algebra library in Python.

Almost every data science Python package such as SciPy (Scientific Python), Matplotlib (plotting library) or Scikit-learn , etc depends on it. Among its main features:

* an efficient N-dimensional `array` object, used to represent vectors and matrices in N dimensions
* broadcasting functions, which operate element-wise operations on arrays
* basic linear algebra functions (products of arrays, decompositions of matrices, eigenvalues computation, ...)
* random number capabilities (random numbers generation, probability distributions sampling)

To use `numpy` you need to import the module, using for example:

In [2]:
import numpy as np

The `numpy` package is implemented in C and Fortran, so when calculations are vectorized (formulated with vectors and matrices), performance is better with respect to Standard Python Library data structures.

## Creating `numpy` arrays

There are several ways to create numpy arrays:

* from Python lists
* using dedicated functions
* reading data from files

### Creating arrays from lists

For example, to create a vector or a matrix from Python lists, use the `numpy.array` function:

In [5]:
# a vector: the argument to the array function is a Python list
v = np.array([1,2,3,4])

v

array([1, 2, 3, 4])

In [123]:
# a 2x3 matrix: the argument to the array function is a nested Python list
M = np.array([[1, 2, 3], [4, 5, 6]])

M

array([[1, 2, 3],
       [4, 5, 6]])

We can get information about the **shape of an array** by checking the `shape` property:

In [10]:
v.shape

(4,)

In [9]:
M.shape

(2, 3)

The **number of elements in an array** is available through the `size` property:

In [8]:
M.size

4

### Creating arrays using dedicated functions

For larger arrays it is inpractical to initialize them manually, so we can use dedicated `numpy` functions that generate arrays of different forms:

#### arange

In [23]:
# create an array using a range
x = np.arange(0, 10, 1) # arguments: start, stop, step
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

The combination of `arange` and `reshape` functions can help us to create also matrices:

In [4]:
A = np.arange(3*3).reshape(3,3)
A

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

#### random data

In [39]:
# uniform random numbers in [0,1]
np.random.rand(5,4)

array([[0.16172757, 0.23684204, 0.82024222, 0.62128889],
       [0.4567138 , 0.30690682, 0.21821515, 0.0874465 ],
       [0.00904519, 0.64143856, 0.36682142, 0.07890752],
       [0.75268633, 0.10000581, 0.94109466, 0.39316802],
       [0.75207789, 0.05554725, 0.41800985, 0.47743048]])

In [42]:
# standard normal distributed random numbers
np.random.randn(4,5)

array([[ 1.38752873, -0.09170818, -0.34827605,  1.00971658,  0.11500774],
       [ 2.57489578, -0.69992167,  1.35077086, -0.13138451,  1.48878672],
       [-2.51320899, -0.53284792, -0.51178907, -0.97731113, -0.93748183],
       [ 0.56851753,  0.37275313,  1.61631382, -1.06288808,  0.40909852]])

#### diagonal matrix

In [37]:
# a diagonal matrix
np.diag([1,2,3])

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

#### zeros and ones

In [39]:
np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [40]:
np.ones((3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

### Creating arrays reading data files

A very common file format for data files is comma-separated values (CSV), or related formats such as TSV (tab-separated values). To read data from such files into Numpy arrays we can use the `numpy.genfromtxt` function. For example, 

In [17]:
!head stockholm_td_adj.dat

1800  1  1    -6.1    -6.1    -6.1 1
1800  1  2   -15.4   -15.4   -15.4 1
1800  1  3   -15.0   -15.0   -15.0 1
1800  1  4   -19.3   -19.3   -19.3 1
1800  1  5   -16.8   -16.8   -16.8 1
1800  1  6   -11.4   -11.4   -11.4 1
1800  1  7    -7.6    -7.6    -7.6 1
1800  1  8    -7.1    -7.1    -7.1 1
1800  1  9   -10.1   -10.1   -10.1 1
1800  1 10    -9.5    -9.5    -9.5 1


In [18]:
data = np.genfromtxt('stockholm_td_adj.dat')

In [43]:
data.shape

(77431, 7)

Using `numpy.savetxt`, we can store a Numpy array to a file in CSV format:

In [20]:
np.savetxt("stockholm.csv", data)

## Manipulating arrays

### Indexing

We can index elements in an array using square brackets and indices:

In [59]:
# v is a vector, and has only one dimension, taking one index
v[0]

1

In [60]:
# M is a matrix, or a 2 dimensional array, taking two indices 
M[1,1]

5

We can obtain full rows or columns using `:` instead of an index: 

In [77]:
M[1,:] # row 1

array([4, 5, 6])

In [75]:
M[:,1] # column 1

array([2, 5])

We can assign new values to elements in an array using indexing:

In [80]:
M[0,0] = 10

In [81]:
M

array([[10,  2,  3],
       [ 4,  5,  6]])

In [82]:
# also works for full rows and columns
M[:,2] = 20

In [83]:
M

array([[10,  2, 20],
       [ 4,  5, 20]])

### Index slicing

Index slicing is the technical name for the syntax `M[lower:upper:step]` to extract part of an array:

In [89]:
A = np.array([1,2,3,4,5])
A

array([1, 2, 3, 4, 5])

In [90]:
A[1:3]

array([2, 3])

Index slicing can be used to modify the original array:

In [92]:
A[0:2] = [10,20]

A

array([10, 20,  3,  4,  5])

We can omit any of the three parameters in `M[lower:upper:step]`:

In [94]:
A[::] # omitting lower, upper, step: nothing changes

array([10, 20,  3,  4,  5])

In [96]:
A[::2] # using only step, lower and upper are the defaults (beginning and end of the array)

array([10,  3,  5])

Negative indices counts from the end of the array:

In [46]:
A = np.array([1,2,3,4,5])
A

array([1, 2, 3, 4, 5])

In [47]:
A[-1] # the last element in the array

5

Index slicing works exactly the same way for multidimensional arrays:

In [48]:
A = np.arange(12).reshape(3,4)
A

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [52]:
# to select the upper-left 2x3 block from the original matrix
A[0:2, 0:3]

array([[0, 1, 2],
       [4, 5, 6]])

### Fancy indexing

Fancy indexing is a technique that uses boolean or integer arrays (**masks**) as indexes of other arrays. If the mask is a boolean array, then elements are selected (True) or not (False) depending on the value of the mask.

This feature is very useful to conditionally select elements from an array, using for example comparison operators:

In [53]:
x = np.arange(0, 10, 1)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [54]:
mask = (x < 5)
mask

array([ True,  True,  True,  True,  True, False, False, False, False,
       False])

In [29]:
x[mask]

array([0, 1, 2, 3, 4])

## Linear algebra

Vectorizing code is the key to writing efficient numerical calculation with Python/Numpy. That means that a program should be formulated in terms of matrix and vector operations as much as possible.

### Scalar operations

We can use the usual arithmetic operators to multiply, add, subtract, and divide arrays with scalar numbers:

In [25]:
v1 = np.arange(0, 5)
v1

array([0, 1, 2, 3, 4])

In [33]:
# multiplication and sum examples on 1-d array
v1 * 2, v1 + 1

(array([0, 2, 4, 6, 8]), array([1, 2, 3, 4, 5]))

In [67]:
# scalar operations apply to arrays of any dimension:
A

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [68]:
A + 1

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

### Element-wise operations

When we add, subtract, multiply and divide arrays with each other, the default behaviour is element-wise operations:

In [72]:
v1

array([0, 1, 2, 3, 4])

In [75]:
v1 + v1, v1 * v1

(array([0, 2, 4, 6, 8]), array([ 0,  1,  4,  9, 16]))

### Matrix algebra

We can cast the array objects to the type `matrix`. This changes the behavior of the standard arithmetic operators +, -, * to use matrix algebra:

In [86]:
M = np.matrix(A)
M

matrix([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

In [93]:
# after casting a 1-d array to the matrix type, it can be transposed
v = np.matrix(v1).T
v

matrix([[0],
        [1],
        [2],
        [3],
        [4]])

In [95]:
# inner product
v.T * v

matrix([[30]])

In [97]:
# of course, matrices can be transposed as well
M.T

matrix([[ 0,  4,  8],
        [ 1,  5,  9],
        [ 2,  6, 10],
        [ 3,  7, 11]])

In [99]:
# to compute the inverse of a matrix
M.I

matrix([[-0.3375    , -0.1       ,  0.1375    ],
        [-0.13333333, -0.03333333,  0.06666667],
        [ 0.07083333,  0.03333333, -0.00416667],
        [ 0.275     ,  0.1       , -0.075     ]])

In [109]:
# to compute the determinant of a matrix
M = np.matrix(A[0:3,0:3])
np.linalg.det(M)

0.0

The `np.linalg` package contains many useful functions for linear algebra. To check them out, run:

In [1]:
help(np.linalg)

NameError: name 'np' is not defined

## Further reading

For more information about numpy:
* https://docs.scipy.org/doc/numpy/user/quickstart.html
* https://docs.scipy.org/doc/numpy/reference/

# EXERCISE 5

Given a 1-dimensional vector v with n rows and a 2-dimensional matrix M with n columns, without using any built-in function, define functions to:
 - compute the inner product of two vectors v1 and v2
 - compute the product of x and M
 
TIP: you can use Python lists

In [124]:
v = np.array([2,1,3])
M = np.array([[1,2,3],[4,5,6],[7,8,9]])