# Short Numpy Tutorial

This is a very short introduction to the library numpy (http://www.numpy.org), focused on one of its basic data structures, `ndarray`. Numpy is the most important scientific package in the Python ecosystem, because it provides a common datastructure on which many other packages are build on.

![Python scientific ecosystem](http://luispedro.org/files/talks/2013/EuBIAS/figures/sciwheel.png)

To make this tutorial work on Python 3, let's import some future features:

In [1]:
from __future__ import print_function, division

In [2]:
# np is the standard abbreviation for numpy in the code
# Even the numpy docs use it
import numpy as np

## What is an ndarray?

The `ndarray` is the biggest contribution of numpy. An ndarray is

- a regular grid of N-dimensions,
- homogeneous by default (all the elements have the same type),
- contiguous block of memory with types corresponding to machine types (8-bit ints, 32 bit floats, 64-bit longs, ...).
- by default float64 is considered.

### Building an array (inline)

We can build an array explicitely from a Python list:

In [3]:
arr = np.array([
    [1.2, 2.3, 4.0],
    [1.2, 3.4, 5.2],
    [0.0, 1.0, 1.3],
    [0.0, 1.0, 2e-1]])

print(arr)

[[1.2 2.3 4. ]
 [1.2 3.4 5.2]
 [0.  1.  1.3]
 [0.  1.  0.2]]


**Note:** 

Check what parameters the array() command has e.g. how to specify the type of the array. 

Create a 2D matrix of 3 rows and 4 columns initializing with values chosen by you and specify the type of the elements as float.

    

In [4]:
print (arr.dtype)

float64


**Exercise:** Check in [numpy](docs.scipy.org) what other ways there are to create an array?()

Create a three-dimensional array of 100x100x3 elements of type integer stored by 32 bits. How many ways to do it, do you find?

**Help**: [Array creation.](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.creation.html)

### Inspecting array properties

In [5]:
print(arr.dtype)
print(arr.ndim)
print(arr.shape)

float64
2
(4, 3)


This array is of `float64`, it has 2 dimensions and its shape is 4 rows and 3 columns.

When constructing an array, we can explicitly specify the type:

In [6]:
iarr = np.array([1,2,3], dtype='uint8')
print(iarr)

[1 2 3]


Arithmetic operations on the array : we should take into account that the type has to be respected.

In [7]:
arr *= 2.5
iarr *= 2
print(arr)
print(iarr)

[[ 3.    5.75 10.  ]
 [ 3.    8.5  13.  ]
 [ 0.    2.5   3.25]
 [ 0.    2.5   0.5 ]]
[2 4 6]


Ex: What is the problem of:

`iarr *= 2.5 ?`

In [8]:
iarr *= 2.5 

UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('uint8') with casting rule 'same_kind'

The solution is to convert the array elements to float.

Check in [numpy](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html) the different types and how to convert an array to a different type.

In [9]:
iarr.astype(float)*2.5

array([ 5., 10., 15.])

Has the type of the `iarr` array chaged?
Notice that numpy array creates variable with a certain type. If we do not consider it, our code will not work!

In [10]:
print(iarr.dtype)
print(iarr)


uint8
[2 4 6]


## Indexing

### Slicing & Dicing

We can use Python's `[]` operator to slice and address the array:

Below, you can see some examples of how we can read a matrix: 

In [11]:
print(arr) # The whole matrix
print(arr[0,0]) # 0 row, 0 column
print(arr[1]) # The whole 1 row
print(arr[:,2]) # The whole 2 column

[[ 3.    5.75 10.  ]
 [ 3.    8.5  13.  ]
 [ 0.    2.5   3.25]
 [ 0.    2.5   0.5 ]]
3.0
[ 3.   8.5 13. ]
[10.   13.    3.25  0.5 ]


### Working with slices of an array.

Slices share memory with the original array! In the following code, the variable `view` corresponds to a slice of the array `arr`.

In [12]:
# The position arr[1,0] = x. If we move its value to view, and modify view, we can see how arr[1,0] also is modifed.

print("Before: {}".format(arr[1,0]))

# adding 100
view = arr[1]
view[0] += 100

print("After: {}".format(arr[1,0]))

Before: 3.0
After: 103.0


Note that by default Python assignation assumes sharing internal memory.

**Exercise:** How can we avoid memory sharing between variables? We should use the `.copy()` function.

In [13]:
print("Before: {}".format(arr[1,0]))
view = arr.copy()[1]
view[0] += 100
print("After: {}".format(arr[1,0]))

Before: 103.0
After: 103.0


### Visual illustration of slicing

In [14]:
a = np.array([
       [ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

![slicing](https://scipy-lectures.github.io/_images/numpy_indexing.png)

This image is taken from [scipy-lectures](https://scipy-lectures.github.io/intro/numpy/array_object.html), a more complete tutorial on numpy than what we have here.

## [Boolean operations](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.logic.html)

An important subset of operations with numpy arrays concerns using logical operators to build boolean arrays. For example:


In [15]:
print (arr)
is_greater_one = (arr >= 1.)
print(is_greater_one)

[[  3.     5.75  10.  ]
 [103.     8.5   13.  ]
 [  0.     2.5    3.25]
 [  0.     2.5    0.5 ]]
[[ True  True  True]
 [ True  True  True]
 [False  True  True]
 [False  True False]]


Put -100 in all elements of `arr` that are bigger than 10:

In [16]:
arr[(arr>10)] = -100
print(arr)

[[   3.      5.75   10.  ]
 [-100.      8.5  -100.  ]
 [   0.      2.5     3.25]
 [   0.      2.5     0.5 ]]


Construct a second array `arr2` that contains only the values of `arr` that are between 5 and 10:

In [17]:
arr2 = arr[(arr>5)&(arr<10)]
print(arr2)

[5.75 8.5 ]


Check what other [logical operations](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.logic.html) there are in numpy.

## [Basic functions on arrays](http://www.scipy-lectures.org/intro/numpy/operations.html)



In [18]:
arr.mean()

-13.666666666666666

Also available: `max`, `min`, `sum`, `ptp` (point-to-point, i.e., difference between maximum and minimum values).

These functions can also work *axis-wise*:

In [19]:
arr.mean(axis=0)

array([-24.25  ,   4.8125, -21.5625])

In order to *save* code lines, an important trick is to combine logical operations:

In [20]:
is_greater_one = (arr > 1)
print(is_greater_one)
print(is_greater_one.mean())

[[ True  True  True]
 [False  True False]
 [False  True  True]
 [False  True False]]
0.5833333333333334


## Broadcasting

You can often perform operations along the array rows or columns: 

In [21]:
print(arr)
print("Now adding [1,1,0] to *every row*")
print()
arr += np.array([1,1,0])
print(arr)

[[   3.      5.75   10.  ]
 [-100.      8.5  -100.  ]
 [   0.      2.5     3.25]
 [   0.      2.5     0.5 ]]
Now adding [1,1,0] to *every row*

[[   4.      6.75   10.  ]
 [ -99.      9.5  -100.  ]
 [   1.      3.5     3.25]
 [   1.      3.5     0.5 ]]


Add the vector [1,2,3,4] to each column:

In [22]:
print(arr)
arr3 = arr.transpose()
arr3 += np.array([1,2,3,4])
arr = arr3.transpose()
print(arr)

[[   4.      6.75   10.  ]
 [ -99.      9.5  -100.  ]
 [   1.      3.5     3.25]
 [   1.      3.5     0.5 ]]
[[  5.     7.75  11.  ]
 [-97.    11.5  -98.  ]
 [  4.     6.5    6.25]
 [  5.     7.5    4.5 ]]


The exact [rules of how broadcasting works](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) are a bit complex to explain, but it generally works as expected. For example, if your data is a set of measurements for a sample, and your columns are the different types of measurements, then, you can easily remove the mean like this:

## Footnotes

[homogeneous]: There is a loophole to get heterogeneous arrays, namely an array of `object`. Then, you can store any Python object. This comes at the cost of decreased computational efficiency (both in terms of processing time and memory usage).

In [23]:
arr = np.array([
    [1.2, 2.3, 4.0],
    [1.2, 3.4, 5.2],
    [0.0, 1.0, 1.3],
    [0.0, 1.0, 2e-1]])
print ('The original matrix is:\n ', arr)
print()
print('The average value per column is: ',arr.mean(0))
print('The average value per row is: ',arr.mean(1))
print()
# here we make a copy of the variable arr since we will modify it several times
arr_aux1 = arr.copy()
arr_aux2 = arr.copy()

# we substract the average values to the whole matrix, first row based and later column based

arr_aux1 -= arr.mean(0)
print('The average value after subtracting the average values per row is: ') 
print(arr_aux1)

arr_aux2 = arr_aux2.transpose()
arr_aux2 -= arr.mean(1)
print('The average value after subtracting the average values per coloumn is: ')
print(arr_aux2)
print()

# The normalization is performed by dividing the matrix by its mean. 
print(arr.mean())
print ('The original matrix after normalizing is:\n ', arr/arr.mean())

The original matrix is:
  [[1.2 2.3 4. ]
 [1.2 3.4 5.2]
 [0.  1.  1.3]
 [0.  1.  0.2]]

The average value per column is:  [0.6   1.925 2.675]
The average value per row is:  [2.5        3.26666667 0.76666667 0.4       ]

The average value after subtracting the average values per row is: 
[[ 0.6    0.375  1.325]
 [ 0.6    1.475  2.525]
 [-0.6   -0.925 -1.375]
 [-0.6   -0.925 -2.475]]
The average value after subtracting the average values per coloumn is: 
[[-1.3        -2.06666667 -0.76666667 -0.4       ]
 [-0.2         0.13333333  0.23333333  0.6       ]
 [ 1.5         1.93333333  0.53333333 -0.2       ]]

1.7333333333333332
The original matrix after normalizing is:
  [[0.69230769 1.32692308 2.30769231]
 [0.69230769 1.96153846 3.        ]
 [0.         0.57692308 0.75      ]
 [0.         0.57692308 0.11538462]]
