In [1]:
import numpy as np

In [2]:
# Use ? to get help (see the docstring) about commands in Jupyter notebook
?np.array

[1;31mDocstring:[0m
array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0,
      like=None)

Create an array.

Parameters
----------
object : array_like
    An array, any object exposing the array interface, an object whose
    __array__ method returns an array, or any (nested) sequence.
    If object is a scalar, a 0-dimensional array containing object is
    returned.
dtype : data-type, optional
    The desired data-type for the array.  If not given, then the type will
    be determined as the minimum type required to hold the objects in the
    sequence.
copy : bool, optional
    If true (default), then the object is copied.  Otherwise, a copy will
    only be made if __array__ returns a copy, if obj is a nested sequence,
    or if a copy is needed to satisfy any of the other requirements
    (`dtype`, `order`, etc.).
order : {'K', 'A', 'C', 'F'}, optional
    Specify the memory layout of the array. If object is not an array, the
    newly created array will be i

In the `numpy` package the terminology used for vectors, matrices and higher-dimensional data sets is *array*. Numpy's array object is called `ndarray`, for N-dimensional array.



In [3]:
a = [[1,2], [3,4]]
a

[[1, 2], [3, 4]]

In [4]:
import matplotlib.pyplot as plt # For making plots. Ignore this for now.

## Creating `numpy` arrays

There are a number of ways to initialize new numpy arrays, for example from

* a Python list or tuples
* using functions that are dedicated to generating numpy arrays, such as `arange`, `linspace`, etc.
* reading data from files

### From lists

For example, to create new vector and matrix arrays from Python lists we can use the `numpy.array` function.

In [5]:
# a vector: the argument to the array function is a Python list
v = np.array([1,2,3,4])
v

array([1, 2, 3, 4])

In [6]:
# a matrix: the argument to the array function is a nested Python list
M = np.array([[1, 2], [3, 4]])
M

array([[1, 2],
       [3, 4]])

The vector has 1 dimension, the matrix has 2. We learn this with `numpy.ndim`.

In [7]:
np.ndim(v), np.ndim(M)

(1, 2)

The `v` and `M` objects are both of the type `ndarray` that the `numpy` module provides.

In [8]:
type(v), type(M)

(numpy.ndarray, numpy.ndarray)

The difference between the `v` and `M` arrays is their shapes. We can get information about the shape of an array by using the `ndarray.shape` property.

In [9]:
v.shape

(4,)

In [10]:
M.shape

(2, 2)

The number of elements in the array is available through the `ndarray.size` property:

In [11]:
v.size

4

In [12]:
M.size

4

Equivalently, we could use the function `numpy.shape` and `numpy.size`

In [13]:
np.shape(M)

(2, 2)

In [14]:
np.size(M)

4

So far the `numpy.ndarray` looks awfully much like a Python list (or nested list). Why not simply use Python lists for computations instead of creating a new array type? 

There are several reasons:

* Python lists are very general. They can contain any kind of object. They are dynamically typed. They do not support mathematical functions such as matrix and dot multiplications, etc. Implementing such functions for Python lists would not be very efficient because of the dynamic typing.
* Numpy arrays are **statically typed** and **homogeneous**. The type of the elements is determined when the array is created.
* Numpy arrays are memory efficient.
* Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of `numpy` arrays can be implemented in a compiled language (C and Fortran is used).

Using the `dtype` (data type) property of an `ndarray`, we can see what type the data of an array has:

In [15]:
M.dtype

dtype('int32')

We get an error if we try to assign a value of the wrong type to an element in a numpy array:

In [16]:
M[0,0] = "hello"

ValueError: invalid literal for int() with base 10: 'hello'

If we want, we can explicitly define the type of the array data when we create it, using the `dtype` keyword argument: 

In [None]:
M = np.array([[1, 2], [3, 4]], dtype=complex)

M

array([[1.+0.j, 2.+0.j],
       [3.+0.j, 4.+0.j]])

Common data types that can be used with `dtype` are: `int`, `float`, `complex`, `bool`, `object`, etc.

We can also explicitly define the bit size of the data types, for example: `int64`, `int16`, `float128`, `complex128`.

### Using array-generating functions

For larger arrays it is inpractical to initialize the data manually, using explicit python lists. Instead we can use one of the many functions in `numpy` that generate arrays of different forms. Some of the more common are:

#### arange

In [None]:
# create a range

x = np.arange(0, 10, 0.5) # arguments: start, stop, step. Like the function range for lists!
x

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,
       6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])

In [None]:
x = np.arange(-1, 1, 0.1) # Here we can use floats and non-integer steps. You could not do this with lists
x

array([-1.00000000e+00, -9.00000000e-01, -8.00000000e-01, -7.00000000e-01,
       -6.00000000e-01, -5.00000000e-01, -4.00000000e-01, -3.00000000e-01,
       -2.00000000e-01, -1.00000000e-01, -2.22044605e-16,  1.00000000e-01,
        2.00000000e-01,  3.00000000e-01,  4.00000000e-01,  5.00000000e-01,
        6.00000000e-01,  7.00000000e-01,  8.00000000e-01,  9.00000000e-01])

#### linspace and logspace

In [None]:
# using linspace, both end points ARE included
np.linspace(0, 10, 25)

array([ 0.        ,  0.41666667,  0.83333333,  1.25      ,  1.66666667,
        2.08333333,  2.5       ,  2.91666667,  3.33333333,  3.75      ,
        4.16666667,  4.58333333,  5.        ,  5.41666667,  5.83333333,
        6.25      ,  6.66666667,  7.08333333,  7.5       ,  7.91666667,
        8.33333333,  8.75      ,  9.16666667,  9.58333333, 10.        ])

In [None]:
np.logspace(0, 1, 5, base=10)

array([ 1.        ,  1.77827941,  3.16227766,  5.62341325, 10.        ])

In [None]:
import math
np.logspace(0, 1, 5, base=math.exp(1))

array([1.        , 1.28402542, 1.64872127, 2.11700002, 2.71828183])

#### Random data

In [None]:
# from numpy import random # numpy has also its set of random functions

In [None]:
# uniform random numbers in [0,1]
np.random.rand(5,5)

array([[0.43021793, 0.82033851, 0.89678771, 0.81017829, 0.29565113],
       [0.58191595, 0.75487943, 0.93924612, 0.86161537, 0.99752068],
       [0.40506814, 0.37168366, 0.35684993, 0.33135694, 0.22026233],
       [0.53827661, 0.89653147, 0.57683356, 0.33117457, 0.4399993 ],
       [0.9739789 , 0.62902052, 0.99634883, 0.67558085, 0.76076692]])

standard normal distributed random numbers $\mu = 0$ and $\sigma^2=1$

In [None]:
np.random.randn(5,5)

array([[-0.47274164,  1.4669506 , -0.74965209,  0.97001866, -1.31370017],
       [-0.98727804, -0.62631531,  0.69748105,  2.3346479 , -0.69485054],
       [-1.37791058,  2.30260866,  0.02043416, -1.33827798,  0.26637643],
       [-1.14752437, -0.67774987,  1.26661883,  0.70847312, -1.38208694],
       [-1.02617664, -1.26352223, -0.49281733,  0.11709263, -0.077243  ]])

There is a huge variety of functions that you can use: 
<img src="random1.png" width="700px"/>



and you can generate samples from all the major distributions.
Have a look at the documentation at https://docs.scipy.org/doc/numpy-1.14.1/reference/routines.random.html:
<img src="random2.png" width="700px"/>

#### zeros and ones

You can also create arrays filled with the same element:

In [None]:
np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [None]:
np.ones((3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

The function `numpy.empty` initializes an "empty" array.

In [None]:
np.empty((3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

 <span style="color:red">It is not safe to assume that np.empty will return an array of all zeros. In many cases it will return uninitialized garbage values!</span>

### Broadcasting

The term *broadcasting* refers to operations between arrays of different shapes or even types. For example adding a scalar to each element of a matrix. Being able to do this very fast and in a human-readable way is one of the strengths of numpy.

In [None]:
np.zeros((3,3)) + 2

array([[2., 2., 2.],
       [2., 2., 2.],
       [2., 2., 2.]])

In [None]:
np.ones((3,3)) * 4

array([[4., 4., 4.],
       [4., 4., 4.],
       [4., 4., 4.]])

## Manipulating arrays
### Indexing
We can index elements in an array using square brackets and indices:

In [None]:
v = np.array([1,2,3,4])
M = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])     
# v is a vector, and has only one dimension, taking one index
M

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
# M is a matrix, or a 2 dimensional array, taking two indices ;
M[1,1]

5

<img src="indexing.png" width="400px"/>

If we omit an index of a multidimensional array it returns the whole row (or, in general, a N-1 dimensional array) 

In [None]:
M

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
M[0]

array([1, 2, 3])

In [None]:
M[:,0]

array([1, 4, 7])

*** 
### Quick exercise
If I have a list of lists, what is the syntax to access the first element of the first list? Pay attention to the difference with arrays!

In [None]:
a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] # How to select the number 1 in this list of lists?

In [None]:
a[0][0]

1

***

### Index slicing

Index slicing is the technical name for the syntax `M[lower:upper:step]` to extract part of an array:

In [None]:
A = np.array([1,2,3,4,5])
A

array([1, 2, 3, 4, 5])

It works in the same way as for **lists**.

In [None]:
A[1:3]

array([2, 3])

In [None]:
A[1:3] = [-2,-3]
A

array([ 1, -2, -3,  4,  5])

In [None]:
A[::] # lower, upper, step all take the default values

array([ 1, -2, -3,  4,  5])

In [None]:
A[::2] # step is 2, lower and upper defaults to the beginning and end of the array

array([ 1, -3,  5])

In [None]:
A[:3] # first three elements

array([ 1, -2, -3])

In [None]:
A[3:] # elements from index 3

array([4, 5])

Negative indices counts from the end of the array (positive index from the beginning):

In [None]:
A[-1:]

array([5])

In [None]:
A[-2:]

array([4, 5])

Index slicing works exactly the same way for multidimensional arrays:

In [None]:
A = np.array([[n+m*10 for n in range(5)] for m in range(5)])

A

array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

In [None]:
# a block from the original array
A[1:4, 1:4]

array([[11, 12, 13],
       [21, 22, 23],
       [31, 32, 33]])

<img src="files/slicing.png" width="400px"/>

The devil is in the details!

In [None]:
A[4:,:]


array([[40, 41, 42, 43, 44]])

In [None]:
A[4,:]

array([40, 41, 42, 43, 44])

### An index slice only creates a view!
An important distinction from lists is that array slices are *views* on the original array. This means that  <span style="color:red">the data is not copied, and any modifications to the view will be reflected in the source array!</span>

In [None]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
arr_part = arr[5:8]
arr_part

array([5, 6, 7])

In [None]:
arr_part[:] = 666
arr

array([  0,   1,   2,   3,   4, 666, 666, 666,   8,   9])

In [None]:
arr[:] = 999
arr_part

array([999, 999, 999])

arr_part is just a view. If we want a real new object, independent from the old one, we need to create a .copy(). Let's try again:

In [None]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
arr_slice = arr[5:8].copy()
arr_slice

array([5, 6, 7])

In [None]:
arr_slice[:] = 666
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
arr[:] = 999
arr_slice

array([666, 666, 666])

<span style="color:red">If you want a copy of a slice of an ndarray instead of a view, you will need to explicitly copy the array; for example `arr[5:8].copy()`</span>

### Fancy indexing
Fancy indexing is the name for when an array or list is used in-place of an index: 

In [None]:
A

array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

In [None]:
row_indices = [1, 2, -1]
A[row_indices,:] # this selects the second, third and fourth row of A, and all its columns

array([[10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [40, 41, 42, 43, 44]])

In [None]:
A[row_indices] #this is equivalent to the expression above

array([[10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [40, 41, 42, 43, 44]])

<span style="color:red">Fancy indexing, unlike slicing, always copies the data into a new array.</span>

In [None]:
row_indices = [1, 2, 3]
A

array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

In [None]:
col_indices = [1, 2, -1] # remember, index -1 means the last element
A[row_indices, col_indices]

array([11, 22, 34])

*** 
### Quiz
If this is matrix `A`:  
`[[ 0,  1,  2,  3,  4],  
  [10, 11, 12, 13, 14],  
  [20, 21, 22, 23, 24],  
  [30, 31, 32, 33, 34],  
  [40, 41, 42, 43, 44]]`

Then what is the output of `A[[1,2,3], [1,-1,2]]`?
***

In [None]:
A[[1,2,3], [1,-1,2]]

array([11, 24, 32])

We can also use index *masks*: If the index mask is an Numpy array of data type `bool`, then an element is selected (True) or not (False) depending on the value of the index mask at the position of each element: 

In [None]:
B = np.array([n for n in range(5)])
B

array([0, 1, 2, 3, 4])

In [None]:
row_mask = np.array([True, False, True, False, False])
B[row_mask]

array([0, 2])

In [None]:
# same thing
row_mask = np.array([1,0,1,0,0], dtype=bool) #1 is true, 0 is false
B[row_mask]

array([0, 2])

This feature is very useful to conditionally select elements from an array, using for example comparison operators:

In [None]:
x = np.arange(1, 7, 0.5)
x

array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5])

In [None]:
mask = (3 < x) & (x < 6.5)
# Always use parentheses for mask conditions. Only then can you join them with & and | 

mask

array([False, False, False, False, False,  True,  True,  True,  True,
        True,  True, False])

In [None]:
x[mask]

array([3.5, 4. , 4.5, 5. , 5.5, 6. ])

 <span style="color:red">The Python keywords `and` and `or` do not work with boolean arrays. Use instead `&` and `| `.</span>

## Functions for extracting data from arrays and creating arrays

### where
The index mask can be converted to position index using the `where` function

In [None]:
x = np.arange(1, 7, 0.5)
mask = (3 < x) & (x < 6.5)
indices = np.where(mask)
indices # Note that this is a tuple with one array element

(array([ 5,  6,  7,  8,  9, 10], dtype=int64),)

In [None]:
x[indices] # this indexing is equivalent to the fancy indexing x[mask]

array([3.5, 4. , 4.5, 5. , 5.5, 6. ])

In [None]:
x

array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5])

Setting values with boolean arrays works in a common-sense way. To set all of the negative values to 0 we need only do:

In [None]:
x[x < 3] = 0
x

array([0. , 0. , 0. , 0. , 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5])

## File I/O

### Comma-separated values (CSV)

A very common file format for data files is comma-separated values (CSV), or related formats such as TSV (tab-separated values). 

In [None]:
!head stockholm_temperatures.dat # head is a shell command that displays the beginning of a file

'head' is not recognized as an internal or external command,
operable program or batch file.


The file stockholm_temperatures.dat contains the temperature in Stockholm since 1800 until 2011. The first three columns are respectively year, month and day, and the last column is the temperature.

To read data from such files into Numpy arrays we can use the `numpy.loadtxt` function. For example:

In [None]:
data = np.loadtxt('stockholm_temperatures.dat') 

In [None]:
data.shape

(77431, 4)

In [None]:
# Ignore this code for now - we will explain it later.
fig, ax = plt.subplots(figsize=(14,4))
ax.plot(data[:,0]+data[:,1]/12.0+data[:,2]/365, data[:,3])
ax.set_title('Temperatures in Stockholm')
ax.set_xlabel('Year')
ax.set_ylabel('Temperature (C)')

ModuleNotFoundError: No module named 'matplotlib'

Using `numpy.savetxt` we can store a Numpy array to a file in CSV format:

In [None]:
M = np.random.rand(3,3)
M

array([[0.12114907, 0.82126095, 0.90076284],
       [0.98117124, 0.94774101, 0.01591459],
       [0.15731425, 0.33745763, 0.90968523]])

In [None]:
np.savetxt("random-matrix.csv", M)

In [None]:
!head random-matrix.csv

'head' is not recognized as an internal or external command,
operable program or batch file.


In [None]:
np.savetxt("random-matrix.csv", M, fmt='%.5f') # fmt specifies the format
!head random-matrix.csv

### Numpy's native file format (uncompressed)

`np.save` and `np.load` are the two workhorse functions for efficiently saving and loading array data on disk. Arrays are saved by default in an uncompressed raw binary format with file extension `.npy`.

In [None]:
np.save("random-matrix.npy", M)
!file random-matrix.npy # file is a shell command that displays the file type

'file' is not recognized as an internal or external command,
operable program or batch file.


In [None]:
np.load("random-matrix.npy")

array([[0.12114907, 0.82126095, 0.90076284],
       [0.98117124, 0.94774101, 0.01591459],
       [0.15731425, 0.33745763, 0.90968523]])

### Numpy's native file format (compressed)

You save multiple arrays in a zip archive using `np.savez` and passing the arrays as keyword arguments:

In [None]:
np.savez('array_archive.npz', a=M, b=data)

When loading an .npz file, you get back a dict-like object which loads the individual arrays:

In [None]:
arch = np.load('array_archive.npz')
arch['a']

### Special values

In [None]:
a = np.arange(4)
a = a/0 # [0/0 1/0 2/0 3/0]
a

  a = a/0 # [0/0 1/0 2/0 3/0]
  a = a/0 # [0/0 1/0 2/0 3/0]


array([nan, inf, inf, inf])

In [None]:
np.nan == np.nan # nan is not equal to anything, not even nan. Seemingly breaks the law of identity!

False

In [None]:
np.isnan(a) # nan is nan

array([ True, False, False, False])

In [None]:
np.isinf(a) # nan is not infinite

array([False,  True,  True,  True])

In [None]:
np.isfinite(a) # nan is not finite

array([False, False, False, False])

#### Selecting subsets of a real data set

We can select subsets of the data in an array using indexing, fancy indexing, and the other methods of extracting data from an array (described above), and run computations later.

For example, if we want to calculate the average temperature in 1971 only, we can create a mask in the following way:

In [None]:
# reminder, the temperature dataset is stored in the data variable:
np.shape(data)

(77431, 4)

In [None]:
mask = (data[:,0] == 1971)      # mask is a boolean array
print(mask)
data[mask,0]


[False False False ... False False False]


array([1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971.,
       1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971.,
       1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971.,
       1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971.,
       1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971.,
       1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971.,
       1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971.,
       1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971.,
       1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971.,
       1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971.,
       1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971.,
       1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971.,
       1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971.,
       1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971., 1971.,
       1971., 1971.,

In [None]:
data[mask,3]    # <= 3 is temperature

array([ -9.2,  -6.5,  -4.7,  -0.3,  -6.8, -11.6,  -1.9,   4.3,   4.6,
         5.9,   4.8,   1.9,   0.8,  -0.9,  -1.5,  -1.5,  -3.9,   2.4,
         1.3,   2. ,   2. ,   1.8,   2.5,   3.7,   4.9,   3.7,   1.7,
        -2.1,  -5.1,  -6.3,   0.2,  -2.7,  -4. ,  -0.6,   1.6,   3.4,
         2.4,  -0.4,  -1.9,  -4.8,  -1.9,   2.5,   2.4,   3.1,   2.5,
         1.7,   1.5,   0.5,   0. ,  -0.3,   0.5,   1.1,   1.1,  -2.1,
         0.6, -11.9,  -8.2,  -8.3,  -7.9,  -7.7,  -8.5, -10.8, -13.5,
        -8.6,  -5.4,  -0.5,   2.1,   0. ,  -4.6,  -3.5,  -3. ,  -0.8,
         0.4,  -0.1,   0.8,   1.6,   1.7,   1.8,   1.3,   0. ,  -2.2,
        -4.4,  -0.3,   3.6,   3.3,  -0.8,  -1.4,   1. ,  -0.5,   0.6,
         4.2,   4. ,   3.6,  -0.7,   0.1,   0.9,   2.5,   5.6,   3.7,
         4.2,   6.1,   5.8,   2.7,   2.3,   3. ,   4.6,   5.2,   6.5,
         9. ,   9.8,   7.6,   3.3,   2.1,   0.5,  -0.3,   0.1,   2.2,
         3.6,   1.2,   3.3,   6.4,   9.9,   8.5,   7.4,   9.4,  12.2,
        14.7,  13.2,

In [None]:
print("This is the mean temperature in Stockholm in 1971: "+str(np.mean(data[mask,3])))

This is the mean temperature in Stockholm in 1971: 6.9301369863013695


If we are interested in the average temperature only in a particular month, say February, then we can create a index mask and use it to select only the data for that month using:

In [None]:
np.unique(data[:,1]) # the month column takes values from 1 to 12       <= 1 is month

array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.])

In [None]:
mask_feb = (data[:,1] == 2)

In [None]:
# the temperature data is in column 3
np.mean(data[mask_feb,3])

-3.212109570736596

#### More functions: sum, prod, and trace

In [None]:
d = np.arange(0, 4)
d

array([0, 1, 2, 3])

In [None]:
# sum up all elements
np.sum(d)

6

In [None]:
# product of all elements
d+1     #[1, 2, 3, 4]   
np.prod(d+1)

24

In [None]:
# cummulative sum
np.cumsum(d)

array([0, 1, 3, 6])

In [None]:
# cummulative product
np.cumprod(d+1)

array([ 1,  2,  6, 24])

When you have two dimensional objects, you can specificy along which dimension (axis) you want to perform the sum (or mean, or the maximum, etc.)
<img src="sum_axis.png">

In [None]:
x = np.array([[1, 1], [2, 2]])
print(x.sum(axis=0))   # columns (first dimension)      [3 3]
print(x[:, 0].sum(), x[:, 1].sum())                     # 3 3
print(x.sum(axis=1))   # rows (second dimension)        [2 4]
print(x[0, :].sum(), x[1, :].sum())                     # 2 4

[3 3]
3 3
[2 4]
2 4


## Reshaping, resizing and stacking arrays

The shape of an Numpy array can be modified without copying the underlaying data, which makes it a fast operation even for large arrays.

In [None]:
A

NameError: name 'A' is not defined

In [None]:
n, m = A.shape

NameError: name 'A' is not defined

In [None]:
B = A.reshape((1,n*m))
B

array([[ 0,  1,  2,  3,  4, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30,
        31, 32, 33, 34, 40, 41, 42, 43, 44]])

In [None]:
B[0,0:5] = 5 # modify the array

B

array([[ 5,  5,  5,  5,  5, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30,
        31, 32, 33, 34, 40, 41, 42, 43, 44]])

In [None]:
A # and the original variable is also changed. B is only a different view of the same data

array([[ 5,  5,  5,  5,  5],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

We can also use the function `flatten` to make a higher-dimensional array into a vector. But this function creates a copy of the data.

In [None]:
B = A.flatten()
B

array([ 5,  5,  5,  5,  5, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31,
       32, 33, 34, 40, 41, 42, 43, 44])

In [None]:
B[0:5] = 10
B

array([10, 10, 10, 10, 10, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31,
       32, 33, 34, 40, 41, 42, 43, 44])

In [None]:
A # now A has not changed, because B's data is a copy of A's, not refering to the same data

array([[ 5,  5,  5,  5,  5],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

## Adding a new dimension: newaxis

With `newaxis`, we can insert new dimensions in an array, for example converting a vector to a column or row matrix:

In [17]:
v = np.array([1,2,3])

In [18]:
v.shape

(3,)

In [19]:
# make a column matrix of the vector v
v[:,np.newaxis]

array([[1],
       [2],
       [3]])

In [20]:
# column matrix
v[:,np.newaxis].shape

(3, 1)

In [21]:
# row matrix
v[np.newaxis,:].shape

(1, 3)

## Stacking and repeating arrays

Using function `repeat`, `tile`, `vstack`, `hstack`, and `concatenate` we can create larger vectors and matrices from smaller ones:

### tile and repeat

In [22]:
a = np.array([[1, 2], [3, 4]])

In [23]:
# repeat each element 3 times
np.repeat(a, 3)

array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4])

In [24]:
# tile the matrix 3 times 
np.tile(a, 3)

array([[1, 2, 1, 2, 1, 2],
       [3, 4, 3, 4, 3, 4]])

### concatenate

In [25]:
b = np.array([[5, 6]])

In [26]:
np.concatenate((a, b), axis=0)

array([[1, 2],
       [3, 4],
       [5, 6]])

In [27]:
np.concatenate((a, b.T), axis=1)

array([[1, 2, 5],
       [3, 4, 6]])

### hstack and vstack

In [28]:
np.vstack((a,b))

array([[1, 2],
       [3, 4],
       [5, 6]])

In [29]:
np.hstack((a,b.T))

array([[1, 2, 5],
       [3, 4, 6]])

## Using arrays in conditions

When using arrays in conditions, for example `if` statements and other boolean expressions, one needs to use `any` or `all`, which requires that any or all elements in the array evalutes to `True`:

In [30]:
M

array([[1, 2],
       [3, 4]])

In [None]:
if (M > 5).any():
    print("at least one element in M is larger than 5")
else:
    print("no element in M is larger than 5")

In [None]:
if (M > 5).all():
    print("all elements in M are larger than 5")
else:
    print("all elements in M are not larger than 5")

## Using numpy on a photo

Look at each other or do normal people things. Don't look into the cam. 📷

In [31]:
from skimage import io
photo = io.imread('notintothematrix.jpg')
plt.imshow(photo);

ModuleNotFoundError: No module named 'skimage'

In [32]:
photo.shape

NameError: name 'photo' is not defined

In [None]:
np.ndim(photo)

In [None]:
plt.imshow(photo[100:520, 350:820]);

In [None]:
plt.imshow(photo[::-1]);

In [None]:
plt.imshow(photo[::2, ::2]);

In [None]:
# when np.where is used with 3 arguments, it replaces all True elements with the second, and all False elements with the third
photo_masked = np.where(photo > 155, photo, 0) 
plt.imshow(photo_masked);

In [None]:
photo_water = np.copy(photo)
wlevel = int(photo.shape[0]*0.7)
photo_water[wlevel:, :, :-1] = 0
plt.imshow(photo_water);
io.imsave('beforeidsp.jpg', photo_water)

Advanced optional exercise for home: Try to understand what the one line of code below does.  
See also: https://tannerhelland.com/2011/10/01/grayscale-image-algorithm-vb6.html

In [None]:
photo_gs = np.dstack([photo.dot([0.2126, 0.7152, 0.0722])] * 3).astype(np.uint8)

plt.imshow(photo_gs);

### Matrixification!

<img src="matrixformula.png" width="400px"/>

Put sunglasses on! Look cool and into the cam. 📷

In [None]:
photo2 = io.imread('intothematrix.jpg')
plt.imshow(photo2);

In [None]:
photo_matrix = np.array(photo2, dtype = np.float32)

# Normalize: [0,255] -> [0,1] 
photo_matrix /= 255
# Apply the matrix formula
photo_matrix[:, :, 0] **= (3/2)
photo_matrix[:, :, 1] **= (4/5)
photo_matrix[:, :, 2] **= (3/2)

# De-normalize: [0,1] -> [0,255]
photo_matrix = (photo_matrix*255).astype(np.uint8)

f = plt.figure(figsize=(20,20))
plt.imshow(photo_matrix);

In [None]:
io.imsave('inthematrix.jpg', photo_matrix)

***
<center>Everything below: Read at home.</center>

## Linear algebra

Vectorizing code is the key to writing efficient numerical calculation with Python/Numpy. That means that as much as possible of a program should be formulated in terms of matrix and vector operations, like matrix-matrix multiplication.

### Scalar-array operations

We can use the usual arithmetic operators to multiply, add, subtract, and divide arrays with scalar numbers.

In [None]:
v1 = np.arange(0, 5)

In [None]:
v1 * 2

In [None]:
v1 + 2

In [None]:
A * 2

In [None]:
A + 2

### Element-wise array-array operations

When we add, subtract, multiply and divide arrays with each other, the default behaviour is **element-wise** operations:

In [None]:
A * A # element-wise multiplication

In [None]:
v1 * v1

If we multiply arrays with compatible shapes, we get an element-wise multiplication of each row:

In [None]:
A.shape, v1.shape

In [None]:
A * v1

### Matrix algebra

What about matrix mutiplication? There are two ways. We can either use the `dot` function, which applies a matrix-matrix, matrix-vector, or inner vector multiplication to its two arguments: 

In [None]:
np.dot(A, A)

In [None]:
np.dot(A, v1)

In [None]:
np.dot(v1, v1)

Alternatively, we can cast the array objects to the type `matrix`. This changes the behavior of the standard arithmetic operators `+, -, *` to use matrix algebra.

In [None]:
M = np.matrix(A)
v = np.matrix(v1).T # make it a column vector

In [None]:
v

In [None]:
M * M

In [None]:
M * v

In [None]:
# inner product
v.T * v

In [None]:
# with matrix objects, standard matrix algebra applies
v + M*v

If we try to add, subtract or multiply objects with incompatible shapes we get an error:

In [None]:
v = np.matrix([1,2,3,4,5,6]).T

In [None]:
M.shape, v.shape

In [None]:
M * v

See also the related functions: `inner`, `outer`, `cross`, `kron`, `tensordot`. Try for example `help(kron)`.

With the diag function we can also extract the diagonal and subdiagonals of an array:

In [None]:
A

In [None]:
np.diag(A)

In [None]:
np.diag(A, -1)

## Vectorizing functions

As mentioned several times by now, to get good performance we should try to avoid looping over elements in our vectors and matrices, and instead use vectorized algorithms. The first step in converting a scalar algorithm to a vectorized algorithm is to make sure that the functions we write work with vector inputs.

In [None]:
def Theta(x):
    """
    Scalar implemenation of the Heaviside step function.
    """
    if x >= 0:
        return 1
    else:
        return 0

In [None]:
Theta(np.array([-3,-2,-1,0,1,2,3]))

OK, that didn't work because we didn't write the `Theta` function so that it can handle a vector input... 

To get a vectorized version of Theta we can use the Numpy function `vectorize`. In many cases it can automatically vectorize a function:

In [None]:
Theta_vec = np.vectorize(Theta)

In [None]:
Theta_vec(np.array([-3,-2,-1,0,1,2,3]))

We can also implement the function to accept a vector input from the beginning (requires more effort but might give better performance):

In [None]:
def Theta(x):
    """
    Vector-aware implemenation of the Heaviside step function.
    """
    return 1 * (x >= 0)

In [None]:
Theta(np.array([-3,-2,-1,0,1,2,3]))

In [None]:
# still works for scalars as well
Theta(-1.2), Theta(2.6)