# NumPy Basics

NumPy is the standard numerical library available in the `python` realm. It allows quicker computations on numerical array-like structures. Central objects in the `numpy` library are `ndarray`s. These are *homogenuous* $n$-dimensional arrays ; elements of the array are all of the same type. 

Efficiency of `ndarray` objects come from the fact that element-wise operations are `C` implemented to ensure low complexity. One can translate loop-like operations on array-like structures into available corresponding implementation for `ndarrays`. This process is called *vectorisation* ; it improves efficiency and must be on mind when dealing with scientific programming. 

## Defining an `ndarray` object

In [1]:
import numpy as np

In [2]:
import math as m

In [3]:
np_matrix = np.array([1, 2, 4., 5])


In [4]:
np_matrix.shape

(4,)

One can build up a $2$-dimensional `ndarray` object as a list of lists. The matrix in such a case is given line by line. An `ndarray` object comes with a lot of attributes, we'll be seeing a number of them while going on. Here are the ones enclosing the shape of the array.

In [5]:
np_matrix = np.array([[1, 2, 3, 4.], [5., -1, 2, 9]])

In [6]:
np_matrix.shape

(2, 4)

In many cases one has to initialize an `ndarray`, either by giving random coefficients to the elements of the matrix or by giving a specified type matrix. Here are the standard available `ndarray`s.

In [7]:
np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [8]:
np.ones((2, 3))

array([[1., 1., 1.],
       [1., 1., 1.]])

In [9]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [10]:
np.diag((1, 2, 3, 4))

array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

To build up a random `ndarray` one can use available `numpy` built-in random generators.

In [11]:
from numpy.random import rand

In [12]:
rand(4, 4)

array([[0.76425459, 0.98091991, 0.43173604, 0.31227521],
       [0.1528977 , 0.13749055, 0.79624266, 0.05448083],
       [0.07898931, 0.34066338, 0.79497688, 0.07414634],
       [0.37199593, 0.98227545, 0.51651654, 0.81906027]])

In [13]:
from numpy.random import randn

In [14]:
randn(200, 3)

array([[-0.87653486, -1.97073607,  0.45322584],
       [-0.50810557, -2.40395073,  0.02177317],
       [ 1.155939  , -0.2928725 ,  0.29200311],
       [-2.01951881, -1.72140583, -1.52103878],
       [ 1.02505454,  2.16196225,  1.71642139],
       [ 0.44190514, -1.0590599 , -0.31853681],
       [-1.13160313,  1.36396776, -0.46528554],
       [-0.49706232, -0.7374113 ,  1.25498042],
       [ 0.21143837,  0.14497661, -0.9378222 ],
       [ 0.79834154,  0.58532659, -1.84424415],
       [-2.01954778,  0.18962273, -0.32954326],
       [-0.16896669, -0.44638457, -1.32390144],
       [ 1.20886625, -0.31101644, -0.52632596],
       [-1.06927424, -0.37444869,  0.23857414],
       [ 1.21890267,  1.28658449,  0.10597233],
       [ 2.30960213, -1.10717406, -0.47263661],
       [-0.4538636 ,  0.09969097,  0.7682317 ],
       [-0.61444916, -0.16400036,  0.16879669],
       [ 0.41188579, -1.34780672, -0.63659488],
       [ 1.08022905,  1.10641984, -1.80468293],
       [-0.42266913, -0.82347331,  1.136

A useful way of building up matrices out of lists is to reshape the standard one-line corresponding numpy array object. 

In [15]:
np_matrix.reshape(4, 2)

array([[ 1.,  2.],
       [ 3.,  4.],
       [ 5., -1.],
       [ 2.,  9.]])

In [16]:
A = [1]*30
A

[1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1]

In [17]:
np_A = np.array(A)

In [18]:
np_B = np_A.reshape(3, -1)

In [19]:
np_B

array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

In [20]:
np_B.shape

(3, 10)

Another useful array definition is the one given by `arange`. It is the `numpy` version of `python` range. It returns a one dimensional array containing an arithmetic sequence of integers following range syntax.

In [21]:
np.arange?

In [22]:
np_A = np.arange(-1, 10, 2)
np_A

array([-1,  1,  3,  5,  7,  9])

In [23]:
np_A = np_A.reshape(-1, 2)
np_A

array([[-1,  1],
       [ 3,  5],
       [ 7,  9]])

In many applications one looks for a sequence of floats modelling the real line. A way of generating such a `numpy` array is to use the `linspace` function.

In [24]:
np.linspace?

In [25]:
np_A = np.linspace(-10, 10, 100)
np_A

array([-10.        ,  -9.7979798 ,  -9.5959596 ,  -9.39393939,
        -9.19191919,  -8.98989899,  -8.78787879,  -8.58585859,
        -8.38383838,  -8.18181818,  -7.97979798,  -7.77777778,
        -7.57575758,  -7.37373737,  -7.17171717,  -6.96969697,
        -6.76767677,  -6.56565657,  -6.36363636,  -6.16161616,
        -5.95959596,  -5.75757576,  -5.55555556,  -5.35353535,
        -5.15151515,  -4.94949495,  -4.74747475,  -4.54545455,
        -4.34343434,  -4.14141414,  -3.93939394,  -3.73737374,
        -3.53535354,  -3.33333333,  -3.13131313,  -2.92929293,
        -2.72727273,  -2.52525253,  -2.32323232,  -2.12121212,
        -1.91919192,  -1.71717172,  -1.51515152,  -1.31313131,
        -1.11111111,  -0.90909091,  -0.70707071,  -0.50505051,
        -0.3030303 ,  -0.1010101 ,   0.1010101 ,   0.3030303 ,
         0.50505051,   0.70707071,   0.90909091,   1.11111111,
         1.31313131,   1.51515152,   1.71717172,   1.91919192,
         2.12121212,   2.32323232,   2.52525253,   2.72

In [26]:
np_A = np_A.reshape(-1, 4)
np_A.shape

(25, 4)

## Slicing 

There are many different ways of slicing an `ndarray`. One needs to be careful about the fact that some give back a view on a slice of the array others copy part of it.

In [27]:
np_A[10:16:2, 1:3]

array([[-1.71717172, -1.51515152],
       [-0.1010101 ,  0.1010101 ],
       [ 1.51515152,  1.71717172]])

Standard slicing gives views on subelements of `ndarray`. 

In [28]:
np_A

array([[-10.        ,  -9.7979798 ,  -9.5959596 ,  -9.39393939],
       [ -9.19191919,  -8.98989899,  -8.78787879,  -8.58585859],
       [ -8.38383838,  -8.18181818,  -7.97979798,  -7.77777778],
       [ -7.57575758,  -7.37373737,  -7.17171717,  -6.96969697],
       [ -6.76767677,  -6.56565657,  -6.36363636,  -6.16161616],
       [ -5.95959596,  -5.75757576,  -5.55555556,  -5.35353535],
       [ -5.15151515,  -4.94949495,  -4.74747475,  -4.54545455],
       [ -4.34343434,  -4.14141414,  -3.93939394,  -3.73737374],
       [ -3.53535354,  -3.33333333,  -3.13131313,  -2.92929293],
       [ -2.72727273,  -2.52525253,  -2.32323232,  -2.12121212],
       [ -1.91919192,  -1.71717172,  -1.51515152,  -1.31313131],
       [ -1.11111111,  -0.90909091,  -0.70707071,  -0.50505051],
       [ -0.3030303 ,  -0.1010101 ,   0.1010101 ,   0.3030303 ],
       [  0.50505051,   0.70707071,   0.90909091,   1.11111111],
       [  1.31313131,   1.51515152,   1.71717172,   1.91919192],
       [  2.12121212,   2

In [29]:
np_B = np_A[10:, 2]

In [30]:
np_A[10:, 2]

array([-1.51515152, -0.70707071,  0.1010101 ,  0.90909091,  1.71717172,
        2.52525253,  3.33333333,  4.14141414,  4.94949495,  5.75757576,
        6.56565657,  7.37373737,  8.18181818,  8.98989899,  9.7979798 ])

Slicing through boolean choices.

In [31]:
np_A < 2.

array([[ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False]])

In [32]:
np_A[np_A <  2]

array([-10.        ,  -9.7979798 ,  -9.5959596 ,  -9.39393939,
        -9.19191919,  -8.98989899,  -8.78787879,  -8.58585859,
        -8.38383838,  -8.18181818,  -7.97979798,  -7.77777778,
        -7.57575758,  -7.37373737,  -7.17171717,  -6.96969697,
        -6.76767677,  -6.56565657,  -6.36363636,  -6.16161616,
        -5.95959596,  -5.75757576,  -5.55555556,  -5.35353535,
        -5.15151515,  -4.94949495,  -4.74747475,  -4.54545455,
        -4.34343434,  -4.14141414,  -3.93939394,  -3.73737374,
        -3.53535354,  -3.33333333,  -3.13131313,  -2.92929293,
        -2.72727273,  -2.52525253,  -2.32323232,  -2.12121212,
        -1.91919192,  -1.71717172,  -1.51515152,  -1.31313131,
        -1.11111111,  -0.90909091,  -0.70707071,  -0.50505051,
        -0.3030303 ,  -0.1010101 ,   0.1010101 ,   0.3030303 ,
         0.50505051,   0.70707071,   0.90909091,   1.11111111,
         1.31313131,   1.51515152,   1.71717172,   1.91919192])

Behaviour of `ndarrays` within boolean conditions.

## Setting Coefficient Values

In [33]:
np_A[1, 2] = 1000.

In [34]:
np_A[::, 0] = -30

In [35]:
np_A

array([[-3.00000000e+01, -9.79797980e+00, -9.59595960e+00,
        -9.39393939e+00],
       [-3.00000000e+01, -8.98989899e+00,  1.00000000e+03,
        -8.58585859e+00],
       [-3.00000000e+01, -8.18181818e+00, -7.97979798e+00,
        -7.77777778e+00],
       [-3.00000000e+01, -7.37373737e+00, -7.17171717e+00,
        -6.96969697e+00],
       [-3.00000000e+01, -6.56565657e+00, -6.36363636e+00,
        -6.16161616e+00],
       [-3.00000000e+01, -5.75757576e+00, -5.55555556e+00,
        -5.35353535e+00],
       [-3.00000000e+01, -4.94949495e+00, -4.74747475e+00,
        -4.54545455e+00],
       [-3.00000000e+01, -4.14141414e+00, -3.93939394e+00,
        -3.73737374e+00],
       [-3.00000000e+01, -3.33333333e+00, -3.13131313e+00,
        -2.92929293e+00],
       [-3.00000000e+01, -2.52525253e+00, -2.32323232e+00,
        -2.12121212e+00],
       [-3.00000000e+01, -1.71717172e+00, -1.51515152e+00,
        -1.31313131e+00],
       [-3.00000000e+01, -9.09090909e-01, -7.07070707e-01,
      

## Universal Functions

Many standard mathematical functions are reimplemented in numpy to ensure efficiency.

In [36]:
np.exp(np_A)

  """Entry point for launching an IPython kernel.


array([[9.35762297e-14, 5.55637361e-05, 6.80029415e-05, 8.32269459e-05],
       [9.35762297e-14, 1.24662685e-04,            inf, 1.86727806e-04],
       [9.35762297e-14, 2.79692945e-04, 3.42308569e-04, 4.18942123e-04],
       [9.35762297e-14, 6.27518520e-04, 7.68002806e-04, 9.39937692e-04],
       [9.35762297e-14, 1.40789927e-03, 1.72308953e-03, 2.10884229e-03],
       [9.35762297e-14, 3.15875992e-03, 3.86592014e-03, 4.73139424e-03],
       [9.35762297e-14, 7.08698731e-03, 8.67357053e-03, 1.06153465e-02],
       [9.35762297e-14, 1.59003503e-02, 1.94600051e-02, 2.38165696e-02],
       [9.35762297e-14, 3.56739933e-02, 4.36604277e-02, 5.34348070e-02],
       [9.35762297e-14, 8.00380986e-02, 9.79564464e-02, 1.19886224e-01],
       [9.35762297e-14, 1.79573314e-01, 2.19774883e-01, 2.68976487e-01],
       [9.35762297e-14, 4.02890322e-01, 4.93086479e-01, 6.03475096e-01],
       [9.35762297e-14, 9.03923902e-01, 1.10628782e+00, 1.35395549e+00],
       [9.35762297e-14, 2.02804182e+00, 2.48206508e

In [37]:
m.exp(4.)

54.598150033144236

Standard algebraic operations on matrices implemented for `ndarrays`.

In [38]:
np_A * np_A

array([[9.00000000e+02, 9.60004081e+01, 9.20824406e+01, 8.82460973e+01],
       [9.00000000e+02, 8.08182838e+01, 1.00000000e+06, 7.37169677e+01],
       [9.00000000e+02, 6.69421488e+01, 6.36771758e+01, 6.04938272e+01],
       [9.00000000e+02, 5.43720029e+01, 5.14335272e+01, 4.85766758e+01],
       [9.00000000e+02, 4.31078461e+01, 4.04958678e+01, 3.79655137e+01],
       [9.00000000e+02, 3.31496786e+01, 3.08641975e+01, 2.86603408e+01],
       [9.00000000e+02, 2.44975003e+01, 2.25385165e+01, 2.06611570e+01],
       [9.00000000e+02, 1.71513111e+01, 1.55188246e+01, 1.39679625e+01],
       [9.00000000e+02, 1.11111111e+01, 9.80512193e+00, 8.58075707e+00],
       [9.00000000e+02, 6.37690032e+00, 5.39740843e+00, 4.49954086e+00],
       [9.00000000e+02, 2.94867871e+00, 2.29568411e+00, 1.72431385e+00],
       [9.00000000e+02, 8.26446281e-01, 4.99948985e-01, 2.55076013e-01],
       [9.00000000e+02, 1.02030405e-02, 1.02030405e-02, 9.18273646e-02],
       [9.00000000e+02, 4.99948985e-01, 8.26446281e

In [39]:
np_A + np_A

array([[-6.00000000e+01, -1.95959596e+01, -1.91919192e+01,
        -1.87878788e+01],
       [-6.00000000e+01, -1.79797980e+01,  2.00000000e+03,
        -1.71717172e+01],
       [-6.00000000e+01, -1.63636364e+01, -1.59595960e+01,
        -1.55555556e+01],
       [-6.00000000e+01, -1.47474747e+01, -1.43434343e+01,
        -1.39393939e+01],
       [-6.00000000e+01, -1.31313131e+01, -1.27272727e+01,
        -1.23232323e+01],
       [-6.00000000e+01, -1.15151515e+01, -1.11111111e+01,
        -1.07070707e+01],
       [-6.00000000e+01, -9.89898990e+00, -9.49494949e+00,
        -9.09090909e+00],
       [-6.00000000e+01, -8.28282828e+00, -7.87878788e+00,
        -7.47474747e+00],
       [-6.00000000e+01, -6.66666667e+00, -6.26262626e+00,
        -5.85858586e+00],
       [-6.00000000e+01, -5.05050505e+00, -4.64646465e+00,
        -4.24242424e+00],
       [-6.00000000e+01, -3.43434343e+00, -3.03030303e+00,
        -2.62626263e+00],
       [-6.00000000e+01, -1.81818182e+00, -1.41414141e+00,
      

In [43]:
np_B = np_A.dot(np_A.T)
np_B.shape

(25, 25)

## Exercise

Look into saving and loading numpy arrays.

In [45]:
np.save("np_B", np_B)

In [47]:
np_C = np.load("np_B.npy")
np_C.shape

(25, 25)

## Exercise

Compare efficiency of `numpy` matrix multiplication to naive function using built-in structures.

In [55]:
def matrix_mult(A, B):
    if A.shape[1] != B.shape[0]:
        raise Exception("Shapes of entries are not compatible.")
    C = np.zeros((A.shape[0], B.shape[1]))
    for i in range(A.shape[1]):
        for j in range(B.shape[0]):
            for k in range(A.shape[1]):
                C[i,j] += A[i, k] * B[k, j]
    return C

In [56]:
L = np.arange(10000)
A = L.reshape(-1, 100)
B = L.reshape(100, -1)

In [57]:
A.shape, B.shape

((100, 100), (100, 100))

In [59]:
%time matrix_mult(A, B)

CPU times: user 869 ms, sys: 5.92 ms, total: 875 ms
Wall time: 876 ms


array([[3.28350000e+07, 3.28399500e+07, 3.28449000e+07, ...,
        3.33151500e+07, 3.33201000e+07, 3.33250500e+07],
       [8.23350000e+07, 8.23499500e+07, 8.23649000e+07, ...,
        8.37851500e+07, 8.38001000e+07, 8.38150500e+07],
       [1.31835000e+08, 1.31859950e+08, 1.31884900e+08, ...,
        1.34255150e+08, 1.34280100e+08, 1.34305050e+08],
       ...,
       [4.83433500e+09, 4.83530995e+09, 4.83628490e+09, ...,
        4.92890515e+09, 4.92988010e+09, 4.93085505e+09],
       [4.88383500e+09, 4.88481995e+09, 4.88580490e+09, ...,
        4.97937515e+09, 4.98036010e+09, 4.98134505e+09],
       [4.93333500e+09, 4.93432995e+09, 4.93532490e+09, ...,
        5.02984515e+09, 5.03084010e+09, 5.03183505e+09]])

In [61]:
%time np.dot(A, B)

CPU times: user 2.71 ms, sys: 1.77 ms, total: 4.48 ms
Wall time: 2.38 ms


array([[  32835000,   32839950,   32844900, ...,   33315150,   33320100,
          33325050],
       [  82335000,   82349950,   82364900, ...,   83785150,   83800100,
          83815050],
       [ 131835000,  131859950,  131884900, ...,  134255150,  134280100,
         134305050],
       ...,
       [4834335000, 4835309950, 4836284900, ..., 4928905150, 4929880100,
        4930855050],
       [4883835000, 4884819950, 4885804900, ..., 4979375150, 4980360100,
        4981345050],
       [4933335000, 4934329950, 4935324900, ..., 5029845150, 5030840100,
        5031835050]])

## Exercise

Simulate a random walk using both `numpy` and built-in structures. Compare both functions.

* Looking into the documentation of `matplotlib` write down a function enabling you to represent a random walk. 