![MLTrain logo](https://mltrain.cc/wp-content/uploads/2017/11/mltrain_logo-4.png "MLTrain logo")

In [2]:
# !wget -q -O changeNBLayout.py https://raw.githubusercontent.com/cmalliopoulos/PfBDAaML/master/changeNBLayout.py
# % run changeNBLayout.py
%run ../PfBDAaML/changeNBLayout.py

-------------------------
# Introduction #

The `numpy` package is the basis for all numeric computations in Python because it provides __high-performance__ vector and matrix computations and a compact API.  
`numpy` methods and functions use C and Fortran underneeth so when only numpy calls are made in sequence the performance is comparabe to direct C|Fortran implementations.  
A sequence of _numpy-only_ calls is generally known as '_vectorized computations_'.

__NB:__ numpy processes in-memory data so there're no provisions in the library per-se for efficient read|write to disks. It is the programmer's responsibility to employ mechanisms for either compacting the memory representations of data or handle disk paging.  
The practice for hadling large arrays is to use scipy's _sparse array__ libraries and numpy's _memory-mapped files_, both outside the scope of this lecture.

In [10]:
import numpy as np
import random as ran
from os import linesep as endl

# Our library version 
print 'Current Numpy version is:', np.__version__

 Current Numpy version is: 1.13.3


--------------------------
# Constructors #

### The ndarray constructor ###

In [17]:
arr = np.ndarray(shape = [2, 4], dtype = int, order = 'C')
print '2x4 integer ndarray with arbitrary memory contents:', endl, arr

2x4 integer ndarray with arbitrary memory contents: 
[[              0               0 140678978722384 140678979376864]
 [140678978749216 140678979377984 140679888726176               0]]


### The numpy.array factory function ###

In [23]:
# A numpy.ndarray object with 100 samples from the Gamma distribution
s = ran.seed(101)
data = [ran.gammavariate(.6, .8) for _ in xrange(10)]
print '100 elements from the Gamma distribution:', endl, np.array(data, dtype = float)

100 elements from the Gamma distribution: 
[ 0.45144449  0.31371191  0.08575201  0.14052511  0.01337472  0.01973911
  0.20938272  0.17227148  0.24616949  0.27341855]


### Specific ndarray constructors ###
There're special functions for constructing arrays of certain content:

In [37]:
# range ctor:
print 'Integer range ctor with arguments <start>, <stop>, <step>:', endl, np.arange(0, 10, 1)

# linspace
print endl, '10 equispaced elements in [-1, 1)', endl, np.linspace(-1, 1, 10, endpoint = False)

# logspace
print endl, '10 logarithmic elements in [-1, 1]', endl, np.logspace(-1, 1, 10, endpoint = True, base = np.e)

# zeros
print endl, '2 by 4 zeros', endl, np.zeros(shape = [2, 4])

# ones
print endl, '2 by 4 units', endl, np.ones(shape = [2, 4])

# full
print endl, '2 by 4 of value randomly selected from the normal distribution:' 
print np.full([2, 4], fill_value = ran.normalvariate(0., .1), dtype = float)


Integer range ctor with arguments <start>, <stop>, <step>: 
[0 1 2 3 4 5 6 7 8 9]

10 equispaced elements in [-1, 1) 
[-1.  -0.8 -0.6 -0.4 -0.2  0.   0.2  0.4  0.6  0.8]

10 logarithmic elements in [-1, 1] 
[ 0.36787944  0.45942582  0.57375342  0.71653131  0.89483932  1.11751907
  1.39561243  1.742909    2.17662993  2.71828183]

2 by 4 zeros 
[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]

2 by 4 units 
[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]

2 by 4 of value randomly selected from the normal distribution:
[[-0.03589073 -0.03589073 -0.03589073 -0.03589073]
 [-0.03589073 -0.03589073 -0.03589073 -0.03589073]]


### meshgrid and mgrid ###

These functions generate grid points in $\mathbb R^n$ (n is the dimension of the grid) from a list of coordinate vectors.  
So if x = [1, 2] and y = [0, 1] each of `meshgrid` and `mgrid` will generate 2 2D matrices of the form:  

``` Python
[[1, 2],  [[0, 0],
 [1, 2]],  [1, 1]]
```

This is particularely useful for evaluating nultivariate functions using vector operations as we shall see below

`meshgrid`

In [67]:
x = np.linspace(-1, 1, 4, endpoint = False)
y = np.logspace(-1, 1, 4, base = 10, endpoint = True)

X, Y = np.meshgrid(x, y, indexing = 'xy')

# np.set_printoptions(linewidth = 132, precision = 3)
print 'X = ', endl, X
print endl, 'Y =', endl, Y

print endl, 'Coordinate grid:', endl, [[(round(_x, 3), round(_y, 3)) for _x in x] for _y in y]

X =  
[[-1.  -0.5  0.   0.5]
 [-1.  -0.5  0.   0.5]
 [-1.  -0.5  0.   0.5]
 [-1.  -0.5  0.   0.5]]

Y = 
[[  0.1     0.1     0.1     0.1  ]
 [  0.464   0.464   0.464   0.464]
 [  2.154   2.154   2.154   2.154]
 [ 10.     10.     10.     10.   ]]

Coordinate grid: 
[[(-1.0, 0.1), (-0.5, 0.1), (0.0, 0.1), (0.5, 0.1)], [(-1.0, 0.464), (-0.5, 0.464), (0.0, 0.464), (0.5, 0.464)], [(-1.0, 2.154), (-0.5, 2.154), (0.0, 2.154), (0.5, 2.154)], [(-1.0, 10.0), (-0.5, 10.0), (0.0, 10.0), (0.5, 10.0)]]


Evaluating $\sin ({x^2} + {y^2})$ on the generated (X, Y) grid:

In [68]:
np.sin(X**2 + Y**2)

array([[ 0.847,  0.257,  0.01 ,  0.257],
       [ 0.938,  0.449,  0.214,  0.449],
       [-0.598, -0.984, -0.997, -0.984],
       [ 0.452, -0.277, -0.506, -0.277]])

`mgrid` is syntactic sugar for `start:end:step` meshgrids:

In [72]:
np.mgrid[4:0:-1, 0:8:2]

array([[[4, 4, 4, 4],
        [3, 3, 3, 3],
        [2, 2, 2, 2],
        [1, 1, 1, 1]],

       [[0, 2, 4, 6],
        [0, 2, 4, 6],
        [0, 2, 4, 6],
        [0, 2, 4, 6]]])

### Random array generators ###
Numpy has a rich set functions for sampling from univariate and multivariate distributions.  
Ref: https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.random.html

In [94]:
print 'A 4 by 4 array with samples drawn from the uniform distribution in [0., 1.)]:'
print np.random.rand(4, 4)

print endl, '4 floats drawn from U[0., 1.)'
print np.random.sample(size = 4)

# Create a vector of probabilities using 'toProb', then use np.random.choice
# to select with replacement (bootstrap) from {0,...,9} 4 elements:
toProb = lambda _: _/sum(_)
print endl, 'Bootstrap from data using probability distribution'
print np.random.choice(np.arange(10), size = 4, replace = True, p = toProb(np.random.sample(10)))

# Random hex numbers drawn uniformly in the range {0,...255}
print endl, 'Random bytes', endl, repr(np.random.bytes(5))

A 4 by 4 array with samples drawn from the uniform distribution in [0., 1.)]:
[[ 0.182  0.959  0.373  0.843]
 [ 0.309  0.381  0.261  0.617]
 [ 0.138  0.849  0.799  0.138]
 [ 0.085  0.123  0.992  0.541]]

4 floats drawn from U[0., 1.)
[ 0.007  0.12   0.085  0.018]

Bootstrap from data using probability distribution
[9 5 9 5]

Random bytes 
'(s\x12uq'


__Distributions:__ The list of supported distributions is large. Go to [the reference page](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.random.html) to see more

In [99]:
print 'Floats drawn from N(0, 1):'
print np.random.randn(5, 5)

print endl, 'Chi-square distribution with 5 degrees of freedom:'
print np.random.chisquare(5, 5)

print endl, '10 samples from the exponential distribution'
print np.random.exponential(size = 10)

Floats drawn from N(0, 1):
[[ 0.81  -0.505  0.682  0.834 -0.678]
 [ 1.307 -0.022 -0.116  0.915 -0.383]
 [ 0.756 -0.229  0.322  0.74  -1.479]
 [ 0.782  0.509  1.471 -0.324 -1.302]
 [-0.472  0.462 -0.469 -0.729  0.408]]

Chi-square distribution with 5 degrees of freedom:
[ 7.011  2.469  3.461  3.789  4.762]

10 samples from the exponential distribution
[ 0.36   0.195  0.118  0.168  0.033  0.224  0.688  0.728  2.78   0.462]


---------------------------------
# ndarray attributes #

In [102]:
from string import ascii_letters
sarr = np.array(list(ascii_letters), dtype = str)

print endl, 'An array of characters:', endl, sarr
print sarr.dtype, sarr.shape, sarr.size, sarr.ndim, sarr.nbytes, sarr.itemsize


An array of characters: 
['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k' 'l' 'm' 'n' 'o' 'p' 'q' 'r' 's' 't' 'u' 'v' 'w' 'x' 'y' 'z' 'A' 'B' 'C' 'D' 'E' 'F' 'G'
 'H' 'I' 'J' 'K' 'L' 'M' 'N' 'O' 'P' 'Q' 'R' 'S' 'T' 'U' 'V' 'W' 'X' 'Y' 'Z']
|S1 (52,) 52 1 52 1


---------------------------------------------------
# Indexing and selection #

In [116]:
arr = np.arange(10)

# Element accessor:
print arr[5]

# Range and 'shard' accessors (slices)
print arr[2:5]
print arr[::-1]
print arr[:-1:2]
print arr[-1::-2]

5
[2 3 4]
[9 8 7 6 5 4 3 2 1 0]
[0 2 4 6 8]
[9 7 5 3 1]


### Fancy indexing ###

In [117]:
arr2 = np.linspace(-1, 1, 12).reshape(4, -1)
print endl, arr2[[1, 3]]
print arr2[:, [0, 1]]


[[-0.455 -0.273 -0.091]
 [ 0.636  0.818  1.   ]]
[[-1.    -0.818]
 [-0.455 -0.273]
 [ 0.091  0.273]
 [ 0.636  0.818]]


In [118]:
# Counter-intuitive: the following will NOT return a 2x2 array:
arr2[[0, 1], [0, 1]]

array([-1.   , -0.273])

In [112]:
# Assign 3 to first row's elements. This is an application of 'broadcasting'
arr2[0] = 3
print arr2

[[ 3.     3.     3.   ]
 [-0.455 -0.273 -0.091]
 [ 0.091  0.273  0.455]
 [ 0.636  0.818  1.   ]]


In [115]:
# A fancier example of fancy indexing:
arr3 = np.linspace(-1, 1, 10)
arr3Ix = np.random.choice(10, size = [4, 4])
arr3[arr3Ix]

array([[-0.778, -0.111,  0.556,  0.556],
       [-0.556, -1.   ,  1.   ,  0.333],
       [-0.111, -1.   ,  0.556,  0.778],
       [-0.556, -1.   , -0.111,  1.   ]])

### Boolean indexing ###

In the following examples, we use hypothetical grades of MLTrain instructors in 5 courses.  
`names` vector contains instructor names and `grades` is a matrix of course grades. Each row of `grades` refers to an instructor in `names`

In [160]:
grades = np.random.uniform(0, 1, size = [5, 4])
names = np.array(['Alex', 'Chris', 'Nick', 'Chris', 'Chris'])

print 'A mask for Chris:', endl, names == 'Chris'

print endl, 'Select the grades of Chris:'
print grades[names == 'Chris']

print endl, 'and the grades of the others:'
print grades[names != 'Chris']

print endl, 'the grades of Chris and Alex:'
print grades[np.isin(names, ['Alex', 'Chris'])]

print endl, 'Select the grades of Chris that are > .4 (nasty):'
print grades[(grades > .4) & np.repeat(names == 'Chris', 4).reshape(-1, 4)]

A mask for Chris: 
[False  True False  True  True]

Select the grades of Chris:
[[ 0.718  0.758  0.465  0.291]
 [ 0.987  0.332  0.504  0.41 ]
 [ 0.428  0.383  0.262  0.135]]

and the grades of the others:
[[ 0.078  0.672  0.534  0.851]
 [ 0.348  0.011  0.508  0.592]]

the grades of Chris and Alex:
[[ 0.078  0.672  0.534  0.851]
 [ 0.718  0.758  0.465  0.291]
 [ 0.987  0.332  0.504  0.41 ]
 [ 0.428  0.383  0.262  0.135]]

Select the grades of Chris that are > .4 (nasty):
[ 0.718  0.758  0.465  0.987  0.504  0.41   0.428]


### where ###

In [162]:
# We want to grade qualitatively and therefore replace grades > .5 with 'HIGH'
qGrades = np.where(grades <= .5, 'LOW', 'HIGH')

print 'Chris'' grades:'
print qGrades[names == 'Chris']

Chris grades:
[['HIGH' 'HIGH' 'LOW' 'LOW']
 ['HIGH' 'LOW' 'HIGH' 'LOW']
 ['LOW' 'LOW' 'LOW' 'LOW']]


### choose ###
From N arrays a1, ... aN of the same shape and an index-array ix of equal shape and values in {0, ..., N-1}, we generate an output array with values chosen from the a1, ..., aN according to the values in ix.  
  
Let's emulate a random experiment using `choose`:  
I toss a coin 10 times and if I get heads I select from the normal distribution, otherwise from the exponential.  

For those with a probability background, I have created a __mixture model__.  

<span style = "color: purple"> Can you describe the underlying classification problem? </span>

In [166]:
# A generalization of 'where' used for generating mixtures:
choices = np.random.choice([0, 1], 10)
arr1 = np.random.standard_normal(10)
arr2 = np.random.standard_exponential(10)

np.choose(choices, arr1, arr2)

array([-0.981, -0.981, -0.981, -0.981,  0.841,  0.841,  0.841,  0.841,  0.841,  0.841])

--------------
# Matrix operations #

### Element-wise ###

In [175]:
arr1 = np.tile(np.linspace(1, 4, 4, dtype = np.float32), 4).reshape(4, -1)
arr2 = np.random.choice(np.random.randn(4), [4, 4])

print arr1
print arr2

print endl, 'Element-wise binary operations:'
print arr1 + arr2
print arr1 * arr2
print arr2 / (arr1)

print endl, 'Element-wise function application (ufunc):'
print np.log(arr1)

[[ 1.  2.  3.  4.]
 [ 1.  2.  3.  4.]
 [ 1.  2.  3.  4.]
 [ 1.  2.  3.  4.]]
[[ 3.004 -2.209 -2.209 -2.209]
 [-0.319  0.259  3.004  3.004]
 [ 3.004 -0.319 -0.319 -2.209]
 [-2.209 -2.209 -2.209 -2.209]]

Element-wise binary operations:
[[ 4.004 -0.209  0.791  1.791]
 [ 0.681  2.259  6.004  7.004]
 [ 4.004  1.681  2.681  1.791]
 [-1.209 -0.209  0.791  1.791]]
[[  3.004  -4.419  -6.628  -8.838]
 [ -0.319   0.519   9.013  12.017]
 [  3.004  -0.637  -0.956  -8.838]
 [ -2.209  -4.419  -6.628  -8.838]]
[[ 3.004 -1.105 -0.736 -0.552]
 [-0.319  0.13   1.001  0.751]
 [ 3.004 -0.159 -0.106 -0.552]
 [-2.209 -1.105 -0.736 -0.552]]

Element-wise function application (ufunc):
[[ 0.     0.693  1.099  1.386]
 [ 0.     0.693  1.099  1.386]
 [ 0.     0.693  1.099  1.386]
 [ 0.     0.693  1.099  1.386]]


### Matrix multiplication ###

In [178]:
print 'Matrix multiplication:'
print np.dot(arr1, arr2)

print endl, 'Matrix-vector multiplications:'
print np.dot(arr1, arr2[:, 0])
print np.dot(arr2[0], arr1)

Matrix multiplication:
[[  2.542 -11.484  -5.994 -11.667]
 [  2.542 -11.484  -5.994 -11.667]
 [  2.542 -11.484  -5.994 -11.667]
 [  2.542 -11.484  -5.994 -11.667]]

Matrix-vector multiplications:
[ 2.542  2.542  2.542  2.542]
[ -3.624  -7.248 -10.872 -14.496]


### BLAS ###
See https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.linalg.html

In [187]:
import numpy.linalg as la

U, sigma, V = la.svd(arr2)
print 'Singular values of arr2:'
print sigma

eigValues, eigVectors = la.eig(arr2)
print endl, 'The eigenvalues of arr2 are:'
print eigValues

Singular values of arr2:
[ 7.127  4.499  1.87   0.997]

The eigenvalues of arr2 are:
[ 2.340+1.501j  2.340-1.501j -1.973+1.961j -1.973-1.961j]


-----------------------------------
# Retructuring arrays #

### reshape | resize ###

In [194]:
arr = np.arange(10)
print arr

print endl, 'Make a 2x5 array:'
print arr.reshape(2, 5)

print endl, 'This is the same'
print arr.reshape(2, -1)

print endl, 'Formally, "newaxis" seems irrelevant but it reshapes as well:'
print arr.reshape(2, -1)[:, np.newaxis, :]

print endl, 'nparray.resize modifies the array in place and adds zeros:'
x = arr.copy()
x.resize(5, 3)
print x

print endl, 'numpy.resize returns a new array and adds starting over'
print np.resize(arr, [5, 3])

[0 1 2 3 4 5 6 7 8 9]

Make a 2x5 array:
[[0 1 2 3 4]
 [5 6 7 8 9]]

This is the same
[[0 1 2 3 4]
 [5 6 7 8 9]]

Formally, "newaxis" seems irrelevant but it reshapes as well:
[[[0 1 2 3 4]]

 [[5 6 7 8 9]]]

nparray.resize modifies the array in place and adds zeros:
[[0 1 2]
 [3 4 5]
 [6 7 8]
 [9 0 0]
 [0 0 0]]

numpy.resize returns a new array and adds starting over
[[0 1 2]
 [3 4 5]
 [6 7 8]
 [9 0 1]
 [2 3 4]]


### repeating and tiling ###

In [None]:
arr = np.arange(4)
print 'Repeat each element twice:'
print np.repeat(arr, 2)

print endl, 'We can have different repeats per element:'
print np.repeat(arr, [2, 1, 0, 3])

print endl, 'We can repeat along an axis:'
print np.repeat(arr.reshape(2, 2), 2, axis = 1)

print endl, '"Tile" repeats the array as block, broadcasting to new axes if necessary:'
print np.tile(arr, [3, 2, 2])

### stacking ###

`stack` concatenates a list of equi-sized arrays along a new dimension.  
For vectors it is intuitive but for arrays of more dimensions the result is more involved.  
To see what actually happens, before stacking add an extra dimension to one of the arrays as below:

In [226]:
print 'Stack along the 1st dim (rows):'
print arr[np.newaxis, :]
print np.stack([arr, -arr], axis = 0)

print endl, 'Stack along columns'
print arr[:, np.newaxis]
print np.stack([arr, -arr], axis = 1)

print endl, 'Stack 2 2D arrays one after the other:'
arr2 = arr.reshape(2, 2)
print np.stack([arr2, -arr2], axis = 0)

print endl, 'Now stack by transforming each element to a 1x2 array:'
print np.stack([arr2, -arr2], axis = 2)

Stack along the 1st dim (rows):
[[0 1 2 3]]
[[ 0  1  2  3]
 [ 0 -1 -2 -3]]

Stack along columns
[[0]
 [1]
 [2]
 [3]]
[[ 0  0]
 [ 1 -1]
 [ 2 -2]
 [ 3 -3]]

Stack 2 2D arrays one after the other:
[[[ 0  1]
  [ 2  3]]

 [[ 0 -1]
  [-2 -3]]]

Now stack by transforming each element to a 1x2 array:
[[[ 0  0]
  [ 1 -1]]

 [[ 2 -2]
  [ 3 -3]]]


### Concatenate ###
To "stack" without adding a new dimension, use concatenate.  
Therefore, you can only concatenate vectors on axis = 0 and 2D arrays on axis = 0 or 1

In [237]:
arr = np.arange(4).reshape(2, 2)
print np.concatenate([arr, -arr], axis = 1)

[[ 0  1  0 -1]
 [ 2  3 -2 -3]]


### <span style = "color: purple">Exercise</span> ###
Concatenate 2 vectors columnwise