### Ch 02: Vectors, matrices and multidimensional arrays

[NumPy manual (latest version, ReadTheDocs)](https://numpy.readthedocs.io/en/latest/index.html)

In [119]:
import numpy as np
import seaborn as sn
import pandas as pd

### NumPy arrays
* __NOT THE SAME AS PYTHON LISTS__.
* All array elements have same data type; arrays are fixed size. 
* (Need to edit the array? create a new one.)
* Attributes:
    - _shape_: tuple; contains # of elements for each axis of the array
    - _size_: total # of elements
    - _ndim_: number of dimensions (axes)
    - _nbytes_: number of bytes used for storage
    - _dtype_: datatype

In [2]:
data = np.array([[1, 2], [3, 4], [5, 6]])
type(data)

numpy.ndarray

In [3]:
data.ndim, data.shape, data.size, data.dtype, data.nbytes

(2, (3, 2), 6, dtype('int64'), 48)

In [4]:
data

array([[1, 2],
       [3, 4],
       [5, 6]])

### Data types:
* int (integer: 8b, 16b, 32b, 64b)
* uint (unsigned integer: 8b, 16b, 32b, 64b)
* bool (boolean)
* float (floating-point: 16b, 32b, 64b, 128b)
* complex (complex floating-point: 64b, 128b, 256b)

In [5]:
# integer
data = np.array([1, 2, 3], dtype=np.int); print(data.dtype, data)
data = np.array([1, 2, 3], dtype=np.int32); print(data.dtype, data)
data = np.array([1, 2, 3], dtype=np.int16); print(data.dtype, data)
data = np.array([1, 2, 3], dtype=np.int8); print(data.dtype, data)

int64 [1 2 3]
int32 [1 2 3]
int16 [1 2 3]
int8 [1 2 3]


In [6]:
# integer (unsigned)
data = np.array([1, 2, 3], dtype=np.uint); print(data.dtype, data)
data = np.array([1, 2, 3], dtype=np.uint32); print(data.dtype, data)
data = np.array([1, 2, 3], dtype=np.uint16); print(data.dtype, data)
data = np.array([1, 2, 3], dtype=np.uint8); print(data.dtype, data)

uint64 [1 2 3]
uint32 [1 2 3]
uint16 [1 2 3]
uint8 [1 2 3]


In [7]:
# boolean
data = np.array([True,False,1,0], dtype=bool); print(data.dtype, data)

bool [ True False  True False]


In [8]:
# floating point
data = np.array([1., 2., 3.], dtype=np.float); print(data.dtype, data)
data = np.array([1., 2., 3.], dtype=np.float128); print(data.dtype, data)
data = np.array([1., 2., 3.], dtype=np.float32); print(data.dtype, data)
data = np.array([1., 2., 3.], dtype=np.float16); print(data.dtype, data)

float64 [1. 2. 3.]
float128 [1. 2. 3.]
float32 [1. 2. 3.]
float16 [1. 2. 3.]


In [9]:
# complex floating point
data = np.array([1., 2., 3.], dtype=np.complex)
print(data.dtype, data)
data = np.array([1., 2., 3.], dtype=np.complex64)
print(data.dtype, data)
data = np.array([1., 2., 3.], dtype=np.complex256)
print(data.dtype, data)

complex128 [1.+0.j 2.+0.j 3.+0.j]
complex64 [1.+0.j 2.+0.j 3.+0.j]
complex256 [1.+0.j 2.+0.j 3.+0.j]


### Typecasting
* Once created, dtype cannot be changed. Create a copy by __typecasting__ (_astype_).

In [10]:
data.astype(np.int)

  data.astype(np.int)


array([1, 2, 3])

### Promoting
* Data types can get "promoted" to support math ops:

In [11]:
d1 = np.array([1, 2, 3], dtype=float)
d2 = np.array([1, 2, 3], dtype=complex)
(d1+d2).dtype

dtype('complex128')

* Some cases may require creation of arrays set to appropriate data types. The default datatype is 'float'.

In [12]:
# NumPy sqrt returns different datatypes depending on argument:
print(np.sqrt(np.array([-1, 0, 1]               )))
print(np.sqrt(np.array([-1, 0, 1], dtype=complex)))

[nan  0.  1.]
[0.+1.j 0.+0.j 1.+0.j]


  print(np.sqrt(np.array([-1, 0, 1]               )))


### Real and imaginary parts
* All numpy arrays (__not just complex vals__) have real & imaginary attributes.

In [13]:
data = np.array([1, 2, 3], dtype=complex)
print(data,"\n",data.real,"\n",data.imag)

[1.+0.j 2.+0.j 3.+0.j] 
 [1. 2. 3.] 
 [0. 0. 0.]


### Array Data Order in Memory

* Multidimensional arrays are stored as contiguous data in memory. There is a freedom of choice in how to arrange the array elements in this memory segment. 

- Row-major and column-major ordering are special cases of strategies for mapping an element's index using `ndarray.strides`.

- Operations that require changing `strides` result in new ndarray objects that refer to the same data as the original array. Such arrays are called `views`. For efficiency, NumPy strives to create views rather than copies when applying operations on arrays.

* Two options:
    - __Row-major__ (row-wise storage; C std, Numpy default.)
    - __Column-major__ (column-wise storage; Fortran std.)


* To specify, use `order='C'` or `order='F'`

### Creating Arrays
![array-gen-funcs](pics/array-gen-funcs.png)
![array-gen-funcs2](pics/array-gen-funcs2.png)

### Arrays created from lists and other array-like objects

In [14]:
data = np.array([1, 2, 3, 4]) # 1D array
data.ndim, data.shape

(1, (4,))

In [15]:
data = np.array([[1, 2], [3, 4]]) # 2D array
data.ndim, data.shape

(2, (2, 2))

### Arrays filled with constant values

In [16]:
np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [17]:
data = np.ones(4); data

array([1., 1., 1., 1.])

In [18]:
5.4*data

array([5.4, 5.4, 5.4, 5.4])

* __full()__: create array filled with ones, then muliply array with desired fill value.
* __fill()__: similar
* __empty()__: unitialized data

In [19]:
np.full(10, 5.4)

array([5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4])

In [20]:
x1 = np.empty(5); x1

array([4.65378523e-310, 0.00000000e+000, 1.58101007e-322, 1.50008929e+248,
       2.37151510e-322])

In [21]:
x1.fill(3.0); x1

array([3., 3., 3., 3., 3.])

### Arrays filled with sequences
- __arange(start,stop,increment)__
- __linspace(start,stop,#points)__
- __logspace(start,stop,#points)__

In [22]:
np.arange(0.0, 10, 1)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [23]:
np.linspace(0, 10, 20)

array([ 0.        ,  0.52631579,  1.05263158,  1.57894737,  2.10526316,
        2.63157895,  3.15789474,  3.68421053,  4.21052632,  4.73684211,
        5.26315789,  5.78947368,  6.31578947,  6.84210526,  7.36842105,
        7.89473684,  8.42105263,  8.94736842,  9.47368421, 10.        ])

In [24]:
np.logspace(0, 2, 4)  # 4 data points between 10**0=1 to 10**2=100

array([  1.        ,   4.64158883,  21.5443469 , 100.        ])

### Mesh-grid arrays
* Given two 1D coordinate arrays, generate 2D coordinate array.
* Often used when plotting function over two variables (ex: contour plots).

In [25]:
x,y = np.array([-1, 0, 1]), np.array([-2, 0, 2])

X, Y = np.meshgrid(x, y); X

array([[-1,  0,  1],
       [-1,  0,  1],
       [-1,  0,  1]])

In [26]:
Y

array([[-2, -2, -2],
       [ 0,  0,  0],
       [ 2,  2,  2]])

In [27]:
(X+Y)**2

array([[9, 4, 1],
       [1, 0, 1],
       [1, 4, 9]])

* __np.mgrid__ & __np.ogrid__ can also generate coordinate arrays with slightly different syntaxes.

In [28]:
np.mgrid?

[0;31mType:[0m        MGridClass
[0;31mString form:[0m <numpy.lib.index_tricks.MGridClass object at 0x7f587e53c4c0>
[0;31mFile:[0m        ~/.local/lib/python3.8/site-packages/numpy/lib/index_tricks.py
[0;31mDocstring:[0m  
`nd_grid` instance which returns a dense multi-dimensional "meshgrid".

An instance of `numpy.lib.index_tricks.nd_grid` which returns an dense
(or fleshed out) mesh-grid when indexed, so that each returned argument
has the same shape.  The dimensions and number of the output arrays are
equal to the number of indexing dimensions.  If the step length is not a
complex number, then the stop is not inclusive.

However, if the step length is a **complex number** (e.g. 5j), then
the integer part of its magnitude is interpreted as specifying the
number of points to create between the start and stop values, where
the stop value **is inclusive**.

Returns
----------
mesh-grid `ndarrays` all of the same dimensions

See Also
--------
numpy.lib.index_tricks.nd_grid : clas

In [29]:
np.ogrid?

[0;31mType:[0m        OGridClass
[0;31mString form:[0m <numpy.lib.index_tricks.OGridClass object at 0x7f587e53c550>
[0;31mFile:[0m        ~/.local/lib/python3.8/site-packages/numpy/lib/index_tricks.py
[0;31mDocstring:[0m  
`nd_grid` instance which returns an open multi-dimensional "meshgrid".

An instance of `numpy.lib.index_tricks.nd_grid` which returns an open
(i.e. not fleshed out) mesh-grid when indexed, so that only one dimension
of each returned array is greater than 1.  The dimension and number of the
output arrays are equal to the number of indexing dimensions.  If the step
length is not a complex number, then the stop is not inclusive.

However, if the step length is a **complex number** (e.g. 5j), then
the integer part of its magnitude is interpreted as specifying the
number of points to create between the start and stop values, where
the stop value **is inclusive**.

Returns
-------
mesh-grid
    `ndarrays` with only one dimension not equal to 1

See Also
--------
np

### Creating arrays with properties of other arrays

* Typical use case: a function that takes arrays of unspecified type & size as arguments & requires working arrays of the same type & size.

* __like__(), __ones_like()__, __zeros_like()__, __full_like()__, __empty_like()__.

In [30]:
np.ones_like([1,2,3,4])

array([1, 1, 1, 1])

In [31]:
np.zeros_like([1,2,3,4])

array([0, 0, 0, 0])

In [32]:
np.full_like([1,2,3,4],5)

array([5, 5, 5, 5])

In [33]:
np.empty_like([1,2,3,4])

array([94193654629790,              0,             64,              0])

### Creating matrix arrays

* __np.identity()__: creates square matrix with ones on diagonal, zero elsewhere.
* __np.eye()__: ones on diagonal, optionally offset
* __diag()__: arbitrary 1D array on the diagonal of a matrix

In [34]:
np.identity(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [35]:
np.eye(4, k=1)

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

In [36]:
np.eye(4, k=-1)

array([[0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.]])

In [37]:
np.diag(np.arange(0, 20, 5))

array([[ 0,  0,  0,  0],
       [ 0,  5,  0,  0],
       [ 0,  0, 10,  0],
       [ 0,  0,  0, 15]])

## Index and slicing

### One-dimensional arrays
![array-slice-funcs](pics/array-slice-funcs.png)

In [38]:
a = np.arange(0, 11); a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [39]:
a[0], a[-1], a[4] # first, last, 5th elements

(0, 10, 4)

In [41]:
a[1:-1] # range (2nd..2nd to last)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [42]:
a[1:-1:2] # range (1st..last, by 2)

array([1, 3, 5, 7, 9])

In [40]:
a[:5], a[-5:] # first five elements, last five elements

(array([0, 1, 2, 3, 4]), array([ 6,  7,  8,  9, 10]))

In [43]:
a[::-1] # every value, reverse order

array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1,  0])

In [44]:
a[::-3] # every 3rd value, reverse order

array([10,  7,  4,  1])

### Multidimensional arrays

In [47]:
f = lambda m,n: n+10*m
A = np.fromfunction(f, (6, 6), dtype=int); A

array([[ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

In [48]:
A[:,0], A[0,:] # 1st col, 1st row

(array([ 0, 10, 20, 30, 40, 50]), array([0, 1, 2, 3, 4, 5]))

In [50]:
A[:3,:3], A[3:,:3] # upper left 3x3, lower left 3x3

(array([[ 0,  1,  2],
        [10, 11, 12],
        [20, 21, 22]]),
 array([[30, 31, 32],
        [40, 41, 42],
        [50, 51, 52]]))

In [51]:
A[::2, ::2] # every 2nd element

array([[ 0,  2,  4],
       [20, 22, 24],
       [40, 42, 44]])

In [52]:
A[1::2, 1::3]  # every (2nd,3rd) element starting from 1,1

array([[11, 14],
       [31, 34],
       [51, 54]])

### Views
* Subarray extractions using slice ops are alternative *views* of same underlying data. (They refer to same data, but using different "strides".)

- __np.copy()__, or __np.array(,copy=True)__.

In [53]:
B = A[1:5, 1:5]; B

array([[11, 12, 13, 14],
       [21, 22, 23, 24],
       [31, 32, 33, 34],
       [41, 42, 43, 44]])

In [54]:
# modifying B (created from A) also modifies A.
B[:,:] = 0; A

array([[ 0,  1,  2,  3,  4,  5],
       [10,  0,  0,  0,  0, 15],
       [20,  0,  0,  0,  0, 25],
       [30,  0,  0,  0,  0, 35],
       [40,  0,  0,  0,  0, 45],
       [50, 51, 52, 53, 54, 55]])

In [60]:
# explicitly copy B to C (B not affected.)
C = B.copy(); C

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [61]:
C[:,:] = 1; C,B # C is a *copy* of the view B.

(array([[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]),
 array([[0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]]))

### Fancy indexing and Boolean-valued indexing
* Arrays can be indexed using another array, a list, or sequence of integers.

![array-methods](pics/numpy-array-index-methods.png)

In [62]:
A = np.linspace(0, 1, 11); A

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

In [63]:
A[np.array([0, 2, 4])]

array([0. , 0.2, 0.4])

In [64]:
A[[0, 2, 4]]

array([0. , 0.2, 0.4])

In [68]:
# Boolean-based indexing: great for filtering!
A>0.8, A[A>0.8]

(array([False, False, False, False, False, False, False, False, False,
         True,  True]),
 array([0.9, 1. ]))

- Arrays from fancy|boolean indexing are *new, independent structures* - not just views of existing data.

In [72]:
A = np.arange(10,20); A

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

In [73]:
indices = [2, 4, 6]; B = A[indices]; B

array([12, 14, 16])

In [74]:
B[0] = -1; B,A # this does not affect A

(array([-1, 14, 16]), array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]))

In [75]:
A[indices] = -1; A

array([10, 11, -1, 13, -1, 15, -1, 17, 18, 19])

In [76]:
A = np.arange(10); A

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [77]:
B = A[A > 5]; B

array([6, 7, 8, 9])

In [78]:
B[0] = -1; B,A # this does not affect A

(array([-1,  7,  8,  9]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))

In [79]:
A[A > 5] = -1; A

array([ 0,  1,  2,  3,  4,  5, -1, -1, -1, -1])

### Reshaping and resizing ops
![reshape](pics/reshape-ops.png)
![indexing viz](pics/indexing-viz.png)
* Reshaping doesn't modify underlying data, only changes *stride* attribute

In [80]:
data = np.array([[1, 2], [3, 4]])
np.reshape(data, (1, 4))

array([[1, 2, 3, 4]])

In [81]:
data.reshape(4)

array([1, 2, 3, 4])

* __np.ravel()__ = special case of reshape. It collapses all array dimensions & returns a flattened 1D array with length = total number of original array elements. 
* __flatten()__ does the same thing, but returns a copy instead of a view.

In [82]:
data, data.flatten(), data.flatten().shape

(array([[1, 2],
        [3, 4]]),
 array([1, 2, 3, 4]),
 (4,))

In [83]:
data, data.ravel(), data.ravel().shape

(array([[1, 2],
        [3, 4]]),
 array([1, 2, 3, 4]),
 (4,))

* __np.newaxis()__ = add axis to existing array.

In [84]:
data = np.arange(0, 5); data

array([0, 1, 2, 3, 4])

In [87]:
col = data[:, np.newaxis]; col

array([[0],
       [1],
       [2],
       [3],
       [4]])

In [88]:
row = data[np.newaxis, :]; row

array([[0, 1, 2, 3, 4]])

### Merging arrays into bigger arrays

* __np.hstack()__: horizontal stacking
* __np.vstack()__:  vertical stacking rows into a matrix
* __np.concatenate()__: similar to stack, but accepts an _axis_ keyword

In [89]:
data = np.arange(5); data

array([0, 1, 2, 3, 4])

In [90]:
np.vstack((data, data, data)) # stack vertically along axis 0

array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])

In [91]:
np.hstack((data, data, data)) # stack horizontally along axis 0

array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4])

In [92]:
# to make hstack treat input arrays as columns:
data = data[:, np.newaxis]; data

array([[0],
       [1],
       [2],
       [3],
       [4]])

In [93]:
np.hstack((data, data, data))

array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2],
       [3, 3, 3],
       [4, 4, 4]])

* Number of elements in NumPy arrays can't be changed once created. __append__, __insert__, __delete__ all use a fresh copy of an array.
* __Not a best practice__ due to the overhead of creating & copying the arrays. Start with preallocated arrays whenever possible to avoid resizing.

### Vectorized expressions

* Designed to avoid need for "*for*" loops. __Broadcasting__ = a scalar being distributed and an operation being applied to each element in an array.

![broadcasting](pics/broadcasting.png)

### Arithmetic operations
![arithmetic-ops](pics/arithmetic-ops.png)

In [94]:
# element-wise addition
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])

In [95]:
x+y, x-y

(array([[ 6,  8],
        [10, 12]]),
 array([[-4, -4],
        [-4, -4]]))

In [96]:
x*y, y/x

(array([[ 5, 12],
        [21, 32]]),
 array([[5.        , 3.        ],
        [2.33333333, 2.        ]]))

In [97]:
x*2, 2**x

(array([[2, 4],
        [6, 8]]),
 array([[ 2,  4],
        [ 8, 16]]))

In [98]:
y/2, (y/2).dtype

(array([[2.5, 3. ],
        [3.5, 4. ]]),
 dtype('float64'))

* If a math operation is performed on incompatible (size or shape) arrays, a __ValueError__ is raised.

In [99]:
x = np.array([1, 2, 3, 4]).reshape(2,2); x

array([[1, 2],
       [3, 4]])

In [100]:
z = np.array([1, 2, 3, 4]); z

array([1, 2, 3, 4])

In [103]:
try:
    x / z # incompatible size/shape
except ValueError:
    print("Nope. Can't do that.")

Nope. Can't do that.


* Broadcasting to a correct shape:

In [104]:
z = np.array([[2, 4]]); z.shape

(1, 2)

In [105]:
x/z

array([[0.5, 0.5],
       [1.5, 1. ]])

In [106]:
zz = np.concatenate([z, z], axis=0); zz

array([[2, 4],
       [2, 4]])

In [107]:
x/zz

array([[0.5, 0.5],
       [1.5, 1. ]])

In [108]:
z = np.array([[2], [4]]); z.shape

(2, 1)

In [109]:
x/z

array([[0.5 , 1.  ],
       [0.75, 1.  ]])

In [110]:
zz = np.concatenate([z, z], axis=1); zz

array([[2, 2],
       [4, 4]])

In [111]:
x/zz

array([[0.5 , 1.  ],
       [0.75, 1.  ]])

In [112]:
x = np.array([[1, 3], [2, 4]])
x = x+y; x

array([[ 6,  9],
       [ 9, 12]])

In [113]:
x = np.array([[1, 3], [2, 4]])
x += y; x

array([[ 6,  9],
       [ 9, 12]])

### Elementwise math functions
![element-wise-math](pics/element-wise-math-functs.png)

In [129]:
x = np.linspace(-1, 1, 8); print(x)

[-1.         -0.71428571 -0.42857143 -0.14285714  0.14285714  0.42857143
  0.71428571  1.        ]


In [130]:
y = np.sin(np.pi*x); print(y) # sine function

[-1.22464680e-16 -7.81831482e-01 -9.74927912e-01 -4.33883739e-01
  4.33883739e-01  9.74927912e-01  7.81831482e-01  1.22464680e-16]


In [131]:
print(np.round(y, decimals=4)) # round FP numbers to 4 decimals

[-0.     -0.7818 -0.9749 -0.4339  0.4339  0.9749  0.7818  0.    ]


In [132]:
np.add(np.sin(x)**2, np.cos(x)**2) # sin^2+cos^2

array([1., 1., 1., 1., 1., 1., 1., 1.])

![element-wise-math](pics/element-wise-math.png)

* Sometimes need to define new functions that use NumPy arrays element-by-element. __vectorize()__ may help; it transforms a (usually scalar) function.

In [134]:
def heaviside(x):
    return 1 if x > 0 else 0

heaviside(-1), heaviside(1.5)

(0, 1)

In [136]:
# won't work for Numpy arrays:
try:
    heaviside(np.linspace(-5, 5, 11))
except ValueError:
    print("Nope. Can't do that.")

Nope. Can't do that.


In [138]:
# works, but relatively slow.
# better to use boolean-valued arrays (to be discussed later)
# use as quick-n-dirty check
heaviside = np.vectorize(heaviside) 
heaviside(x)

array([0, 0, 0, 0, 1, 1, 1, 1])

### Aggregate functions

- Accepts array inputs, returns scalar outputs.
- Uses entire array by default - can specify an axis using `axis`.

![aggregate-funcs](pics/aggregate-funcs.png)

In [141]:
data = np.random.normal(size=(8,8)); data.round(3)

array([[-0.095, -0.961,  0.36 ,  0.238, -0.659,  0.141, -0.643,  0.304],
       [-0.347,  0.538, -1.699, -0.412, -1.663,  0.863,  0.119,  1.866],
       [ 0.226, -0.332, -0.204,  0.069,  0.15 , -0.19 ,  0.053,  0.894],
       [ 0.266, -0.5  ,  0.674, -0.125,  0.839, -0.064,  2.473, -1.724],
       [ 1.282, -1.624,  0.167, -1.827,  1.123,  0.594, -0.175,  0.241],
       [-1.073, -0.556, -0.425, -1.52 ,  0.158,  0.343, -0.525,  0.418],
       [ 1.499,  0.343, -0.27 ,  0.125,  0.476,  1.512, -1.242,  0.885],
       [-1.156,  0.768,  0.759,  0.649, -2.435, -1.02 ,  1.087,  0.545]])

In [144]:
np.mean(data), data.mean(), np.std(data), data.std()

(-0.006520115097958065,
 -0.006520115097958065,
 0.9417145011403956,
 0.9417145011403956)

In [146]:
data = np.random.normal(size=(5, 10, 15))

In [151]:
# axis keyword controls which array axis gets aggregated
data.sum(axis=0).shape, data.sum(axis=(0,2)).shape, data.sum()

((10, 15), (10,), -8.446358114124024)

![aggregate funcs illustrated](pics/aggregate-funcs-illustrated.png)

In [152]:
data = np.arange(1,10).reshape(3,3); data

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [153]:
data.sum(), data.sum(axis=0), data.sum(axis=1)

(45, array([12, 15, 18]), array([ 6, 15, 24]))

### Boolean arrays and vectorized conditional expressions
* Enables you to avoid using if statements. Winning!

In [155]:
a = np.array([1, 2, 3, 4])
b = np.array([4, 3, 2, 1]); a<b

array([ True,  True, False, False])

In [156]:
# aggregate booleans
np.all(a<b), np.any(a<b)

(False, True)

In [157]:
if np.all(a < b):
    print("All a's < b's")
elif np.any(a < b):
    print("Some a's < b's")
else:
    print("All b's < a's")

Some a's < b's


In [158]:
# vectorized booleans
x = np.array([-2, -1, 0, 1, 2]); x>0

array([False, False, False,  True,  True])

In [159]:
1*(x>0)

array([0, 0, 0, 1, 1])

In [160]:
x*(x>0)

array([0, 0, 0, 1, 2])

### Conditional / Logical computing
* Example use case: defining piecewise functions.

![conditionals-logicals](pics/conditional-logical-funcs.png)

In [163]:
x = np.linspace(-5, 5, 11); print(x)

[-5. -4. -3. -2. -1.  0.  1.  2.  3.  4.  5.]


In [164]:
# expression is a multiplication of two Boolean arrays,
# so multiplication acts as an elementwise AND operator.
def pulse(x, position, height, width):
    return height * (x >= position) * (x <= (position+width))

In [165]:
pulse(x, position=-2, height=1, width=5)

array([0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0])

In [166]:
pulse(x, position=1, height=1, width=5)

array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

In [167]:
# another implementation using logical_and:
def pulse(x, position, height, width):
    return height * np.logical_and(x >= position, x <= (position + width))

In [168]:
x = np.linspace(-4, 4, 9); x

array([-4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.])

In [174]:
# 1st arg = boolean; 2nd,3rd args = true,false results
print(np.where(x<0, x*10, x/10))

[-40.  -30.  -20.  -10.    0.    0.1   0.2   0.3   0.4]


In [173]:
# select value from list of conditions.
print(np.select([x < -1, x < 2, x >= 2],['bad','meh','good']))

['bad' 'bad' 'bad' 'meh' 'meh' 'meh' 'good' 'good' 'good']


In [175]:
# choose value from list of arrays.
print(np.choose([0, 0, 0, 1, 1, 1, 2, 2, 2], [x**2, x**3, x**4]))

[ 16.   9.   4.  -1.   0.   1.  16.  81. 256.]


In [182]:
# returns tuple of indices
# same result as direct indexing (abs(x)>2, but uses fancy ndxng.)
print(  np.nonzero(abs(x)>2))
print(x[np.nonzero(abs(x)>2)])
print(           x[abs(x)>2])

(array([0, 1, 7, 8]),)
[-4. -3.  3.  4.]
[-4. -3.  3.  4.]


### Set operations
* Manages __unordered collections__ of unique objects.
![set-funcs](pics/set-funcs.png)

In [186]:
a = np.unique([1,2,3,3])
b = np.unique([2,3,4,4,5,6,5])

In [188]:
print(np.in1d(a,b)) # test for existence of a in b (1D)

[False  True  True]


In [189]:
1 in a, 1 in b # testing for single element presence

(True, False)

In [190]:
print(np.all(np.in1d(a,b))) # a = subset of b?

False


In [192]:
print(np.union1d(    a,b)) # presence in either or both arrays
print(np.intersect1d(a,b)) # both arrays

[1 2 3 4 5 6]
[2 3]


In [193]:
print(np.setdiff1d(a, b)) # presence in a, but not in b
print(np.setdiff1d(b, a)) # presence in b, but not in a

[1]
[4 5 6]


### Operations on arrays

Operations that act upon arrays __as a single entity__, and return transformed arrays of the same size.

![array-funcs](pics/array-funcs.png)

In [194]:
data = np.arange(9).reshape(3, 3); print(data)

[[0 1 2]
 [3 4 5]
 [6 7 8]]


In [195]:
print(np.transpose(data))
print(data.T) # transpose also exists as special method "T"

[[0 3 6]
 [1 4 7]
 [2 5 8]]
[[0 3 6]
 [1 4 7]
 [2 5 8]]


In [197]:
print(np.fliplr(data)) # flip left-to-right
print(np.flipud(data)) # flip up-down

[[2 1 0]
 [5 4 3]
 [8 7 6]]
[[6 7 8]
 [3 4 5]
 [0 1 2]]


In [None]:
np.flipud(data) # flip up-to-down

### Matrix and vector operations
![matrix-funcs](pics/matrix-funcs.png)

In [198]:
A = np.arange(1,7).reshape(2,3); print(A)

[[1 2 3]
 [4 5 6]]


In [199]:
B = np.arange(1,7).reshape(3,2); print(B)

[[1 2]
 [3 4]
 [5 6]]


In [201]:
print(np.dot(A,B),"\n\n",np.dot(B,A))

[[22 28]
 [49 64]] 

 [[ 9 12 15]
 [19 26 33]
 [29 40 51]]


In [205]:
A = np.arange(9).reshape(3, 3); print(A); print()
x = np.arange(3);               print(x)

[[0 1 2]
 [3 4 5]
 [6 7 8]]

[0 1 2]


In [206]:
# dot also works for matrix-vector multiplication
print(np.dot(A, x)); print()
print(A.dot(x))

[ 5 14 23]

[ 5 14 23]


* Matrix multiplication expressions can quickly get VERY cumbersome. Below is an example of a __similarity transform__. 
$A'=BAB^{-1}$:

In [209]:
A = np.random.rand(3,3); print(A.round(3))
B = np.random.rand(3,3); print(B.round(3))

[[0.51  0.843 0.214]
 [0.091 0.157 0.047]
 [0.684 0.539 0.166]]
[[0.746 0.722 0.195]
 [0.988 0.547 0.1  ]
 [0.445 0.659 0.577]]


In [211]:
Ap = np.dot(B, 
            np.dot(A, 
                   np.linalg.inv(B))); print(Ap.round(3))

[[ 1.84  -0.759 -0.099]
 [ 2.225 -0.988 -0.142]
 [ 1.36  -0.33  -0.019]]


In [212]:
Ap = B.dot(A.dot(np.linalg.inv(B))); print(Ap.round(3))

[[ 1.84  -0.759 -0.099]
 [ 2.225 -0.988 -0.142]
 [ 1.36  -0.33  -0.019]]


* NumPy __matrix__ data structure = an easier-to-read alternative.

In [214]:
A = np.matrix(A); print(A.round(3))
B = np.matrix(B); print(B.round(3))

[[0.51  0.843 0.214]
 [0.091 0.157 0.047]
 [0.684 0.539 0.166]]
[[0.746 0.722 0.195]
 [0.988 0.547 0.1  ]
 [0.445 0.659 0.577]]


In [215]:
Ap = B*A*B.I; print(Ap.round(3)) # I = inverse matrix

[[ 1.84  -0.759 -0.099]
 [ 2.225 -0.988 -0.142]
 [ 1.36  -0.33  -0.019]]


* Unfortunately __matrix__ has some disadvantages & is discouraged. Expressions like A * B are context dependent, which causes readability issues.
* Consider **casting arrays to matrices** before computation, then casting the result back to ndarray instead.

In [216]:
A = np.asmatrix(A); print(A.round(3))
B = np.asmatrix(B); print(B.round(3))

[[0.51  0.843 0.214]
 [0.091 0.157 0.047]
 [0.684 0.539 0.166]]
[[0.746 0.722 0.195]
 [0.988 0.547 0.1  ]
 [0.445 0.659 0.577]]


In [217]:
Ap = B*A*B.I; Ap = np.asarray(Ap); print(Ap.round(3))

[[ 1.84  -0.759 -0.099]
 [ 2.225 -0.988 -0.142]
 [ 1.36  -0.33  -0.019]]


* __np.inner__ expects two inputs with the same dimension.
* __np.dot__ can take input vectors of shape _1xN_ & _Nx1_ respectively.
* __np.outer__ maps two vectors to a matrix.

In [219]:
print(np.inner(x,x)) # inner product between 2 arrays
print(np.dot(  x,x))

5
5


In [220]:
y = x[:, np.newaxis]; print(y)

[[0]
 [1]
 [2]]


In [221]:
print(np.dot(y.T, y))

[[5]]


In [223]:
x = np.array([1,2,3]); print(x)
print(np.outer(x,x))
print(np.kron( x,x))

[1 2 3]
[[1 2 3]
 [2 4 6]
 [3 6 9]]
[1 2 3 2 4 6 3 6 9]


* __np.kron__ often used to compute tensor products of arbitrary dimensions (both inputs must have same #axes).
* To obtain a result similar to `np.outer(x,x)`, input array x should be extended to shape (N,1) & (1,N) for `kron`'s 1st & 2nd arguments.

In [224]:
print(np.kron(x[:,np.newaxis], x[np.newaxis,:]))

[[1 2 3]
 [2 4 6]
 [3 6 9]]


In [225]:
# computing tensor product of two 2x2 matrices
print(np.kron(np.ones((2,2)), 
              np.identity(2)))

[[1. 0. 1. 0.]
 [0. 1. 0. 1.]
 [1. 0. 1. 0.]
 [0. 1. 0. 1.]]


In [None]:
np.kron(np.identity(2), np.ones((2,2)))

* Expressing common array ops using __Einstein's summation convention__ (np.einsum). (a summation is assumed over each index that occurs multiple times in an expression.)
* First argument is __an index expression__ (a string with comma-separated indices, followed by arbitrary number of arrays.
* For example: $x_n y_n$ can represented using "n,n".

In [232]:
x = np.array([1, 2, 3, 4])
y = np.array([5, 6, 7, 8])

In [234]:
print(np.einsum("n,n",x,y))
print(np.inner(       x,y))

70
70


* Matrix multiplication $A_{mk} B_{kn}$ using "mk,kn":

In [235]:
A = np.arange(9).reshape(3, 3); print(A)

[[0 1 2]
 [3 4 5]
 [6 7 8]]


In [236]:
B = A.T; print(B)

[[0 3 6]
 [1 4 7]
 [2 5 8]]


In [237]:
print(np.einsum("mk,kn",A,B))

[[  5  14  23]
 [ 14  50  86]
 [ 23  86 149]]


In [238]:
# verifying...
print(np.alltrue(np.einsum("mk,kn",A,B) == np.dot(A,B)))

True
