### Ch 02: Vectors, matrices and multidimensional arrays

[NumPy manual (latest version, ReadTheDocs)](https://numpy.readthedocs.io/en/latest/index.html)

In [3]:
import numpy as np
import seaborn as sn
import pandas as pd

### NumPy arrays
* __NOT THE SAME AS PYTHON LISTS__.
* All array elements have same data type; arrays are fixed size. 
* (Need to edit the array? create a new one.)
* **Attributes**:
    - _shape_: tuple; contains # of elements for each axis of the array
    - _size_: total # of elements
    - _ndim_: number of dimensions (axes)
    - _nbytes_: number of bytes used for storage
    - _dtype_: datatype

In [4]:
data = np.array([[1, 2], [3, 4], [5, 6]])
type(data)

numpy.ndarray

In [5]:
data.ndim, data.shape, data.size, data.dtype, data.nbytes

(2, (3, 2), 6, dtype('int64'), 48)

In [6]:
data

array([[1, 2],
       [3, 4],
       [5, 6]])

### Integers (8b,16,32b,64)
(python 3.11: dtype=np.int now dtype=int)

In [8]:
data = np.array([1, 2, 3], dtype=int); print(data.dtype, data)
data = np.array([1, 2, 3], dtype=np.int32); print(data.dtype, data)
data = np.array([1, 2, 3], dtype=np.int16); print(data.dtype, data)
data = np.array([1, 2, 3], dtype=np.int8); print(data.dtype, data)

int64 [1 2 3]
int32 [1 2 3]
int16 [1 2 3]
int8 [1 2 3]


### Unsigned Integers (8b,16,32b,64b)

In [9]:
data = np.array([1, 2, 3], dtype=np.uint); print(data.dtype, data)
data = np.array([1, 2, 3], dtype=np.uint32); print(data.dtype, data)
data = np.array([1, 2, 3], dtype=np.uint16); print(data.dtype, data)
data = np.array([1, 2, 3], dtype=np.uint8); print(data.dtype, data)

uint64 [1 2 3]
uint32 [1 2 3]
uint16 [1 2 3]
uint8 [1 2 3]


### Booleans

In [10]:
data = np.array([True,False,1,0], dtype=bool); print(data.dtype, data)

bool [ True False  True False]


### Floating Point (16b,32b,64b,128b)
- NumPy 1.20: numpy.float deprecated; use 'float' by itself.

In [13]:
data = np.array([1., 2., 3.], dtype=float); print(data.dtype, data)
data = np.array([1., 2., 3.], dtype=np.float128); print(data.dtype, data)
data = np.array([1., 2., 3.], dtype=np.float32); print(data.dtype, data)
data = np.array([1., 2., 3.], dtype=np.float16); print(data.dtype, data)

float64 [1. 2. 3.]
float128 [1. 2. 3.]
float32 [1. 2. 3.]
float16 [1. 2. 3.]


### Complex Data (64b,128b,256b)
- NumPy 1.20: np.complex deprecated. use 'float' by itself.

In [16]:
data = np.array([1., 2., 3.], dtype=complex);       print(data.dtype, data)
data = np.array([1., 2., 3.], dtype=np.complex64);  print(data.dtype, data)
data = np.array([1., 2., 3.], dtype=np.complex256); print(data.dtype, data)

complex128 [1.+0.j 2.+0.j 3.+0.j]
complex64 [1.+0.j 2.+0.j 3.+0.j]
complex256 [1.+0.j 2.+0.j 3.+0.j]


### Real and imaginary parts
* All numpy arrays (__not just complex vals__) have real & imaginary attributes.

In [17]:
data = np.array([1, 2, 3], dtype=complex)
print(data,"\n",data.real,"\n",data.imag)

[1.+0.j 2.+0.j 3.+0.j] 
 [1. 2. 3.] 
 [0. 0. 0.]


### Typecasting
* Once created, dtype cannot be changed. Create a copy by __typecasting__ (_astype_).

In [19]:
data.astype(int) # previously: astype(np.int)

  data.astype(int) # previously: astype(np.int)


array([1, 2, 3])

### Promoting
* Data types can get "promoted" to support math ops:

In [20]:
d1 = np.array([1, 2, 3], dtype=float)
d2 = np.array([1, 2, 3], dtype=complex)
(d1+d2).dtype

dtype('complex128')

* Some cases may require creation of arrays set to appropriate data types. The default datatype is 'float'.

In [24]:
# NumPy sqrt returns different datatypes depending on argument:
print(np.sqrt(np.array([-1, 0, 1]               )))
print(np.sqrt(np.array([-1, 0, 1], dtype=complex)))

[nan  0.  1.]
[0.+1.j 0.+0.j 1.+0.j]


  print(np.sqrt(np.array([-1, 0, 1]               )))


### Array Data Order in Memory

* Multidimensional arrays are stored as contiguous data in memory. There is a freedom of choice in how to arrange the array elements in this memory segment. 

- **Row-major** and **column-major** ordering are special cases of strategies for mapping an element's index using `ndarray.strides`.

- Operations that require changing `strides` return "views" that refer to the same data as the original array. For efficiency, NumPy strives to create views rather than copies when applying operations on arrays.

* Two options:
    - __Row-major__ (row-wise storage; C std, Numpy default. Use `order='C'`)
    - __Column-major__ (column-wise storage; Fortran std. Use `order='F'`)

### Creating Arrays
![array-gen-funcs](pics/array-gen-funcs.png)
![array-gen-funcs2](pics/array-gen-funcs2.png)

### Arrays created from lists and other array-like objects

In [25]:
data = np.array([1, 2, 3, 4]) # 1D array
data.ndim, data.shape

(1, (4,))

In [26]:
data = np.array([[1, 2], [3, 4]]) # 2D array
data.ndim, data.shape

(2, (2, 2))

### Arrays filled with constants:
- zeros(), ones(), full(), fill(), empty()

In [27]:
np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [28]:
data = np.ones(4); data

array([1., 1., 1., 1.])

In [29]:
5.4*data

array([5.4, 5.4, 5.4, 5.4])

In [30]:
np.full(10, 5.4)

array([5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4])

In [31]:
x1 = np.empty(5); x1

array([3.67078878e-316, 0.00000000e+000, 3.70441647e-316, 3.70167420e-316,
       2.37151510e-322])

In [32]:
x1.fill(3.0); x1

array([3., 3., 3., 3., 3.])

### Arrays filled with increments
- arange(start,stop,increment)
- linspace(start,stop,#points)

In [33]:
np.arange(0.0, 10, 1)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [34]:
print(np.linspace(0, 10, 20))

[ 0.          0.52631579  1.05263158  1.57894737  2.10526316  2.63157895
  3.15789474  3.68421053  4.21052632  4.73684211  5.26315789  5.78947368
  6.31578947  6.84210526  7.36842105  7.89473684  8.42105263  8.94736842
  9.47368421 10.        ]


### Arrays filled with logarithmic sequences
- starting value, ending value, base (optional)

In [35]:
# 4 data points between 10**0=1 to 10**2=100
np.logspace(0, 2, 4)  

array([  1.        ,   4.64158883,  21.5443469 , 100.        ])

### Mesh-grid arrays
* Given two 1D coordinate arrays, generate 2D coordinate array.
* Often used when plotting function over two variables (ex: contour plots).

In [36]:
x,y = np.array([-1, 0, 1]), np.array([-2, 0, 2])

X, Y = np.meshgrid(x, y); X

array([[-1,  0,  1],
       [-1,  0,  1],
       [-1,  0,  1]])

In [37]:
Y

array([[-2, -2, -2],
       [ 0,  0,  0],
       [ 2,  2,  2]])

In [38]:
(X+Y)**2

array([[9, 4, 1],
       [1, 0, 1],
       [1, 4, 9]])

* np.mgrid & np.ogrid generate coordinate arrays with slightly different syntaxes.

In [39]:
np.mgrid[0:3,0:5]

array([[[0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1],
        [2, 2, 2, 2, 2]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]]])

In [40]:
np.ogrid[0:3,0:5]

[array([[0],
        [1],
        [2]]),
 array([[0, 1, 2, 3, 4]])]

### Creating arrays with properties of other arrays

* Typical use case: a function that takes arrays of unspecified type & size as arguments & requires working arrays of the same type & size.

* like(), ones_like(), zeros_like(), full_like(), empty_like().

In [41]:
np.ones_like([1,2,3,4])

array([1, 1, 1, 1])

In [42]:
np.zeros_like([1,2,3,4])

array([0, 0, 0, 0])

In [43]:
np.full_like([1,2,3,4],5)

array([5, 5, 5, 5])

In [44]:
np.empty_like([1,2,3,4])

array([   18550,        0, 69312912, 63859472])

### Creating matrix arrays

* __np.identity()__: creates square matrix with ones on diagonal, zero elsewhere.
* __np.eye()__: ones on diagonal, optionally offset
* __diag()__: arbitrary 1D array on the diagonal of a matrix

In [45]:
np.identity(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [46]:
np.eye(4, k=1)

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

In [47]:
np.eye(4, k=-1)

array([[0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.]])

In [48]:
np.diag(np.arange(0, 20, 5))

array([[ 0,  0,  0,  0],
       [ 0,  5,  0,  0],
       [ 0,  0, 10,  0],
       [ 0,  0,  0, 15]])

## Index and slicing
- Elements and subarrays of NumPy arrays are accessed using the standard square bracket notation that is also used with Python lists.

### One-dimensional arrays
- Positive integers index elements from the beginning of the array (index starts at 0). Negative integers index elements from the end of the array.
![array-slice-funcs](pics/array-slice-funcs.png)

In [49]:
a = np.arange(0, 11); a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [50]:
a[0], a[-1], a[4] # first, last, 5th elements

(0, 10, 4)

In [51]:
a[1:-1] # range (2nd..2nd to last)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [52]:
a[1:-1:2] # range (1st..last, by 2)

array([1, 3, 5, 7, 9])

In [53]:
a[:5], a[-5:] # first five elements, last five elements

(array([0, 1, 2, 3, 4]), array([ 6,  7,  8,  9, 10]))

### Reversed order

In [54]:
a[::-1]

array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1,  0])

In [55]:
a[::-3]

array([10,  7,  4,  1])

### Multidimensional arrays

In [56]:
f = lambda m,n: n+10*m
A = np.fromfunction(f, (6, 6), dtype=int); A

array([[ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

In [57]:
A[:,0], A[0,:] # 1st col, 1st row

(array([ 0, 10, 20, 30, 40, 50]), array([0, 1, 2, 3, 4, 5]))

In [58]:
A[:3,:3], A[3:,:3] # upper left 3x3, lower left 3x3

(array([[ 0,  1,  2],
        [10, 11, 12],
        [20, 21, 22]]),
 array([[30, 31, 32],
        [40, 41, 42],
        [50, 51, 52]]))

In [59]:
A[::2, ::2] # every 2nd element

array([[ 0,  2,  4],
       [20, 22, 24],
       [40, 42, 44]])

In [60]:
A[1::2, 1::3]  # every (2nd,3rd) element starting from 1,1

array([[11, 14],
       [31, 34],
       [51, 54]])

### Views
* Subarray extractions using slice ops are alternative *views* of same underlying data. (They refer to same data, but using different "strides".)
- np.copy()
- np.array(,copy=True)

In [61]:
B = A[1:5, 1:5]; B

array([[11, 12, 13, 14],
       [21, 22, 23, 24],
       [31, 32, 33, 34],
       [41, 42, 43, 44]])

In [62]:
# modifying B (created from A) also modifies A.
B[:,:] = 0; A

array([[ 0,  1,  2,  3,  4,  5],
       [10,  0,  0,  0,  0, 15],
       [20,  0,  0,  0,  0, 25],
       [30,  0,  0,  0,  0, 35],
       [40,  0,  0,  0,  0, 45],
       [50, 51, 52, 53, 54, 55]])

In [63]:
# explicitly copy B to C (B not affected.)
C = B.copy(); C

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [64]:
C[:,:] = 1; C,B # C is a *copy* of the view B.

(array([[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]),
 array([[0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]]))

### Fancy indexing
* Arrays can be indexed using another array, a list, or sequence of integers.

![array-methods](pics/numpy-array-index-methods.png)

In [65]:
A = np.linspace(0, 1, 11); A

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

In [66]:
A[np.array([0, 2, 4])]

array([0. , 0.2, 0.4])

In [67]:
A[[0, 2, 4]]

array([0. , 0.2, 0.4])

### Boolean-based indexing: great for filtering!

In [68]:
A>0.8, A[A>0.8]

(array([False, False, False, False, False, False, False, False, False,
         True,  True]),
 array([0.9, 1. ]))

- Arrays from fancy|boolean indexing are *new, independent structures* - not just views of existing data.

In [69]:
A = np.arange(10,20); A

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

In [70]:
indices = [2, 4, 6]; B = A[indices]; B

array([12, 14, 16])

In [71]:
B[0] = -1; B,A # this does not affect A

(array([-1, 14, 16]), array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]))

In [72]:
A[indices] = -1; A

array([10, 11, -1, 13, -1, 15, -1, 17, 18, 19])

In [73]:
A = np.arange(10); A

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [74]:
B = A[A > 5]; B

array([6, 7, 8, 9])

In [75]:
B[0] = -1; B,A # this does not affect A

(array([-1,  7,  8,  9]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))

In [76]:
A[A > 5] = -1; A

array([ 0,  1,  2,  3,  4,  5, -1, -1, -1, -1])

### Reshaping and resizing ops
![reshape](pics/reshape-ops.png)
![indexing viz](pics/indexing-viz.png)
* Reshaping doesn't modify underlying data, only changes *stride* attribute

In [77]:
data = np.array([[1, 2], [3, 4]])
np.reshape(data, (1, 4))

array([[1, 2, 3, 4]])

In [78]:
data.reshape(4)

array([1, 2, 3, 4])

### ravel(), flatten()
* np.ravel() = special case of reshape. It collapses all array dimensions & returns a flattened 1D array with length = total number of original array elements. 
* flatten() does the same thing, but returns a copy instead of a view.

In [79]:
data, data.flatten(), data.flatten().shape

(array([[1, 2],
        [3, 4]]),
 array([1, 2, 3, 4]),
 (4,))

In [80]:
data, data.ravel(), data.ravel().shape

(array([[1, 2],
        [3, 4]]),
 array([1, 2, 3, 4]),
 (4,))

### newaxis()
* np.newaxis() = add axis to existing array.

In [81]:
data = np.arange(0, 5); data

array([0, 1, 2, 3, 4])

In [82]:
col = data[:, np.newaxis]; col

array([[0],
       [1],
       [2],
       [3],
       [4]])

In [83]:
row = data[np.newaxis, :]; row

array([[0, 1, 2, 3, 4]])

### hstack(), vstack(), concatenate()

* np.hstack(): horizontal stacking
* np.vstack():  vertical stacking rows into a matrix
* np.concatenate(): similar to stack, but accepts an _axis_ keyword

In [84]:
data = np.arange(5); data

array([0, 1, 2, 3, 4])

In [85]:
# stack vertically along axis 0
np.vstack((data, data, data)) 

array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])

In [86]:
# stack horizontally along axis 0np.hstack((data, data, data)) 

In [87]:
# to make hstack treat input arrays as columns:
data = data[:, np.newaxis]; data

array([[0],
       [1],
       [2],
       [3],
       [4]])

In [88]:
np.hstack((data, data, data))

array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2],
       [3, 3, 3],
       [4, 4, 4]])

* Number of elements in NumPy arrays can't be changed once created. __append__, __insert__, __delete__ all use a fresh copy of an array.
* __Not a best practice__ due to the overhead of creating & copying the arrays. Start with preallocated arrays whenever possible to avoid resizing.

### Vectorized expressions & Broadcasting

* Designed to avoid need for "*for*" loops. __Broadcasting__ = a scalar being distributed and an operation being applied to each element in an array.

![broadcasting](pics/broadcasting.png)

### Arithmetic operations
![arithmetic-ops](pics/arithmetic-ops.png)

In [89]:
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])

In [90]:
x+y, x-y

(array([[ 6,  8],
        [10, 12]]),
 array([[-4, -4],
        [-4, -4]]))

In [91]:
x*y, y/x

(array([[ 5, 12],
        [21, 32]]),
 array([[5.        , 3.        ],
        [2.33333333, 2.        ]]))

In [92]:
x*2, 2**x

(array([[2, 4],
        [6, 8]]),
 array([[ 2,  4],
        [ 8, 16]]))

In [93]:
y/2, (y/2).dtype

(array([[2.5, 3. ],
        [3.5, 4. ]]),
 dtype('float64'))

* If a math operation is performed on incompatible (size or shape) arrays, a __ValueError__ is raised.

In [94]:
x = np.array([1, 2, 3, 4]).reshape(2,2); x

array([[1, 2],
       [3, 4]])

In [95]:
z = np.array([1, 2, 3, 4]); z

array([1, 2, 3, 4])

In [96]:
try:
    x / z # incompatible size/shape
except ValueError:
    print("Nope. Can't do that.")

Nope. Can't do that.


* Broadcasting to a correct shape:

In [97]:
z = np.array([[2, 4]]); z.shape

(1, 2)

In [98]:
x/z

array([[0.5, 0.5],
       [1.5, 1. ]])

In [99]:
zz = np.concatenate([z, z], axis=0); zz

array([[2, 4],
       [2, 4]])

In [100]:
x/zz

array([[0.5, 0.5],
       [1.5, 1. ]])

In [101]:
z = np.array([[2], [4]]); z.shape

(2, 1)

In [102]:
x/z

array([[0.5 , 1.  ],
       [0.75, 1.  ]])

In [103]:
zz = np.concatenate([z, z], axis=1); zz

array([[2, 2],
       [4, 4]])

In [104]:
x/zz

array([[0.5 , 1.  ],
       [0.75, 1.  ]])

In [105]:
x = np.array([[1, 3], [2, 4]])
x = x+y; x

array([[ 6,  9],
       [ 9, 12]])

In [106]:
x = np.array([[1, 3], [2, 4]])
x += y; x

array([[ 6,  9],
       [ 9, 12]])

### Trigonometry, square root, exponential, logarithmic functions
![element-wise-math](pics/element-wise-math-functs.png)

In [107]:
x = np.linspace(-1, 1, 8); print(x)

[-1.         -0.71428571 -0.42857143 -0.14285714  0.14285714  0.42857143
  0.71428571  1.        ]


In [108]:
y = np.sin(np.pi*x); print(y) # sine function

[-1.22464680e-16 -7.81831482e-01 -9.74927912e-01 -4.33883739e-01
  4.33883739e-01  9.74927912e-01  7.81831482e-01  1.22464680e-16]


In [109]:
print(np.round(y, decimals=4)) # round FP numbers to 4 decimals

[-0.     -0.7818 -0.9749 -0.4339  0.4339  0.9749  0.7818  0.    ]


In [110]:
np.add(np.sin(x)**2, np.cos(x)**2) # sin^2+cos^2

array([1., 1., 1., 1., 1., 1., 1., 1.])

### Element-wise Math Functions

![element-wise-math](pics/element-wise-math.png)

### vectorize()
* Sometimes we need to define new functions that use NumPy arrays element-by-element. __vectorize()__ may help; it transforms a (usually scalar) function.

In [111]:
def heaviside(x):
    return 1 if x > 0 else 0

heaviside(-1), heaviside(1.5)

(0, 1)

In [112]:
# won't work for Numpy arrays:
try:
    heaviside(np.linspace(-5, 5, 11))
except ValueError:
    print("Nope. Can't do that.")

Nope. Can't do that.


In [113]:
# works, but relatively slow.
# better to use boolean-valued arrays (to be discussed later)
# use as quick-n-dirty check
heaviside = np.vectorize(heaviside) 
heaviside(x)

array([0, 0, 0, 0, 1, 1, 1, 1])

### Aggregate functions

- Accepts array inputs, returns scalar outputs.
- Uses entire array by default - can specify an axis using `axis`.

![aggregate-funcs](pics/aggregate-funcs.png)

In [114]:
data = np.random.normal(size=(8,8)); data.round(2)

array([[ 0.9 , -0.3 , -0.92, -1.15, -1.13,  0.34,  0.04, -0.32],
       [ 0.87, -0.28,  0.65,  2.58, -0.34, -1.6 , -0.07, -1.61],
       [-1.16, -0.34, -1.44, -1.45,  1.27,  1.76,  0.46,  1.63],
       [-0.81,  0.43, -0.07, -0.38,  0.27,  0.56,  0.4 , -1.82],
       [-0.36,  0.17, -1.61, -0.66, -0.36,  0.53,  0.41,  0.51],
       [-0.54, -0.6 , -0.89,  0.26,  1.6 , -1.4 ,  0.07,  0.88],
       [-0.05,  0.55,  2.07,  0.48, -0.28,  0.88,  1.99,  1.12],
       [-1.35, -0.73,  0.71, -1.08,  0.61, -3.09,  0.6 ,  1.  ]])

In [115]:
np.mean(data), data.mean(), np.std(data), data.std()

(-0.02519989991756791,
 -0.02519989991756791,
 1.0664806025352818,
 1.0664806025352818)

In [116]:
data = np.random.normal(size=(5, 10, 15))

In [117]:
# axis keyword controls which array axis gets aggregated
print(data.sum(axis=0    ).shape)
print(data.sum(axis=(0,2)).shape)
print(data.sum())

(10, 15)
(10,)
54.52989381088969


### Array aggregation:
1) over all elements
2) over first axis
3) over 2nd axis of a 3x3 array
![aggregate funcs illustrated](pics/aggregate-funcs-illustrated.png)

In [118]:
data = np.arange(1,10).reshape(3,3); data

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [119]:
print(data.sum(),data.sum(axis=0), data.sum(axis=1))

45 [12 15 18] [ 6 15 24]


### Boolean arrays and vectorized conditional expressions
* Enables you to avoid using if statements. Winning!

In [120]:
a = np.array([1, 2, 3, 4])
b = np.array([4, 3, 2, 1]); a<b

array([ True,  True, False, False])

### Aggregate booleans

In [121]:
# aggregate booleans
np.all(a<b), np.any(a<b)

(False, True)

In [122]:
if np.all(a < b):
    print("All a's < b's")
elif np.any(a < b):
    print("Some a's < b's")
else:
    print("All b's < a's")

Some a's < b's


### Vectorized booleans

In [123]:
x = np.array([-2, -1, 0, 1, 2]); x>0

array([False, False, False,  True,  True])

In [124]:
1*(x>0)

array([0, 0, 0, 1, 1])

In [125]:
x*(x>0)

array([0, 0, 0, 1, 2])

### Conditional / Logical ops
* Example use case: defining piecewise functions.

![conditionals-logicals](pics/conditional-logical-funcs.png)

In [126]:
x = np.linspace(-5, 5, 11); print(x)

[-5. -4. -3. -2. -1.  0.  1.  2.  3.  4.  5.]


In [127]:
# expression is a multiplication of two Boolean arrays,
# so multiplication acts as an elementwise AND operator.
def pulse(x, position, height, width):
    return height * (x >= position) * (x <= (position+width))

In [128]:
pulse(x, position=-2, height=1, width=5)

array([0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0])

In [129]:
pulse(x, position=1, height=1, width=5)

array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

In [130]:
# another implementation using logical_and:
def pulse(x, position, height, width):
    return height * np.logical_and(x >= position, x <= (position + width))

In [131]:
x = np.linspace(-4, 4, 9); x

array([-4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.])

### where(), select(), choose(), nonzero()

In [132]:
# 1st arg = boolean; 2nd,3rd args = true,false results
print(np.where(x<0, x*10, x/10))

[-40.  -30.  -20.  -10.    0.    0.1   0.2   0.3   0.4]


In [133]:
# select value from list of conditions.
print(np.select(
    [x < -1, x < 2, x >= 2],
    ['bad',  'meh', 'good']))

['bad' 'bad' 'bad' 'meh' 'meh' 'meh' 'good' 'good' 'good']


In [134]:
# choose value from list of arrays.
print(np.choose([0, 0, 0, 1, 1, 1, 2, 2, 2], 
                [x**2, x**3, x**4]))

[ 16.   9.   4.  -1.   0.   1.  16.  81. 256.]


In [135]:
# returns tuple of indices
# same result as direct indexing (abs(x)>2, but uses fancy ndxng.)
print(  np.nonzero(abs(x)>2))
print(x[np.nonzero(abs(x)>2)])
print(           x[abs(x)>2])

(array([0, 1, 7, 8]),)
[-4. -3.  3.  4.]
[-4. -3.  3.  4.]


### Set operations
* Manages __unordered collections__ of unique objects.
![set-funcs](pics/set-funcs.png)

In [136]:
a = np.unique([1,2,3,3])
b = np.unique([2,3,4,4,5,6,5])

In [137]:
print(np.in1d(a,b)) # test for existence of a in b (1D)

[False  True  True]


In [138]:
1 in a, 1 in b # testing for single element presence

(True, False)

In [139]:
print(np.all(np.in1d(a,b))) # a = subset of b?

False


In [140]:
print(np.union1d(    a,b)) # presence in either or both arrays
print(np.intersect1d(a,b)) # both arrays

[1 2 3 4 5 6]
[2 3]


In [141]:
print(np.setdiff1d(a, b)) # presence in a, but not in b
print(np.setdiff1d(b, a)) # presence in b, but not in a

[1]
[4 5 6]


### Array operations

Operations that act upon arrays __as a single entity__, and return transformed arrays of the same size.

![array-funcs](pics/array-funcs.png)

In [142]:
data = np.arange(9).reshape(3, 3); print(data)

[[0 1 2]
 [3 4 5]
 [6 7 8]]


In [143]:
print(np.transpose(data))
print(data.T) # transpose also exists as special method "T"

[[0 3 6]
 [1 4 7]
 [2 5 8]]
[[0 3 6]
 [1 4 7]
 [2 5 8]]


In [144]:
print(np.fliplr(data)) # flip left-to-right
print(np.flipud(data)) # flip up-down

[[2 1 0]
 [5 4 3]
 [8 7 6]]
[[6 7 8]
 [3 4 5]
 [0 1 2]]


In [145]:
np.flipud(data) # flip up-to-down

array([[6, 7, 8],
       [3, 4, 5],
       [0, 1, 2]])

### Matrix and vector operations
![matrix-funcs](pics/matrix-funcs.png)

In [146]:
A = np.arange(1,7).reshape(2,3); print(A)

[[1 2 3]
 [4 5 6]]


In [147]:
B = np.arange(1,7).reshape(3,2); print(B)

[[1 2]
 [3 4]
 [5 6]]


In [148]:
print(np.dot(A,B),"\n\n",np.dot(B,A))

[[22 28]
 [49 64]] 

 [[ 9 12 15]
 [19 26 33]
 [29 40 51]]


In [149]:
A = np.arange(9).reshape(3, 3); print(A); print()
x = np.arange(3);               print(x)

[[0 1 2]
 [3 4 5]
 [6 7 8]]

[0 1 2]


In [150]:
# dot also works for matrix-vector multiplication
print(np.dot(A, x)); print()
print(A.dot(x))

[ 5 14 23]

[ 5 14 23]


### Matrix math: alternative data structure
* Matrix multiplication expressions can quickly get VERY cumbersome. Below is an example of a __similarity transform__. 
$A'=BAB^{-1}$:

In [151]:
A = np.random.rand(3,3); print(A.round(3))
B = np.random.rand(3,3); print(B.round(3))

[[0.573 0.894 0.081]
 [0.425 0.664 0.023]
 [0.662 0.513 0.942]]
[[0.234 0.628 0.817]
 [0.84  0.92  0.124]
 [0.528 0.723 0.374]]


In [152]:
Ap = np.dot(B, 
            np.dot(A, 
                   np.linalg.inv(B))); print(Ap.round(3))

[[ 32.041  43.933 -82.345]
 [-35.998 -48.682  95.248]
 [ -6.764  -8.92   18.819]]


In [153]:
Ap = B.dot(A.dot(np.linalg.inv(B))); print(Ap.round(3))

[[ 32.041  43.933 -82.345]
 [-35.998 -48.682  95.248]
 [ -6.764  -8.92   18.819]]


* NumPy __matrix__ data structure = an easier-to-read alternative.

In [154]:
A = np.matrix(A); print(A.round(3))
B = np.matrix(B); print(B.round(3))

[[0.573 0.894 0.081]
 [0.425 0.664 0.023]
 [0.662 0.513 0.942]]
[[0.234 0.628 0.817]
 [0.84  0.92  0.124]
 [0.528 0.723 0.374]]


In [155]:
Ap = B*A*B.I; print(Ap.round(3)) # I = inverse matrix

[[ 32.041  43.933 -82.345]
 [-35.998 -48.682  95.248]
 [ -6.764  -8.92   18.819]]


* Unfortunately __matrix__ has some disadvantages & is discouraged. Expressions like A * B are context dependent, which causes readability issues.
* Consider **casting arrays to matrices** before computation, then casting the result back to ndarray instead.

In [156]:
A = np.asmatrix(A); print(A.round(3))
B = np.asmatrix(B); print(B.round(3))

[[0.573 0.894 0.081]
 [0.425 0.664 0.023]
 [0.662 0.513 0.942]]
[[0.234 0.628 0.817]
 [0.84  0.92  0.124]
 [0.528 0.723 0.374]]


In [157]:
Ap = B*A*B.I; Ap = np.asarray(Ap); print(Ap.round(3))

[[ 32.041  43.933 -82.345]
 [-35.998 -48.682  95.248]
 [ -6.764  -8.92   18.819]]


### inner(), dot(), outer()
* np.inner expects two inputs with the same dimension.
* np.dot can take input vectors of shape _1xN_ & _Nx1_ respectively.
* np.outer maps two vectors to a matrix.

In [158]:
print(np.inner(x,x)) # inner product between 2 arrays
print(np.dot(  x,x))

5
5


In [159]:
y = x[:, np.newaxis]; print(y)

[[0]
 [1]
 [2]]


In [160]:
print(np.dot(y.T, y))

[[5]]


In [161]:
x = np.array([1,2,3]); print(x)
print(np.outer(x,x))
print(np.kron( x,x))

[1 2 3]
[[1 2 3]
 [2 4 6]
 [3 6 9]]
[1 2 3 2 4 6 3 6 9]


### kron()
* np.kron: often used to compute tensor products of arbitrary dimensions (both inputs must have same #axes).
* To obtain a result similar to `np.outer(x,x)`, input array x should be extended to shape (N,1) & (1,N) for `kron`'s 1st & 2nd arguments.

In [162]:
print(np.kron(x[:,np.newaxis], x[np.newaxis,:]))

[[1 2 3]
 [2 4 6]
 [3 6 9]]


In [163]:
# computing tensor product of two 2x2 matrices
print(np.kron(np.ones((2,2)), 
              np.identity(2)))

[[1. 0. 1. 0.]
 [0. 1. 0. 1.]
 [1. 0. 1. 0.]
 [0. 1. 0. 1.]]


In [164]:
np.kron(np.identity(2), np.ones((2,2)))

array([[1., 1., 0., 0.],
       [1., 1., 0., 0.],
       [0., 0., 1., 1.],
       [0., 0., 1., 1.]])

### einsum()
* Expressing common array ops using __Einstein's summation convention__ (np.einsum). (a summation is assumed over each index that occurs multiple times in an expression.)
* First argument is __an index expression__ (a string with comma-separated indices, followed by arbitrary number of arrays.
* For example: $x_n y_n$ can represented using "n,n".

In [165]:
x = np.array([1, 2, 3, 4])
y = np.array([5, 6, 7, 8])

In [166]:
print(np.einsum("n,n",x,y))
print(np.inner(       x,y))

70
70


* Matrix multiplication $A_{mk} B_{kn}$ using "mk,kn":

In [167]:
A = np.arange(9).reshape(3, 3); print(A)

[[0 1 2]
 [3 4 5]
 [6 7 8]]


In [168]:
B = A.T; print(B)

[[0 3 6]
 [1 4 7]
 [2 5 8]]


In [169]:
print(np.einsum("mk,kn",A,B))

[[  5  14  23]
 [ 14  50  86]
 [ 23  86 149]]


In [170]:
# verifying...
print(np.alltrue(np.einsum("mk,kn",A,B) == np.dot(A,B)))

True
