# Demo of some `numpy` features

## MCS 275 Spring 2024 - David Dumas

This is a quick tour of some `numpy` features.  For more detail see:
* [Chapter 2 of VanderPlas](https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html)
* [The numpy documentation](https://numpy.org/doc/stable/)

## Importing the module

And checking the version.

In [61]:
import numpy as np
np.__version__

'1.21.5'

## Creating arrays

They are iterable and type-homogeneous.  Can make one from any suitable iterable.

[List of built-in dtypes](https://numpy.org/doc/stable/reference/arrays.scalars.html#arrays-scalars-built-in).

In [3]:
# `np.array` will convert from an iterable
x = np.array([2,4,8,16,32])

In [8]:
# List of lists -> 4 row, 3 column matrix (2 dimensional array)
A = np.array([[1,2,3],[4,5,6],[7,8,9],[0,2,0]])

In [11]:
# nice display
print(x)
print()
print(A)

[ 2  4  8 16 32]

[[1 2 3]
 [4 5 6]
 [7 8 9]
 [0 2 0]]


In [13]:
# ndarray class
type(x)

numpy.ndarray

Check number of dimensions

In [6]:
x.ndim # how many dimensions?

1

In [14]:
A.ndim

2

Check shape (size in each dimension)

In [7]:
x.shape # size in each dimension, as a tuple

(5,)

In [15]:
A.shape

(4, 3)

Check "length" (first elt of shape)

In [16]:
len(x) # number of items in the vector

5

In [17]:
len(A) # number of rows in the matrix = A.shape[0]

4

In general, `len(m)` means `m.shape[0]` if `m` is a numpy array.

Check data type

In [18]:
x.dtype # int64 means (signed) integer, 64 bits

dtype('int64')

Data type typically inferred but can be specified (potential lossy process)

In [19]:
# Given a mix of integers and floats, numpy
# will choose a floating point dtype
y= np.array([5,6,7,7.289])

In [20]:
y

array([5.   , 6.   , 7.   , 7.289])

In [21]:
y.dtype # float64 means float, 64 bits (double)

dtype('float64')

In [22]:
y_force_int = y= np.array([5,6,7,7.289], dtype="int")

In [25]:
# Notice we lost precision by specifying dtype int
y_force_int

array([5, 6, 7, 7])

In [24]:
# Notice numpy chose a precise type compatible with
# the request "int"
y_force_int.dtype

dtype('int64')

In [26]:
# uint8 means UNSIGNED integer, 8 bits
# UNSIGNED = only 0 and positive values
# range is 0...255
z = np.array([1,-1,2,100,300,500,800,16384], dtype="uint8")

In [27]:
z

array([  1, 255,   2, 100,  44, 244,  32,   0], dtype=uint8)

In [28]:
# Why did 300 appear as 44 in the array above?
300 % 256

44

## Filled arrays

Can fill with zeros, ones, or make an array full of a general value.

In [29]:
# Filled with zeros
np.zeros( (3,12), dtype="int64")

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

In [30]:
# Filled with ones
np.ones( (6,2), dtype="float64")

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [31]:
# Filled with one value
np.full( (7,4), 42, dtype="uint8" )

array([[42, 42, 42, 42],
       [42, 42, 42, 42],
       [42, 42, 42, 42],
       [42, 42, 42, 42],
       [42, 42, 42, 42],
       [42, 42, 42, 42],
       [42, 42, 42, 42]], dtype=uint8)

Can also ask for an array filled with random values (floats between 0 and 1, never exactly 1, uniformly distributed).  Note `np.random` is a submodule, you want `np.random.random(...)`

In [32]:
# Filled with random numbers between 0 and 1
np.random.random( (4,5) )  # argument is the shape

array([[0.84985369, 0.52904623, 0.06757972, 0.60749338, 0.44579558],
       [0.94276045, 0.22398809, 0.99290933, 0.9505469 , 0.65531365],
       [0.0944089 , 0.67245052, 0.25851387, 0.81402906, 0.02156303],
       [0.16531779, 0.87858936, 0.60636781, 0.24837322, 0.67221406]])

## Special things about 2D arrays

Identity (eye-dentity) matrix

$$
\begin{pmatrix}
1 & 0 & 0\\
0 & 1 & 0\\
0 & 0 & 1
\end{pmatrix}
$$

In [62]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [63]:
np.eye(5,dtype="int")

array([[1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0],
       [0, 0, 0, 0, 1]])

In [64]:
np.eye(3,dtype="bool")

array([[ True, False, False],
       [False,  True, False],
       [False, False,  True]])

Transpose

In [66]:
A = np.array([[1,3],[5,7],[9,4],[6,6]])
A

array([[1, 3],
       [5, 7],
       [9, 4],
       [6, 6]])

In [67]:
A.T  # The TRANSPOSE of A

array([[1, 5, 9, 6],
       [3, 7, 4, 6]])

In [68]:
A[2,0] == A.T[0,2]

True

In [69]:
A.shape

(4, 2)

In [70]:
A.T.shape

(2, 4)

## Vector algebra

In [33]:
# two 3-dimensional vectors
v = np.array([1,2,5])
w = np.array([4,-8,0])

Dot product (and vector length)

In [34]:
v.dot(w) # dot product
#   1*4 + 2*(-8) + 5*0

-12

In [35]:
v.dot(v)**0.5 # length

5.477225575051661

Scalar multiplication

In [36]:
1.8 * v # scalar multiplication

array([1.8, 3.6, 9. ])

Elementwise sum

In [37]:
v+w # elementwise sum

array([ 5, -6,  5])

Elementwise product (?!)

In [38]:
v*w # elementwise product

array([  4, -16,   0])

## Arithmetic progressions

* `np.arange` is `start`, `stop`, `step`
* `np.linspace` is `first`,`last`,`number`

In [39]:
# Recall how you get a list of integer values
# in arithmetic progression using built-in stuff
list(range(3,20,2))

[3, 5, 7, 9, 11, 13, 15, 17, 19]

The similarly named `arange` from `numpy` does all this and more.

In [40]:
# From 2 up to but not including 3 in steps of size 0.1
np.arange(2,3,0.1)   # start, stop (not included), step

array([2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])

When you know how many points you want, rather than the spacing, it's better to use `np.linspace`.  It takes the first and last elements, then the number of evenly-spaced points you want between them.

In [41]:
# From 12 to 14 in 6 steps; 
np.linspace( 12, 14, 6 )   # first, last, number of elements

array([12. , 12.4, 12.8, 13.2, 13.6, 14. ])

## Accessing items

Zero-based indexing.  For multi-dimensional arrays, give several integer indices separated by commas.

In [47]:
v = np.arange(8,24,3)
v

array([ 8, 11, 14, 17, 20, 23])

In [44]:
A = np.array([[1,2,3],[4,5,6],[7,8,9],[0,-2,16]])
A

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [ 0, -2, 16]])

### Vector indexing: Just like lists

In [48]:
v[0]

8

In [49]:
v[4]

20

In [51]:
v[-1]

23

### Multidimensional indexing: use a tuple of indices

For matrices, it's `[row, col]`

In [53]:
print(A)
print()
print(A[0,1])  # row 0 column 1

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [ 0 -2 16]]

2


Omitted indices at the end mean "everything from those dimensions"

In [54]:
A[2] # Row 2

array([7, 8, 9])

Using `:` as an index means "everything from that dimension"

In [56]:
print(A)
print()
# All rows, column 1; that is, get column 1 as a vector
print(A[:,1])  

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [ 0 -2 16]]

[ 2  5  8 -2]


In [75]:
np.arange(0,16).reshape( (4,4) )

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [81]:
np.random.random( (3,3) ).ravel()

array([0.38840672, 0.24351184, 0.2268054 , 0.62668464, 0.1897047 ,
       0.00832081, 0.77081304, 0.19896759, 0.71307259])

## Assigning items

**`numpy` arrays are mutable** 😱  -- more complicated than immutable, but more memory efficient (changes rather than copies)

In [83]:
A = np.zeros( (12,16), dtype="int")
A

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

In [84]:
A[7,2] = 4

In [85]:
A

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

In [87]:
A[7]  # row 7, column ANYTHING, i.e. the entire row 7 as a vector

array([0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [88]:
A[:, 2] # all rows, column 2, i.e. the entire column 2 a a vector

array([0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0])

## Slices

Can combine slice notation with multiple indices.

In [91]:
L = ["a","b","c","d","e"]
L[1:3]  # this is a slice

['b', 'c']

In [None]:
L[:] # slice that starts at beginning of the list and ends just past the end of the list

In [105]:
A = np.arange(12*5,dtype="int").reshape( (12,5) ) + 1
A

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25],
       [26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35],
       [36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45],
       [46, 47, 48, 49, 50],
       [51, 52, 53, 54, 55],
       [56, 57, 58, 59, 60]])

In [104]:
A[3:6]   # 2D array that consists of rows 3, 4, and 5 from A

array([[16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25],
       [26, 27, 28, 29, 30]])

In [106]:
A[1:-1, 1:-1]   # 2D array with rows 1 up to but not including last
                # and cols 1 up to but not including last

array([[ 7,  8,  9],
       [12, 13, 14],
       [17, 18, 19],
       [22, 23, 24],
       [27, 28, 29],
       [32, 33, 34],
       [37, 38, 39],
       [42, 43, 44],
       [47, 48, 49],
       [52, 53, 54]])

In [108]:
A[9:11,2:4]

array([[48, 49],
       [53, 54]])

In [109]:
A[::2,::2] # all entries whose position has an even-numbered column and row

array([[ 1,  3,  5],
       [11, 13, 15],
       [21, 23, 25],
       [31, 33, 35],
       [41, 43, 45],
       [51, 53, 55]])

In [111]:
# You can use slices in assignment statements
A[::2,::2] = 0  # sets all entries in that slice of the matrix to zero

In [112]:
A

array([[ 0,  2,  0,  4,  0],
       [ 6,  7,  8,  9, 10],
       [ 0, 12,  0, 14,  0],
       [16, 17, 18, 19, 20],
       [ 0, 22,  0, 24,  0],
       [26, 27, 28, 29, 30],
       [ 0, 32,  0, 34,  0],
       [36, 37, 38, 39, 40],
       [ 0, 42,  0, 44,  0],
       [46, 47, 48, 49, 50],
       [ 0, 52,  0, 54,  0],
       [56, 57, 58, 59, 60]])

In [114]:
A[1:4, 1:6] = -1  # the submatrix between rows 1 and 4 (not incl 4) and cols 1 and 6 (not incl 6) is set to -1

In [115]:
A

array([[ 0,  2,  0,  4,  0],
       [ 6, -1, -1, -1, -1],
       [ 0, -1, -1, -1, -1],
       [16, -1, -1, -1, -1],
       [ 0, 22,  0, 24,  0],
       [26, 27, 28, 29, 30],
       [ 0, 32,  0, 34,  0],
       [36, 37, 38, 39, 40],
       [ 0, 42,  0, 44,  0],
       [46, 47, 48, 49, 50],
       [ 0, 52,  0, 54,  0],
       [56, 57, 58, 59, 60]])

Slices return **views**, not copies.

In [116]:
C = A[-1]  # last row of A

In [117]:
C

array([56, 57, 58, 59, 60])

In [118]:
C[1::2] += 10  # each element of C at odd index gets increased by 10

In [119]:
C

array([56, 67, 58, 69, 60])

In [120]:
A

array([[ 0,  2,  0,  4,  0],
       [ 6, -1, -1, -1, -1],
       [ 0, -1, -1, -1, -1],
       [16, -1, -1, -1, -1],
       [ 0, 22,  0, 24,  0],
       [26, 27, 28, 29, 30],
       [ 0, 32,  0, 34,  0],
       [36, 37, 38, 39, 40],
       [ 0, 42,  0, 44,  0],
       [46, 47, 48, 49, 50],
       [ 0, 52,  0, 54,  0],
       [56, 67, 58, 69, 60]])

## Equality and bool

`.all()` checks if an array of booleans is all `True`.

In [121]:
A = np.array([1,2,3])
B = np.array([1,2,4])

In [122]:
A==B

array([ True,  True, False])

In [123]:
# common mistake
if A==B:
    print("They are the same")
else:
    print("There is a difference")

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [124]:
# common fix
if np.all(A==B):
    print("They are the same")
else:
    print("There is a difference")

There is a difference


In [125]:
# common fix
if np.array_equal(A,B):
    print("They are the same")
else:
    print("There is a difference")

There is a difference


## Ufuncs

Functions that automatically apply to each entry in an array.

### Some arrays to operate on

In [131]:
v = np.linspace(0,4,17)
v

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  , 2.25, 2.5 ,
       2.75, 3.  , 3.25, 3.5 , 3.75, 4.  ])

### Examples of numpy ufuncs

In [134]:
# slow
L = []
for x in v:
    L.append(np.cos(x))
cos_x = np.array(L)

In [133]:
cos_x

array([ 1.        ,  0.96891242,  0.87758256,  0.73168887,  0.54030231,
        0.31532236,  0.0707372 , -0.17824606, -0.41614684, -0.62817362,
       -0.80114362, -0.92430238, -0.9899925 , -0.99412968, -0.93645669,
       -0.82055936, -0.65364362])

In [135]:
np.cos(v) # elementwise cosine of the numbres in v

array([ 1.        ,  0.96891242,  0.87758256,  0.73168887,  0.54030231,
        0.31532236,  0.0707372 , -0.17824606, -0.41614684, -0.62817362,
       -0.80114362, -0.92430238, -0.9899925 , -0.99412968, -0.93645669,
       -0.82055936, -0.65364362])

In [137]:
v**2  # exponentiation also automatically operates elementwise

array([ 0.    ,  0.0625,  0.25  ,  0.5625,  1.    ,  1.5625,  2.25  ,
        3.0625,  4.    ,  5.0625,  6.25  ,  7.5625,  9.    , 10.5625,
       12.25  , 14.0625, 16.    ])

Let $f(x) = 3x^2 - 8x + 14$.  Apply $f$ to each element of array `v`.

In [138]:
def f(x):
    return 3*x**2  - 8*x + 14  # square, scalar mult, and addition are all ufuncs

In [139]:
np.array([ f(x) for x in v ])  # slow way

array([14.    , 12.1875, 10.75  ,  9.6875,  9.    ,  8.6875,  8.75  ,
        9.1875, 10.    , 11.1875, 12.75  , 14.6875, 17.    , 19.6875,
       22.75  , 26.1875, 30.    ])

In [141]:
f(v) # automatically acts on every entry of v separately

array([14.    , 12.1875, 10.75  ,  9.6875,  9.    ,  8.6875,  8.75  ,
        9.1875, 10.    , 11.1875, 12.75  , 14.6875, 17.    , 19.6875,
       22.75  , 26.1875, 30.    ])

## Broadcasting

In [144]:
np.array([[1,2],[3,4]]) + 512  # elementwise

array([[513, 514],
       [515, 516]])

In [146]:
np.array([[1,2],[3,4]]) + np.array([[512,512],[512,512]])  # what actually happens

array([[513, 514],
       [515, 516]])

In [147]:
np.array([[1,2,3],[4,5,6]]) + np.array([10,100,1000]) # matrix + row
# 10,100,1000 gets added to each row of the matrix

array([[  11,  102, 1003],
       [  14,  105, 1006]])

## Aggregations

`sum`, `max`, `min`, `argmax`, `argmin`, `mean`, `all`, `any`, `array_equal`

## Masks

In [149]:
B = np.array([[1,2,3],[4,5,6]]) + np.array([10,100,1000])

In [150]:
B

array([[  11,  102, 1003],
       [  14,  105, 1006]])

In [151]:
B > 100

array([[False,  True,  True],
       [False,  True,  True]])

In [152]:
B[ B > 100 ] = 0   # Every element of B that is greater than 100 should be set to zero

In [155]:
A[ A < 0 ] = 0  #
A[ A > 1 ] = 1  #  clamp A between 0 and 1

## Pillow integration

* `np.array(img)` just works, if `img` is a `PIL.Image` object
* Use `PIL.Image.fromarray(A)` to make an image from an array
    * Shape `(height,width)` and dtype `uint8` for grayscale
    * Shape `(height,width,3)` and dtype `uint8` for color (last axis is red, green, blue)