# Lecture 16 2018-10-11: Numpy

Numpy arrays (4.1); functions on ndarrays (4.2); png (4.6)

This worksheet accompanies the lecture notes.

## Numpy

*numpy* is a package in the *scipy* suite of packages that contains specialized modules for numerical computing, that is for writing programs that use or produce larger, more complex, collections of numeric data. (Scipy is for scientific computing in general). 

You can do anything in basic python that you can do with numpy, but it is usually much, much harder.
In particular, 

* numpy provides data objects for multi dimensional arrays, whereas python only supports one dimensional arrays (lists). 
* Numpy makes it much easier to index into arrays, and supports "slicing", which makes it possible to work with a part of an array independently. 
* Numpy makes it easy to perform operations, such as arithmetic, on entire arrays
* Numpy provides a rich library for working with pseudorandom numbers
* Numpy is *highly* optimized, whereas basic python functions can be very slow

As with all packages, to access numpy you need to *import* it. The convention is to
>import numpy as np

In [491]:
import numpy as np

### n-dimensional arrays (np.ndarray)

The basic data structure for numpy is the n-dimensional array, *np.ndarray*. 


#### Numpy provides several ways to create ndarrays

One can cast from other data types as usual, with *np.array()* (note: *array()*, not *nparray*). 
Numpy attempts to do the "right thing" with casting, as is usual in Python.

In [492]:
# create a 2-dimensional and a 3-dimensional array 
#(note that spacing doesn't matter so make it readable!)
array_2d = np.array([ [1,2,3], [4,5,6] ])
array_3d = np.array(
    [
        [ [1,2,3], [4,5,6] ],
        [ [7,8,9], [10,11,12] ]
    ]
)

In [493]:
# interesting error (note what's inside)
error_array = np.array([ [1,2,3], [4,5,6], 7 ])
error_array

array([list([1, 2, 3]), list([4, 5, 6]), 7], dtype=object)

Create special arrays directly. 
Note that some provide a *shape* as a paramter--shapes are tuples.

In [494]:
#np.zeros(6)
#np.zeros((2,3))
#np.ones((1,2,3))
#np.zeros_like(array_3d)
#np.ones_like(array_3d)
#np.identity(10)
#np.arange(36)
#np.arange(0,1,0.01)

In [495]:
rand_array = np.random.randn(4,4)
rand_array

array([[-1.56024652, -0.59772864, -0.02202465, -0.56502116],
       [-0.97454806, -1.7425243 , -1.17150462, -1.23013988],
       [ 0.29223654,  1.12072402,  0.61298384,  0.10027397],
       [-1.11439531, -0.17389808,  0.36820554,  0.76123527]])

In [496]:
one_to_ten = np.arange(10)
one_to_ten

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

#### Useful metadata in an ndarray

An ndarray is an object representing an array (usually of numbers) of type *ndarray* along with information about the array, such as its underlying data type, number of dimensions, and shape. 

object | value
:----- | :--------
ndim   | number of dimensions
shape  | shape of ndarray (number of items in each dimension)
dtype  | underlying data type
size   | number of data items


In [497]:
# print, and use introspection to look inside these ndarrays (shape, dim, dtype, ...)


In [498]:
# do some dumb efforts to create ndarrays that will fail
x = np.array([[1,2,3], [4,5,6], 7])               # 1 dimensional array!

#x = np.array([[1,2,3], [4,5,6], [7, 8, 'foo']])   #array of 'U21'!

x   # try some introspection, too

array([list([1, 2, 3]), list([4, 5, 6]), 7], dtype=object)

#### operations on ndarrays
In general, treat the array as if it were a single mathematical object

In [499]:
a, b, c = 1, 2, 3
x = array_2d
y = array_2d - 10
z = array_2d * 100

print(x)
print(y)
print(z)

print(x+z)
print(x*x*x)

print(x**0.5)

[[1 2 3]
 [4 5 6]]
[[-9 -8 -7]
 [-6 -5 -4]]
[[100 200 300]
 [400 500 600]]
[[101 202 303]
 [404 505 606]]
[[  1   8  27]
 [ 64 125 216]]
[[ 1.          1.41421356  1.73205081]
 [ 2.          2.23606798  2.44948974]]


In [500]:
inv = 1/x
print(inv)
print([inv**i for i in range(5)])

[[ 1.          0.5         0.33333333]
 [ 0.25        0.2         0.16666667]]
[array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]]), array([[ 1.        ,  0.5       ,  0.33333333],
       [ 0.25      ,  0.2       ,  0.16666667]]), array([[ 1.        ,  0.25      ,  0.11111111],
       [ 0.0625    ,  0.04      ,  0.02777778]]), array([[ 1.        ,  0.125     ,  0.03703704],
       [ 0.015625  ,  0.008     ,  0.00462963]]), array([[  1.00000000e+00,   6.25000000e-02,   1.23456790e-02],
       [  3.90625000e-03,   1.60000000e-03,   7.71604938e-04]])]


In [501]:
print(np.sin(rand_array))

[[-0.99994435 -0.56276638 -0.02202287 -0.53543369]
 [-0.82744819 -0.98529095 -0.92133658 -0.94253554]
 [ 0.28809466  0.90041565  0.5753106   0.10010601]
 [-0.89764445 -0.17302294  0.35994183  0.68981628]]


In [502]:
np.sin(array_2d)
np.log2(array_2d)
np.exp(array_2d)
np.sqrt(array_2d)

array([[ 1.        ,  1.41421356,  1.73205081],
       [ 2.        ,  2.23606798,  2.44948974]])

Comparisons are element-wise

In [503]:
x*x == x**2
rand_array < 0
rand_array > 0.5

array([[False, False, False, False],
       [False, False, False, False],
       [False,  True,  True, False],
       [False, False, False,  True]], dtype=bool)

In [504]:
rand_array[[rand_array > 0.5]]
rand_array[[rand_array < 0.8]]

array([-1.56024652, -0.59772864, -0.02202465, -0.56502116, -0.97454806,
       -1.7425243 , -1.17150462, -1.23013988,  0.29223654,  0.61298384,
        0.10027397, -1.11439531, -0.17389808,  0.36820554,  0.76123527])

#### Useful ndarray methods

method | meaning
:----- | :------
reshape(dim) | reshape to dimension *dim* (a tuple)
T      | tranpose the matrix

In [505]:
x = np.arange(12)
print(x)

x = x.reshape((4,3))
print(x)

y = x.T
print(y)

[ 0  1  2  3  4  5  6  7  8  9 10 11]
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
[[ 0  3  6  9]
 [ 1  4  7 10]
 [ 2  5  8 11]]


#### Indexing

Indexing and slicing as in base python works, but they are enhanced for ndarrays.

Each dimension can be indexed separately, separated by commas. (see page 99, Fig 4.2)

In [506]:
print(array_3d)
array_3d[1]
array_3d[1][0]
array_3d[1,0]
array_3d[0:1]
array_3d[0:1,0]

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]


array([[1, 2, 3]])

boolean indexes can be very useful

In [507]:
rand_array = np.random.randn(4,4)

print(rand_array)
print(rand_array<0)

ra_nz = rand_array[rand_array<0]
print(ra_nz)

rand_array[rand_array<0] = 0
rand_array[rand_array>1] = None

print(rand_array)

[[-0.68107012  0.00376108  0.23117585  0.34313879]
 [ 0.01884368 -0.60506344 -1.04707178 -2.32657899]
 [ 1.70460783 -0.23060312 -0.07261914 -1.1944307 ]
 [ 1.08620275  0.61128764  1.59172342  0.23872217]]
[[ True False False False]
 [False  True  True  True]
 [False  True  True  True]
 [False False False False]]
[-0.68107012 -0.60506344 -1.04707178 -2.32657899 -0.23060312 -0.07261914
 -1.1944307 ]
[[ 0.          0.00376108  0.23117585  0.34313879]
 [ 0.01884368  0.          0.          0.        ]
 [        nan  0.          0.          0.        ]
 [        nan  0.61128764         nan  0.23872217]]


#### Slices
A  slice is a window into the original object defined by the slice. That is, it is a *reference*. Changing it changes the original object. 

In [508]:
middle_slice = one_to_ten[2:6]
middle_slice

array([2, 3, 4, 5])

In [509]:
middle_slice[0] = 999
one_to_ten

array([  0,   1, 999,   3,   4,   5,   6,   7,   8,   9])

In [510]:
middle_slice[:] = 666
one_to_ten

array([  0,   1, 666, 666, 666, 666,   6,   7,   8,   9])

#### Fancy indexing

If you use an list as an index, the elements of the list act like individual indexes on that dimension. This is relatively esoteric. Don't worry if you don't get it.

In [511]:
print(array_2d)
array_2d[0,[2,0]]

[[1 2 3]
 [4 5 6]]


array([3, 1])

## Math and stat methods and objects in numpy

### Objects

name | value
:--- | :-----
pi   | pi
e    | e

In [512]:
rand_array = np.random.randn(4,4)

In [513]:
np.pi
np.e

2.718281828459045

### methods

#### (pseudo) randomly generated numbers

name | value
:--- | :----
randn(shape) | normally distributed random variables

In [514]:
rand_array = np.random.randn(4,4) # normally distributed
print(rand_array)

[[ 0.18604793  0.61453522 -0.19618699  0.80899124]
 [-1.02775987 -1.17390862  1.12394355 -0.61862023]
 [ 2.00295278  0.03343106  0.02193409  0.76530131]
 [ 0.50132854  1.8274901   0.50860833 -0.90940125]]


In [515]:
print('insight check. you win? {}'.format(
    np.random.randint(20) < np.random.randint(20)) 
    )

insight check. you win? True


In [516]:
np.random.random(10) # uniform distribution on [0,1]

array([ 0.66310851,  0.29871465,  0.50822108,  0.71837023,  0.83809826,
        0.61054612,  0.64366922,  0.05628116,  0.53648535,  0.55325656])

#### Other methods

name | value
:--- | :----
sum  | total of entries in the ndarray
mean | mean of entries in ndarray
std  | standard devisation of entries in the ndarray
cumsum | cumulative sum of entries in the ndarray
sort | sort the ndarray
argmax | the index of the maximum item in the ndarray
argmin | the index of the minimum item in the ndarray

Many of these can work over different axes. Think of *axis=0* as working on rows, and *axis=1* as working on columns. (I find that thinking of "x axis" and "y axis" is misleading.

In [517]:
seq_array = np.arange(6).reshape((3,2))
print(seq_array)

[[0 1]
 [2 3]
 [4 5]]


In [518]:
print(seq_array.sum())
print(seq_array.sum(axis=0))
print(seq_array.sum(axis=1))

15
[6 9]
[1 5 9]


In [519]:
seq_array

array([[0, 1],
       [2, 3],
       [4, 5]])

In [520]:
print('Random array: {}\n'.format(rand_array))
print('means in total: {}'.format(rand_array.mean()))
print('std in total: {}\n'.format(rand_array.std()))
print('means moving across rows: {}'.format(rand_array.mean(axis=0)))
print('std moving across rows: {}\n'.format(rand_array.std(axis=0)))
print('means moving across columns: {}'.format(rand_array.mean(axis=1)))
print('std moving across columns: {}'.format(rand_array.std(axis=1)))

Random array: [[ 0.18604793  0.61453522 -0.19618699  0.80899124]
 [-1.02775987 -1.17390862  1.12394355 -0.61862023]
 [ 2.00295278  0.03343106  0.02193409  0.76530131]
 [ 0.50132854  1.8274901   0.50860833 -0.90940125]]

means in total: 0.27929294932531423
std in total: 0.9087522587954405

means moving across rows: [ 0.41564235  0.32538694  0.36457475  0.01156777]
std moving across rows: [ 1.07970136  1.08086146  0.50725807  0.78251506]

means moving across columns: [ 0.35334685 -0.42408629  0.70590481  0.48200643]
std moving across columns: [ 0.3891682   0.91663722  0.80713974  0.96791304]


In [521]:
rand_array = np.random.rand(2,3)
print(rand_array); print()
print('cumulative\n{}\n'.format(rand_array.cumsum()))
print('cumulative over rows\n{}\n'.format(rand_array.cumsum(axis=0)))
print('cumulative over columns\n{}'.format(rand_array.cumsum(axis=1)))

[[ 0.06627136  0.34160235  0.09035751]
 [ 0.03326319  0.88414258  0.06780308]]

cumulative
[ 0.06627136  0.4078737   0.49823121  0.53149441  1.41563699  1.48344007]

cumulative over rows
[[ 0.06627136  0.34160235  0.09035751]
 [ 0.09953455  1.22574493  0.15816059]]

cumulative over columns
[[ 0.06627136  0.4078737   0.49823121]
 [ 0.03326319  0.91740577  0.98520886]]


In [522]:
rand_list = np.random.randn(10)
print('Random list:\n{}\n'.format(rand_list))
#print(rand_list.argmax())
#print(rand_list.argmin())

print('max in {}\n\tis at index {}\n\tvalue {}'.format(rand_list, 
                                                  rand_list.argmax(),
                                                  rand_list[rand_list.argmax()]
                                                 )                                                           
     )

Random list:
[-0.45061345 -1.63416836  0.46480536 -1.05501004 -0.3304233   0.39699815
 -0.28943476  0.46709551  0.26860396  0.77370137]

max in [-0.45061345 -1.63416836  0.46480536 -1.05501004 -0.3304233   0.39699815
 -0.28943476  0.46709551  0.26860396  0.77370137]
	is at index 9
	value 0.7737013701897275
