# Numpy Function Summary

This is a short set of notes covering some concepts/snippets in NumPy. Made as I follow along Python Data Science Handbook by Jake VanderPlas

In [154]:
# imports 
import numpy as np
%xmode minimal

Exception reporting mode: Minimal


## Making Arrays

In [155]:
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [156]:
np.ones((3,5),dtype=float) 

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [157]:
# float defaults to float64, int defaults to int64
print(type(_3[0,0]), type(_2[0]))

<class 'numpy.float64'> <class 'numpy.int64'>


In [158]:
np.full((3,5),3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [159]:
np.arange(0,20,2) # similar to standard python range

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [160]:
np.linspace(0,np.pi,7) 
# very useful for generating linear spaces, esp when functions
# need to be generated and plotted

array([0.        , 0.52359878, 1.04719755, 1.57079633, 2.0943951 ,
       2.61799388, 3.14159265])

In [161]:
np.random.random((3,3)) # uniform random

array([[0.81296924, 0.56964974, 0.22138374],
       [0.83403211, 0.49137092, 0.34444953],
       [0.05489069, 0.55865437, 0.28875437]])

In [162]:
np.random.normal(0,1,(3,3)) # standard normal

array([[-0.13556148,  0.24552433,  0.93751342],
       [ 0.88138712, -0.09459561,  0.29901006],
       [ 1.90159968, -0.08169106,  0.9730997 ]])

In [163]:
np.random.randint(0,10,(3,3))

array([[8, 0, 6],
       [8, 2, 3],
       [5, 5, 8]])

In [164]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [165]:
np.empty((2,4)) 
# empty array, filled with whatever already exists there

array([[-2.68156159e+154,  1.73060623e-077,  1.97626258e-323,
         0.00000000e+000],
       [ 0.00000000e+000,  0.00000000e+000,  4.34457681e-311,
         5.60613042e-309]])

<img style="width: 60%;" src="img/numpy_types.png"></img>
<img style="width: 60%;" src="img/np_datatypes.png"></img>

## More on Arrays

- Arrays are mutable: if you make a subarray via splitting/splicing and modify the subarray, the original array will also change. Use `.copy()` to not let that happen
- Operations on arrays:
    - Attributes
    - Indexing
    - Slicing and Reshaping
    - Joining and Splitting

### Array Attributes

Arrays have two major attributes: dimension and shape. Consider the following array:

In [166]:
np.random.randint(0,50,(2,3,5))

array([[[ 0,  5,  5, 27, 46],
        [39, 33, 37, 40, 16],
        [ 7, 47, 30, 24,  5]],

       [[33, 18, 20, 10,  4],
        [ 9, 39, 12, 39, 26],
        [37, 46, 45, 32,  1]]])

The position of an element in a one-dimensional array is it's index.

The dimension of the array is the number of indexes required to access a particular element of the array. To access the first element, we would need 3 indexes i.e. `_15[0,0,0]`, hence the dimension is 3.

The shape of the array is the range of particular indexes. The first index here has a range of 2, the second 3 and the last 5. Note that indexes are ordered from outermost to innermost: 
```text
[ [ [1], [2] ], [ [3], [4] ] ]
 ↑ ↑ ↑
 0 1 2     if your position is (<0>,<1>,<2>) (<> = value)
```

This is equivalent to doing `A[<0>][<1>][<2>]`. For brevity, numpy uses `[<0>, <1>, <2>]` to access.

In [167]:
print(_15.ndim, _15.shape, _15.size, sep="\n")

3
(2, 3, 5)
30


In [168]:
print(_15[0][2][4], _15[0,2,4])

44 44


### Slicing, Reshaping, Joining and Splitting

Start with creating a small example array

In [169]:
A = np.random.randint(0,50,(5,8))
A

array([[ 0, 39,  1, 43, 12,  2, 33, 33],
       [34, 49, 30, 47,  1, 26,  4, 28],
       [44, 35, 16, 39, 33,  7, 47, 30],
       [12,  1, 46, 28,  3,  8, 35, 25],
       [47, 35,  6, 26, 44, 34, 20, 35]])

In [170]:
# Slicing: similar (same?) to slicing python lists
A[1:-1,2:-2]

array([[30, 47,  1, 26],
       [16, 39, 33,  7],
       [46, 28,  3,  8]])

In [171]:
# Reshaping
A.reshape((2,4,5)) 
# flattens the array out sequentially into 1-D,
# then strings it out in the given shape

array([[[ 0, 39,  1, 43, 12],
        [ 2, 33, 33, 34, 49],
        [30, 47,  1, 26,  4],
        [28, 44, 35, 16, 39]],

       [[33,  7, 47, 30, 12],
        [ 1, 46, 28,  3,  8],
        [35, 25, 47, 35,  6],
        [26, 44, 34, 20, 35]]])

In [172]:
# Reshaping via np.newaxis: np.newaxis simply inserts a new axis 
# around your data.
np.arange(5)[np.newaxis]

array([[0, 1, 2, 3, 4]])

The way to use newaxis is to use it in an index whenever a new axis needs to be inserted. The following inserts a new axis around everyelement of np.arange(5), converting it into a column vector from a row vector.

In [173]:
np.arange(5)[:,np.newaxis]

array([[0],
       [1],
       [2],
       [3],
       [4]])

In [174]:
# concatenation
B = np.ones((5,3), dtype=int)
np.concatenate([A,B], axis=1) # note the axis here: we're concatenating
# along the first axis (lengthwise to the inner elements)

array([[ 0, 39,  1, 43, 12,  2, 33, 33,  1,  1,  1],
       [34, 49, 30, 47,  1, 26,  4, 28,  1,  1,  1],
       [44, 35, 16, 39, 33,  7, 47, 30,  1,  1,  1],
       [12,  1, 46, 28,  3,  8, 35, 25,  1,  1,  1],
       [47, 35,  6, 26, 44, 34, 20, 35,  1,  1,  1]])

In [175]:
# stacking is equivalent to concatenating horizontally
np.hstack([A,B])

array([[ 0, 39,  1, 43, 12,  2, 33, 33,  1,  1,  1],
       [34, 49, 30, 47,  1, 26,  4, 28,  1,  1,  1],
       [44, 35, 16, 39, 33,  7, 47, 30,  1,  1,  1],
       [12,  1, 46, 28,  3,  8, 35, 25,  1,  1,  1],
       [47, 35,  6, 26, 44, 34, 20, 35,  1,  1,  1]])

In [176]:
# splitting
np.split(A,[3,5], axis=1) # np.hsplit, np.vsplit also exist

[array([[ 0, 39,  1],
        [34, 49, 30],
        [44, 35, 16],
        [12,  1, 46],
        [47, 35,  6]]),
 array([[43, 12],
        [47,  1],
        [39, 33],
        [28,  3],
        [26, 44]]),
 array([[ 2, 33, 33],
        [26,  4, 28],
        [ 7, 47, 30],
        [ 8, 35, 25],
        [34, 20, 35]])]

## Computation 

Computation is via Universal functions (ufuncs): these functions are vectorized to operate on each element of numpy arrays elementwise, and fast. Much faster than loops.

In [177]:
A

array([[ 0, 39,  1, 43, 12,  2, 33, 33],
       [34, 49, 30, 47,  1, 26,  4, 28],
       [44, 35, 16, 39, 33,  7, 47, 30],
       [12,  1, 46, 28,  3,  8, 35, 25],
       [47, 35,  6, 26, 44, 34, 20, 35]])

In [178]:
A + 100

array([[100, 139, 101, 143, 112, 102, 133, 133],
       [134, 149, 130, 147, 101, 126, 104, 128],
       [144, 135, 116, 139, 133, 107, 147, 130],
       [112, 101, 146, 128, 103, 108, 135, 125],
       [147, 135, 106, 126, 144, 134, 120, 135]])

In [179]:
A / 2 # note how the datatype changes here

array([[ 0. , 19.5,  0.5, 21.5,  6. ,  1. , 16.5, 16.5],
       [17. , 24.5, 15. , 23.5,  0.5, 13. ,  2. , 14. ],
       [22. , 17.5,  8. , 19.5, 16.5,  3.5, 23.5, 15. ],
       [ 6. ,  0.5, 23. , 14. ,  1.5,  4. , 17.5, 12.5],
       [23.5, 17.5,  3. , 13. , 22. , 17. , 10. , 17.5]])

<img style="width: 60%;" src="img/np_ufuncs_1.png"></img>
<img style="width: 30%;" src="img/np_ufuncs_2.png"></img>
Other functions are also ufuncs! `np.sin`,`np.cos`, `np.tan`, `np.abs`, etc 

### Advanced Ufuncs: Outer product and Aggregation

In [180]:
x = np.arange(4)
y = np.arange(4,8)
(x,y)

(array([0, 1, 2, 3]), array([4, 5, 6, 7]))

In [181]:
# outer product: compute for all pairs of diff inputs
np.multiply.outer(x,y)

array([[ 0,  0,  0,  0],
       [ 4,  5,  6,  7],
       [ 8, 10, 12, 14],
       [12, 15, 18, 21]])

Aggregation applies the results of the ufunc to the entire array sequentially, rather than to all elements individually at once. Reduction and accumulation are the two main types of aggregation

In [182]:
np.add.reduce(x) # adds all elements of x

6

In [183]:
np.multiply.accumulate(y) # sequentially multiplies elements of x

array([  4,  20, 120, 840])

In [184]:
# aggregates can also work on an axis!
np.sum(A, axis=0) # sums along axis 0

array([137, 159,  99, 183,  93,  77, 139, 151])

note that here, axis refers to the dimension of the array that will be collapsed, rather than the dimension that we'll sum on. Collapsing axis 0 means that we'll sum along all axes other than 0

<img style="width: 60%;" src="img/aggr_funcs.png"></img>

### Broadcasting

Broadcasting (Broadening + casting) applies binary ufuncs on arrays of different sizes, according to the following rules:

1. If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
2. If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
3. If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

In [185]:
# Rule 1:
np.ones((2, 3)) + np.arange(3)
# (2,3) + (3,) -> (2,3) + (1,3) -> (2,3) + (2,3) (extension)

array([[1., 2., 3.],
       [1., 2., 3.]])

In [186]:
# Rule 2
np.arange(3).reshape((3, 1)) + np.arange(3)

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

In [187]:
# Rule 3
np.ones((3, 2)) + np.arange(3) # doesn't work

ValueError: operands could not be broadcast together with shapes (3,2) (3,) 

## Boolean Arrays and Masking

Boolean conditional ufuncs can be used to mask and obtain only specific datapoints from a given array

In [188]:
A

array([[ 0, 39,  1, 43, 12,  2, 33, 33],
       [34, 49, 30, 47,  1, 26,  4, 28],
       [44, 35, 16, 39, 33,  7, 47, 30],
       [12,  1, 46, 28,  3,  8, 35, 25],
       [47, 35,  6, 26, 44, 34, 20, 35]])

In [189]:
A<25 # this gives us a boolean index. We'll cover indices later

array([[ True, False,  True, False,  True,  True, False, False],
       [False, False, False, False,  True, False,  True, False],
       [False, False,  True, False, False,  True, False, False],
       [ True,  True, False, False,  True,  True, False, False],
       [False, False,  True, False, False, False,  True, False]])

In [190]:
A[A<25]

array([ 0,  1, 12,  2,  1,  4, 16,  7, 12,  1,  3,  8,  6, 20])

boolean unary bit operators are used as conditionals for boolean arrays (they apply the conditional elementwise)

In [191]:
A[(A<25) & (A>10)]

array([12, 16, 12, 20])

Indexes can also be used to obtain a count of elements satisfying a particular range/value

In [192]:
(A<25).sum()

14

In [193]:
(A==32).any()

False

## Fancy Indexing

When a tuple containing indices is passed, numpy would create an array with the same structure as the tuple passed

In [194]:
idx = ((2,3),(4,5))
A[idx]

array([33,  8])

In [195]:
idx_row = np.array([0,1,2,4,4])
idx_col = np.array([2,3,1,2,4])

A[idx_row[:,np.newaxis], idx_col]

array([[ 1, 43, 39,  1, 12],
       [30, 47, 49, 30,  1],
       [16, 39, 35, 16, 33],
       [ 6, 26, 35,  6, 44],
       [ 6, 26, 35,  6, 44]])

Fancy indexing can be combined with simple indexing, slicing and masking, making it very powerful.

### At command: in place addition

In [196]:
# add 100 to all diagonal elements in A
# index building is a lil complicated here sadly ;-;
np.add.at(A, np.concatenate([(np.eye(A.shape[0])==1), np.zeros((A.shape[0], A.shape[1] - A.shape[0]), dtype=bool)], axis=1), 100)
A

array([[100,  39,   1,  43,  12,   2,  33,  33],
       [ 34, 149,  30,  47,   1,  26,   4,  28],
       [ 44,  35, 116,  39,  33,   7,  47,  30],
       [ 12,   1,  46, 128,   3,   8,  35,  25],
       [ 47,  35,   6,  26, 144,  34,  20,  35]])

## Sorting

In [197]:
# A.sort() # in place, modifies A!
# to not modify A, use np.sort(A)
np.sort(A)

array([[  1,   2,  12,  33,  33,  39,  43, 100],
       [  1,   4,  26,  28,  30,  34,  47, 149],
       [  7,  30,  33,  35,  39,  44,  47, 116],
       [  1,   3,   8,  12,  25,  35,  46, 128],
       [  6,  20,  26,  34,  35,  35,  47, 144]])

In [203]:
# partitioning: gets the k smallest values in the array
np.partition(A,2,axis=1)[:,:2]

array([[ 1,  2],
       [ 1,  4],
       [ 7, 30],
       [ 1,  3],
       [ 6, 20]])