## Topics for Class
* Revisiting Stacking and Splitting
* Few interesting array creation functions
* Understanding of Reductions
* Understanding of Broadcasting
* Understanding of Universal Functions

In [2]:
import numpy as np

## Revisiting Stacking and Splitting
* Stacking merges two or more matrics together to create larger matrices
  * Horizontal stacking (hstack)
  * Vertical stacking (vstack)
* Splitting splits an existing matrix into smaller matrices
  * Horizontal splitting (hsplit)
  * Vertical splitting (vsplit)
* Other common methods of merging 
  * Concatenate
  * Append

In [3]:
stk1 = np.random.randint(1,10,(3,4))
stk1

array([[4, 2, 8, 2],
       [9, 9, 1, 4],
       [9, 8, 5, 7]])

In [4]:
stk2 = np.random.randint(11,20,(3,4))
stk2

array([[17, 16, 15, 11],
       [18, 13, 18, 18],
       [15, 17, 11, 11]])

In [9]:
stk3 = np.hstack((stk1,stk2))
stk3

array([[ 4,  2,  8,  2, 17, 16, 15, 11],
       [ 9,  9,  1,  4, 18, 13, 18, 18],
       [ 9,  8,  5,  7, 15, 17, 11, 11]])

In [10]:
stk4 = np.stack((stk1,stk2),axis=1)
stk4

array([[[ 4,  2,  8,  2],
        [17, 16, 15, 11]],

       [[ 9,  9,  1,  4],
        [18, 13, 18, 18]],

       [[ 9,  8,  5,  7],
        [15, 17, 11, 11]]])

In [12]:
stk5 = np.vstack((stk1,stk2))
stk5

array([[ 4,  2,  8,  2],
       [ 9,  9,  1,  4],
       [ 9,  8,  5,  7],
       [17, 16, 15, 11],
       [18, 13, 18, 18],
       [15, 17, 11, 11]])

In [14]:
stk6 = np.stack((stk1,stk2),axis=0)
stk6

array([[[ 4,  2,  8,  2],
        [ 9,  9,  1,  4],
        [ 9,  8,  5,  7]],

       [[17, 16, 15, 11],
        [18, 13, 18, 18],
        [15, 17, 11, 11]]])

## Difference between hstack/vstack and stack
* hstack appends the 2nd matrix side by side of the 1st matrix. The shape remains the same
* stack along axis=1 forms 3 2D matrices by taking row by row values. The shape changes from 2D to 3D

In [47]:
# concatenate and append are another methods for doing matrix concatenation
stk7 = np.concatenate((stk1,stk2))
stk7

array([[ 2,  4,  4,  5],
       [ 3,  5,  3,  4],
       [ 2,  1,  1,  3],
       [11, 11, 14, 19],
       [13, 11, 16, 19],
       [16, 16, 15, 11]])

In [48]:
stk8 = np.concatenate((stk1,stk2),axis=0)
stk8

array([[ 2,  4,  4,  5],
       [ 3,  5,  3,  4],
       [ 2,  1,  1,  3],
       [11, 11, 14, 19],
       [13, 11, 16, 19],
       [16, 16, 15, 11]])

In [49]:
stk9 = np.concatenate((stk1,stk2),axis=1)
stk9

array([[ 2,  4,  4,  5, 11, 11, 14, 19],
       [ 3,  5,  3,  4, 13, 11, 16, 19],
       [ 2,  1,  1,  3, 16, 16, 15, 11]])

In [52]:
stk10 = np.append(stk1,stk2,axis=0)
stk10

array([[ 2,  4,  4,  5],
       [ 3,  5,  3,  4],
       [ 2,  1,  1,  3],
       [11, 11, 14, 19],
       [13, 11, 16, 19],
       [16, 16, 15, 11]])

In [53]:
stk11 = np.append(stk1,stk2,axis=1)
stk11

array([[ 2,  4,  4,  5, 11, 11, 14, 19],
       [ 3,  5,  3,  4, 13, 11, 16, 19],
       [ 2,  1,  1,  3, 16, 16, 15, 11]])

## Let's talk about Split functions now!

In [16]:
# lets create a matrix and apply split function
splt1 = np.random.randint(1,10,(3,4))
splt1

array([[1, 6, 3, 9],
       [7, 4, 4, 6],
       [5, 2, 1, 6]])

In [18]:
splt2 = np.hsplit(splt1,2)
splt2

[array([[1, 6],
        [7, 4],
        [5, 2]]), array([[3, 9],
        [4, 6],
        [1, 6]])]

In [19]:
splt3 = np.split(splt1,2,axis=1)
splt3

[array([[1, 6],
        [7, 4],
        [5, 2]]), array([[3, 9],
        [4, 6],
        [1, 6]])]

In [20]:
splt4 = np.vsplit(splt1,3)
splt4

[array([[1, 6, 3, 9]]), array([[7, 4, 4, 6]]), array([[5, 2, 1, 6]])]

In [21]:
splt5 = np.split(splt1,3,axis=0)
splt5

[array([[1, 6, 3, 9]]), array([[7, 4, 4, 6]]), array([[5, 2, 1, 6]])]

## Quick Note on Transposition

In [25]:
# lets create a vector
v = np.array([[1,2,3]])
v.shape

(1, 3)

In [26]:
# another way to create the same vector
v1 = np.transpose(np.array([[1,2,3]]))
v1.shape

(3, 1)

## Quick Notes

In [27]:
# creating a matrix with nos. diagonally
m1 = np.diag([1,2,3,4])
m1

array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

In [45]:
# creating a matrix from a function : here the function is i^2 + j^2
m2 = np.fromfunction(lambda i,j:i**2+j**2,(4,5))
m2

array([[ 0.,  1.,  4.,  9., 16.],
       [ 1.,  2.,  5., 10., 17.],
       [ 4.,  5.,  8., 13., 20.],
       [ 9., 10., 13., 18., 25.]])

In [46]:
# here the function is sin(i)+cos(j)
m3 = np.fromfunction(lambda i,j:np.sin(i)+np.cos(j),(3,4))
m3

array([[ 1.        ,  0.54030231, -0.41614684, -0.9899925 ],
       [ 1.84147098,  1.38177329,  0.42532415, -0.14852151],
       [ 1.90929743,  1.44959973,  0.49315059, -0.08069507]])

## Some discussions on Reductions
#### It means "the mechanisms to summarize and in turn reduce data"
* Sum
* Mean
* Median
* .....Any other aggregation functions


In [54]:
red1 = np.random.randint(1,10,(3,4))
red1

array([[4, 9, 5, 3],
       [2, 8, 3, 6],
       [3, 6, 7, 3]])

In [58]:
red2 = red1.sum()
red2

59

In [59]:
red3 = red1.sum(0)
red3

array([ 9, 23, 15, 12])

In [62]:
red4 = red1.sum(1)
red4

array([21, 19, 19])

In [64]:
# Quick application - lets normalize the matrix : formula = (x - mean)/std
red5 = (red1 - red1.mean(0))/red1.std(0)
red5

array([[ 1.22474487,  1.06904497,  0.        , -0.70710678],
       [-1.22474487,  0.26726124, -1.22474487,  1.41421356],
       [ 0.        , -1.33630621,  1.22474487, -0.70710678]])

## Understanding Broadcasting
![Broadcasting](img/broadcasting.png)

#### Broadcasting Rules :
  * Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with    ones on its leading (left) side.
  * Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
  * Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

In [31]:
arr1 = np.zeros((4,3))
arr1

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [32]:
arr2 = np.array([1,2,3])
arr2

(3,)

In [33]:
arr3 = arr1 + arr2
arr3
# please note that broadcasting a row vector with a matrix requires the no. of columns to be same in both

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

In [13]:
# broadcasting with a column vector
arr4 = np.transpose(np.array([[1,2,3,4]]))
arr4

array([[1],
       [2],
       [3],
       [4]])

In [14]:
arr5 = arr1 + arr4
arr5
# please note that broadcasting a column vector with a matrix requires the no. of rows to be same in both

array([[1., 1., 1.],
       [2., 2., 2.],
       [3., 3., 3.],
       [4., 4., 4.]])

In [35]:
food = np.array([[56.1,22.1,7.4,3.1]
                ,[12.1,2.1,5.4,23]
                ,[3.2,14.3,18.1,4.5]
                ,])
food

array([[56.1, 22.1,  7.4,  3.1],
       [12.1,  2.1,  5.4, 23. ],
       [ 3.2, 14.3, 18.1,  4.5]])

In [43]:
total_cal = np.sum(food,axis=0)
total_cal

array([71.4, 38.5, 30.9, 30.6])

In [23]:
perc = (food/total_cal)*100
perc

array([[78.57142857, 57.4025974 , 23.94822006, 10.13071895],
       [16.94677871,  5.45454545, 17.47572816, 75.16339869],
       [ 4.48179272, 37.14285714, 58.57605178, 14.70588235]])

In [45]:
# another example of broadcasting
# create pairwise distance matrix having the distances between a set of points
# create a matrix having 4 points (0,0) (4,0) (0,3) (4,3)
pts1 = np.array([[0,0]
                ,[4,0]
                ,[0,3]
                ,[4,3]])
pts1.shape

(4, 2)

In [46]:
pts2 = np.array([[1,1],[2,2]])
pts2.shape

(2, 2)

In [47]:
pts1 - pts2

ValueError: operands could not be broadcast together with shapes (4,2) (2,2) 

In [53]:
pts3 = pts1.reshape(1,4,2)
pts3.shape

(1, 4, 2)

In [54]:
pts4 = pts2.reshape(2,1,2)
pts4.shape

(2, 1, 2)

In [55]:
pts5 = pts3 - pts4
pts5

array([[[-1, -1],
        [ 3, -1],
        [-1,  2],
        [ 3,  2]],

       [[-2, -2],
        [ 2, -2],
        [-2,  1],
        [ 2,  1]]])

In [56]:
pts6 = pts5**2
pts6.shape

(2, 4, 2)

In [57]:
pts7 = np.sum(pts6,axis=2)
pts7

array([[ 2, 10,  5, 13],
       [ 8,  8,  5,  5]], dtype=int32)

In [58]:
pts8 = np.sqrt(pts7)
pts8

array([[1.41421356, 3.16227766, 2.23606798, 3.60555128],
       [2.82842712, 2.82842712, 2.23606798, 2.23606798]])

In [98]:
# one step distance matrix calculation
np.sqrt(np.sum((pts1.reshape(1,4,2) - pts2.reshape(2,1,2))**2, -1))

array([[1.41421356, 3.16227766, 2.23606798, 3.60555128],
       [2.82842712, 2.82842712, 2.23606798, 2.23606798]])

In [101]:
np.arange(1,6)

array([1, 2, 3, 4, 5])

## Introduction to Universal Functions (UFuncs)
* Super fast operations on a matrix rather than loop method
* All sorts of operations (mathematical, logical, comparison, transcendental etc.)
* Reduce, aggregate, accumulate functions (reducing data to a minimum)
* List of UFuncs : 
  * ![UFunc1](img/ufunc1.png)
  * ![UFunc2](img/ufunc2.png)

In [59]:
# Lets do a speed test
import numpy as np
np.random.seed(0)

def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output
        
values = np.random.randint(1, 10, size=5)
compute_reciprocals(values)

array([0.16666667, 1.        , 0.25      , 0.25      , 0.125     ])

In [60]:
big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)

2.2 s ± 113 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [61]:
%timeit 1/big_array

5.85 ms ± 160 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [63]:
red1 = np.arange(1,10)
red1

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [66]:
np.add.reduce(red1) 

45

In [67]:
np.add.accumulate(red1)

array([ 1,  3,  6, 10, 15, 21, 28, 36, 45], dtype=int32)

In [68]:
red2 = np.random.randint(1,10,(3,4))
red2

array([[3, 6, 4, 2],
       [7, 8, 1, 4],
       [3, 8, 8, 7]])

In [69]:
red3 = np.add.reduce(red2)
red3

array([13, 22, 13, 13])

In [70]:
red4 = np.add.reduce(red2,1)
red4

array([15, 20, 26])

In [119]:
red5 = np.sum(red2,0)
red5

array([13, 22, 13, 13])

In [71]:
# questions asked by Shyna
ex1 = np.random.randint(1,10,(3,4))
ex1

array([[9, 1, 1, 4],
       [4, 5, 9, 7],
       [6, 6, 7, 1]])

In [72]:
ex1[1:2] = ex1[1:2]*2

In [73]:
ex1

array([[ 9,  1,  1,  4],
       [ 8, 10, 18, 14],
       [ 6,  6,  7,  1]])