## Use a numpy matrix / vector to perform calculation will be much more faster than multiple loops in python

### This trick is very useful when dealing with large scale data

In [1]:
import numpy as np
import time

In [2]:
A = np.random.rand(5000)
B = np.random.rand(10000)

Says you have 2 array, and you need to calculate the differences between each element in A and B, which means the result will be a 5000*10000 matrix    

Simply use two loops in python can solve it easily but slowly

In [3]:
startLoop = time.clock()
diffLoop = []
for i in range(len(A)):
    temp = []
    for j in range(len(B)):
        temp.append(A[i]-B[j])
    diffLoop.append(temp)
timeLoop = time.clock()-startLoop

In [4]:
len(diffLoop),len(diffLoop[0])

(5000, 10000)

In [5]:
print('Time costed with Python loop: ',timeLoop,'s')

Time costed with Python loop:  15.26 s


**Actually you can perfrom it faster with numpy subtraction, but the problem is how?**

In [6]:
A-B

ValueError: operands could not be broadcast together with shapes (5000,) (10000,) 

**Obviously it is not allowed to do that neither in math nor program, but the only thing you need is just reshaping the tensor like this.**

In [7]:
startNp = time.clock()
A = A.reshape(A.shape[0],1)
diffNp = A-B
timeNp = time.clock()-startNp

In [8]:
diffNp.shape

(5000, 10000)

In [9]:
(diffNp == diffLoop).all() # the result is same

True

In [10]:
print('Time costed with numpy: ',timeNp,'s')

Time costed with numpy:  0.14999999999999858 s


**The reason is the broadcasting feature in Numpy: https://docs.scipy.org/doc/numpy-1.15.0/user/basics.broadcasting.html .  The smaller array is “broadcast” across the larger array so that they have compatible shapes.**

It is allowed to calculate array with different dimensions as long as one of them is one    

When a numpy calculation is performed, their dimensions will aligned from lowest dimension (right side). If two dimensions are same or one of them is 1, then this dimension is compatible, all the level of dimensions should meet this requirement. Missed dimensions will be treated as 1.

In [11]:
x,y = np.arange(5),np.arange(3)
x.shape,y.shape

((5,), (3,))

In [12]:
x-y

ValueError: operands could not be broadcast together with shapes (5,) (3,) 

In [13]:
newX = x.reshape(x.shape[0],1)

In [14]:
newX - y

array([[ 0, -1, -2],
       [ 1,  0, -1],
       [ 2,  1,  0],
       [ 3,  2,  1],
       [ 4,  3,  2]])

In [15]:
newX.shape,y.shape

((5, 1), (3,))

The shape of y is change to (1,3) here