## The gears of neural networks: tensor operations
---
Much as any computer program can be ultimately reduced to a small set of binary operations on binary inputs(AND,NOR,OR and so on), all tansformation learned by deep neural network can be reduced to a handful of of *tensor operation* applied to tensors of numerical data. Different operation in Tensors are like Addition,Multiplication,division and so on.

Let's say we are building our network by stacking dense layer on top of each other.Keras instance of dense layer looks like this.
`Keras.layers.Dense(512,activation="relu")`

This layer can be considered as fuction, which takes ainput 2D tensor and return another 2D tensor- a new representation for the input tensor. Specifically, the function is as follows(where **W** is a 2D tensor and **b** is a vector, both are attribute of a layer):

`output = relu(dot(W,input)+b)`

Let's unpack this there are three tensor operations here:
- A dot product (dot) between theinput tensor and a tensor name W.
- An addition (+) between resulting 2D tensor and a vector b.
- Then finally relu operation where max(x,0).


#### Key attributes in Tensor

A Tensor is defined by three key attributes:
- `Number of axes(rank)` - A 3D tensor has thee axes, and a matrix has two axes.This is also called the tensor's `ndim` in pythin libraries such as numpy.  


- `Shape` - This is a tuple of integers that describes how many dimensions the tensors has along each axis.For example let's say a matrix has hape (3,5) and the 3D tensor has shape (3,3,5). A vector has a shape with a single element, such as (5,), whereas a scalar has a an empty shape,().  


- `Data type`- This is the type of the data contained int he the tensor, for instance a tensor's type could be float2,unit8,float64, and so son, On rare ocasions you may see 'char' tensor. Note that string tensor doesn't exist in numpy, because tenors  live in preallocated, contagiou memory segments.

### 1. Element-wise operations
---
**Element-wise operations:** Operation that are applied independentaly to each entry in the tensor being considered.These operations are highly amneable to massevly parralel implemenatations. Naive python implementation of an element-wise operations can be written using the pythoon loop. Let's see an example.  


In [3]:
import numpy as np

In [4]:
## applying relu functiont to matrix using for loop.
def naive_relu(x:input):
    assert len(x.shape) == 2 ## it's a 2D numoy tensor.
    
    x = x.copy() ## make a copy of it for doing any operations toa void overwrite to input tensor.
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i,j] = max(x[i,j],0)## value below 0 are considered zero here.
    return x

In [5]:
matrix_A = np.matrix(np.random.randn(5,2))
matrix_A

matrix([[ 1.12102096, -0.70869042],
        [-1.11280532,  0.33702367],
        [ 1.64109751,  0.38418265],
        [ 1.23431686, -0.38054473],
        [ 0.58436407, -0.45413768]])

In [6]:
%%timeit
## Applying relu function using naive relu using for loop
relu_res = naive_relu(matrix_A)
relu_res

12.6 µs ± 462 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In practice when using Numpy Array these opertaions  are available and optimized built-in numpy functions,Which themselves delegate the heavy lifting  to Basic Linear Algebra subprogrmas(BLAS).BLAS are low-level, highly parallel, effieicent tensor-manipulation routines that are typically iplemented in Fortan and C.So, in numpy these element wise operation are blazing fast.  

**Let's see the time taken by numpy relu function**

In [7]:
%%timeit
z  = np.maximum(matrix_A,0.)

1.64 µs ± 30.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


Let's try naive element wise addition operation

In [8]:
def naive_add(x,y):
    assert len(x.shape) == 2
    assert x.shape == y.shape ### X and Y are 2D Numpy Tensor
    ## In matrix for addition and subtraction the shape and size of matrix has to be same.
    
    x = x.copy() ##Avoiding input Tensor.
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i,j] += y[i,j]
    return x

In [9]:
matrix_x = [[int(j+i) for j in range(3)]for i in range(5)]
matrix_x = np.matrix(matrix_x)
matrix_x

matrix([[0, 1, 2],
        [1, 2, 3],
        [2, 3, 4],
        [3, 4, 5],
        [4, 5, 6]])

In [10]:
matrix_y = [[int(j*i+2)for j in range(3)]for i in range(5)]
matrix_y  =  np.matrix(matrix_y)
matrix_y

matrix([[ 2,  2,  2],
        [ 2,  3,  4],
        [ 2,  4,  6],
        [ 2,  5,  8],
        [ 2,  6, 10]])

In [11]:
%%timeit
naive_add(matrix_x,matrix_y) ## time taken use naive method

20.9 µs ± 1.06 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [12]:
add_result = naive_add(matrix_x,matrix_y)

In [13]:
add_result

matrix([[ 2,  3,  4],
        [ 3,  5,  7],
        [ 4,  7, 10],
        [ 5,  9, 13],
        [ 6, 11, 16]])

$$ x= \begin{bmatrix}
0&1&2\\
1&2&3\\
2&3&4\\
3&4&5\\
4&5&6
\end{bmatrix}  
$$  


$$ y= \begin{bmatrix}
2&2&2\\
2&3&4\\
2&4&6\\
2&5&8\\
2&6&10
\end{bmatrix}
$$

$$x+y = \begin{bmatrix}
2&3&4\\
3&5&7\\
4&7&10\\
5&9&13\\
6&11&16
\end{bmatrix}$$

In [14]:
%%timeit
matrix_x + matrix_y 

2.07 µs ± 26.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


As you can see the time taken by numpy module to calculate this is amazingly less as compare to the naive method.In the  backend all this method has c or fortan code which is wrapped by python API.

In [15]:
z = matrix_x + matrix_y
z

matrix([[ 2,  3,  4],
        [ 3,  5,  7],
        [ 4,  7, 10],
        [ 5,  9, 13],
        [ 6, 11, 16]])

### 2. Broadcasting
---
In our earlier naive implementation of `naive_add` only supports the addition of 2D tensor with indentical shapes.But in the dense layer, we add a 2D tensor with a vector.What happens with addition when the shapes of the two tensors beinf added is differ?  

When possible, and if there's no ambigutiy, the smaller tensor will be broadcasted to match the shape of the larger tensor.Broadcasting conist of two steps.  
1. Axes(called broadcasr axes) are added to the smaller tensor to match the ndin to natch the larger tensor.
2. The smaller tensor is repeated alongside these new axes to match the full shape of large tensor.

### Shape Compatibility Rules
1. If x,y have a different number of dimensions, prepend **1's** to the shape of the shorter.
2. Any axis of length 1 can be repeated(broadcast)to the length of other vector's length in that axis.
3. All other axes must have matching lengths.  

`x,shape == (2,3)
y.shape == (2,3) # compatible
y.shape == (2,1) # compatible
y.shape == (1,3) ## compatible as we can prepend 1 to match dim
y.shape == (3,) ## here also prepend 1
#result in (2,3) shape
y.shape == (3,2) #not compatibe
y.shape == (2,) ## cannot append 1 later`

Let's take a concrete example. conder x with shape(32,10) and y with the shape (10,).First we add an empty first axis to y,ehose shape becomes (1,10). Then, we repeat y 32 times alongside this new axis, so that we end up with the tensor Y with the shape (32,10), Where `Y[i,:] == y` at this pint we can proceed with X and Y, because they have same shape.



In [16]:
def naive_add_matrix_and_vector(x,y):
    assert len(x.shape) == 2 ## numpy tensor
    assert len(y.shape) == 1
    assert x.shape[1] == y.shape[0]
    
    x = x.copy() ## avoid overwriting
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i,j] += y[j]
    return x

In [17]:
max_A = np.matrix(np.random.random((3,5)))

In [18]:
max_A

matrix([[0.22507737, 0.41597762, 0.16978782, 0.78754142, 0.24128524],
        [0.65142811, 0.58775953, 0.04436466, 0.44562289, 0.32671138],
        [0.42489012, 0.29793036, 0.1955798 , 0.47047185, 0.71442754]])

In [19]:
max_B = np.random.random((5,))

In [20]:
max_B

array([0.09059811, 0.08373514, 0.06392869, 0.93048703, 0.25728368])

In [21]:
naive_add_matrix_and_vector(max_A,max_B)

matrix([[0.31567548, 0.49971275, 0.23371651, 1.71802845, 0.49856892],
        [0.74202622, 0.67149467, 0.10829334, 1.37610992, 0.58399505],
        [0.51548823, 0.3816655 , 0.25950849, 1.40095888, 0.97171122]])

With broadcating, you can generally apply two-tensor element-wise operations if one tensor has shape (a,b,..n.n+1,...m) and other has shape (n,n+1,...m).The broadcasting will then automatically happen for axes a through n-1."

### 3. Tensor dot
---
The dot opeartion, aslo called a tensor product(not to be confused with element wise product) is the most common, most useful tensor opertation.Contrary to element-wise operations, it combines entries in the input tensor. 


An element-wise product is done with the * operator in numpy, keras,Theanso, and TensorFlow.`dot` uses a different suntaz in Tensorflow,but in Numpy and keras it's done using the standard dot operator.

Vectors with only only same number of elements are allowed for dot product.  

Dot product between matrix and vector can aslo be done , which return a vector where the coefficients are the dot product between vector and rows of Matrix.

##### 3.1 Vector Dot product
---

In [22]:
def naive_vector_dot(x,y):
    assert len(x.shape) == 1
    assert len(y.shape) == 1 ## x and y are Numpy vectors.
    assert x.shape[0] == y.shape[0]
    Z = 0
    for i in range(x.shape[0]):
        Z += x[i] * y[i] ##  multiply and then add them is dot, if we only multiply then then product.
        
    return Z       

In [23]:
Vector_A = (np.array((2,3)))
Vector_B = (np.array((5,6)))
print("Vector_A:",Vector_A,"Vector_B: ",Vector_B)

Vector_A: [2 3] Vector_B:  [5 6]


In [24]:
res1 = naive_vector_dot(Vector_A,Vector_B)
res1

28

In [25]:
Vector_A @ Vector_B ## numpy dot product symbol is "@"

28

#### 3.2 matrix-vector dot product
___

In [26]:
def naive_matrix_vector_dot(x,y):
    assert len(x.shape) == 2
    assert len(y.shape) == 1
    assert x.shape[1] == y.shape[0] ## The columns of matrix must be equal to size of vector.
    
    z = np.zeros(x.shape[0])
    for i in range(x.shape[0]):  ## 
        for j in range(x.shape[1]):
            z[i]+= x[i,j] * y[j]
            
    return z

In [27]:
matrix_A = np.matrix([[2,3],[4,6],[5,11]])
vector_B = np.array([3,6])

In [28]:
print("matrix_A:",matrix_A,"vector_B: ",vector_B)

matrix_A: [[ 2  3]
 [ 4  6]
 [ 5 11]] vector_B:  [3 6]


In [29]:
res2 = naive_matrix_vector_dot(matrix_A,vector_B)
res2

array([24., 48., 81.])

In [30]:
matrix_A @ vector_B

matrix([[24, 48, 81]])

In [110]:
matrix_B @ vector_B

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 3)

**NOTE: As soon as one of the two tesnor has an ndim greater than 1, dot is no longer symmetric, which is to say that dot(x,y) isn't the same as dot(x,y)** 

#### 3.3 Matrix-Matrix prodcut
---
**Important point for matrix dot product**  

Dot product generalizes to tensors with an arbitary numbers of axes. The most common applcations may be the dot product between two matrices. The dot product between two matrices is only possible if `x.shape[1] == y.shape[0]` the result is a matrix with shape `(x.shape[0], y.shape[1])`

In [55]:
def naive_matrix_dot(x,y):
    assert len(x.shape) == 2
    assert len (y.shape) == 2
    assert x.shape[1] == y.shape[0] ## The first dimension if x must be the same as oth dimension of y
    
    z = np.zeros((x.shape[0],y.shape[1]))
    
    for i in range(x.shape[0]):
        for j in range(y.shape[1]):
            row_x = np.array(x[i,:])
            print(row_x.)
            column_y = np.array(y[:,j])
            z[i,j] = naive_vector_dot(row_x,column_y)
            
    return z

In [65]:
matrix_A = np.matrix(([1,2],[4,5],[9,2]))
matrix_B = np.matrix(([1,2,5],[4,5,9]))

In [66]:
matrix_A

matrix([[1, 2],
        [4, 5],
        [9, 2]])

In [67]:
matrix_B

matrix([[1, 2, 5],
        [4, 5, 9]])

In [69]:
matrix_A @ matrix_B ## gives 3 rows and 3 column

matrix([[ 9, 12, 23],
        [24, 33, 65],
        [17, 28, 63]])

<img src="1.jpg" width="300" height="500">  

x,y,z are pictured as rectange.Because the rows and X and the columns of y must have the same size, it follows that the width of x must match the height of y.

#### 3.4 Tensor reshaping
---
Ii is not used in dense layer in neural network,it's main application is in preprocessing the data before feeding in network.  

Reshaping a tensor means rearranging its rows and columns to match a traget shape.Naturally the reshaped tensor has the same total number of coefficients as the initial tensor.

In [93]:
x = np.array([[0,1],
             [1,2],
             [3,5]])
x.shape

(3, 2)

In [97]:
x = x.reshape((6,1))
x

array([[0],
       [1],
       [1],
       [2],
       [3],
       [5]])

In [98]:
x = x.reshape((2,3))
x

array([[0, 1, 1],
       [2, 3, 5]])

**Transposing**  
A special case of reshpaing that's commanly encountered is transposition.Transposing a matrix means exchanging its rows and its columns, so that `x[i,:]` changes to `x[:,i]`.

In [109]:
x = np.zeros((300,20))
x = np.transpose(x)
x.shape

(20, 300)

#### 3.5 Geometric interpretation of tensor operations
---  

Because the contents of the tensors manipulated by tensosr operations can be interpreted as cordinated of points in some geometric space, all tensor opearations havea geometric interpretation.  

In general,elementary geometric operations such as affine trnasformation, rotations,scaling, and so on can be expressed as tensor operations.


#### 3.6 Geometric interpretation of deep learning
---

Neural networks consist entirely of chains of tensor operations and that all of these operation are just geometric trandformation of the input data. It follows that you can interpret a neural network as a very complex geometric trnasformations in a high-dimensional space, implementes via a long series of simple steps.  

In 3D, Imagine two sheets of colured paper: one red and one blue.Put one on top of other. Now crumble them togther in a small ball. That crumbled paper ball is your input data, and each sheet of a paper is a class data like classification problem.What a neural network or any other machine-learning model will do is to figure out a transformation of the paper ball that would uncrumble, so as to make the two classes a clearly seprable again.With deep learning this would be implemnted as a series of simple transformation of the 3D space. Such as those which we can apply to the crumbled ball with our fingers, one movement at a time.  

Uncrumbling paper balls is what machine learning is about: Finding neat represnetation for complex, highly folded data manifolds.Deep learning works so well because it take approach of incrementally decomposing a complicated geoemetric tranformation into a long chain of elementry ones, which os pretty much the strategy a human would follow to uncrumble a paper ball. Each layer in a deep network applies a transformation that disentangles the data a little-- and deep stack of layers makes tractable and extremly complicated disentanglement process.