all transformations learned by deep neural networks can be reduced to a handful of 'tensor operations' applied to tensors of numeric data, much as any computer program can be ultimately reduced to a small set of binary operations on binary inputs (AND, OR, NOR, and so on)

In the initial example. the network was built by stacking 'Dense' layers on top of each other. A Keras layer instance looks like this:

In [1]:
keras.layers.Dense(512, activation = 'relu')

NameError: name 'keras' is not defined

This layer ca be interpreted as a function, which takes as input a 2D tensor and returns another 2D tensor - a new representation for the input tensor. Specifically, the funciton is as follows (where 'W' is a 2D tensor and 'b' is a vector, both attributes of the layer):

In [None]:
output = relu(dot(W, input) + b)

Unpacking the three tensor operations here:

- dot product (dot) between the input tensor and a tensor named 'W'

- an addition (+) between the resulting 2D tensor and a vector (b)

- a 'relu' operation .relu(x) is max(x,0)

##### Element-wise operations

The 'relu' operation and addition are 'element-wise' operations: operations that are applied independently to each entry in the tensors being considered. This means these operations are highly amenable to massively parallel implementations (vectorised implementations, a term that comes from the vector processor supercomputer architecture from the 1970-1990 period). To write a navie python implementation of an element-wise opearation, we can use a 'for loop', as in this naive implementaion of an element wise 'relu' operation:

In [None]:
def naive_relu(x):
    assert len(x.shape) == 2    # x is a 2D numpy tensor
    
    x = x.copy()     # avoid overwriting the input tensor
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] = max(x[i, j], 0)
    return x

the same for addition:

In [None]:
def naive_add(x, y):
    assert len(x.shape) == 2     # x and y are 2D Numpy tensors
    assert x.shape == y.shape
    
    x = x.copy()     # avoid overwriting the input tensor
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i,j] += y[i,j]
    return x

On the same principle, the element-wise multiplication, subtraction and so on

In practice, when dealing with Numpy arrays, these operations are available as well-optimised built-in Numpy functions, which themselves delegate the heavy lifting to a Basic Linear Algebra Subprograms (BLAS) implementation if you have one installed (which you should). BLAS are low-level, highly parallel, efficient tensor manipulation routines that are typically implemented in Fortran or C

So in Numpy, the following element-wise operation will be blazing fast:

In [None]:
import numpy as np

z = x + y

z = np.maximum(z, 0.)

##### Broadcasting

'naive_add' only supports the addtion with identical shapes

Brocasting process addtion when the shapes of the two tensors being added differ

Assuming no ambiguity, the smaller tensor will be broadcasted to match the shape of the larger tensor. Boardcasting consists of two steps:

 - axes (called 'broadcast axes') are added to the smaller tensor to match the 'ndim' of the larger tensor

- the smaller tensor is repeated alongside these new axes to match the full shape of the larger tensor

Example of navie implementation:

In [None]:
def naive_add_matrix_and_vector(x, y):
    assert len(x.shape) == 2
    assert len(y.shape) == 1
    assert x.shape[1] == y.shape[0]
    
    x = x.copy()
    for i in range(x.shape[0]):
        for j in  range(x.shape[1]):
            x[i, j] += y[j]
    return x 

Example of 'element-wise maximum operation' via broadcasting:

In [None]:
import numpy as np

x = np.random.random((64, 3, 32, 10))
y = np.random.random((32, 10))

z = np.maximum(x, y)

##### Tensor dot

- 'dot operation' ('tensor product') 

- Contrary to element-wise operations, it combines entries in the input tensors

'dot' operator in Numpy and Keras:

In [None]:
import numpy as np
z = np.dot(x, y)

'dot' (.) in mathematical notation:

In [None]:
z = x . y

Example of 'dot' operation of two vectors:

In [4]:
def naive_vector_dot(x, y):
    assert len(x.shape) == 1
    assert len(y.shape) == 1
    assert x.shape[0] == y.shape[0]
    z = 0 
    for i in range(x.shape[0]):
        z += x[i] * y[i]
        return z

the 'dot' product between two vectors is a scalar and that only vectors with the same number of elements are compatible for a 'dot' product

the 'dot' product between a matrix [x] and a vector [y] returns a vector where the coefficients are the 'dot' products between the [y] and the rows of [x]:

In [None]:
import numpy as np

def naive_matrix_vector_dot(x, y):
    assert len(x.shape) == 2
    assert len(y.shape) ==1
    assert x.shape[1] == y.shape[0]
    
    z = np.zeros(x.shape[0])
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            z[i] += x[i, j] * y[j]
    return z

alternatively:

In [None]:
def naive_matrix_vector_dot(x, y):
    z = np.zeros(x.shape[0])
    for i in range(x.shape[0]):
        z[i] = naive_vector_dot(x[i, :], y)
    return z

as soon as one of the two tensors has an 'ndim' greater than 1, 'dot' is no longer symmetric, which is to say that 'dot(x,y)' isn't the same as 'dot(y,x)'

 a 'dot' product generalises to tensors with an arbitrary number of axes. 

the 'dot' product of two matrices [x] and [y] (dot(x,y)) if and only if 'x.shape[1] == y.shape[0]', the result is a matrix with shape (x.shape[0], y.shape[1]), where the coefficients are the vector products between the rows of [x] and the columns of [y]

In [None]:
def naive_matrix_dot(x, y):
    assert len(x.shape) == 2
    assert len(y.shape) == 2
    assert x.shape[1] == y.shape[0]
    
    z = np.zeros((x.shape[0], y.shape[1]))
    for i in range(x.shape[0]):
        for j in range(y.shape[1]):
            row_x = x[i, :]
            column_y = y[:, j]
            z[i, j] = naive_vector_dot(row_x, column_y)
    return z

![Matrix dot-product box diagram](./matrix_dot-product_box_diagram.png)

more generally, the dot product between higher-dimensional tensors, following the same rules for shape compatibility as outlined earlier for the 2D case:

In [None]:
(a, b, c, d) . (d,) -> (a, b , c)
(a, b, c, d) . (d, e) -> (a, b, c, d, e)

##### Tensor Reshaping

tensor reshape is used to preprocessed the digits data before feeding it into the network, i.e.:

In [None]:
train_images = train_images.reshape((60000, 28 * 28))

reshaping a tensor means rearranging its rows and columns to match a target shape, naturally, the reshaped tensor has the same total number of coefficients as the initial tensor. i.e.:

In [2]:
x = np.array([[0., 1.],
            [2., 3.],
            [4., 5.]])
print(x.shape)

(3, 2)


In [3]:
x = x.reshape((6, 1))
x

array([[0.],
       [1.],
       [2.],
       [3.],
       [4.],
       [5.]])

In [4]:
x = x.reshape((2, 3))
x

array([[0., 1., 2.],
       [3., 4., 5.]])

a special case of reshaping is 'transposition', transposing matrix means exchanging its rows and its columns, so that x[i,:] becomes x[:,i]

In [5]:
x = np.zeros((300, 20))
x = np.transpose(x)
print(x.shape)

(20, 300)


##### Geometric Interpretation of Tensor Operation

the contents of the tensors manipulated by tensor operations can be interpreted as cordinates of points in some geometric space, all tensor operations have a geometirc interpretation

in general, elementary geometric operations such as affine transformations, rotations, and scaling, and so on can be expressed as tensor operations

for instance, a rotation of a 2D vector by an angle theta can be achieved via a dot product with a 2x2 matrix R = [u, v], where u and v are both vectors of the plane: u = [cos(theta), sin(theta)] and v = [-sin(theta), cos(theta)]

##### A Geometric Interpretation of Deep Learning

neural networks consist entirely of chains of tensor operations and that all of these tensor operations are just geometric transformations of the input data. 

it follows that you can interpret a neural network as a very complex geometric transformation in a high-dimensional space, implemented via a long series of simple steps

In 3D, the following mental image may prove useful. Imagine two sheets of colored paper: one red and one blue. Put one on top of the other. Now crumple them together into a small ball. That crumpled paper ball is your input data, and each sheet of paper is a class of data in a classification problem. What a neural network (or any other machine-learning model) is meant to do is figure out a transformation of the paper ball that would uncrumple it, so as to make the two classes cleanly separable again. With deep learning, this would be implemented as a series of simple transformations of the 3D space, such as those you could apply on the paper ball with your fingers, one movement at a time.

![Uncrumpling A Complicated Manifold of Data](./uncrumpling_a_complicated_manifold_of_data.png)

Uncrumpling paper balls is what machine learning is about: finding neat representations for complex, highly folded data manifolds

Deep Learning takes the approach of incrementally decomposing a complicated geometric transformation into a long chain of elementary ones, which is pretty much the strategy a human would follow to uncrumple a paper ball. Each layer in a deep network applies a transformation that disentagles the data a little, and a deep stack of layers makes tractable an extremly complicated disentanglement process