# The Tensor
#### Previously, we made a distinction between vectors (one-dimensional arrays) and matrices (two-dimensional arrays). When we start working with more complicated neural networks, we'll need to use a higher-dimensional arrays as well.

#### In many neural network libraries, *n*-dimensional arrays are referred to as *tensors*, which is what we'll call them too.
#### If I were writing an entire book about deep learning, I'd implement a full-featured `Tensor` class that overloaded Python's arithmetic operators and could handle a variety of other operations. Such an implementation would take a notebook on its own. Here we'll cheat and say that a `Tensor` is just a `list`. This is true in one direction - all our vectors and matrices and higher-dimensional analogues *are* lists. It is certainly not true in the other direction - most Python *lists* are not *n*-dimensional arrays in our sense

#### First, let's write a helper function to find a tensor's *shape*

In [27]:
from typing import List

Tensor = list
def shape(tensor: Tensor) -> List[int]:
    sizes: List[int] = []
    while isinstance(tensor, list):
        sizes.append(len(tensor))
        tensor = tensor[0]
    return sizes

print(shape([1, 2, 3]))
print(shape([[1,2], [3,4], [5,6]]))

[3]
[3, 2]


#### Because tensors can have any number of dimensions, we'll typically need to work with them recursively. We'll do one thing in the one-dimensional case and recurse in the higher-dimensional case:

In [28]:
def is_1d(tensor: Tensor) -> bool:
    """
    If tensor[0] is a list, it's a higher-order tensor. Otherwise, tensor is 1-dimensional (that is, a vector)
    """
    return not isinstance(tensor[0], list)

print(is_1d([1, 2, 3]))

True


#### Which we can use to write a recursive `tensor_sum` function:

In [29]:
def tensor_sum(tensor: Tensor) -> float:
    """ Sums up all the values in the tensor"""
    if is_1d(tensor):
        return sum(tensor)
    else:
        return sum(tensor_sum(tensor_i) for tensor_i in tensor)

print(tensor_sum([1, 2, 3]))

6


#### We'll create a couple of helper functions so that we don't have to rewrite this logic everywhere. The first applies a function elementwise to a single tensor

In [30]:
from typing import Callable

def tensor_apply(f: Callable[[float], float], tensor: Tensor) -> Tensor:
    """ Applies f elementwise""" 
    if is_1d(tensor):
        return[f(x) for x in tensor]
    else:
        return [tensor_apply(f, tensor_i) for tensor_i in tensor]

print(tensor_apply(lambda x: x + 1, [1, 2, 3])) # So in this example, we are adding 1 to every instance of x - each tensor

[2, 3, 4]


#### We can use this to write a function that creates a zero tensor with the same shape as a given tensor:

In [31]:
def zeros_like(tensor: Tensor) -> Tensor:
    return tensor_apply(lambda _: 0.0, tensor)

print(zeros_like([1, 2, 3]))

[0.0, 0.0, 0.0]


#### We'll also need to apply a function to corresponding elements from two tensors (which had better be the exact same shape, although we won't check that)

In [32]:
def tensor_combine(f: Callable[[float, float], float],
                    t1: Tensor,
                    t2: Tensor) -> Tensor:
    """ Applies f to corresponding elements of t1 and t2"""
    if is_1d(t1):
        return [f(x, y) for x,y in zip(t1, t2)]
    else:
        return [tensor_combine(f, t1_i, t2_i) for t1_i, t2_i in zip(t1, t2)]

import operator
print(tensor_combine(operator.add, [1, 2, 3], [4, 5, 6]))
print(tensor_combine(operator.mul, [1, 2, 3], [4, 5, 6]))

[5, 7, 9]
[4, 10, 18]


# The Layer Abstraction 
#### In our previous notebook we built a simple neural net that allowed us to stack two layers of neurons, each of which computed `sigmoid(dot(weights, inputs))`.

#### Although that's perhaps an idealized representation of what an actual neuron does, in practice we'd like to allow a wider variety of things. Perhaps we'd like the neurons to remember something about their previous inputs. Perhaps we'd like to use a different activation function than `sigmoid`. And frequently we'd like to use more than two layers. (Our `feed_forward` function actually handled any number of layers, but our gradient computations did not.)

#### In this notebook we'll build machinery for implementing such a variety of neural networks. Our fundamental abstraction will be the `Layer`, something that knows how to apply some function to its inputs that knows how to backpropagate gradients.

#### One way of thinking about the neural networks we built in `fizzbuzz.ipynb` is as a "linear" layer, followed by a "sigmoid" layer, then another linear layer and another sigmoid layer. We didn't distinguish them in these terms, but doing so will allow us to experiment with much more general structures:

In [33]:
from typing import Iterable, Tuple

class Layer:
    """
    Our neural networks will be composed of Layers, each of which 
    knows how to do some computation on its inputs in the "forward" 
    direction and propagate gradients in the "backward" direction
    """
    def forward(self, input):
        """
        Not the lack of typyes. We're not going to be prescriptive
        about what kinds of inputs layer can take and what kinds of 
        outputs they can return.
        """
        raise NotImplementedError

    def backward(self, gradient):
        """
        Similarly, we're not going to be prescriptive about what the
        gradient looks like. It's up to you the user to make sure 
        that you're doing things sensibly.
        """
        raise NotImplementedError
    
    def params(self) -> Iterable[Tensor]:
        """ 
        Returns the parameters of this layer. The default implementation
        return nothing, so that if you have a layer with no parameters
        you don't have to implement this.
        """
        return ()
    
    def grads(self) -> Iterable[Tensor]:
        """
        Returns the gradients, in the same order as params()
        """
        return()

#### The `forward` and `backward` methods will have to be implemented in our concrete subclasses. Once we build a neural net, we'll want to train it using gradient descent, which means we'll want to update each parameter in the network using its gradient. Accordingly, we'll insist that each layer be able to tell us its parameters and gradients

#### Some layers (for example, a layer that applies `sigmoid` to each of its inputs) have no parameters to update, so we provide a default implementation that handles that case.

In [34]:
from ml.neural_networks import sigmoid

class Sigmoid(Layer):
    def forward(self, input: Tensor) -> Tensor:
        """
        Apply sigmoid to each element of the input tensor,
        and save the results to use in backpropagation
        """
        self.sigmoids = tensor_apply(sigmoid, input)
        return self.sigmoids
        
    def backward(self, gradient: Tensor) -> Tensor:
        return tensor_combine(lambda sig, grad: sig * (1 - sig) * grad,
                              self.sigmoids,
                              gradient)

ModuleNotFoundError: No module named 'linear_algebra'