In [1]:
# Add Lincoln to system path
import sys
sys.path.append("/Users/seth/development/lincoln/")

In [15]:
from torch import Tensor
import torch

from typing import Tuple

In [3]:
from lincoln.operations import Operation, ParamOperation

In [11]:
a = torch.Size((28, 28, 3))

In [12]:
fil1 = Tensor([[1,1,1], [2,2,2]])
fil1.shape

torch.Size([2, 3])

In [16]:
class Reshape(Operation):
    def __init__(self, shape: Tuple):
        super().__init__()
        self.shape = shape


    def _output(self) -> Tensor:
        return self.input.view(self.shape)


    def _input_grad(self, output_grad: Tensor) -> Tensor:
        return output_grad.view(self.input.shape)

In [34]:
torch.manual_seed(10218)
inp = Tensor(5, 1, 28, 28).uniform_(-1, 1)

In [37]:
r = Reshape((5*1*28*28,))
out = r.forward(inp)

In [38]:
torch.sum(out)

tensor(2.1648)

In [39]:
out_grad = torch.ones_like(out)

In [41]:
r.backward(out_grad).shape

torch.Size([5, 1, 28, 28])

### Concat

Most common instance of concat: 

Concat will be used within a layer. A layer will take in one input, output one output.

# LSTM Operations

## `LSTM Layer` 

Input: series of word embeddings, cell state.

Output: series of embeddings of the same size as the word, hidden state.

Cell state will be an attribute of the layer.

LSTM layer will have an additional function "reset state" that resets the hidden state to zero.

## `LSTM Node` 

Each LSTM layer will have a series of a special kind of operation, `LSTM Node`. These nodes will have a series of special operations: they will take in as input an embedding and a hidden state and pass out an embedding and an updated hidden state. 

Operations of LSTM Node:

Receives as input:

* X (batch size x embedding dim)
* H_prev (batch size x hidden dim)
* C_prev (batch size x hidden dim)

1. Z = Concat(X, H)
1. Z1, Z2, Z3, Z4 = Copy(Z)
1. F = WeightMultiply(Z1, W_f)
1. F = BiasAdd(F, B_f)
1. F_out = Sigmoid(F)
1. I = WeightMultiply(Z2, W_i)
1. I = BiasAdd(I, B_i)
1. I_out = Sigmoid(I)
1. C = WeightMultiply(Z3, W_c)
1. C = BiasAdd(C, B_c)
1. C_bar = Tanh(C, B_c)
1. C1 = Multiply(F_out, C_prev)
1. C2 = Multiply(I_out, C_bar)
1. C_new = Add(C1, C2)
1. O = WeightMutiply(Z4, W_o)
1. O = Add(O, B_o)
1. O_out = Sigmoid(O)
1. C_tan = Tanh(C_new)
1. H_new = Multiply(O_out, C_tan)
1. H_out = WeightMultiply(H_new, W_v)
1. X_out = BiasAdd(H_out, B_v)

Issues:

* Concat operations needs to produce four distinct outputs. 
    * Potential solution: `Copy` operation that sums gradients.

* Need a way to handle branching:
* After `Copy`, there will be four `forward` operations happening.
* Can't write `for operation in self.operations: operation.forward(X)`
    * Maybe I can
    
* What about weights?
* Previously: initialize weights for each operation via `self.param`.
* Now: initialize weights via, if the Operation is `LSTMNode`:
    * `LSTMNode.params = []` and append appropriate weights.
* What about the 

In [42]:
class Operation(object):

    def __init__(self):
        pass


    def forward(self, input: Tensor):

        self.input = input

        self.output = self._output()

        return self.output


    def backward(self, output_grad: Tensor) -> Tensor:

        assert_same_shape(self.output, output_grad)

        self._compute_grads(output_grad)

        assert_same_shape(self.input, self.input_grad)
        return self.input_grad


    def _compute_grads(self, output_grad: Tensor) -> Tensor:

        self.input_grad = self._input_grad(output_grad)

        assert_same_shape(self.input, self.input_grad)
        return self.input_grad

    def _output(self) -> Tensor:
        raise NotImplementedError()

    def _input_grad(self, output_grad: Tensor) -> Tensor:
        raise NotImplementedError()

```python
Z, Z = Concat(X, H_in) # Concat takes in two inputs produces one output, will have X_grad and H_in_grad

Z_1, Z_2, Z_3 = Copy(Z, 3)

F = WeightMultiply(Z_1, W_f) 
F = BiasAdd(F, B_f)
F_out = Sigmoid(F)

I = WeightMultiply(Z_2, W_i) 
I = BiasAdd(I, B_i)
I_out = Sigmoid(I)

C = WeightMultiply(Z_3, W_c)
C = BiasAdd(C, B_c)
C_bar = Tanh(C)



Operations are the smallest units that have inputs and outputs defined by having a input and output.

Going to need to build operations that:

* Can take in one input and produce multiple outputs
* If it produces multiple outputs, _gradients will need to accumulate_.