# Tensorflow Implementation Details<hr>

## Batch Normalization
- Tensorflow has 2 functions:<br>
tf.nn.batch_normalization(low level, built-in)<br>
tf.conrib.layers.batch_norm
- Generally, contrib is higher level, so less code but also less flexible
- Built-in: need to create running mean/variance manually, update them manually
- Contrib version creates all variables internally, but also harder to access them
- We'll use contrib, but this will make things complicated for us in another way

## Fractionally-Strided Convolution
- It's strange. but the strangeness is logically consistent
- In TF, convolution filter is specified like:<br>
(filter height, filter width, # feature maps in, # feature maps out)
- Strides are specified like:<br>
(1, vertical stride, horizontal stride, 1)


- We specify the shapes and strides as if the "input to the function" were actually the "output of a regular convolution"
- If you think of fractionally-strided convolution as a "forward" operation, then you might use:<br>
(filter height, filter width, # feature maps in, # feature maps out)
- But a forward convolution that would have generated the function input as its output would actually have had the filter shape:<br>
(filter height, filter width, # feature maps out, # feature maps in)


- Same theme applies to strides too
- You'd think because it's "fractionally strided", we could use stride = 0.5 -> Error!
- If we "pretend" out output generates the input via a convolution, then THAT stride would be 2
- For a regular forward convolution, output would be 1/2 size of input
- Since this is a "backward" convolution, output is twice the size

## conv2d_transpose arguments
- For regular conv, we use padding = "SAME" -> yields convenient shape calculations
- Conv transpose doesn't have this, instead we have an argument for output_shape
- "output_shape" refers to actual output of conv transpose function, not the "output as if we were doing forward conv"
- If you specify output_shape incorrectly -> error!
- Makes you wonder why it is needed at all

## output_shape example
- Input to function:(N, 8, 8, 64)
- Stride = 2
- Filter size: (5, 5, 32, 64)
- Vocabulary:<br>
input = input to conv2d_transpose<br>
output = output from conv2d_transpose
- Is filter the right shape? Last dim(64) should match # feature maps of "input"<br>
Since "our input" == "output of a forward conv"


- Input to function: (N, 8, 8, 64)
- Stride = 2
- Filter size: (5, 5, 32, 64)
![output_shape_ex](../images/output_shape_ex.PNG)

## Layer Design
- I like to use classes for each layer
- One function to create params, one function to use them


```python
class MyLayer:
    def __init__(m_in, m_out):
        W = random(m_in, m_out)
        b = random(m_out)
    
    def forward(X):
        return f(XW + b)

```

- Layers in Tensorflow's contrib module do both of these at the same time
- On one hand, it's easier because you can do both in one line:

```python
output = contrib_layer(input) # create weights automatically
```

- It's ok for params to be internal, b/c in TF we don't need direct access optimizer will update them automatically


Actually, that's not true in our case ! We need 2 optimizers: 1for G , 1 for D We update only one set of params at a time<br>
So we need another special function to collect the params to tell TF what to optimize

## What's the problem?
- Remember how a GAN works
- We have 2 batches of data to pass to discriminator (hence, need to call forward() twice)
- But if we call our batch_norm(input) twice, it's going to create its internal params all over again !
![Discriminator.PNG](../images/Discriminator.PNG)

## What's the solution
The solution is that the batch norm layer accepts an argument called **reuse** you pass in true if you want to reduce the weights of this layer

```python
batch_norm(
    inputs,
    reuse = None,
)
```

- Unfortunately, this introduces other complexities too
- It's just a function call - how does TF find the previously created weights?
<br>(Remember, i've called this same function multiple times already)
- We need scopes (reminds me of namespaces in C++), and a name for each layer
- Makes the code messier, but benefits outweight costs

## Dont' forget! Batch norm train vs test

```python
batch_norm(
    inputs,
    reuse = None,
    is_training = True,
)

```

- Now we need yet another flag to the layer, is_training
- Instead of just a clean layer.forward(X), we now have layer.forward(X, reuse, is_training)

## TF Implementation Summary
- Non-trivial obstacles in getting a DCGAN working in TF
- Convolutional arithmetic, FS-conv treats input as output and output as input in a forward conv
- Batch norm, reuse, scope, name so TF can track down your variables