# Summary
----------
Imagine we have a total of 3 layers in the Autoencoder, \n",

* The input layer has 3 units (x1, x2, x3)\n",
* The Hidden layer has 2 units (h1, h2)\n",
* The output are (y1, y2, y3), \n",

* For an autoencoder the input units are the output units (out(x1), out(x2), out(x3))
1) Weights (w1, w2, w3) are the weights from x1, x2, x3 to h1
2) Weights (w4, w5, w6) are weights from x1, x2, x3 to h2
3) Weights (w7, w8) are the weights from h1, h2 to out(x1)
4) Weights (w9, w10) are the weights from h1, h2 to out(x2)
5) Weights (w11, w12) are the weights from h1, h2 to out(x3)
Here we will just see the forward propogation and backward propogation for one lineage.

## Forward Propagation:
---------------

We initially initialize the weights w1....w12 with random values and then during backpropagation obtain the gradients (change in direction of weights (-ve or +ve weight) that minimizes the cost function)

 $ z(h_1) = \sum_{i=1}^3 w_ix_i +b $

 $ z(h_1) = w_1x_1 + w_2x_2 + w_3x_3 + b $

 $ out(h_1) = \frac{1}{1+\exp^{z(h_1)}} = \frac{1}{1+\exp^{w_1x_1 + w_2x_2 + w_3x_3 + b}} $  --> Just Bounds $z(h_1)$ by probability

 $ z(y_1) = w_7out(h_1) + w_8out(h_2) + b2 $

 $ out(y_1) = \frac{1}{1+\exp^{z(y_1)}} = \frac{1}{1+\exp^{w_7h_1 + w_8h_1 + b2}} $  --> Just Bounds $z(y_1)$ by probability

 $ Squared  error: E^{tot} = \frac{1}{2} \sum_{i=1}^3 (x_i - out(y_i))^2 $


## Backward Propagation:
-----------------
 
The concept of backward propogation is that how much the output (error) changes while we change the weights a little bit. When we involve change we think of partial derivatives. So now lets work backward.

### delta 2 or gradient 2 : (change in weights (w7) from hidden to output units (y1)) : Here we basically learn the hidden to output unit weights

** $ \frac{d}{d(w_7)} E_{tot} = \frac{d}{d(out(y_1))} E_{tot} * \frac{d}{d(z(y_1))} out(y_1) * \frac{d}{d(w_7)} z(y_1)$  **

 * $ \frac{d}{d(out(y_1))} E_{tot} = \frac{d}{d(out(y_1))} \frac{1}{2} \sum_{i=1}^3 (x_i - out(y_i))^2 $ 
 
     $= \frac{d}{d(out(y_1))} ((x_1 - out(y_1)^2 + (x_2 - y_2)^2 + (x_3 - y_3)^2) $
     
     $ = -(x_1 - y_1)$

	Simarly the derivative goes for w8....w12

 * $ \frac{d}{d(z(y_1))} out(y_1) = \frac{d}{d(z(y_1))} (\frac{1}{1+\exp^{z(y_1)}}) $
 
     $ =  \frac{d}{d(z(y_1))} (1+\exp^{z(y_1)} $
     
     $ =  -1*(1+\exp^{z(y_1)})^{-2} \frac{d}{d(z(y_1))} (1+\exp^z{z(y_1}))$  
     
     $ = -1(1+\exp^{z(y_1)})^{-2} * z(y_1) $
     
     $ = \frac{-\exp^{z(y_1)}}{(1-\exp^{z(y_1)})} * (\frac{1}{1-\exp^{z(y_1}}) $
     
     $ = (1 - \frac{1}{1-\exp^{z(y_1)}}) (\frac{1}{1-\exp^{z(y_1)}}) $ 
     
     $ = (1-out(y_1)) * out(y_1) $

 * $ \frac{d}{d(w_7)} z(y_1) =  \frac{d}{d(w_7)} w_7h_1 + w_8h_1 + b2 $
 
     $  = out(h_1) $

** The new weight W7 becomes : $ w_7 = w_7 - \alpha \frac{d}{d(w_7)} E_{tot} $ , here alpha is the learning rate and -ve sign indicates that we are doing gradient descent **

### delta 1 or gradient 1 : (change in weights (w1) from hidden to output units (y1)) : Here we basically learn the input to hidden weights

** $ \frac{d}{d(w_1)} E_{tot} = \frac{d}{d(out(h_1))} E_{tot} * \frac{d}{d(z(h_1))} out(h_1) * \frac{d}{d(w_1)} z(h_1)$  **

** Here $\frac{d}{d(out(y_1))} E_{tot}$ is critical because a small change in weight w1 will cause change in h1 and the change in h2 will cause change in both y1 and y2 which in turn will change the Error, Hence we write the above equation as **

 * $\frac{d}{d(out(y_1))} E_{tot} = \frac{d}{d(out(h_1))} (E(y_1)))  + \frac{d}{d(out(h_1))} (E(y_2))$
     
     $ = [\frac{d}{d(out(y_1)} E(y_1) * \frac{d}{d(z(y_1)} out(y_1) * \frac{d}{d(z(y_2)} out(h_1)] + [\frac{d}{d(out(y_2)} E(y_2) * \frac{d}{d(z(y_2)} out(y_2) * \frac{d}{d(z(y_2)} out(h_1)]$
     
     $ = $