Skip to content

Commit

Permalink
changed O for output to yhat
Browse files Browse the repository at this point in the history
  • Loading branch information
bfortuner committed Feb 28, 2018
1 parent 2430ae1 commit 6f46050
Showing 1 changed file with 9 additions and 9 deletions.
18 changes: 9 additions & 9 deletions docs/backpropagation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Using the chain rule we can easily find the derivative of Cost with respect to w
.. math::
C'(W) &= C'(R) \cdot R'(Z) \cdot Z'(W) \\
&= (\hat y -y) \cdot R'(Z) \cdot X
&= (\hat{y} -y) \cdot R'(Z) \cdot X
Now that we have an equation to calculate the derivative of cost with respect to any weight, let's go back to our toy neural network example above

Expand All @@ -64,15 +64,15 @@ What is the derivative of cost with respect to :math:`W_o`?

.. math::
C'(W_O) &= C'(O) \cdot O'(Z_O) \cdot Z_O'(W_O) \\
C'(W_O) &= C'(\hat{y}) \cdot \hat{y}'(Z_O) \cdot Z_O'(W_O) \\
&= (O - y) \cdot R'(Z_O) \cdot H
And how about with respect to :math:`W_h`? To find out we just keep going further back in our function applying the chain rule recursively until we get to the function that has the Wh term.

.. math::
C'(W_h) &= C'(O) \cdot O'(Z_o) \cdot Z_o'(H) \cdot H'(Z_h) \cdot Z_h'(W_h) \\
&= (O - y) \cdot R'(Z_o) \cdot W_o \cdot R'(Z_h) \cdot X
C'(W_h) &= C'(\hat{y}) \cdot O'(Z_o) \cdot Z_o'(H) \cdot H'(Z_h) \cdot Z_h'(W_h) \\
&= (\hat{y} - y) \cdot R'(Z_o) \cdot W_o \cdot R'(Z_h) \cdot X
And just for fun, what if our network had 10 hidden layers. What is the derivative of cost for the first weight :math:`w_1`?

Expand Down Expand Up @@ -110,21 +110,21 @@ To calculate output layer error we need to find the derivative of cost with resp

.. math::
C'(Z_o) = (O - y) \cdot R'(Z_o)
C'(Z_o) = (\hat{y} - y) \cdot R'(Z_o)
To simplify notation, ml practitioners typically replace the :math:`(O-y) * R'(Zo)` sequence with the term :math:`E_o`. So our formula for output layer error equals:
To simplify notation, ml practitioners typically replace the :math:`(\hat{y}-y) * R'(Zo)` sequence with the term :math:`E_o`. So our formula for output layer error equals:

.. math::
E_o = (O - y) \cdot R'(Z_o)
E_o = (\hat{y} - y) \cdot R'(Z_o)
.. rubric:: Hidden layer error

To calculate hidden layer error we need to find the derivative of cost with respect to the hidden layer input, Zh. 

.. math::
C'(Z_h) = (O - y) \cdot R'(Z_o) \cdot W_o \cdot R'(Z_h)
C'(Z_h) = (\hat{y} - y) \cdot R'(Z_o) \cdot W_o \cdot R'(Z_h)
Next we can swap in the :math:`E_o` term above to avoid duplication and create a new simplified equation for Hidden layer error:

Expand All @@ -140,7 +140,7 @@ Let’s return to our formula for the derivative of cost with respect to the out

.. math::
C'(W_O) = (O - y) \cdot R'(Z_O) \cdot H
C'(W_O) = (\hat{y} - y) \cdot R'(Z_O) \cdot H
We know we can replace the first part with our equation for output layer error :math:`E_h`. H represents the hidden layer activation.

Expand Down

0 comments on commit 6f46050

Please sign in to comment.