# Try to implement derivatives with respect to the weights


Everything would be easier if I multiplied FFANN.derivatives backwards. That is,
in order to find $\dfrac{\partial s^{(N-1)}_{r}}{\partial s^{(0)}_{p}}$ start from
$\dfrac{\partial s^{(1)}_{r}}{\partial s^{(0)}_{p}}$ and accumulate  
products $\dfrac{\partial s^{(2)}_{r}}{\partial s^{(1)}_{q}}\dfrac{\partial s^{(1)}_{q}}{\partial s^{(0)}_{p}}$, $\dfrac{\partial s^{(3)}_{r}}{\partial s^{(2)}_{q}}\dfrac{\partial s^{(2)}_{q}}{\partial s^{(1)}_{k}}\dfrac{\partial s^{(1)}_{k}}{\partial s^{(0)}_{p}}$ (with summation over repeated indices). This will help with the back propagation algorithm which is based on the observation that 
$$
\dfrac{\partial s^{(N-1)}_{r}}{\partial w^{(l)}_{ji}} = 
\dfrac{\partial s^{(N-1)}_{r}}{\partial s^{(l+1)}_{j}} \theta^{\prime\, (l+1)}_{j}s^{(l)}_{i} \; \text{ (no sum over j)}
$$

In [1]:
import FeedForwardANN as FFANN
import numpy as np

In [2]:
class FFv2(FFANN.FFANN):
    '''
    Very inefficient numerical derivative of s^{self.total_layers-1}_r wrt w^{l}_{ji}
    just for testing purposes!
    '''
    
    def der_w(self,r,l,j,i,h=1e-3):
        w=self.weights[l][j][i]
        h1=h +abs(w)*h
        
        self.weights[l][j][i]+=h1
        self.evaluate()
        f1=self.signals[self.total_layers-1][r]
        
        self.weights[l][j][i]+=-2*h1
        self.evaluate()
        f0=self.signals[self.total_layers-1][r]
        
        self.weights[l][j][i]+=h1
        return (f1-f0)/(2.*h1)

In [3]:
lin=FFANN.linearActivation()
sig=FFANN.sigmoidActivation()


In [4]:
brain = FFv2(2,1,[2],[sig,sig])
brain.init_params()


In [5]:
brain([33.33,2])

([0.6170447699094339], [[0, 0]])

In [6]:
brain.signals[brain.total_layers-1]

[0.6170447699094339]

In [7]:
brain.der_w(0,1,0,1)

0.2362997327805013

In [8]:
l=1
j=0
i=1
x=sum([brain.weights[l][j][i] * brain.signals[l][i] for i in range(brain.nodes[l])]) + brain.biases[l][j]
brain.signals[l][i]*brain.activations[l].derivative(x)

0.2362997721411304