# Entrenamiento Backpropagation batch - step to step

In [1]:
import numpy as np

## Arquitectura de la red

<div>
<img src="../img/rn_online_2-2-2.png" align="center" width="400"/>
<div style="text-align: justify;"/>

In [2]:
wh = np.array([[0.15, 0.25],
               [0.20, 0.30]])
bh = np.array([0.35, 0.35])

w_out = np.array([[0.40, 0.50],
                  [0.45, 0.55]])
b_out = np.array([0.60, 0.60])

X = np.array([[0, 0],
              [1, 0],
              [0, 1],
              [1, 1]])
y = np.array([[0, 1],
              [1, 0],
              [1, 0],
              [0, 1]])

## Feedforward

In [3]:
z_h = np.dot(X, wh) + bh
print(z_h)

[[0.35 0.35]
 [0.5  0.6 ]
 [0.55 0.65]
 [0.7  0.9 ]]


In [4]:
def sigmod (z):
  return 1/(1+np.exp(-z))

In [5]:
a_h = sigmod(z_h)
print(a_h)

[[0.58661758 0.58661758]
 [0.62245933 0.64565631]
 [0.63413559 0.65701046]
 [0.66818777 0.7109495 ]]


In [6]:
z_out = np.dot(a_h, w_out) + b_out
print(z_out)

[[1.09862494 1.21594846]
 [1.13952907 1.26634063]
 [1.14930894 1.27842355]
 [1.18720239 1.32511611]]


In [7]:
a_out = sigmod(z_out)
print(a_out)

[[0.75000237 0.77134977]
 [0.75759317 0.78011568]
 [0.75938467 0.78218131]
 [0.76624034 0.79003164]]


# Backpropagation

## Ajuste de los pesos de la capa de salida

Consideremos un peso en particular de la capa de salida

$\Delta w_{h_j, o_k} \alpha - \frac{\partial E}{\partial w_{h_j,o_k}}$

Es necesario considerar que el error no es directamente una función de un error. Se expande como:

$\Delta w_{h_j,o_kj} = - \eta
\frac{\partial E}{\partial out_{o_k}} * 
\frac{\partial out_{o_k}}{\partial net_{o_k}} * 
\frac{\partial net_{o_k}}{\partial w_{h_j,h_k}}$

### Regla de cambio de pesos para un peso entre la capa oculta y de salida

$\Delta w_{h_j,o_k} = - \eta  (t_{o_k} - a_k^{(out)}) * a_k^{(out)} * ( 1 - a_k^{(out)}) * a_j^{(h)}$

1. Derivada del error con respecto a la activación: $- (t_{k} - a_k^{(out)})$

In [8]:
delta_out = y - a_out
print(delta_out)

[[-0.75000237  0.22865023]
 [ 0.24240683 -0.78011568]
 [ 0.24061533 -0.78218131]
 [-0.76624034  0.20996836]]


2. Derivada de la activación con respecto a la entrada: $a_k^{(out)} * ( 1 - a_k^{(out)})$

In [9]:
sigmod_derivative = a_out * (1 - a_out)
print(sigmod_derivative)

[[0.18749881 0.1763693 ]
 [0.18364576 0.17153521]
 [0.18271959 0.17037371]
 [0.17911608 0.16588165]]


3. Derivada de la entrada con respecto al peso: $a_j^{(h)}$

In [10]:
print(a_h)

[[0.58661758 0.58661758]
 [0.62245933 0.64565631]
 [0.63413559 0.65701046]
 [0.66818777 0.7109495 ]]


Si se considera que $\delta_k^{(out)} = (t_k - a_k^{(out)}) * a_k^{(out)} * (1- a_k^{(out)}) $￼, entonces la regla es muy similar a la del Perceptrón.

$\Delta w_{h_j, o_k} = - \eta \delta_{o_k} * a_j^{(h)}  $

In [11]:
delta = delta_out * sigmod_derivative
print(delta)

[[-0.14062456  0.04032688]
 [ 0.04451699 -0.1338173 ]
 [ 0.04396514 -0.13326313]
 [-0.13724597  0.0348299 ]]


# Ajuste de pesos de la capa oculta

$\Delta w_{ji} \, \alpha -
\big[ 
\sum_k 
\frac{\partial E_{total}}{\partial out_{o_k}} \frac{\partial out_{o_k}}{\partial net_{o_k}} \frac{\partial net_{o_k}}{\partial out_{h_j}}
\big] 
\frac{\partial out_{h_j}}{\partial net_{h_j}} 
\frac{\partial net_{h_j}}{\partial w_{h_j,i_i}}$

$\Delta w_{ji} = \eta 
\big[ 
\sum_k 
(t_{o_k} - a_k^{(out)}) * a_k^{(out)} * (1- a_k^{(out)}) w_{h_j, o_k}
\big] 
a_j^{(h)} (1 - a_j^{(h)})
a_i^{(in)}$

$\Delta w_{ji} = \eta 
\big[ 
\sum_k 
\delta_k^{(out)} w_{h_j, o_k}
\big] 
a_j^{(h)} (1 - a_j^{(h)})
a_i^{(in)}$
￼￼

In [12]:
delta_h = np.dot(delta ,w_out.T)
print(delta_h)

[[-0.03608638 -0.04110126]
 [-0.04910186 -0.05356687]
 [-0.04904551 -0.05351041]
 [-0.03748344 -0.04260424]]


In [13]:
sigmod_derivative_h = a_h * (1 - a_h)
print(sigmod_derivative_h)

[[0.2424974  0.2424974 ]
 [0.23500371 0.22878424]
 [0.23200764 0.22534771]
 [0.22171287 0.20550031]]


In [14]:
delta_h = delta_h * sigmod_derivative_h
print(delta_h)

[[-0.00875085 -0.00996695]
 [-0.01153912 -0.01225526]
 [-0.01137893 -0.01205845]
 [-0.00831056 -0.00875518]]


Delta para la capa oculta


## Regla de cambio de pesos entre la capa de entrada y oculta

Si consideramos que $\delta_{h_j} = \big[ \sum_k w_{o_k,h_j} \big ] * out_{h_j} * (1- out_{h_j}) $, entonces la regla es muy similar a la del Perceptrón.

$\Delta w_{ij} = - \eta \delta_{h_j} out_{i_i}$

In [15]:
grad_w_h = np.dot(delta_h.T, X)
grad_b_h = np.sum(delta_h, axis = 0)
print(grad_w_h)
print('\nGradiente de b h\n', grad_b_h)

[[-0.01984968 -0.01968949]
 [-0.02101044 -0.02081363]]

Gradiente de b h
 [-0.03997947 -0.04303584]


Delta para la capa de salida

### Regla de cambio de pesos para un peso entre la capa oculta y de salida

In [16]:
grad_w_out = np.dot(delta.T, a_h)
grad_b_out = np.sum(delta, axis = 0)
print(grad_w_out)
print('\nGradiente de b out\n', grad_b_out)

[[-0.11860904 -0.12243956]
 [-0.12087335 -0.1255365 ]]

Gradiente de b out
 [-0.1893884  -0.19192365]


### Cambio de pesos para un peso entre la capa entrada y oculta

In [17]:
wh -= 0.5 * grad_w_h
bh -= 0.5 * grad_b_h

print(wh)
print('\nBias\n', bh)

[[0.15992484 0.25984475]
 [0.21050522 0.31040682]]

Bias
 [0.36998973 0.37151792]


Ajuste de pesos de capa de salida

In [18]:
w_out -= 0.5 * grad_w_out
b_out -= 0.5 * grad_b_out

print(w_out)
print('\nBias\n', b_out)

[[0.45930452 0.56121978]
 [0.51043668 0.61276825]]

Bias
 [0.6946942  0.69596183]
