In [13]:
import numpy as np

# sigmoid function
def nonlin(x,deriv=False):
    if(deriv==True):
        return x*(1-x)
    return 1/(1+np.exp(-x))

- Formula of Sigmoid function: $$S(t) = \frac{1}{(1+e^t)}$$
- Sigmoid function is the activation function that maps inputs to any numbesr between 0 to 1, which represents probabilities.
- The process of making the simplified derivative of sigmoid:
$$S(x) = \frac{1}{(1+e^x)}$$
$$f(x) = \frac{1}{S(x)} = 1+e^x$$
$$f'(x) = \frac{S'(x)}{S(x)^2}$$
$$f'(x) = -e^{-x} = 1 - f(x) = 1 - \frac{1}{S(x)}
= \frac{(S(x)-1)}{S(x)}$$
$$ \frac{S'(x)}{S(x)^2} = \frac{(S(x)-1)}{S(x)}$$ 
$$ S'(x) = {S(x)^2}\frac{(S(x)-1)}{S(x)}$$ 
$$ S'(x) = {S(x)}(S(x)-1)$$ 

In [14]:
# input dataset
X = np.array([  [0,0,1],
                [0,1,1],
                [1,0,1],
                [1,1,1] ])

| Input| Feature1 | Feature2 | Feature3
| :-: | :-: |:-: | :-:
| Sample1 | 0 | 0 | 1
| Sample2 | 0 | 1 | 1
| Sample3 | 1 | 0 | 1
| Sample4 | 1 | 1 | 1

In [15]:
# output dataset            
y = np.array([[0,0,1,1]]).T

| Output
| :-: 
|0 
|0 
|1 
|1 

In [16]:
# seed random numbers to make calculation
# deterministic (just a good practice)
np.random.seed(1)

In [17]:
# initialize weights randomly with mean 0
syn0 = 2*np.random.random((3,1)) - 1

- Since X is 4x3 matrix, to compute dot product, the synapse should be 3x1 matrix to produce 4x1 matrix for the next layer.
- the matrix representation:

$\left[ \begin{array}{cccc}
0 & 0 & 1 \\
0 & 1 & 1 \\
1 & 0 & 1 \\
1 & 1 & 1 \\ \end{array} \right]$ * $\left[ \begin{array}{cccc}
s_1 \\
s_2 \\
s_3 \\ \end{array} \right]$ = $\left[ \begin{array}{cccc}
o_1 \\
o_2 \\
o_3 \\
o_4 \\ \end{array} \right]$

In [11]:
for iter in range(10000):

    # forward propagation
    l0 = X # layer 0
    l1 = nonlin(np.dot(l0,syn0)) # layer 1


- Forward propagation is an operation that applies corresponding weights of a synapse onto each value of the input layer to produce the next layer.

In [12]:
    # how much did we miss?
    l1_error = y - l1


- The more error the row of output has, the more weight should be fixed to get better result in the next iteration.

In [8]:
    # multiply how much we missed by the 
    # slope of the sigmoid at the values in l1
    l1_delta = l1_error * nonlin(l1,True)
    # update weights
    syn0 += np.dot(l0.T,l1_delta)


- The slope gives the direction to adjust the weights (positive or negative).
- By multiplying it with error rate, it contains the direction and degree of the adjustment.
- By taking dot product between layer 0 and the computed error information, update the weights at layer 0

In [9]:
print("Output After Training:")
print(l1)

Output After Training:
[[ 0.2689864 ]
 [ 0.36375058]
 [ 0.23762817]
 [ 0.3262757 ]]
