# First example

## Learn : Linear Model for NOT

$f^{*}(0) = 1$<br>$f^{*}(1) = 0$

No activation function this time.

### Diagram:

![NOT Diagram](images/not_diagram.png)

### Function:

$$f(x, w, b) = w \cdot x + b$$

### Error function:
$$MSE=\frac{1}{n}\sum_{x=0}^{1}{(f(x, w, b) - f^*(x))^2}$$
$n$ is equal to the number of entries, in our case: 2

**Objective:** we want to minimize this error function

**How:** With calculus, finding the derivative for minimizing the error

### Mean Squared Error (MSE):

$$ MSE = \frac{1}{2} \cdot (\,f(0, w, b) - f^*(0)\,)^2 + \frac{1}{2} \cdot (\,f(1, w, b) - f^*(1)\,)^2  $$
$$ MSE = \frac{1}{2} \cdot (\,b-1\,)^2 + \frac{1}{2} \cdot (\,w+b - 0\,)^2  $$
$$ MSE = b^2 - b + \frac{1}{2} + \frac{w^2}{2} + w \cdot b $$

### Partial derivatives

Now, calculating the partial derivatives to find the gradient descedent:<br><br>
<div style="float:left;width:250px;">
![Gradient](images/gradient.png)
</div>
<div style="float:left;width:250px;"><br>
 $$\frac{\partial MSE}{\partial w} = w + b$$<br>
 $$\frac{\partial MSE}{\partial b} = 2 \cdot b - 1 + w$$
</div>
<div style="clear:both;"></div>

### Iteration:
$$w \leftarrow w- \eta \cdot \frac{\partial MSE}{\partial w}$$<br>
$$b \leftarrow b- \eta \cdot \frac{\partial MSE}{\partial b}$$

In [1]:
import random
import itertools

# generic mse
def mse(f, f_star, w, b, n):
    return sum([(f(x, w, b) - f_star(x)) ** 2 for x in itertools.product([0, 1], repeat=n)])

In [2]:
def not_func(x):
    if x[0] == 0:
        return 1
    if x[0] == 1:
        return 0
    
    raise ValueError('Incorrect input, it should be 0 or 1')
    

def not_model(x, w, b):
    return w * x[0] + b


def grad_w(w, b, eta):
    return w - eta * (w + b)


def grad_b(w, b, eta):
    return b - eta * (2 * 
                      b - 1 + w)

def train_not(model, objective, n, num_interactions=500, eta=0.1):
    w = random.uniform(-1, 1)
    b = random.uniform(-1, 1)
    
    j = 0
    for i in range(num_interactions):
        mse_value = mse(model, objective, w, b, n)
        w = grad_w(w, b, eta)
        b = grad_b(w, b, eta)

        if j == 0:
            print(mse_value)
        elif j == 10:
            j = -1
            
        j += 1
    
    def trained_model(x):
        return model(x, w, b)
    
    return trained_model

In [3]:
trained_model = train_not(not_model, not_func, n=2)

3.1273125908836406
0.09097695296660298
0.03567994424746805
0.014551440351822587
0.0059280827849091285
0.0024148165342546637
0.0009836745834271108
0.0004006993048506026
0.00016322463896691278
6.648946578546263e-05
2.7084446857500127e-05
1.1032834343084863e-05
4.494218925064842e-06
1.8307175761297194e-06
7.457417850426648e-07
3.037775008061611e-07
1.2374359576856902e-07
5.040688481898796e-08
2.0533216457561165e-08
8.364194288274525e-09
3.407149885006487e-09
1.387900608083934e-09
5.653605397308239e-10
2.3029930099176767e-10
9.381229199556604e-11
3.821438489630135e-11
1.556660840372668e-11
6.341049263512728e-12
2.583022885592576e-12
1.052192934008835e-12
4.2861020560916524e-13
1.7459412849918662e-13
7.112082098637322e-14
2.8971026821594625e-14
1.1801331653804728e-14
4.807265885281481e-15
1.9582370865126027e-15
7.976867920247288e-16
3.249372720017398e-16
1.3236301388442732e-16
5.3918000378717415e-17
2.1963467976054934e-17
8.946807103117877e-18
3.644476357878606e-18
1.484575451602423e-18
6.0

In [4]:
for x in itertools.product([0, 1], repeat=1):
    print('input: {}, output: {}'.format(x, trained_model(x)))

input: (0,), output: 0.9999999996269427
input: (1,), output: 2.4870483450456504e-10


# Second example: Linear Model for XOR
## Learn XOR

 x1 | x2 | XOR 
----|----|----
 0  | 0  | 0
 0  | 1  | 1
 1  | 0  | 1
 1  | 1  | 0

*No activation function.*

### Diagram:
![XOR Diagram](images/xor_diagram.png)

### Function:

$$ f(x, w, b) = w_1 \cdot x_1 + w_2 \cdot x_2 + b$$

### MSE:

$$ MSE = \frac{1}{4} \cdot (b^2 + (w_2+1-1)^2+(w_1+b-1)+(w_1+b-1)^2+(w_1+w_2+b)^2)$$

### Partial derivatives:

$$\frac{\partial MSE}{\partial w_1} = w_1 + b - \frac{1}{2} + \frac{w_2}{2}$$<br>
$$\frac{\partial MSE}{\partial w_2} = w_2 + b - \frac{1}{2} + \frac{w_1}{2}$$<br>
$$\frac{\partial MSE}{\partial b} = 2b + w_1 + w_2 - 1$$<br>

In [5]:
def xor_func(x):
    return int(x[0] != x[1])


def xor_model(x, w, b):
    return x[0] * w[0] + x[1] * w[1] + b
    

def grad_w_1(w, b, eta):
    return w[0] - eta * (w[0] + b - 0.5 + 0.5 * w[1])


def grad_w_2(w, b, eta):
    return w[1] - eta * (w[1] + b - 0.5 + 0.5 * w[0])


def grad_b(w, b, eta):
    return b - eta * (2 * b + w[0] + w[1] - 1)


def train_xor(model, objective, n, num_interactions=500, eta=0.1):
    w = [random.uniform(-1, 1) for _ in range(2)]
    b = random.uniform(-1, 1)
    
    j = 0
    for i in range(num_interactions):
        mse_value = mse(model, objective, w, b, n)
        w[0] = grad_w_1(w, b, eta)
        w[1] = grad_w_2(w, b, eta)
        b = grad_b(w, b, eta)
        
        if j == 0:
            print(mse_value)
        elif j == 10:
            j = -1
            
        j += 1
    
    def trained_model(x):
        return model(x, w, b)
    
    return trained_model

In [6]:
trained_model = train_xor(xor_model, xor_func, n=4)

74.21251615645126
4.416213022532111
4.138188488536622
4.050881249927394
4.019597506970198
4.00790416439515
4.003325607279144
4.001449798328364
4.000649764838388
4.000297169776201
4.000137846871886
4.000064553535356
4.00003041838905
4.000014389975611
4.00000682395414
4.000003240696449
4.000001540265461
4.000000732383601
4.000000348308851
4.000000165658091
4.000000078786001
4.00000003746763
4.0000000178166735
4.000000008471465
4.000000004027666
4.000000001914761
4.0000000009102195
4.000000000432664
4.000000000205652
4.000000000097744
4.000000000046457
4.000000000022078
4.000000000010494
4.000000000004986
4.00000000000237
4.000000000001125
4.000000000000535
4.000000000000253
4.000000000000121
4.000000000000057
4.0000000000000275
4.000000000000013
4.000000000000007
4.0000000000000036
4.000000000000002
4.000000000000001


In [7]:
for x in itertools.product([0, 1], repeat=2):
    print('input: {}, output: {}'.format(x, trained_model(x)))

input: (0, 0), output: 0.4999999913242938
input: (0, 1), output: 0.4999999991588299
input: (1, 0), output: 0.4999999984537919
input: (1, 1), output: 0.5000000062883281


# Improved model for XOR


![XOR Improved](images/xor_improved_diagram.png)

### Activation function: ReLU
$$\sigma(x) = max(0, x)$$<br>

### Derivative:
\begin{cases}
    x,& \text{if } x\geq 1\\
    0,              & \text{otherwise}
\end{cases}

### MSE:

$$ MSE = \frac{1}{4} \sum_x (f^{(2)}(f^{(1)}(x)) - f^{*}(x))^2$$

$$ MSE = \frac{1}{4} \sum_x (w_1 \cdot f^{(1)}(x_1) + w_2 \cdot f^{(1)}(x_2) + b - f^{*}(x))^2$$

$$ MSE = \frac{1}{4} \sum_{x_1=0}^{1} \sum_{x_2=0}^{1} (w_1 \cdot max(0, w_{1,1}x_1 + w_{1,2}x_2 + b_1) + w_2 \cdot max(0, w_{2,1}x_1 + w_{2,2}x_2 + b_2) + b - f^{*}(x))^2$$

In [8]:
import sympy
import random
sympy.init_printing(use_latex='mathjax')


class Xor(sympy.Function):
    @classmethod
    def eval(cls, x1, x2):
        if x1.is_Number and x2.is_Number:
            return int(x1 != x2)

class Var:
    def __init__(self, symbol, value):
        self.s = sympy.Symbol(symbol)
        self.v = value
    
    @staticmethod
    def symbols(variables):
        return (Var(symbol, random.uniform(-1, 1)) for symbol in variables.split(' '))
    
    def set_diff(self, expr):
        self.diff = sympy.diff(expr, self.s)
        
    def set_rand(self):
        self.v = random.uniform(-1, 1)

        
def subs(expr, variables):
    return expr.subs([
        (var.s, var.v) for var in variables
    ])


def diff(variables, eta):
    for var in variables:
        var.v -= eta * subs(var.diff, variables)

In [9]:
variables = list(Var.symbols('x_1 x_2 w_1 w_2 w_11 w_12 w_21 w_22 b_1 b_2 b'))
x_1, x_2, w_1, w_2, w_11, w_12, w_21, w_22, b_1, b_2, b = variables

model = (
    w_1.s * sympy.Max(0, w_11.s * x_1.s + w_12.s * x_2.s + b_1.s) + 
    w_2.s * sympy.Max(0, w_21.s * x_1.s + w_22.s * x_2.s + b_2.s) + b.s
)
expr = (model - Xor(x_1.s, x_2.s)) ** 2
expr = sympy.Rational(1, 4) * sympy.Sum(sympy.Sum(expr, (x_1.s, 0, 1)), (x_2.s, 0, 1))

expr

  1      1                                                                    
 ___    ___                                                                   
 ╲      ╲                                                                     
  ╲      ╲                                                                    
  ╱      ╱    (b + w₁⋅Max(0, b_1 + w_11*x_1 + w_12*x_2) + w₂⋅Max(0, b_2 + w_21
 ╱      ╱                                                                     
 ‾‾‾    ‾‾‾                                                                   
x₂ = 0 x₁ = 0                                                                 
──────────────────────────────────────────────────────────────────────────────
                                                    4                         

                            
                            
                            
                           2
*x_1 + w_22*x_2) - x₁ ⊻ x₂) 
                            
                            
      

In [10]:
# do the sum and simplify
expr = expr.doit().simplify()
expr

                                     2                                        
(b + w₁⋅Max(0, b_1) + w₂⋅Max(0, b_2))    (b + w₁⋅Max(0, b_1 + w_11 + w_12) + w
────────────────────────────────────── + ─────────────────────────────────────
                  4                                                      4    

                            2                                                 
₂⋅Max(0, b_2 + w_21 + w_22))    (b + w₁⋅Max(0, b_1 + w_11) + w₂⋅Max(0, b_2 + w
───────────────────────────── + ──────────────────────────────────────────────
                                                           4                  

         2                                                          2
_21) - 1)    (b + w₁⋅Max(0, b_1 + w_12) + w₂⋅Max(0, b_2 + w_22) - 1) 
────────── + ────────────────────────────────────────────────────────
                                        4                            

In [11]:
# set variables derivatives 
for var in variables:
    var.set_diff(expr)

In [12]:
# train based on number of iterations with learning rate eta
def train(iterations=500, eta=0.05):
    for var in variables:
        var.set_rand()

    j = 0
    for _ in range(iterations):
        mse = subs(expr, variables)
        diff(variables, eta)
        
        if j == 0:
            print(mse)
        elif j == 10:
            j = -1
            
        j += 1
        
    def trained_model(x):
        return (
            w_1.v * max(0, w_11.v * x[0] + w_12.v * x[1] + b_1.v) + 
            w_2.v * max(0, w_21.v * x[0] + w_22.v * x[1] + b_2.v) + b.v
        )

    return trained_model

In [13]:
trained_model = train()

0.265646446065377
0.224858254011097
0.206991861276136
0.190704149182786
0.181199984040078
0.174393451491669
0.167328360539738
0.163526028868485
0.154924305564428
0.146407625120289
0.134448533946257
0.121845513677617
0.110361501188768
0.0983061437253321
0.0861120076772425
0.0736537068454568
0.0617838854835395
0.0503109833076007
0.0381953811393830
0.0281041646397589
0.0209350569412672
0.0154467266138095
0.0111496916915322
0.00778730172723044
0.00530334174726595
0.00376824501705862
0.00258942186143672
0.00189570651881333
0.00126690378478501
0.000820948277708715
0.000578653842596901
0.000397286773700105
0.000276810189056663
0.000181024255154999
0.000117368107852272
8.05656907870844e-5
5.49147629686018e-5
3.69530191406392e-5
2.36920184911874e-5
1.63506687844512e-5
1.13638812082042e-5
7.30898060985584e-6
4.78150729183676e-6
3.16891761858790e-6
2.18022378660339e-6
1.48938479073708e-6


In [14]:
for x in itertools.product([0, 1], repeat=2):
    print('input: {}, output: {}'.format(x, trained_model(x)))

input: (0, 0), output: 0.000702514318965952
input: (0, 1), output: 0.998703406850084
input: (1, 0), output: 0.998724427574680
input: (1, 1), output: 0.00102738435601935
