# Multi-Layer Perceptrons 

In [1]:
using Flux

In [2]:
# Layer 1: weights, bias y forward pass 

W1 = rand(3, 5)           # 3 nodos, 5 valores de entrada en cada nodo 
b1 = rand(3)              # cada nodo tiene su propio bias term 

layer1(x) = W1 * x .+ b1

layer1 (generic function with 1 method)

In [3]:
σ(0), σ(1), σ(-1)   # sigmoid function, solo para demo 

(0.5, 0.7310585786300049, 0.2689414213699951)

In [4]:
# activacion de layer 1 
l1(x) = σ.(layer1(x))

l1 (generic function with 1 method)

In [5]:
# Layer 2
W2 = rand(2, 3)            # 2 nodos, 3 valores de entrada en cada nodo (porque L1 tiene 3)
b2 = rand(2) 

layer2(x) = W2 * x .+ b2

layer2 (generic function with 1 method)

In [6]:
# combinar para formar un modelo con activacion lineal 
model(x) = layer2( l1(x) )      

model (generic function with 1 method)

In [7]:
model(rand(5))    

2-element Vector{Float64}:
 2.312653427595195
 2.3704494305669463

Esto realiza predicciones rápidamente, pero no se verá muy bien. Limpiemos el código 

## Limpieza del Codigo 

### Tipos de capas

In [8]:
function linear(in, out)
  W = randn(out, in)
  b = randn(out)
  x -> W * x .+ b      # devuelve una funcion que toma un vector y produce otro 
end

linear (generic function with 1 method)

In [9]:
linear(5, 3)  # 5 nodos de entrada, 3 de salida en la capa actual 

#1 (generic function with 1 method)

In [10]:
linear1 = linear(5, 3) 
linear2 = linear(3, 2)

#1 (generic function with 1 method)

In [11]:
linear1.W   

3×5 Matrix{Float64}:
  0.621917   0.614352  -1.30337    0.962886   0.59264
 -0.217677  -0.129524  -0.78067    0.0362436  0.173365
  0.154861  -1.34433    0.0551138  0.343842   0.201935

In [12]:
model(x) = linear2( σ.(linear1(x)) )

model (generic function with 1 method)

In [13]:
x = rand(5) 

5-element Vector{Float64}:
 0.90204927141224
 0.7822200452431758
 0.9958984648448636
 0.6233980063572192
 0.6151692965125259

In [14]:
model(x)

2-element Vector{Float64}:
 -0.7238180153007114
  2.1076271891491185

In [15]:
# tambien se pueden usar 'pipes'
model_p(x) = σ.(linear1(x)) |>  
             linear2 
        

model_p (generic function with 1 method)

In [16]:
model_p(x)

2-element Vector{Float64}:
 -0.7238180153007114
  2.1076271891491185

Esto es similar a como construirlo con Flux (que todavía no hemos usado). Vamos a usar ahora Flux.

## Layers con Flux 

Reiniciamos el kernel dado que definimos una funcion anteriormente

In [1]:
using Flux 

In [2]:
layer1 = Dense(10, 5, σ)

Dense(10 => 5, σ)   [90m# 55 parameters[39m

In [3]:
model2 = Chain(
  Dense(10, 5, σ),    # entrada son (10, ), nodos son 5 
  Dense(5, 2),
  softmax)            # softmax 

Chain(
  Dense(10 => 5, σ),                    [90m# 55 parameters[39m
  Dense(5 => 2),                        [90m# 12 parameters[39m
  NNlib.softmax,
) [90m                  # Total: 4 arrays, [39m67 parameters, 524 bytes.

In [4]:
model2(rand(10))  

2-element Vector{Float64}:
 0.729338298655391
 0.27066170134460904