# 3) Non-linear single layer network

## Theory

Jusqu'à présent, notre réseau est purement linéaire et seule une relation linéaire peut-être trouvée entre les entrées et les sorties. Ajouter d'autres couches après la première ne changerait rien à cela car on ne ferait que des opérations entre matrice; ce qui reste linéaire. Il faut donc directement ajouter une fonction non-linéaire à notre couche. Pour cela, on applique une fonction non-linéaire après avoir appliqué nos poids et notre biais aux entrées du neurone. Dans notre cas, on aimerait garder des sorties qui sont définies sur l'ensemble des réelles positifs, on utilisera donc la fonction $\mathrm{LReLu}$ (Leaky Rectified Linear Unit) définie comme suit

So far, our network is purely linear and only a linear relationship can be found between inputs and outputs. Adding other layers after the first one wouldn't change anything because we'd only be doing operations between matrices, which is still linear. So we need to add a non-linear function directly to our layer. To do this, we apply a non-linear function after applying our weights and bias to the neuron's inputs. The general consensus is that it does not matter which non-linear function is used so the simpler the better. In our case, we'll use the function $\mathrm{LReLu}$ (Leaky Rectified Linear Unit) defined as follows

\begin{equation}
    \mathrm{LReLu}(x) = \begin{cases} x, & si \,\, x > 0 \\ 0.01x, & si \,\, x < 0 \end{cases}
\end{equation}

whose derivative is

\begin{equation}
    \frac{\partial \mathrm{LReLu}(x)}{\partial x} = \begin{cases} 1, & si \,\, x > 0 \\ 0.01, & si \,\, x < 0 \end{cases}
\end{equation}
which allows relatively fast numerical calculations. The output of a neuron before application of the non-linear function will be noted as

\begin{equation}
    z_i = (\mathbf{W} \mathbf{x} + \mathbf{b})_i = W_{i1} x_1 + W_{i2} x_2 + \cdots + W_{iN_0} x_{N_0} + b_i
\end{equation}

Each prediction is then

\begin{equation}
    y_i = \mathrm{LReLu}[z_i] = \mathrm{LReLu}(W_{i1} x_1 + W_{i2} x_2 + \cdots + W_{iN_0} x_{N_0} + b_i)
\end{equation}

The derivative of the errors with respect to their related weights is

\begin{aligned}
        \frac{\partial \epsilon_i}{\partial W_{ij}} &= \frac{\partial}{\partial W_{ij}} \left(\mathrm{LReLu}[W_{i1} x_1 + W_{i2} x_2 + \cdots + W_{iN_0} x_{N_0} + b_i] - \xi_i\right)^2 \\
        &= \begin{cases} 2\left( W_{i1} x_1 + W_{i2} x_2 + \cdots + W_{iN_0} x_{N_0} + b_i - \xi_i\right)x_j, & si \,\, z_i > 0 \\ 0.02\left( W_{i1} x_1 + W_{i2} x_2 + \cdots + W_{iN_0} x_{N_0} + b_i - \xi_i\right)x_j, & si \,\, z_i < 0 \end{cases} \\
        & \propto \begin{cases} x_j \delta_i, & si \,\, z_i > 0 \\ 0.01x_j \delta_i, & si \,\, z_i < 0 \end{cases}
\end{aligned}

And the weights and biases are updated by the following formulae

\begin{equation}
    W_{ij}' = \begin{cases} W_{ij} - \alpha x_j \delta_i, & si \,\, z_i > 0 \\ W_{ij} - 0.01 \alpha x_j \delta_i, & si \,\, z_i < 0 \end{cases}
\end{equation}

\begin{equation}
    b_i' = \begin{cases} b_i - \alpha \delta_i, & si \,\, z_i > 0 \\ b_i - 0.01 \alpha \delta_i, & si \,\, z_i < 0 \end{cases}
\end{equation}

## Julia code

First we define the LRelu function

In [3]:
function LReLu(p::Float64)::Float64
    if p > 0 
        return p
    else
        return 0.01*p
    end
end

LReLu (generic function with 1 method)

We can then define the structure of our neural network.

In [4]:
mutable struct NonLinearOneLayerModel
    W::Array{Float64,2}
    b::Vector{Float64}
    σ::Function
end

(m::NonLinearOneLayerModel)(x::Array{Float64,1}) = m.σ.(m.W * x + m.b)

We instantiate our model and directly test it.

In [8]:
nInput = 5;
nOutput = 3;

model = NonLinearOneLayerModel(rand(nOutput,nInput), rand(nOutput), LReLu);

x = rand(nInput)
ξ = rand(nOutput)

z = model.W * x + model.b
y = model(x)
δ = y-ξ
ϵ = δ.^2
ϵTot = sum(ϵ)
println(y)
println(δ)
println(ϵ)
println(ϵTot)

[0.7655795569363301, 1.1562712185784825, 2.899367462918296]
[-0.08740704606510707, 0.6341811892530409, 1.9067158917902653]
[0.00763999170182775, 0.4021857808024013, 3.6355654920055467]
4.045391264509775


Then we define the function to train ou neural network.

In [9]:
function train!(model, x, ξ, iteration, α)
    sumϵVec = Vector{Float64}(undef,0)
    for χ in 1:iteration
        z = model.W * x + model.b
        y = model(x) #Compute the prediction from the model
        δ = y - ξ #Difference between the prediction and the expectation

        #Error computed with the mean squared
        ϵ = δ.^2
        push!(sumϵVec, sum(ϵ))

        #Update the weights
        for j in 1:length(x)
            for i in 1: length(ξ)
                if z[i] > 0
                    model.W[i,j] = model.W[i,j] -(α*x[j]*δ[i])
                    model.b[i] = model.b[i] - α*δ[i]
                else
                    model.W[i,j] = model.W[i,j] -(0.01*α*x[j]*δ[i])
                    model.b[i] = model.b[i] - 0.01*α*δ[i]
                end
            end
        end
        @show sum(ϵ)
    end
    return sumϵVec
end

train! (generic function with 1 method)

In [10]:
sumϵVec = train!(model, x, ξ, 30 , 0.1);

sum(ϵ) = 4.045391264509775
sum(ϵ) = 0.39764824819946165
sum(ϵ) = 0.03908747484658005
sum(ϵ) = 0.0038421662783628377
sum(ϵ) = 0.00037767192095501385
sum(ϵ) = 3.7123869594374905e-5
sum(ϵ) = 3.6491505383153174e-6
sum(ϵ) = 3.5869912799491655e-7
sum(ϵ) = 3.5258908360628806e-8
sum(ϵ) = 3.4658311708725877e-9
sum(ϵ) = 3.4067945559301106e-10
sum(ϵ) = 3.3487635643852545e-11
sum(ϵ) = 3.2917210680830958e-12
sum(ϵ) = 3.2356502234968344e-13
sum(ϵ) = 3.1805344883816444e-14
sum(ϵ) = 3.1263575729437787e-15
sum(ϵ) = 3.0731034747223623e-16
sum(ϵ) = 3.0207563417923686e-17
sum(ϵ) = 2.969301603303995e-18
sum(ϵ) = 2.918728720448692e-19
sum(ϵ) = 2.8689899270042054e-20
sum(ϵ) = 2.8201765917621368e-21
sum(ϵ) = 2.7720131894426154e-22
sum(ϵ) = 2.7253095135228936e-23
sum(ϵ) = 2.678619235173052e-24
sum(ϵ) = 2.6277038104192754e-25
sum(ϵ) = 2.5804392092829585e-26
sum(ϵ) = 2.5320586164847867e-27
sum(ϵ) = 2.5103033118329885e-28
sum(ϵ) = 2.354256764018957e-29
