Diverging loss for simple MLP #16

rejuvyesh · 2016-06-10T06:42:15Z

I tried implementing a fairly simple muli-layer perceptron in Knet.jl

using Knet
using MNIST
using ProgressMeter

function onehot(labels; min=0, max=9)
    labels = round(Int, labels)
    res = zeros(Float64, max-min+1, length(labels))
    for (i, label) in enumerate(labels)
        res[label-min+1, i] = 1.0
    end
    return res
end

function train!(model, dataX, labels, loss, batchsize=32, steps=1000)
    @assert size(dataX,2) == size(labels,2) "$(size(dataX)) != $(size(labels))"
    @assert batchsize <= size(labels, 2)
    step = 0
    bs = 1
    prog = Progress(steps, 0.5)
    while step < steps
        be = bs + batchsize - 1
        step += 1

        input = dataX[:, bs:be]
        output = labels[:, bs:be]
        forw(model, input)
        back(model, output, loss)
        Knet.update!(model)
        ProgressMeter.update!(prog, step)
    end
end

function test(model, dataX, labels, loss)
    ypred = forw(model, dataX)
    lossval = loss(ypred, labels)
    return lossval
end

@knet function mlp_softmax(x, input_dim=784, output_dim=10, hidden_dim=64)
    W1 = par(init=Gaussian(0,0.001), dims=(hidden_dim,input_dim))
    b1 = par(init=Constant(0), dims=(hidden_dim,1))

    W2 = par(init=Gaussian(0, 0.001), dims=(output_dim, hidden_dim))
    b2 = par(init=Constant(0), dims=(output_dim,1))

    y = W2*tanh(W1 * x + b1) + b2
    return soft(y)
end

function main()
    trainX, trainY = traindata()
    testX, testY = testdata()
    trainX /= maximum(trainX)
    testX /= maximum(testX)

    model = compile(:mlp_softmax)
    setp(model; ls=0.001, adagrad=true)
    @show test(model, trainX, onehot(trainY), softloss)
    @show test(model, testX, onehot(testY), softloss)
    train!(model, trainX, onehot(trainY), softloss)
    @show test(model, trainX, onehot(trainY), softloss)
    @show test(model, testX, onehot(testY), softloss)
end

main()

One would expect the test loss value to decrease after training a few steps. But instead it diverges to a larger value:

test(model,trainX,onehot(trainY),softloss) = 2.300456469738548
test(model,testX,onehot(testY),softloss) = 2.30038973959615
Progress: 100% Time: 0:00:06
test(model,trainX,onehot(trainY),softloss) = 5.154692176004155
test(model,testX,onehot(testY),softloss) = 5.0927675134887505

Trying different optimization parameters did not seem to help.

The text was updated successfully, but these errors were encountered:

ozanarkancan · 2016-06-10T08:25:27Z

The keyword for the learning rate is "lr". You should correct the line you set the optimization parameters as following:
setp(model; lr=0.001, adagrad=true)

Here is the result:
test(model,trainX,onehot(trainY),softloss) = 2.302583841731204
test(model,testX,onehot(testY),softloss) = 2.3025815985486386
Progress: 100% Time: 0:00:06
test(model,trainX,onehot(trainY),softloss) = 1.6269797708233549
test(model,testX,onehot(testY),softloss) = 1.6429610465456255

rejuvyesh · 2016-06-10T17:32:51Z

/facepalm

Thanks 😄

rejuvyesh closed this as completed Jun 10, 2016

BariscanBozkurt mentioned this issue Nov 29, 2021

Derivative of a Function That Includes @diff Macro #670

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diverging loss for simple MLP #16

Diverging loss for simple MLP #16

rejuvyesh commented Jun 10, 2016 •

edited

ozanarkancan commented Jun 10, 2016

rejuvyesh commented Jun 10, 2016

Diverging loss for simple MLP #16

Diverging loss for simple MLP #16

Comments

rejuvyesh commented Jun 10, 2016 • edited

ozanarkancan commented Jun 10, 2016

rejuvyesh commented Jun 10, 2016

rejuvyesh commented Jun 10, 2016 •

edited