Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diverging loss for simple MLP #16

Closed
rejuvyesh opened this issue Jun 10, 2016 · 2 comments
Closed

Diverging loss for simple MLP #16

rejuvyesh opened this issue Jun 10, 2016 · 2 comments

Comments

@rejuvyesh
Copy link

rejuvyesh commented Jun 10, 2016

I tried implementing a fairly simple muli-layer perceptron in Knet.jl

using Knet
using MNIST
using ProgressMeter

function onehot(labels; min=0, max=9)
    labels = round(Int, labels)
    res = zeros(Float64, max-min+1, length(labels))
    for (i, label) in enumerate(labels)
        res[label-min+1, i] = 1.0
    end
    return res
end

function train!(model, dataX, labels, loss, batchsize=32, steps=1000)
    @assert size(dataX,2) == size(labels,2) "$(size(dataX)) != $(size(labels))"
    @assert batchsize <= size(labels, 2)
    step = 0
    bs = 1
    prog = Progress(steps, 0.5)
    while step < steps
        be = bs + batchsize - 1
        step += 1

        input = dataX[:, bs:be]
        output = labels[:, bs:be]
        forw(model, input)
        back(model, output, loss)
        Knet.update!(model)
        ProgressMeter.update!(prog, step)
    end
end

function test(model, dataX, labels, loss)
    ypred = forw(model, dataX)
    lossval = loss(ypred, labels)
    return lossval
end

@knet function mlp_softmax(x, input_dim=784, output_dim=10, hidden_dim=64)
    W1 = par(init=Gaussian(0,0.001), dims=(hidden_dim,input_dim))
    b1 = par(init=Constant(0), dims=(hidden_dim,1))

    W2 = par(init=Gaussian(0, 0.001), dims=(output_dim, hidden_dim))
    b2 = par(init=Constant(0), dims=(output_dim,1))

    y = W2*tanh(W1 * x + b1) + b2
    return soft(y)
end

function main()
    trainX, trainY = traindata()
    testX, testY = testdata()
    trainX /= maximum(trainX)
    testX /= maximum(testX)

    model = compile(:mlp_softmax)
    setp(model; ls=0.001, adagrad=true)
    @show test(model, trainX, onehot(trainY), softloss)
    @show test(model, testX, onehot(testY), softloss)
    train!(model, trainX, onehot(trainY), softloss)
    @show test(model, trainX, onehot(trainY), softloss)
    @show test(model, testX, onehot(testY), softloss)
end

main()

One would expect the test loss value to decrease after training a few steps. But instead it diverges to a larger value:

test(model,trainX,onehot(trainY),softloss) = 2.300456469738548
test(model,testX,onehot(testY),softloss) = 2.30038973959615
Progress: 100% Time: 0:00:06
test(model,trainX,onehot(trainY),softloss) = 5.154692176004155
test(model,testX,onehot(testY),softloss) = 5.0927675134887505

Trying different optimization parameters did not seem to help.

@ozanarkancan
Copy link
Collaborator

The keyword for the learning rate is "lr". You should correct the line you set the optimization parameters as following:
setp(model; lr=0.001, adagrad=true)

Here is the result:
test(model,trainX,onehot(trainY),softloss) = 2.302583841731204
test(model,testX,onehot(testY),softloss) = 2.3025815985486386
Progress: 100% Time: 0:00:06
test(model,trainX,onehot(trainY),softloss) = 1.6269797708233549
test(model,testX,onehot(testY),softloss) = 1.6429610465456255

@rejuvyesh
Copy link
Author

/facepalm

Thanks 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants