Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem migrating from DBN to lasagne NeuralNet: NaN for each epoch #35

Closed
bmilde opened this issue Feb 5, 2015 · 3 comments
Closed

Comments

@bmilde
Copy link

bmilde commented Feb 5, 2015

Lasagne looks fantastic, thanks for integrating it into nolearn! However, I have trouble transitioning from nolearn's DBN to the new lasagne NeuralNet.

Here is what happens:

Done loading and transforming data, traindata size: 83.5334777832 MB
Distribution of classes in train data:
[[ 0.00000000e+00 5.82160000e+04]
[ 1.00000000e+00 5.12730000e+04]] 2
conf: momentum: 0.01 self.learn_rates: 0.01
fitting classifier... nolearn
InputLayer (None, 200) produces 200 outputs
DenseLayer (None, 50) produces 50 outputs
DenseLayer (None, 2) produces 2 outputs

Epoch Train loss Valid loss Train / Val Valid acc Dur
1 nan nan nan 45.05% 0.7s
2 nan nan nan 46.47% 0.6s
3 nan nan nan 45.77% 0.6s
4 nan nan nan 47.06% 0.6s
5 nan nan nan 47.07% 0.7s
6 nan nan nan 47.06% 0.7s
7 nan nan nan 47.08% 0.7s
8 nan nan nan 53.71% 0.7s
9 nan nan nan 47.05% 0.6s
10 nan nan nan 47.05% 0.6s
11 nan nan nan 47.05% 0.6s
12 nan nan nan 47.05% 0.7s
13 nan nan nan 47.05% 0.6s
14 nan nan nan 47.05% 0.6s

I tried fiddling with different learning rates (1,0.1,0.01,... 0.0000001 even 0.0), momentum rates, different optimisers (sgd,nestrov, rmsprop ...every method that lasagne offers), input sizes, no. of hidden units, two and one hidden layer, all to no avail.

The mnist example from lasagne runs fine though.

Here is my DBN code, which also runs fine and produces models with >0.90% accuracy (on an audio gender detection task), on the same data:

            clf = DBN([X_train.shape[1], self.hid_layer_units, self.hid_layer_units, self._no_classes],
                    dropouts=self.dropouts,
                    learn_rates=self.learn_rates,
                    learn_rates_pretrain=self.learn_rates_pretrain,
                    minibatch_size=self.minibatch_size,
                    learn_rate_decays=self.learn_rate_decays,
                    learn_rate_minimums=self.learn_rate_minimums,
                    epochs_pretrain=self.pretrainepochs,
                    epochs=self.epochs,
                    momentum= self.momentum,
                    real_valued_vis=True,
                    use_re_lu=True,
                    verbose=1)

I've translated that into:

            clf = NeuralNet(
                    layers=[  # three layers: one hidden layer
                            ('input', layers.InputLayer),
                            ('hidden', layers.DenseLayer),
                            #('hidden', layers.DenseLayer),
                            ('output', layers.DenseLayer),
                            ],
                            # layer parameters:
                            input_shape=(None, X_train.shape[1]),  
                            hidden_num_units=self.hid_layer_units,  
                            output_num_units=self._no_classes,
                            output_nonlinearity=None,

                            eval_size=0.1,

                            # optimization method:
                            update=sgd,
                            update_learning_rate=self.learn_rates,
                            #update_momentum=momentum,

                            regression=False, 
                            max_epochs=self.epochs,  
                            verbose=1,
                            )

Is there anything obvious that I've missed here? How can I debug this?

@bmilde
Copy link
Author

bmilde commented Feb 6, 2015

I have put together this simple example fitting nolearns NeuralNet on MNIST, which also doesn't run on my machine (nan for losses + valid acc does not improve). Could you try to run it?

from lasagne import layers
from lasagne import init

from lasagne.updates import sgd,nesterov_momentum
from nolearn.lasagne import NeuralNet

import numpy as np

from sklearn.datasets import fetch_mldata
from sklearn.utils import shuffle

DATA_PATH = '~/data'

mnist = fetch_mldata('MNIST original', data_home=DATA_PATH)

train = mnist.data[:60000].astype(np.float32)
train_labels = mnist.target[:60000].astype(np.int32)

train, train_labels = shuffle(train, train_labels, random_state=42)

print 'train.shape:',train.shape,'train.dtype:',train.dtype,'train_labels.dtype:',train_labels.dtype

clf = NeuralNet(
    layers=[
        ('input', layers.InputLayer),
        ('hidden', layers.DenseLayer),
        ('output', layers.DenseLayer),
        ],
    input_shape = (None, train.shape[1]),
    hidden_num_units=100,
    output_num_units=10,
    output_nonlinearity=None,

    update=nesterov_momentum,
    #update=sgd,
    update_learning_rate=0.01,
    update_momentum=0.9,

    regression=False,
    max_epochs=1000,
    verbose=1,

    #W=init.Uniform()

    )

clf.fit(train,train_labels)

@dnouri
Copy link
Owner

dnouri commented Feb 8, 2015

I think what you're missing is output_nonlinearity=lasagne.nonlinearities.softmax. Sorry for a lack of proper documentation for this. But there's an MNIST example included in the tests if you want to have a look: https://github.com/dnouri/nolearn/blob/master/nolearn/tests/test_lasagne.py#L41-L91

@bmilde
Copy link
Author

bmilde commented Feb 9, 2015

Ah, thanks, yes that was it! Apparently I looked into every other parameter besides output_nonlinearity... thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants