Uniform predictions with DBN #105

winstonquock · 2014-12-21T18:31:17Z

With the latest 0.0.3.3-SNAPSHOT, the problem with zeroed output layer weight matrix is gone. However the problem of uniform prediction still remained. I modified my previous dummy test to have two different inputs/outputs: vector of 100 mapped to 4th label set; vector of 200 mapped to 2nd label set. After the training, both input vector produced exactly the same (to all decimals) prediction, even though the values of the different labels for the same prediction look random.

The output

actual [0.0 ,0.0 ,0.0 ,1.0 ,0.0] vs predicted [0.044186138400415305 ,0.43372079239937705 ,0.044186138400415305 ,0.43372079239937705 ,0.044186138400415305]
actual [0.0 ,1.0 ,0.0 ,0.0 ,0.0] vs predicted [0.044186138400415305 ,0.43372079239937705 ,0.044186138400415305 ,0.43372079239937705 ,0.044186138400415305]
F1: 0.6

The test codes

package org.deeplearning4j;

import org.apache.commons.math3.random.MersenneTwister;
import org.deeplearning4j.eval.Evaluation;
import org.deeplearning4j.models.classifiers.dbn.DBN;
import org.deeplearning4j.models.featuredetectors.rbm.RBM;
import org.deeplearning4j.nn.WeightInit;
import org.deeplearning4j.nn.api.NeuralNetwork;
import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
import org.nd4j.linalg.api.activation.Activations;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.dataset.DataSet;
import org.nd4j.linalg.factory.Nd4j;
import org.nd4j.linalg.lossfunctions.LossFunctions;

public class DBNDummyTest {
    public static void main(String... args) throws Exception {
        INDArray input = Nd4j.create(2, 614); // have to be at least two or else output layer gradient is a scalar and cause exception
        INDArray labels = Nd4j.create(2, 5);

        INDArray row0 = Nd4j.create(1, 614);
        row0.assign(100f);
        input.putRow(0, row0);
        labels.put(0, 3, 1); // set the 4th column

        INDArray row1 = Nd4j.create(1, 614);
        row1.assign(200f);
        input.putRow(1, row1);
        labels.put(1, 1, 1); // set the 2nd column

        DataSet trainingSet = new DataSet(input, labels);

        MersenneTwister gen = new MersenneTwister(123);

        MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
            .iterations(500)
            .weightInit(WeightInit.VI)
            .activationFunction(Activations.tanh())
            .visibleUnit(RBM.VisibleUnit.GAUSSIAN)
            .hiddenUnit(RBM.HiddenUnit.RECTIFIED)
            .constrainGradientToUnitNorm(false)
            .lossFunction(LossFunctions.LossFunction.XENT)
            .optimizationAlgo(NeuralNetwork.OptimizationAlgorithm.ITERATION_GRADIENT_DESCENT)
            .rng(gen)
            .learningRate(1e-2f)
            .nIn(trainingSet.numInputs()).nOut(trainingSet.numOutcomes()).list(2)
            .hiddenLayerSizes(new int[] {100})
            .override(new NeuralNetConfiguration.ConfOverride() {
                @Override
                public void override(int i, NeuralNetConfiguration.Builder builder) {
                    if (i == 1) {
                        builder.iterations(500);
                        builder.weightInit(WeightInit.ZERO);
                        builder.activationFunction(Activations.softMaxRows());
                        builder.lossFunction(LossFunctions.LossFunction.MCXENT);
                    }
                }
            }).build();

        DBN nn = new DBN.Builder().layerWiseConfiguration(conf).build();

        nn.fit(trainingSet);
        INDArray predict2 = nn.output(trainingSet.getFeatureMatrix());
        for (int i = 0; i < predict2.rows(); i++) {
            String actual = trainingSet.getLabels().getRow(i).toString().trim();
            String predicted = predict2.getRow(i).toString().trim();
            System.out.println("actual "+actual+" vs predicted " + predicted);
        }
//        for (int row = 0; row < nn.getOutputLayer().getW().rows(); row++) {
//            System.out.println(nn.getOutputLayer().getW().getRow(row).toString().trim());
//        }
        Evaluation eval = new Evaluation();
        eval.eval(trainingSet.getLabels(), predict2);
        System.out.println("F1: " + eval.f1());
    }
}

The text was updated successfully, but these errors were encountered:

winstonquock · 2014-12-22T23:21:28Z

is there any progress / investigation of this.

agibsonccc · 2014-12-22T23:29:49Z

I ran this with more sane datasets (iris, mnist, et.al) where I was running in to problems. My main goal was to surface signal with those and see it learning something. As far as I'm concerned it works. I'll run this later. I'd like to see something a little more reasonable though. For one, I'd scale the values to be in a more reasonable range. Your layer sizes are disproportional to your input feature size. 5 labels and 2 examples isn't going to give you a balanced dataset let alone any signal in your data. This example will require some cleaning up which I will do here in the next day or 2. I have other priorities atm.

mrdeanplumbley · 2014-12-22T23:44:07Z

I had a similar problem with Iris:

Actual Class 0 was predicted with Predicted 0 with count 47 times

Actual Class 0 was predicted with Predicted 1 with count 3 times

Actual Class 1 was predicted with Predicted 0 with count 5 times

Actual Class 1 was predicted with Predicted 1 with count 45 times

==========================F1 Scores========================================

0.9388254052617593

The output layers take on very similar values:

0.3333843909362641 : 0.33328239088833167 : 0.33333321817540423
0.3333900596933137 : 0.3332767235230874 : 0.3333332167835989
0.3333682782268913 : 0.33329850177644393 : 0.3333332199966647
0.3332613866156246 : 0.33340539399987584 : 0.33333321938449956
0.33337174109526074 : 0.33329503919552605 : 0.33333321970921326

It seems to report predicting them reliably but the output layers take on a uniform value.

There is a mismatch, either the output layer values are wrong or the F1 score is wrong

agibsonccc · 2014-12-23T00:12:52Z

Hmm. I'll write a few unit tests and post them here.

agibsonccc · 2014-12-23T01:56:17Z

Here's a baseline eval test:

e5855af

It increments properly, the confusion matrix is fine. I'll also create a specific example with the DBN as well, to show that it selects each guess properly as well.

agibsonccc · 2014-12-23T04:23:22Z

Rather than "prove it works", do me a favor and step through the debugger on eval.eval here:
https://github.com/SkymindIO/deeplearning4j/blob/master/deeplearning4j-core/src/test/java/org/deeplearning4j/models/classifiers/dbn/DBNTest.java#L37

and see for yourself.

You'll see that it guesses the right indices just fine. I'm open to discussion either way.

mrdeanplumbley · 2014-12-23T12:08:38Z

I guess I am running into two problems.

I am wanting to print out the output probabilities for each example, which I do (for iris) with:

for (int i = 0; i < output.rows(); i++) {
System.out.println(output.getRow(i).getDouble(0) + " : " + output.getRow(i).getDouble(1) + " : " + output.getRow(i).getDouble(2));
}

but they all look the same:

0.3333339584580804 : 0.3333332336519369 : 0.33333280788998265
0.3333340804352555 : 0.3333334552329316 : 0.33333246433181296
0.333334201825926 : 0.3333333493528615 : 0.3333324488212126
0.33333426885848344 : 0.3333333646628999 : 0.3333323664786166
0.3333340205467134 : 0.33333317499671417 : 0.3333328044565723
0.3333337366382527 : 0.3333330519019273 : 0.3333332114598199

I am not confident I am printing out the correct thing here.

Also, with the iris example if I add class 2 examples by increasing
DataSet next = iter.next(100);
to
DataSet next = iter.next(150);

but then it loses the ability to predict class 1 accurately

Actual Class 0 was predicted with Predicted 0 with count 49 times

Actual Class 0 was predicted with Predicted 2 with count 1 times

Actual Class 1 was predicted with Predicted 0 with count 14 times

Actual Class 1 was predicted with Predicted 1 with count 15 times

Actual Class 1 was predicted with Predicted 2 with count 21 times

Actual Class 2 was predicted with Predicted 0 with count 1 times

Actual Class 2 was predicted with Predicted 1 with count 6 times

Actual Class 2 was predicted with Predicted 2 with count 43 times

==========================F1 Scores========================================

0.7897365529258937

Any idea what I am doing wrong/ is going wrong.

Essentially I would like to be able to make ROC curves for some two class problems I have (hence the output layer activation's needed)

mrdeanplumbley · 2014-12-23T12:20:20Z

It looks like the output prediction probs are different. So it is predicting things correctly.

The problem is there isn't a big difference between them. So a ROC curve wouldn't look like its almost guessing.

Any way to get output probabilities for iris to be more weighted towards the example?

i.e.

0.999 : 0.00001: 0.00001 ect.

Also, I assume when I use the same data set:
DataSet next = iter.next(100)

for train and test, it is showing the train example when doing the testing stage?

If I run for loads of iterations would it not overfit and predict quite well? (I know that's cheating normally!)

Or does the dataset split automatically into train/validation/test?

With my actual examples I split manually into train/test on a 70/30% basis

Hope you can clarify I am understanding this right

Dean

agibsonccc · 2014-12-23T16:43:10Z

It doesn't split normally. I have a method on dataset called splitTestAndTrain which does that. Part of me updating the examples will be including that usage int here.Other than that, the new iris test I put out and linked to earlier does that exact thing.

winstonquock · 2014-12-23T19:57:41Z

I was just trying to construct the simplest possible test to show the problem. I tried with 1000 examples of alternating values of 1 and 2:

        final int nSamples = 1000;
        INDArray input = Nd4j.create(nSamples, 614);
        INDArray labels = Nd4j.create(nSamples, 5);

        for (int i = 0; i < nSamples; i++) {
            INDArray row = Nd4j.create(1, 614);
            if (i % 2 == 0) {
                row.assign(1f);
                labels.put(i, 3, 1); // set the 4th column
            } else {
                row.assign(2f);
                labels.put(i, 1, 1); // set the 2nd column
            }
            input.putRow(i, row); // just realize it had copy-and-paste error before; however similar result
        }

(trials with different params: 10 or 100 hidden layer size; 10 or 100 iterations)
Still the same/worse results

actual [0.0 ,0.0 ,0.0 ,1.0 ,0.0] vs predicted [0.2 ,0.2 ,0.2 ,0.2 ,0.2]
actual [0.0 ,1.0 ,0.0 ,0.0 ,0.0] vs predicted [0.2 ,0.2 ,0.2 ,0.2 ,0.2]
actual [0.0 ,0.0 ,0.0 ,1.0 ,0.0] vs predicted [0.2 ,0.2 ,0.2 ,0.2 ,0.2]
actual [0.0 ,1.0 ,0.0 ,0.0 ,0.0] vs predicted [0.2 ,0.2 ,0.2 ,0.2 ,0.2]
F1: 0.0

agibsonccc · 2014-12-23T20:05:01Z

I get the intent which is why I haven't closed the issue or anything. I'd just like to take a crack at my interpretation of the same example.

agibsonccc · 2014-12-27T05:09:06Z

Update on this: better examples are a huge priority and I will integrate a few simple examples. Unfortunately, I put this off due to a huge API upgrade coming. Since we fixed the problems with backprop being diluted, I have primarily diverted my efforts towards finishing up the 0.0.3.3 release. Part of that release will be a plug and play layer architecture which will allow for people to build their own neural nets (including conv nets, recurrent,...). Once this new api upgrade is out, I will work on putting together some better examples. I am still primarily hesitant since things are so fast moving. This is looking more and more like a real neural net library with this release and I'll feel a little more comfortable putting out some neat recipes since things will be a little more configurable.

One of my primary concerns has been making deeplearning4j feature complete. One of my top priorities has been to support every form of neural net architecture allowing for people to pick and choose their architecture. Hopefully this new layer api will reduce problems a bit.

Adam

winstonquock · 2015-01-02T21:53:34Z

I'm able to update the test to the latest 0.0.3.3 with the new API and it produces exact predictions. One thing though, I have to set the iteration high. If I set it to something like 100 iterations, the results flicker among repeated runs (of the exact binary and data, with fixed RNG seed) because the label value differences are too small and easily impact by round off errors. This seems contradicting to advice in http://deeplearning4j.org/troubleshootingneuralnets.html Is this normal for this very simple problem?

o.d.o.s.BaseOptimizer - Error at iteration 9999 was 0.10035656083660333
actual [0.0 ,0.0 ,0.0 ,1.0 ,0.0] vs predicted [0.022635427226149932 ,0.33985780191966286 ,0.022635427226149932 ,0.5922359164018873 ,0.022635427226149932]
actual [0.0 ,1.0 ,0.0 ,0.0 ,0.0] vs predicted [0.017826180473760635 ,0.6189595273749603 ,0.017826180473760635 ,0.3275619312037576 ,0.017826180473760635]
F1: 1.0

The updated test codes

package org.deeplearning4j;

import org.apache.commons.math3.random.MersenneTwister;
import org.deeplearning4j.eval.Evaluation;
import org.deeplearning4j.models.featuredetectors.rbm.RBM;
import org.deeplearning4j.nn.api.LayerFactory;
import org.deeplearning4j.nn.api.OptimizationAlgorithm;
import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
import org.deeplearning4j.nn.layers.OutputLayer;
import org.deeplearning4j.nn.layers.factory.DefaultLayerFactory;
import org.deeplearning4j.nn.layers.factory.LayerFactories;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.deeplearning4j.nn.weights.WeightInit;
import org.nd4j.linalg.api.activation.Activations;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.dataset.DataSet;
import org.nd4j.linalg.factory.Nd4j;
import org.nd4j.linalg.lossfunctions.LossFunctions;

public class DBNDummyTest {
    public static void main(String... args) throws Exception {
        final int nSamples = 1000;
        int nFeatures = 10;
        INDArray input = Nd4j.create(nSamples, nFeatures); // have to be at least two or else output layer gradient is a scalar and cause exception
        INDArray labels = Nd4j.create(nSamples, 5);

        for (int i = 0; i < nSamples; i++) {
            INDArray row = Nd4j.create(1, nFeatures);
            if (i % 2 == 0) {
                row.assign(1f);
                labels.put(i, 3, 1); // set the 4th column
            } else {
                row.assign(2f);
                labels.put(i, 1, 1); // set the 2nd column
            }
            input.putRow(i, row);
        }

        DataSet trainingSet = new DataSet(input, labels);

        MersenneTwister gen = new MersenneTwister(123);
        final int iterations = 10000;
        LayerFactory layerFactory = LayerFactories.getFactory(RBM.class);
        MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
            .iterations(iterations)
            .weightInit(WeightInit.VI)
            .activationFunction(Activations.softMaxRows())
            .visibleUnit(RBM.VisibleUnit.BINARY)
            .hiddenUnit(RBM.HiddenUnit.SOFTMAX)
            .layerFactory(layerFactory)
            .constrainGradientToUnitNorm(true)
//            .lossFunction(LossFunctions.LossFunction.MCXENT)
            .optimizationAlgo(OptimizationAlgorithm.ITERATION_GRADIENT_DESCENT)
            .rng(gen)
            .learningRate(1e-3f)
            .nIn(trainingSet.numInputs()).nOut(trainingSet.numOutcomes()).list(2)
            .hiddenLayerSizes(new int[]{10})
            .override(new NeuralNetConfiguration.ConfOverride() {
                @Override
                public void override(int i, NeuralNetConfiguration.Builder builder) {
                    if (i == 1) {
                        builder.layerFactory(new DefaultLayerFactory(OutputLayer.class));
                        builder.iterations(iterations);
                        builder.weightInit(WeightInit.ZERO);
                        builder.activationFunction(Activations.softMaxRows());
                        builder.lossFunction(LossFunctions.LossFunction.MCXENT);
                    }
                }
            }).build();

        MultiLayerNetwork nn = new MultiLayerNetwork(conf);

        nn.fit(trainingSet);
        INDArray predict2 = nn.output(trainingSet.getFeatureMatrix());
        for (int i = 0; i < 2; i++) {
            String actual = trainingSet.getLabels().getRow(i).toString().trim();
            String predicted = predict2.getRow(i).toString().trim();
            System.out.println("actual "+actual+" vs predicted " + predicted);
        }
        Evaluation eval = new Evaluation();
        eval.eval(trainingSet.getLabels(), predict2);
        System.out.println("F1: " + eval.f1());
    }
}

agibsonccc · 2015-01-02T22:18:02Z

First normalize the data to zero mean and unit variance. Any neural net will produce weird output on data that isn't in between zero and 1. Especially with a DBN and binary data. If you still don't get signal changing the activation to tanh will help a lot as well.
For the hidden unit, I would recommend keeping it binary.

Your configuration for the activation function is also off, softmax rows should be for a classifier only.

WRT your weight initialization, change that to be a smaller distribution. You can do this by changing the weight distribution to .DISTRIBUTION and specifying a distribution with .dist

winstonquock · 2015-01-03T01:52:55Z

Now with the recommended changes, I can get perfect prediction in 5 iterations! Though, why is this not a classifier? Only one of the output label is 1 and different label based on one of the two input feature vectors.

agibsonccc · 2015-01-03T01:57:37Z

Great! It's still a classifier. (Only the output layer though) Keep in mind the point of a DBN is to generate features using pretrained RBMs. The output layer (using multi class cross entropy as the objective) then has a softmax activation (which is essentially logistic/softmax regression if you use zero for the weights) Your goal then is to propagate a feature set through the RBMs such that the Logistic Regression classifier can learn features.

agibsonccc · 2015-01-03T02:29:44Z

I'm going to close this since it appears to be resolved. Please feel free to resume the conversation on the mailing list with tuning this particular example. I'd be more than glad to help you if you put this up with a github gist/repo. I'm aware the docs need to be improved and will continue working on this. I know it's a little daunting to tune neural nets and the suggestions I just gave you probably seem like they come from a black box or some sort of weird voodoo. There's definitely best practices with neural nets I hope to establish. I will continue working on examples and the like.

Remove ownership

lock · 2019-01-22T02:09:09Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

agibsonccc closed this as completed Jan 3, 2015

nightscape pushed a commit to nightscape/deeplearning4j that referenced this issue Jul 4, 2018

Merge pull request deeplearning4j#105 from Atry/remove-ownership

0c96d55

Remove ownership

lock bot locked and limited conversation to collaborators Jan 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uniform predictions with DBN #105

Uniform predictions with DBN #105

winstonquock commented Dec 21, 2014

winstonquock commented Dec 22, 2014

agibsonccc commented Dec 22, 2014

mrdeanplumbley commented Dec 22, 2014

agibsonccc commented Dec 23, 2014

agibsonccc commented Dec 23, 2014

agibsonccc commented Dec 23, 2014

mrdeanplumbley commented Dec 23, 2014

mrdeanplumbley commented Dec 23, 2014

agibsonccc commented Dec 23, 2014

winstonquock commented Dec 23, 2014

agibsonccc commented Dec 23, 2014

agibsonccc commented Dec 27, 2014

winstonquock commented Jan 2, 2015

agibsonccc commented Jan 2, 2015

winstonquock commented Jan 3, 2015

agibsonccc commented Jan 3, 2015

agibsonccc commented Jan 3, 2015

lock bot commented Jan 22, 2019

Uniform predictions with DBN #105

Uniform predictions with DBN #105

Comments

winstonquock commented Dec 21, 2014

winstonquock commented Dec 22, 2014

agibsonccc commented Dec 22, 2014

mrdeanplumbley commented Dec 22, 2014

0.9388254052617593

agibsonccc commented Dec 23, 2014

agibsonccc commented Dec 23, 2014

agibsonccc commented Dec 23, 2014

mrdeanplumbley commented Dec 23, 2014

0.7897365529258937

mrdeanplumbley commented Dec 23, 2014

agibsonccc commented Dec 23, 2014

winstonquock commented Dec 23, 2014

agibsonccc commented Dec 23, 2014

agibsonccc commented Dec 27, 2014

winstonquock commented Jan 2, 2015

agibsonccc commented Jan 2, 2015

winstonquock commented Jan 3, 2015

agibsonccc commented Jan 3, 2015

agibsonccc commented Jan 3, 2015

lock bot commented Jan 22, 2019