Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uniform predictions with DBN #105

Closed
winstonquock opened this issue Dec 21, 2014 · 18 comments
Closed

Uniform predictions with DBN #105

winstonquock opened this issue Dec 21, 2014 · 18 comments

Comments

@winstonquock
Copy link

With the latest 0.0.3.3-SNAPSHOT, the problem with zeroed output layer weight matrix is gone. However the problem of uniform prediction still remained. I modified my previous dummy test to have two different inputs/outputs: vector of 100 mapped to 4th label set; vector of 200 mapped to 2nd label set. After the training, both input vector produced exactly the same (to all decimals) prediction, even though the values of the different labels for the same prediction look random.

The output

actual [0.0 ,0.0 ,0.0 ,1.0 ,0.0] vs predicted [0.044186138400415305 ,0.43372079239937705 ,0.044186138400415305 ,0.43372079239937705 ,0.044186138400415305]
actual [0.0 ,1.0 ,0.0 ,0.0 ,0.0] vs predicted [0.044186138400415305 ,0.43372079239937705 ,0.044186138400415305 ,0.43372079239937705 ,0.044186138400415305]
F1: 0.6

The test codes

package org.deeplearning4j;

import org.apache.commons.math3.random.MersenneTwister;
import org.deeplearning4j.eval.Evaluation;
import org.deeplearning4j.models.classifiers.dbn.DBN;
import org.deeplearning4j.models.featuredetectors.rbm.RBM;
import org.deeplearning4j.nn.WeightInit;
import org.deeplearning4j.nn.api.NeuralNetwork;
import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
import org.nd4j.linalg.api.activation.Activations;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.dataset.DataSet;
import org.nd4j.linalg.factory.Nd4j;
import org.nd4j.linalg.lossfunctions.LossFunctions;

public class DBNDummyTest {
    public static void main(String... args) throws Exception {
        INDArray input = Nd4j.create(2, 614); // have to be at least two or else output layer gradient is a scalar and cause exception
        INDArray labels = Nd4j.create(2, 5);

        INDArray row0 = Nd4j.create(1, 614);
        row0.assign(100f);
        input.putRow(0, row0);
        labels.put(0, 3, 1); // set the 4th column

        INDArray row1 = Nd4j.create(1, 614);
        row1.assign(200f);
        input.putRow(1, row1);
        labels.put(1, 1, 1); // set the 2nd column

        DataSet trainingSet = new DataSet(input, labels);

        MersenneTwister gen = new MersenneTwister(123);

        MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
            .iterations(500)
            .weightInit(WeightInit.VI)
            .activationFunction(Activations.tanh())
            .visibleUnit(RBM.VisibleUnit.GAUSSIAN)
            .hiddenUnit(RBM.HiddenUnit.RECTIFIED)
            .constrainGradientToUnitNorm(false)
            .lossFunction(LossFunctions.LossFunction.XENT)
            .optimizationAlgo(NeuralNetwork.OptimizationAlgorithm.ITERATION_GRADIENT_DESCENT)
            .rng(gen)
            .learningRate(1e-2f)
            .nIn(trainingSet.numInputs()).nOut(trainingSet.numOutcomes()).list(2)
            .hiddenLayerSizes(new int[] {100})
            .override(new NeuralNetConfiguration.ConfOverride() {
                @Override
                public void override(int i, NeuralNetConfiguration.Builder builder) {
                    if (i == 1) {
                        builder.iterations(500);
                        builder.weightInit(WeightInit.ZERO);
                        builder.activationFunction(Activations.softMaxRows());
                        builder.lossFunction(LossFunctions.LossFunction.MCXENT);
                    }
                }
            }).build();

        DBN nn = new DBN.Builder().layerWiseConfiguration(conf).build();

        nn.fit(trainingSet);
        INDArray predict2 = nn.output(trainingSet.getFeatureMatrix());
        for (int i = 0; i < predict2.rows(); i++) {
            String actual = trainingSet.getLabels().getRow(i).toString().trim();
            String predicted = predict2.getRow(i).toString().trim();
            System.out.println("actual "+actual+" vs predicted " + predicted);
        }
//        for (int row = 0; row < nn.getOutputLayer().getW().rows(); row++) {
//            System.out.println(nn.getOutputLayer().getW().getRow(row).toString().trim());
//        }
        Evaluation eval = new Evaluation();
        eval.eval(trainingSet.getLabels(), predict2);
        System.out.println("F1: " + eval.f1());
    }
}
@winstonquock
Copy link
Author

is there any progress / investigation of this.

@agibsonccc
Copy link
Contributor

I ran this with more sane datasets (iris, mnist, et.al) where I was running in to problems. My main goal was to surface signal with those and see it learning something. As far as I'm concerned it works. I'll run this later. I'd like to see something a little more reasonable though. For one, I'd scale the values to be in a more reasonable range. Your layer sizes are disproportional to your input feature size. 5 labels and 2 examples isn't going to give you a balanced dataset let alone any signal in your data. This example will require some cleaning up which I will do here in the next day or 2. I have other priorities atm.

@mrdeanplumbley
Copy link

I had a similar problem with Iris:

Actual Class 0 was predicted with Predicted 0 with count 47 times

Actual Class 0 was predicted with Predicted 1 with count 3 times

Actual Class 1 was predicted with Predicted 0 with count 5 times

Actual Class 1 was predicted with Predicted 1 with count 45 times

==========================F1 Scores========================================

0.9388254052617593

The output layers take on very similar values:

0.3333843909362641 : 0.33328239088833167 : 0.33333321817540423
0.3333900596933137 : 0.3332767235230874 : 0.3333332167835989
0.3333682782268913 : 0.33329850177644393 : 0.3333332199966647
0.3332613866156246 : 0.33340539399987584 : 0.33333321938449956
0.33337174109526074 : 0.33329503919552605 : 0.33333321970921326

It seems to report predicting them reliably but the output layers take on a uniform value.

There is a mismatch, either the output layer values are wrong or the F1 score is wrong

@agibsonccc
Copy link
Contributor

Hmm. I'll write a few unit tests and post them here.

@agibsonccc
Copy link
Contributor

Here's a baseline eval test:

e5855af

It increments properly, the confusion matrix is fine. I'll also create a specific example with the DBN as well, to show that it selects each guess properly as well.

@agibsonccc
Copy link
Contributor

Rather than "prove it works", do me a favor and step through the debugger on eval.eval here:
https://github.com/SkymindIO/deeplearning4j/blob/master/deeplearning4j-core/src/test/java/org/deeplearning4j/models/classifiers/dbn/DBNTest.java#L37

and see for yourself.

You'll see that it guesses the right indices just fine. I'm open to discussion either way.

@mrdeanplumbley
Copy link

I guess I am running into two problems.

I am wanting to print out the output probabilities for each example, which I do (for iris) with:

for (int i = 0; i < output.rows(); i++) {
System.out.println(output.getRow(i).getDouble(0) + " : " + output.getRow(i).getDouble(1) + " : " + output.getRow(i).getDouble(2));
}

but they all look the same:

0.3333339584580804 : 0.3333332336519369 : 0.33333280788998265
0.3333340804352555 : 0.3333334552329316 : 0.33333246433181296
0.333334201825926 : 0.3333333493528615 : 0.3333324488212126
0.33333426885848344 : 0.3333333646628999 : 0.3333323664786166
0.3333340205467134 : 0.33333317499671417 : 0.3333328044565723
0.3333337366382527 : 0.3333330519019273 : 0.3333332114598199

I am not confident I am printing out the correct thing here.

Also, with the iris example if I add class 2 examples by increasing
DataSet next = iter.next(100);
to
DataSet next = iter.next(150);

but then it loses the ability to predict class 1 accurately

Actual Class 0 was predicted with Predicted 0 with count 49 times

Actual Class 0 was predicted with Predicted 2 with count 1 times

Actual Class 1 was predicted with Predicted 0 with count 14 times

Actual Class 1 was predicted with Predicted 1 with count 15 times

Actual Class 1 was predicted with Predicted 2 with count 21 times

Actual Class 2 was predicted with Predicted 0 with count 1 times

Actual Class 2 was predicted with Predicted 1 with count 6 times

Actual Class 2 was predicted with Predicted 2 with count 43 times

==========================F1 Scores========================================

0.7897365529258937

Any idea what I am doing wrong/ is going wrong.

Essentially I would like to be able to make ROC curves for some two class problems I have (hence the output layer activation's needed)

@mrdeanplumbley
Copy link

It looks like the output prediction probs are different. So it is predicting things correctly.

The problem is there isn't a big difference between them. So a ROC curve wouldn't look like its almost guessing.

Any way to get output probabilities for iris to be more weighted towards the example?

i.e.

0.999 : 0.00001: 0.00001 ect.

Also, I assume when I use the same data set:
DataSet next = iter.next(100)

for train and test, it is showing the train example when doing the testing stage?

If I run for loads of iterations would it not overfit and predict quite well? (I know that's cheating normally!)

Or does the dataset split automatically into train/validation/test?

With my actual examples I split manually into train/test on a 70/30% basis

Hope you can clarify I am understanding this right

Dean

@agibsonccc
Copy link
Contributor

It doesn't split normally. I have a method on dataset called splitTestAndTrain which does that. Part of me updating the examples will be including that usage int here.Other than that, the new iris test I put out and linked to earlier does that exact thing.

@winstonquock
Copy link
Author

I was just trying to construct the simplest possible test to show the problem. I tried with 1000 examples of alternating values of 1 and 2:

        final int nSamples = 1000;
        INDArray input = Nd4j.create(nSamples, 614);
        INDArray labels = Nd4j.create(nSamples, 5);

        for (int i = 0; i < nSamples; i++) {
            INDArray row = Nd4j.create(1, 614);
            if (i % 2 == 0) {
                row.assign(1f);
                labels.put(i, 3, 1); // set the 4th column
            } else {
                row.assign(2f);
                labels.put(i, 1, 1); // set the 2nd column
            }
            input.putRow(i, row); // just realize it had copy-and-paste error before; however similar result
        }

(trials with different params: 10 or 100 hidden layer size; 10 or 100 iterations)
Still the same/worse results

actual [0.0 ,0.0 ,0.0 ,1.0 ,0.0] vs predicted [0.2 ,0.2 ,0.2 ,0.2 ,0.2]
actual [0.0 ,1.0 ,0.0 ,0.0 ,0.0] vs predicted [0.2 ,0.2 ,0.2 ,0.2 ,0.2]
actual [0.0 ,0.0 ,0.0 ,1.0 ,0.0] vs predicted [0.2 ,0.2 ,0.2 ,0.2 ,0.2]
actual [0.0 ,1.0 ,0.0 ,0.0 ,0.0] vs predicted [0.2 ,0.2 ,0.2 ,0.2 ,0.2]
F1: 0.0

@agibsonccc
Copy link
Contributor

I get the intent which is why I haven't closed the issue or anything. I'd just like to take a crack at my interpretation of the same example.

@agibsonccc
Copy link
Contributor

Update on this: better examples are a huge priority and I will integrate a few simple examples. Unfortunately, I put this off due to a huge API upgrade coming. Since we fixed the problems with backprop being diluted, I have primarily diverted my efforts towards finishing up the 0.0.3.3 release. Part of that release will be a plug and play layer architecture which will allow for people to build their own neural nets (including conv nets, recurrent,...). Once this new api upgrade is out, I will work on putting together some better examples. I am still primarily hesitant since things are so fast moving. This is looking more and more like a real neural net library with this release and I'll feel a little more comfortable putting out some neat recipes since things will be a little more configurable.

One of my primary concerns has been making deeplearning4j feature complete. One of my top priorities has been to support every form of neural net architecture allowing for people to pick and choose their architecture. Hopefully this new layer api will reduce problems a bit.

Adam

@winstonquock
Copy link
Author

I'm able to update the test to the latest 0.0.3.3 with the new API and it produces exact predictions. One thing though, I have to set the iteration high. If I set it to something like 100 iterations, the results flicker among repeated runs (of the exact binary and data, with fixed RNG seed) because the label value differences are too small and easily impact by round off errors. This seems contradicting to advice in http://deeplearning4j.org/troubleshootingneuralnets.html Is this normal for this very simple problem?

o.d.o.s.BaseOptimizer - Error at iteration 9999 was 0.10035656083660333
actual [0.0 ,0.0 ,0.0 ,1.0 ,0.0] vs predicted [0.022635427226149932 ,0.33985780191966286 ,0.022635427226149932 ,0.5922359164018873 ,0.022635427226149932]
actual [0.0 ,1.0 ,0.0 ,0.0 ,0.0] vs predicted [0.017826180473760635 ,0.6189595273749603 ,0.017826180473760635 ,0.3275619312037576 ,0.017826180473760635]
F1: 1.0

The updated test codes

package org.deeplearning4j;

import org.apache.commons.math3.random.MersenneTwister;
import org.deeplearning4j.eval.Evaluation;
import org.deeplearning4j.models.featuredetectors.rbm.RBM;
import org.deeplearning4j.nn.api.LayerFactory;
import org.deeplearning4j.nn.api.OptimizationAlgorithm;
import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
import org.deeplearning4j.nn.layers.OutputLayer;
import org.deeplearning4j.nn.layers.factory.DefaultLayerFactory;
import org.deeplearning4j.nn.layers.factory.LayerFactories;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.deeplearning4j.nn.weights.WeightInit;
import org.nd4j.linalg.api.activation.Activations;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.dataset.DataSet;
import org.nd4j.linalg.factory.Nd4j;
import org.nd4j.linalg.lossfunctions.LossFunctions;

public class DBNDummyTest {
    public static void main(String... args) throws Exception {
        final int nSamples = 1000;
        int nFeatures = 10;
        INDArray input = Nd4j.create(nSamples, nFeatures); // have to be at least two or else output layer gradient is a scalar and cause exception
        INDArray labels = Nd4j.create(nSamples, 5);

        for (int i = 0; i < nSamples; i++) {
            INDArray row = Nd4j.create(1, nFeatures);
            if (i % 2 == 0) {
                row.assign(1f);
                labels.put(i, 3, 1); // set the 4th column
            } else {
                row.assign(2f);
                labels.put(i, 1, 1); // set the 2nd column
            }
            input.putRow(i, row);
        }

        DataSet trainingSet = new DataSet(input, labels);

        MersenneTwister gen = new MersenneTwister(123);
        final int iterations = 10000;
        LayerFactory layerFactory = LayerFactories.getFactory(RBM.class);
        MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
            .iterations(iterations)
            .weightInit(WeightInit.VI)
            .activationFunction(Activations.softMaxRows())
            .visibleUnit(RBM.VisibleUnit.BINARY)
            .hiddenUnit(RBM.HiddenUnit.SOFTMAX)
            .layerFactory(layerFactory)
            .constrainGradientToUnitNorm(true)
//            .lossFunction(LossFunctions.LossFunction.MCXENT)
            .optimizationAlgo(OptimizationAlgorithm.ITERATION_GRADIENT_DESCENT)
            .rng(gen)
            .learningRate(1e-3f)
            .nIn(trainingSet.numInputs()).nOut(trainingSet.numOutcomes()).list(2)
            .hiddenLayerSizes(new int[]{10})
            .override(new NeuralNetConfiguration.ConfOverride() {
                @Override
                public void override(int i, NeuralNetConfiguration.Builder builder) {
                    if (i == 1) {
                        builder.layerFactory(new DefaultLayerFactory(OutputLayer.class));
                        builder.iterations(iterations);
                        builder.weightInit(WeightInit.ZERO);
                        builder.activationFunction(Activations.softMaxRows());
                        builder.lossFunction(LossFunctions.LossFunction.MCXENT);
                    }
                }
            }).build();

        MultiLayerNetwork nn = new MultiLayerNetwork(conf);

        nn.fit(trainingSet);
        INDArray predict2 = nn.output(trainingSet.getFeatureMatrix());
        for (int i = 0; i < 2; i++) {
            String actual = trainingSet.getLabels().getRow(i).toString().trim();
            String predicted = predict2.getRow(i).toString().trim();
            System.out.println("actual "+actual+" vs predicted " + predicted);
        }
        Evaluation eval = new Evaluation();
        eval.eval(trainingSet.getLabels(), predict2);
        System.out.println("F1: " + eval.f1());
    }
}

@agibsonccc
Copy link
Contributor

First normalize the data to zero mean and unit variance. Any neural net will produce weird output on data that isn't in between zero and 1. Especially with a DBN and binary data. If you still don't get signal changing the activation to tanh will help a lot as well.
For the hidden unit, I would recommend keeping it binary.

Your configuration for the activation function is also off, softmax rows should be for a classifier only.

WRT your weight initialization, change that to be a smaller distribution. You can do this by changing the weight distribution to .DISTRIBUTION and specifying a distribution with .dist

@winstonquock
Copy link
Author

Now with the recommended changes, I can get perfect prediction in 5 iterations! Though, why is this not a classifier? Only one of the output label is 1 and different label based on one of the two input feature vectors.

@agibsonccc
Copy link
Contributor

Great! It's still a classifier. (Only the output layer though) Keep in mind the point of a DBN is to generate features using pretrained RBMs. The output layer (using multi class cross entropy as the objective) then has a softmax activation (which is essentially logistic/softmax regression if you use zero for the weights) Your goal then is to propagate a feature set through the RBMs such that the Logistic Regression classifier can learn features.

@agibsonccc
Copy link
Contributor

I'm going to close this since it appears to be resolved. Please feel free to resume the conversation on the mailing list with tuning this particular example. I'd be more than glad to help you if you put this up with a github gist/repo. I'm aware the docs need to be improved and will continue working on this. I know it's a little daunting to tune neural nets and the suggestions I just gave you probably seem like they come from a black box or some sort of weird voodoo. There's definitely best practices with neural nets I hope to establish. I will continue working on examples and the like.

nightscape pushed a commit to nightscape/deeplearning4j that referenced this issue Jul 4, 2018
@lock
Copy link

lock bot commented Jan 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Jan 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants