New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uniform predictions with DBN #105
Comments
is there any progress / investigation of this. |
I ran this with more sane datasets (iris, mnist, et.al) where I was running in to problems. My main goal was to surface signal with those and see it learning something. As far as I'm concerned it works. I'll run this later. I'd like to see something a little more reasonable though. For one, I'd scale the values to be in a more reasonable range. Your layer sizes are disproportional to your input feature size. 5 labels and 2 examples isn't going to give you a balanced dataset let alone any signal in your data. This example will require some cleaning up which I will do here in the next day or 2. I have other priorities atm. |
I had a similar problem with Iris: Actual Class 0 was predicted with Predicted 0 with count 47 times Actual Class 0 was predicted with Predicted 1 with count 3 times Actual Class 1 was predicted with Predicted 0 with count 5 times Actual Class 1 was predicted with Predicted 1 with count 45 times ==========================F1 Scores======================================== 0.9388254052617593The output layers take on very similar values: 0.3333843909362641 : 0.33328239088833167 : 0.33333321817540423 It seems to report predicting them reliably but the output layers take on a uniform value. There is a mismatch, either the output layer values are wrong or the F1 score is wrong |
Hmm. I'll write a few unit tests and post them here. |
Here's a baseline eval test: It increments properly, the confusion matrix is fine. I'll also create a specific example with the DBN as well, to show that it selects each guess properly as well. |
Rather than "prove it works", do me a favor and step through the debugger on eval.eval here: and see for yourself. You'll see that it guesses the right indices just fine. I'm open to discussion either way. |
I guess I am running into two problems. I am wanting to print out the output probabilities for each example, which I do (for iris) with: for (int i = 0; i < output.rows(); i++) { but they all look the same: 0.3333339584580804 : 0.3333332336519369 : 0.33333280788998265 I am not confident I am printing out the correct thing here. Also, with the iris example if I add class 2 examples by increasing but then it loses the ability to predict class 1 accurately Actual Class 0 was predicted with Predicted 0 with count 49 times Actual Class 0 was predicted with Predicted 2 with count 1 times Actual Class 1 was predicted with Predicted 0 with count 14 times Actual Class 1 was predicted with Predicted 1 with count 15 times Actual Class 1 was predicted with Predicted 2 with count 21 times Actual Class 2 was predicted with Predicted 0 with count 1 times Actual Class 2 was predicted with Predicted 1 with count 6 times Actual Class 2 was predicted with Predicted 2 with count 43 times ==========================F1 Scores======================================== 0.7897365529258937Any idea what I am doing wrong/ is going wrong. Essentially I would like to be able to make ROC curves for some two class problems I have (hence the output layer activation's needed) |
It looks like the output prediction probs are different. So it is predicting things correctly. The problem is there isn't a big difference between them. So a ROC curve wouldn't look like its almost guessing. Any way to get output probabilities for iris to be more weighted towards the example? i.e. 0.999 : 0.00001: 0.00001 ect. Also, I assume when I use the same data set: for train and test, it is showing the train example when doing the testing stage? If I run for loads of iterations would it not overfit and predict quite well? (I know that's cheating normally!) Or does the dataset split automatically into train/validation/test? With my actual examples I split manually into train/test on a 70/30% basis Hope you can clarify I am understanding this right Dean |
It doesn't split normally. I have a method on dataset called splitTestAndTrain which does that. Part of me updating the examples will be including that usage int here.Other than that, the new iris test I put out and linked to earlier does that exact thing. |
I was just trying to construct the simplest possible test to show the problem. I tried with 1000 examples of alternating values of 1 and 2: final int nSamples = 1000;
INDArray input = Nd4j.create(nSamples, 614);
INDArray labels = Nd4j.create(nSamples, 5);
for (int i = 0; i < nSamples; i++) {
INDArray row = Nd4j.create(1, 614);
if (i % 2 == 0) {
row.assign(1f);
labels.put(i, 3, 1); // set the 4th column
} else {
row.assign(2f);
labels.put(i, 1, 1); // set the 2nd column
}
input.putRow(i, row); // just realize it had copy-and-paste error before; however similar result
} (trials with different params: 10 or 100 hidden layer size; 10 or 100 iterations)
|
I get the intent which is why I haven't closed the issue or anything. I'd just like to take a crack at my interpretation of the same example. |
Update on this: better examples are a huge priority and I will integrate a few simple examples. Unfortunately, I put this off due to a huge API upgrade coming. Since we fixed the problems with backprop being diluted, I have primarily diverted my efforts towards finishing up the 0.0.3.3 release. Part of that release will be a plug and play layer architecture which will allow for people to build their own neural nets (including conv nets, recurrent,...). Once this new api upgrade is out, I will work on putting together some better examples. I am still primarily hesitant since things are so fast moving. This is looking more and more like a real neural net library with this release and I'll feel a little more comfortable putting out some neat recipes since things will be a little more configurable. One of my primary concerns has been making deeplearning4j feature complete. One of my top priorities has been to support every form of neural net architecture allowing for people to pick and choose their architecture. Hopefully this new layer api will reduce problems a bit. Adam |
I'm able to update the test to the latest 0.0.3.3 with the new API and it produces exact predictions. One thing though, I have to set the iteration high. If I set it to something like 100 iterations, the results flicker among repeated runs (of the exact binary and data, with fixed RNG seed) because the label value differences are too small and easily impact by round off errors. This seems contradicting to advice in http://deeplearning4j.org/troubleshootingneuralnets.html Is this normal for this very simple problem?
The updated test codes package org.deeplearning4j;
import org.apache.commons.math3.random.MersenneTwister;
import org.deeplearning4j.eval.Evaluation;
import org.deeplearning4j.models.featuredetectors.rbm.RBM;
import org.deeplearning4j.nn.api.LayerFactory;
import org.deeplearning4j.nn.api.OptimizationAlgorithm;
import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
import org.deeplearning4j.nn.layers.OutputLayer;
import org.deeplearning4j.nn.layers.factory.DefaultLayerFactory;
import org.deeplearning4j.nn.layers.factory.LayerFactories;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.deeplearning4j.nn.weights.WeightInit;
import org.nd4j.linalg.api.activation.Activations;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.dataset.DataSet;
import org.nd4j.linalg.factory.Nd4j;
import org.nd4j.linalg.lossfunctions.LossFunctions;
public class DBNDummyTest {
public static void main(String... args) throws Exception {
final int nSamples = 1000;
int nFeatures = 10;
INDArray input = Nd4j.create(nSamples, nFeatures); // have to be at least two or else output layer gradient is a scalar and cause exception
INDArray labels = Nd4j.create(nSamples, 5);
for (int i = 0; i < nSamples; i++) {
INDArray row = Nd4j.create(1, nFeatures);
if (i % 2 == 0) {
row.assign(1f);
labels.put(i, 3, 1); // set the 4th column
} else {
row.assign(2f);
labels.put(i, 1, 1); // set the 2nd column
}
input.putRow(i, row);
}
DataSet trainingSet = new DataSet(input, labels);
MersenneTwister gen = new MersenneTwister(123);
final int iterations = 10000;
LayerFactory layerFactory = LayerFactories.getFactory(RBM.class);
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.iterations(iterations)
.weightInit(WeightInit.VI)
.activationFunction(Activations.softMaxRows())
.visibleUnit(RBM.VisibleUnit.BINARY)
.hiddenUnit(RBM.HiddenUnit.SOFTMAX)
.layerFactory(layerFactory)
.constrainGradientToUnitNorm(true)
// .lossFunction(LossFunctions.LossFunction.MCXENT)
.optimizationAlgo(OptimizationAlgorithm.ITERATION_GRADIENT_DESCENT)
.rng(gen)
.learningRate(1e-3f)
.nIn(trainingSet.numInputs()).nOut(trainingSet.numOutcomes()).list(2)
.hiddenLayerSizes(new int[]{10})
.override(new NeuralNetConfiguration.ConfOverride() {
@Override
public void override(int i, NeuralNetConfiguration.Builder builder) {
if (i == 1) {
builder.layerFactory(new DefaultLayerFactory(OutputLayer.class));
builder.iterations(iterations);
builder.weightInit(WeightInit.ZERO);
builder.activationFunction(Activations.softMaxRows());
builder.lossFunction(LossFunctions.LossFunction.MCXENT);
}
}
}).build();
MultiLayerNetwork nn = new MultiLayerNetwork(conf);
nn.fit(trainingSet);
INDArray predict2 = nn.output(trainingSet.getFeatureMatrix());
for (int i = 0; i < 2; i++) {
String actual = trainingSet.getLabels().getRow(i).toString().trim();
String predicted = predict2.getRow(i).toString().trim();
System.out.println("actual "+actual+" vs predicted " + predicted);
}
Evaluation eval = new Evaluation();
eval.eval(trainingSet.getLabels(), predict2);
System.out.println("F1: " + eval.f1());
}
} |
First normalize the data to zero mean and unit variance. Any neural net will produce weird output on data that isn't in between zero and 1. Especially with a DBN and binary data. If you still don't get signal changing the activation to tanh will help a lot as well. Your configuration for the activation function is also off, softmax rows should be for a classifier only. WRT your weight initialization, change that to be a smaller distribution. You can do this by changing the weight distribution to .DISTRIBUTION and specifying a distribution with .dist |
Now with the recommended changes, I can get perfect prediction in 5 iterations! Though, why is this not a classifier? Only one of the output label is 1 and different label based on one of the two input feature vectors. |
Great! It's still a classifier. (Only the output layer though) Keep in mind the point of a DBN is to generate features using pretrained RBMs. The output layer (using multi class cross entropy as the objective) then has a softmax activation (which is essentially logistic/softmax regression if you use zero for the weights) Your goal then is to propagate a feature set through the RBMs such that the Logistic Regression classifier can learn features. |
I'm going to close this since it appears to be resolved. Please feel free to resume the conversation on the mailing list with tuning this particular example. I'd be more than glad to help you if you put this up with a github gist/repo. I'm aware the docs need to be improved and will continue working on this. I know it's a little daunting to tune neural nets and the suggestions I just gave you probably seem like they come from a black box or some sort of weird voodoo. There's definitely best practices with neural nets I hope to establish. I will continue working on examples and the like. |
Remove ownership
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
With the latest 0.0.3.3-SNAPSHOT, the problem with zeroed output layer weight matrix is gone. However the problem of uniform prediction still remained. I modified my previous dummy test to have two different inputs/outputs: vector of 100 mapped to 4th label set; vector of 200 mapped to 2nd label set. After the training, both input vector produced exactly the same (to all decimals) prediction, even though the values of the different labels for the same prediction look random.
The output
The test codes
The text was updated successfully, but these errors were encountered: