MultiLayer Perceptron #143

webanalytics · 2015-02-02T00:05:46Z

One way I look at dl4j is a repository of NN components that I can experiment different NN configurations.
I believe having a traditional multiLayer perceptron implementation could help with that.
This could allow developers to build hybrids NN architectures, for instance.

Thanks

agibsonccc · 2015-02-02T00:21:24Z

I'll add that you can already do this with the MultiLayerNetwork. In this case assembling layers is the crux of the 0.0.3.3 and up releases. When I was thinking of a backprop network, I was thinking more of implementing the layer interface. Fitting an output layer with respect to logistic regression and pretrain is equivalent to a backpropagation network. If you mean being able to compose different neural networks together say: for a convolutional neural network, these already exist.

agibsonccc · 2015-02-02T00:26:05Z

Did you look at the new layer api: https://github.com/SkymindIO/dl4j-0.0.3.3-examples/blob/master/src/main/java/org/deeplearning4j/mnist/full/DBNExample.java

The MultiLayerNetwork there is the key. I've reiterated this enough times, but I"ll do it again: I plan on having a good set of documentation for 0.0.3.4 now that dl4j is near feature complete with yarn and spark integration. I'm not expecting the API to change much after this. This composable network/hybrid architecture is more or less what people want in a neural network framework though.

sebap · 2015-02-02T00:45:30Z

Please leave the attitude somewhere else. We do not need it here.

agibsonccc · 2015-02-02T00:53:54Z

Text isn't the best way to convey emotions ;) Im asking for clarification.
If I come off as demeaning that's on you.

I was merely pointing at examples of more or less what he was looking for.

I admitted the docs aren't there yet which is why I was linking to relevant
examples to see if that is what he was looking for.

If you could kindly point out where I had an attitude I will gladly
rephrase my posting. Let's leave opinions out of this and focus on figuring
out if I already implemented what was being asked.

Thanks!

Please leave the attitude somewhere else. We do not need it here.

Edit: I see what looks like snark. Fixed.

webanalytics · 2015-02-02T01:29:56Z

Before I posted, I had actually modified the class
org.deeplearning4j.nn.multilayer.MultiLayerTest
To better understand MultiLayerNetwork. I also made the BaseLayer not abstract to use it in the network (code below). I believe BaseLayer is a perpceptron.
Also I'm not sure if backpropagation is done in the hidden layers.
Anyway, I agree the documentation could be improved but understand there might exist too many parallel efforts in implementing dl4j. As a suggestion for documentation, the site http://deeplearning.net/tutorial/ provides very helpful examples for people who wants to use their api. It's clear there, for instance, how MLP was implemented.
Thanks!

LayerFactory layerFactory = LayerFactories.getFactory(BaseLayer.class);
MultiLayerNetwork network = null;
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.optimizationAlgo(OptimizationAlgorithm.ITERATION_GRADIENT_DESCENT)
.iterations(100).weightInit(WeightInit.VI)
.activationFunction(Activations.tanh()).corruptionLevel(0)
.nIn(4).nOut(3).layerFactory(layerFactory).regularization(true)
.list(3).hiddenLayerSizes(new int[]{3, 2}).override(new NeuralNetConfiguration.ConfOverride() {
@OverRide
public void override(int i, NeuralNetConfiguration.Builder builder) {
if (i == 2) {
builder.layerFactory(new DefaultLayerFactory(OutputLayer.class));
builder.activationFunction(Activations.softMaxRows());
builder.lossFunction(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD);

                    }
                }
            }).build();

    conf.setPretrain(false);
    network = new MultiLayerNetwork(conf);

winstonquock · 2015-02-02T05:36:34Z

We need backprop in hidden layers like http://deeplearning.net/tutorial/DBN.html#dbn

Though I suggest that instead of making all hidden layers both RMBs and percetrons, we should allow sub-sequence of each, e.g. layers 0-4 are RBMs and layers 4-6 are percetrons (layer 4 is both.)

agibsonccc · 2015-02-02T05:42:48Z

If you notice how they implement it, they only use logistic regression with
pretrained layer as the back propagation. I had a combination of those
before which caused the terrible f1 scores. You realize it works now right?
:P

On Sun, Feb 1, 2015 at 9:36 PM, winstonquock notifications@github.com
wrote:

We need backprop in hidden layers like
http://deeplearning.net/tutorial/DBN.html#dbn

Though I suggest that instead of making all hidden layers both RMBs and
percetrons, we should allow sub-sequence of each, e.g. layers 0-4 are RBMs
and layers 4-6 are percetrons (layer 4 is both.)

—
Reply to this email directly or view it on GitHub
#143 (comment)
.

winstonquock · 2015-02-02T22:34:18Z

I think you meant they use only the logistic regression layer's error to calculate the gradient for all layers? That seems to be the case.

            self.params.extend(sigmoid_layer.params)

       # compute the gradients with respect to the model parameters
        gparams = T.grad(self.finetune_cost, self.params)

        # compute list of fine-tuning updates
        updates = []
        for param, gparam in zip(self.params, gparams):
            updates.append((param, param - gparam * learning_rate))

It looks different than the backprop algorithm I read in the book, but I'm not too sure what the right way is. I would say we should do backprop in the hidden layers but go with a correct algorithm. Also, that's why I feel we should allow configuring sub-sequence of hidden layers to be percetrons rather than forcing all layers to be both. Basically I feel that the network architecture would look like a few RBMs (that are trained only by pre-training) followed by 2-3 layers of percetron layers (that are trained only by backprop,) but then, if possible, we can make the config more flexible to allow overlapping for the sake of experimentation.

agibsonccc · 2015-02-02T22:37:44Z

This is what you're looking for.

https://github.com/SkymindIO/deeplearning4j/blob/master/deeplearning4j-core/src/main/java/org/deeplearning4j/nn/multilayer/MultiLayerNetwork.java#L602

You'll notice they call it an MLP at the bottom.

webanalytics · 2015-02-03T15:07:08Z

I can see there is a backprop method in the class MultiLayerNetwork but it is still not clear how to use it in its fit method. In the current version, fit only optimizes the parameters of the output layer.

agibsonccc · 2015-02-03T15:18:37Z

The reason I switched it to this version of the MLP is backprop isn't going to be generalizable for different neural networks. The new mix and match layer setup allows for more than just the normal 2 layer feed forward architecture.

One of the things that may not be obvious from the outside is that conv nets and some of the other layers have different backward derivatives to calculate. I left the code in there as a stub.
I'm going to design something for this, but many if not all deep learning now a days is focused on the pretrain layers/logistic regression approach.

webanalytics · 2015-02-03T16:02:59Z

My point is MultiLayerNetwork should be more general and not having a predefined pretrain/output layer architecture for training.
Regarding network architecture, the state-of-art in image classification, for instance, does not use pretraining:
http://www.cs.toronto.edu/~hinton/absps/imagenet.pdf
For NLP tasks, the widely-used conv nets need to back propagate the errors to lower layers:
http://arxiv.org/pdf/1103.0398.pdf
So, it is important to have a flexible framework to be able to implement these kinds of nets.

agibsonccc · 2015-02-03T16:44:18Z

If you notice I'm not disagreeing with you. I left that code in there for a reason. My commits clearly demonstrate that that kind of general purpose architecture is what I'm moving towards. I read on avg ~10 papers a week (including re reading some things). I'm not ignorant of current results by any means. I physically meet the people who write these papers :P. Despite us trying to be the framework for industry, I'd be daft to not be in to the research myself. That's why I made this framework in the first place.

Like anything in software engineering there's design trade offs involved, testing, and everything else. In the process of the reorganization, there's one training method that works that will satisfy a good portion of people. The other one right out the gate was going to be a little harder to support, however the way I designed I laid the ground work.

The only thing I can do right now while trying to balance everything else is give you the hooks to allow for backwards architecture.

Here's more or less what I wanted to do when I get the bandwidth: (remember I have examples to get out the door, spark and yarn versions, maintaining two other libraries for vectorization and scientific computing) Deep learning 4j isn't just core.

Anyways:

I wanted to add a knob on the multi layer configuration for backwards and a back wards method to the layer interface. That will give people the means to do back propagation with arbitrary layers.

If you yourself would like the responsibility of adding the necessary code in the hooks, I gladily accept pull requests. My only compromise for right now will be giving you the necessary method stubs. Will that work? Otherwise this will have to wait a bit.

winstonquock · 2015-02-04T05:13:32Z

@agibsonccc so the BaseMultiLayerNetwork had the back-prop before but it was taken out because of the poor F1 score, right? But why wouldn't other framework w/ back-prop encounter such issue? Could the poor score due to some other bug in the back-prop? [For my problem, the F1 is only 0.2 and very good, better than having the uniform outputs problem but not great. (it was "better" earlier due to the bug in the Evaluation class.)] Also in the old BaseMultiLayerNetwork, it still back-props to all hidden layer like deeplearning.net tutorial. I'm also interested to try the separatedly pure RBM and pure percetron layers as I read in some papers.

agibsonccc · 2015-02-04T05:23:58Z

The old basemultilayernetwork did both.
Pure RBMs are already in there. They're just a layer in this case. Softmax activations are on there if you want to try classification.

Have you tried it since the pull request was put in?

@winstonquock Same message for you:
Anyways:

I wanted to add a knob on the multi layer configuration for backwards and a back wards method to the layer interface. That will give people the means to do back propagation with arbitrary layers.

If you yourself would like the responsibility of adding the necessary code in the hooks, I gladily accept pull requests. My only compromise for right now will be giving you the necessary method stubs. Will that work? Otherwise this will have to wait a bit.

winstonquock · 2015-02-04T19:01:08Z

OK, let me try that.

agibsonccc · 2015-02-05T15:52:33Z

So here's the hooks now:

https://github.com/SkymindIO/deeplearning4j/blob/master/deeplearning4j-core/src/main/java/org/deeplearning4j/nn/layers/BaseLayer.java#L370

The MultiLayerNetwork hook is here, currently untested:
https://github.com/SkymindIO/deeplearning4j/blob/master/deeplearning4j-core/src/main/java/org/deeplearning4j/nn/multilayer/MultiLayerNetwork.java#L1008

winstonquock · 2015-02-07T17:25:39Z

Is it possible to provide the complete implementation instead of a hook?

agibsonccc · 2015-02-08T02:46:12Z

Like I said, no bandwidth right now. I added the hooks for later. Still working on some other stuff. If you can't tell by the mailing list among other things I'm buried right now.

winstonquock · 2015-02-08T18:32:26Z

hmm... the change appears to trash the regular non-back-prop training. Please see my discovery when testing #123

agibsonccc · 2015-02-08T18:39:16Z

This gets back to what we talked about before. One or the other. That's why it's a flag.

winstonquock · 2015-02-08T19:43:49Z

But I did NOT turn on back-prop at all for the training. I actually stopped in the debugger to make sure of that.

winstonquock · 2015-02-09T00:23:13Z

@agibsonccc I further narrow down the problem. It looks like the change to the IterationGradientDescent cause the problem. If I revert only that class but keep everything else in that change set, I got back my previous results.

diff --git a/deeplearning4j-core/src/main/java/org/deeplearning4j/optimize/solvers/IterationGradientDescent.java b/deeplearning4j-core/src/main/java/org/deeplearning4j/optimize/solv
index 42ec653..0ba59bc 100644
--- a/deeplearning4j-core/src/main/java/org/deeplearning4j/optimize/solvers/IterationGradientDescent.java
+++ b/deeplearning4j-core/src/main/java/org/deeplearning4j/optimize/solvers/IterationGradientDescent.java
@@ -7,7 +7,6 @@ import org.deeplearning4j.nn.gradient.Gradient;
 import org.deeplearning4j.optimize.api.IterationListener;
 import org.deeplearning4j.optimize.api.StepFunction;
 import org.deeplearning4j.optimize.api.TerminationCondition;
-import org.deeplearning4j.optimize.terminations.EpsTermination;
 import org.nd4j.linalg.api.ndarray.INDArray;

 import java.util.Collection;
@@ -31,18 +30,14 @@ public class IterationGradientDescent extends BaseOptimizer {
     @Override
     public boolean optimize() {
         for(int i = 0; i < conf.getNumIterations(); i++) {
-            model.setScore();
-            model.iterate(model.input());
             Pair<Gradient,Double> score = model.gradientAndScore();
             INDArray gradient = score.getFirst().gradient(conf.getGradientList());
             INDArray params = model.params();
             updateGradientAccordingToParams(gradient,params,model.batchSize());
-            INDArray newParams = params.addi(gradient);
-            model.setParams(newParams);
+            model.setParams(params.addi(gradient));
             for(IterationListener listener : conf.getListeners())
                 listener.iterationDone(model,i);
             log.info("Error at iteration " + i + " was " + model.score());
-
         }
         return true;
     }
(END)

F1: 0.20407190295160219 precision: 0.20196057270719753. recall: 0.20622784397707364. accuracy: 0.4314270585120466
  CLASS[0]: precision: 0.25925925925925924 recall: 0.10627062706270628
  CLASS[1]: precision: 0.33490566037735847 recall: 0.31678750697155605
  CLASS[2]: precision: 0.07009345794392523 recall: 0.03740648379052369
  CLASS[3]: precision: 0.1590909090909091 recall: 0.0782122905027933
  CLASS[4]: precision: 0.18645357686453576 recall: 0.49246231155778897

winstonquock · 2015-02-09T20:21:26Z

@agibsonccc how should this one be fixed?

agibsonccc · 2015-03-06T14:04:18Z

So I'm coming back around to the back prop stuff now. As of right now there is only the feed forward impl. I'll look at the other architectures and try to consolidate. With the stuff I mentioned earlier, it shouldn't be difficult to get it in place.

agibsonccc · 2015-03-29T20:55:37Z

Fixed for normal feed foward.

Fix for DropoutSpace + test

lock · 2019-01-22T00:09:13Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

winstonquock mentioned this issue Feb 9, 2015

batch training produce poor results #123

Closed

agibsonccc closed this as completed Mar 29, 2015

AlexDBlack added a commit that referenced this issue May 21, 2018

Merge pull request #143 from deeplearning4j/ab_dropout

e3d20b4

Fix for DropoutSpace + test

lock bot locked and limited conversation to collaborators Jan 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiLayer Perceptron #143

MultiLayer Perceptron #143

webanalytics commented Feb 2, 2015

agibsonccc commented Feb 2, 2015

agibsonccc commented Feb 2, 2015

sebap commented Feb 2, 2015

agibsonccc commented Feb 2, 2015

webanalytics commented Feb 2, 2015

winstonquock commented Feb 2, 2015

agibsonccc commented Feb 2, 2015

winstonquock commented Feb 2, 2015

agibsonccc commented Feb 2, 2015

webanalytics commented Feb 3, 2015

agibsonccc commented Feb 3, 2015

webanalytics commented Feb 3, 2015

agibsonccc commented Feb 3, 2015

winstonquock commented Feb 4, 2015

agibsonccc commented Feb 4, 2015

winstonquock commented Feb 4, 2015

agibsonccc commented Feb 5, 2015

winstonquock commented Feb 7, 2015

agibsonccc commented Feb 8, 2015

winstonquock commented Feb 8, 2015

agibsonccc commented Feb 8, 2015

winstonquock commented Feb 8, 2015

winstonquock commented Feb 9, 2015

winstonquock commented Feb 9, 2015

agibsonccc commented Mar 6, 2015

agibsonccc commented Mar 29, 2015

lock bot commented Jan 22, 2019

MultiLayer Perceptron #143

MultiLayer Perceptron #143

Comments

webanalytics commented Feb 2, 2015

agibsonccc commented Feb 2, 2015

agibsonccc commented Feb 2, 2015

sebap commented Feb 2, 2015

agibsonccc commented Feb 2, 2015

webanalytics commented Feb 2, 2015

winstonquock commented Feb 2, 2015

agibsonccc commented Feb 2, 2015

winstonquock commented Feb 2, 2015

agibsonccc commented Feb 2, 2015

webanalytics commented Feb 3, 2015

agibsonccc commented Feb 3, 2015

webanalytics commented Feb 3, 2015

agibsonccc commented Feb 3, 2015

winstonquock commented Feb 4, 2015

agibsonccc commented Feb 4, 2015

winstonquock commented Feb 4, 2015

agibsonccc commented Feb 5, 2015

winstonquock commented Feb 7, 2015

agibsonccc commented Feb 8, 2015

winstonquock commented Feb 8, 2015

agibsonccc commented Feb 8, 2015

winstonquock commented Feb 8, 2015

winstonquock commented Feb 9, 2015

winstonquock commented Feb 9, 2015

agibsonccc commented Mar 6, 2015

agibsonccc commented Mar 29, 2015

lock bot commented Jan 22, 2019