Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiLayer Perceptron #143

Closed
webanalytics opened this issue Feb 2, 2015 · 27 comments
Closed

MultiLayer Perceptron #143

webanalytics opened this issue Feb 2, 2015 · 27 comments

Comments

@webanalytics
Copy link

One way I look at dl4j is a repository of NN components that I can experiment different NN configurations.
I believe having a traditional multiLayer perceptron implementation could help with that.
This could allow developers to build hybrids NN architectures, for instance.

Thanks

@agibsonccc
Copy link
Contributor

I'll add that you can already do this with the MultiLayerNetwork. In this case assembling layers is the crux of the 0.0.3.3 and up releases. When I was thinking of a backprop network, I was thinking more of implementing the layer interface. Fitting an output layer with respect to logistic regression and pretrain is equivalent to a backpropagation network. If you mean being able to compose different neural networks together say: for a convolutional neural network, these already exist.

@agibsonccc
Copy link
Contributor

Did you look at the new layer api: https://github.com/SkymindIO/dl4j-0.0.3.3-examples/blob/master/src/main/java/org/deeplearning4j/mnist/full/DBNExample.java

The MultiLayerNetwork there is the key. I've reiterated this enough times, but I"ll do it again: I plan on having a good set of documentation for 0.0.3.4 now that dl4j is near feature complete with yarn and spark integration. I'm not expecting the API to change much after this. This composable network/hybrid architecture is more or less what people want in a neural network framework though.

@sebap
Copy link

sebap commented Feb 2, 2015

Please leave the attitude somewhere else. We do not need it here.

@agibsonccc
Copy link
Contributor

Text isn't the best way to convey emotions ;) Im asking for clarification.
If I come off as demeaning that's on you.

I was merely pointing at examples of more or less what he was looking for.

I admitted the docs aren't there yet which is why I was linking to relevant
examples to see if that is what he was looking for.

If you could kindly point out where I had an attitude I will gladly
rephrase my posting. Let's leave opinions out of this and focus on figuring
out if I already implemented what was being asked.

Thanks!

Please leave the attitude somewhere else. We do not need it here.

Edit: I see what looks like snark. Fixed.

@webanalytics
Copy link
Author

Before I posted, I had actually modified the class
org.deeplearning4j.nn.multilayer.MultiLayerTest
To better understand MultiLayerNetwork. I also made the BaseLayer not abstract to use it in the network (code below). I believe BaseLayer is a perpceptron.
Also I'm not sure if backpropagation is done in the hidden layers.
Anyway, I agree the documentation could be improved but understand there might exist too many parallel efforts in implementing dl4j. As a suggestion for documentation, the site http://deeplearning.net/tutorial/ provides very helpful examples for people who wants to use their api. It's clear there, for instance, how MLP was implemented.
Thanks!

LayerFactory layerFactory = LayerFactories.getFactory(BaseLayer.class);
MultiLayerNetwork network = null;
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.optimizationAlgo(OptimizationAlgorithm.ITERATION_GRADIENT_DESCENT)
.iterations(100).weightInit(WeightInit.VI)
.activationFunction(Activations.tanh()).corruptionLevel(0)
.nIn(4).nOut(3).layerFactory(layerFactory).regularization(true)
.list(3).hiddenLayerSizes(new int[]{3, 2}).override(new NeuralNetConfiguration.ConfOverride() {
@OverRide
public void override(int i, NeuralNetConfiguration.Builder builder) {
if (i == 2) {
builder.layerFactory(new DefaultLayerFactory(OutputLayer.class));
builder.activationFunction(Activations.softMaxRows());
builder.lossFunction(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD);

                    }
                }
            }).build();

    conf.setPretrain(false);
    network = new MultiLayerNetwork(conf);

@winstonquock
Copy link

We need backprop in hidden layers like http://deeplearning.net/tutorial/DBN.html#dbn

Though I suggest that instead of making all hidden layers both RMBs and percetrons, we should allow sub-sequence of each, e.g. layers 0-4 are RBMs and layers 4-6 are percetrons (layer 4 is both.)

@agibsonccc
Copy link
Contributor

If you notice how they implement it, they only use logistic regression with
pretrained layer as the back propagation. I had a combination of those
before which caused the terrible f1 scores. You realize it works now right?
:P

On Sun, Feb 1, 2015 at 9:36 PM, winstonquock notifications@github.com
wrote:

We need backprop in hidden layers like
http://deeplearning.net/tutorial/DBN.html#dbn

Though I suggest that instead of making all hidden layers both RMBs and
percetrons, we should allow sub-sequence of each, e.g. layers 0-4 are RBMs
and layers 4-6 are percetrons (layer 4 is both.)


Reply to this email directly or view it on GitHub
#143 (comment)
.

@winstonquock
Copy link

I think you meant they use only the logistic regression layer's error to calculate the gradient for all layers? That seems to be the case.

            self.params.extend(sigmoid_layer.params)

       # compute the gradients with respect to the model parameters
        gparams = T.grad(self.finetune_cost, self.params)

        # compute list of fine-tuning updates
        updates = []
        for param, gparam in zip(self.params, gparams):
            updates.append((param, param - gparam * learning_rate))

It looks different than the backprop algorithm I read in the book, but I'm not too sure what the right way is. I would say we should do backprop in the hidden layers but go with a correct algorithm. Also, that's why I feel we should allow configuring sub-sequence of hidden layers to be percetrons rather than forcing all layers to be both. Basically I feel that the network architecture would look like a few RBMs (that are trained only by pre-training) followed by 2-3 layers of percetron layers (that are trained only by backprop,) but then, if possible, we can make the config more flexible to allow overlapping for the sake of experimentation.

@agibsonccc
Copy link
Contributor

@webanalytics
Copy link
Author

I can see there is a backprop method in the class MultiLayerNetwork but it is still not clear how to use it in its fit method. In the current version, fit only optimizes the parameters of the output layer.

@agibsonccc
Copy link
Contributor

The reason I switched it to this version of the MLP is backprop isn't going to be generalizable for different neural networks. The new mix and match layer setup allows for more than just the normal 2 layer feed forward architecture.

One of the things that may not be obvious from the outside is that conv nets and some of the other layers have different backward derivatives to calculate. I left the code in there as a stub.
I'm going to design something for this, but many if not all deep learning now a days is focused on the pretrain layers/logistic regression approach.

@webanalytics
Copy link
Author

My point is MultiLayerNetwork should be more general and not having a predefined pretrain/output layer architecture for training.
Regarding network architecture, the state-of-art in image classification, for instance, does not use pretraining:
http://www.cs.toronto.edu/~hinton/absps/imagenet.pdf
For NLP tasks, the widely-used conv nets need to back propagate the errors to lower layers:
http://arxiv.org/pdf/1103.0398.pdf
So, it is important to have a flexible framework to be able to implement these kinds of nets.

@agibsonccc
Copy link
Contributor

If you notice I'm not disagreeing with you. I left that code in there for a reason. My commits clearly demonstrate that that kind of general purpose architecture is what I'm moving towards. I read on avg ~10 papers a week (including re reading some things). I'm not ignorant of current results by any means. I physically meet the people who write these papers :P. Despite us trying to be the framework for industry, I'd be daft to not be in to the research myself. That's why I made this framework in the first place.

Like anything in software engineering there's design trade offs involved, testing, and everything else. In the process of the reorganization, there's one training method that works that will satisfy a good portion of people. The other one right out the gate was going to be a little harder to support, however the way I designed I laid the ground work.

The only thing I can do right now while trying to balance everything else is give you the hooks to allow for backwards architecture.

Here's more or less what I wanted to do when I get the bandwidth: (remember I have examples to get out the door, spark and yarn versions, maintaining two other libraries for vectorization and scientific computing) Deep learning 4j isn't just core.

Anyways:

I wanted to add a knob on the multi layer configuration for backwards and a back wards method to the layer interface. That will give people the means to do back propagation with arbitrary layers.

If you yourself would like the responsibility of adding the necessary code in the hooks, I gladily accept pull requests. My only compromise for right now will be giving you the necessary method stubs. Will that work? Otherwise this will have to wait a bit.

@winstonquock
Copy link

@agibsonccc so the BaseMultiLayerNetwork had the back-prop before but it was taken out because of the poor F1 score, right? But why wouldn't other framework w/ back-prop encounter such issue? Could the poor score due to some other bug in the back-prop? [For my problem, the F1 is only 0.2 and very good, better than having the uniform outputs problem but not great. (it was "better" earlier due to the bug in the Evaluation class.)] Also in the old BaseMultiLayerNetwork, it still back-props to all hidden layer like deeplearning.net tutorial. I'm also interested to try the separatedly pure RBM and pure percetron layers as I read in some papers.

@agibsonccc
Copy link
Contributor

The old basemultilayernetwork did both.
Pure RBMs are already in there. They're just a layer in this case. Softmax activations are on there if you want to try classification.

Have you tried it since the pull request was put in?

@winstonquock Same message for you:
Anyways:

I wanted to add a knob on the multi layer configuration for backwards and a back wards method to the layer interface. That will give people the means to do back propagation with arbitrary layers.

If you yourself would like the responsibility of adding the necessary code in the hooks, I gladily accept pull requests. My only compromise for right now will be giving you the necessary method stubs. Will that work? Otherwise this will have to wait a bit.

@winstonquock
Copy link

OK, let me try that.

@winstonquock
Copy link

Is it possible to provide the complete implementation instead of a hook?

@agibsonccc
Copy link
Contributor

Like I said, no bandwidth right now. I added the hooks for later. Still working on some other stuff. If you can't tell by the mailing list among other things I'm buried right now.

@winstonquock
Copy link

hmm... the change appears to trash the regular non-back-prop training. Please see my discovery when testing #123

@agibsonccc
Copy link
Contributor

This gets back to what we talked about before. One or the other. That's why it's a flag.

@winstonquock
Copy link

But I did NOT turn on back-prop at all for the training. I actually stopped in the debugger to make sure of that.

@winstonquock
Copy link

@agibsonccc I further narrow down the problem. It looks like the change to the IterationGradientDescent cause the problem. If I revert only that class but keep everything else in that change set, I got back my previous results.

diff --git a/deeplearning4j-core/src/main/java/org/deeplearning4j/optimize/solvers/IterationGradientDescent.java b/deeplearning4j-core/src/main/java/org/deeplearning4j/optimize/solv
index 42ec653..0ba59bc 100644
--- a/deeplearning4j-core/src/main/java/org/deeplearning4j/optimize/solvers/IterationGradientDescent.java
+++ b/deeplearning4j-core/src/main/java/org/deeplearning4j/optimize/solvers/IterationGradientDescent.java
@@ -7,7 +7,6 @@ import org.deeplearning4j.nn.gradient.Gradient;
 import org.deeplearning4j.optimize.api.IterationListener;
 import org.deeplearning4j.optimize.api.StepFunction;
 import org.deeplearning4j.optimize.api.TerminationCondition;
-import org.deeplearning4j.optimize.terminations.EpsTermination;
 import org.nd4j.linalg.api.ndarray.INDArray;

 import java.util.Collection;
@@ -31,18 +30,14 @@ public class IterationGradientDescent extends BaseOptimizer {
     @Override
     public boolean optimize() {
         for(int i = 0; i < conf.getNumIterations(); i++) {
-            model.setScore();
-            model.iterate(model.input());
             Pair<Gradient,Double> score = model.gradientAndScore();
             INDArray gradient = score.getFirst().gradient(conf.getGradientList());
             INDArray params = model.params();
             updateGradientAccordingToParams(gradient,params,model.batchSize());
-            INDArray newParams = params.addi(gradient);
-            model.setParams(newParams);
+            model.setParams(params.addi(gradient));
             for(IterationListener listener : conf.getListeners())
                 listener.iterationDone(model,i);
             log.info("Error at iteration " + i + " was " + model.score());
-
         }
         return true;
     }
(END)

F1: 0.20407190295160219 precision: 0.20196057270719753. recall: 0.20622784397707364. accuracy: 0.4314270585120466
  CLASS[0]: precision: 0.25925925925925924 recall: 0.10627062706270628
  CLASS[1]: precision: 0.33490566037735847 recall: 0.31678750697155605
  CLASS[2]: precision: 0.07009345794392523 recall: 0.03740648379052369
  CLASS[3]: precision: 0.1590909090909091 recall: 0.0782122905027933
  CLASS[4]: precision: 0.18645357686453576 recall: 0.49246231155778897

@winstonquock
Copy link

@agibsonccc how should this one be fixed?

@agibsonccc
Copy link
Contributor

So I'm coming back around to the back prop stuff now. As of right now there is only the feed forward impl. I'll look at the other architectures and try to consolidate. With the stuff I mentioned earlier, it shouldn't be difficult to get it in place.

@agibsonccc
Copy link
Contributor

Fixed for normal feed foward.

AlexDBlack added a commit that referenced this issue May 21, 2018
@lock
Copy link

lock bot commented Jan 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Jan 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants