Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train with this model for Mnist? #88

Closed
fdncred opened this issue Feb 28, 2018 · 11 comments
Closed

Train with this model for Mnist? #88

fdncred opened this issue Feb 28, 2018 · 11 comments

Comments

@fdncred
Copy link

fdncred commented Feb 28, 2018

I'm trying to use your software and mimic these results from this keras model using the MnistDemo program.

In particular, I'm looking at this section:

# Set the CNN model 
# my CNN architechture is In -> [[Conv2D->relu]*2 -> MaxPool2D -> Dropout]*2 -> Flatten -> Dense -> Dropout -> Out

model = Sequential()

model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', 
                 activation ='relu', input_shape = (28,28,1)))
model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', 
                 activation ='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Dropout(0.25))

model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))
model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))
model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(256, activation = "relu"))
model.add(Dropout(0.5))
model.add(Dense(10, activation = "softmax"))

This is what I have so far:

this._net = new Net<double>();
this._net.AddLayer(new InputLayer(28, 28, 1));
this._net.AddLayer(new ConvLayer(5, 5, 32) { Stride = 1, Pad = 2 });
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new ConvLayer(5, 5, 32) { Stride = 1, Pad = 2 });
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new PoolLayer(2, 2));
this._net.AddLayer(new DropoutLayer<double>() { DropProbability = 0.25 });

this._net.AddLayer(new ConvLayer(3, 3, 64) { Stride = 1, Pad = 2 });
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new ConvLayer(3, 3, 64) { Stride = 1, Pad = 2 });
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new PoolLayer(2, 2) { Stride = 2 });
this._net.AddLayer(new DropoutLayer<double>() { DropProbability = 0.25 });

this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new DropoutLayer<double>() { DropProbability = 0.5 });
this._net.AddLayer(new FullyConnLayer(10));
this._net.AddLayer(new SoftmaxLayer(10));

This seems close but I'm not exactly sure how to do the last section since I see no Flatten() or Dense() methods, so I'm not sure it will even work at all.

Do you have any ideas how to duplicate this model section with your codebase?

Thanks,
Darren

@cbovar
Copy link
Owner

cbovar commented Feb 28, 2018

I believe Dense in Keras is FullyConnLayer + Activation layer (Relu, Sigmoid, etc.) in ConvNetSharp
So you should probably add FullyConnLayer(256) between DropoutLayer and ReluLayer

@fdncred
Copy link
Author

fdncred commented Feb 28, 2018

Is this what you mean?

this._net = new Net<double>();
this._net.AddLayer(new InputLayer(28, 28, 1));
this._net.AddLayer(new ConvLayer(5, 5, 32) { Stride = 1, Pad = 2 });
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new ConvLayer(5, 5, 32) { Stride = 1, Pad = 2 });
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new PoolLayer(2, 2));
this._net.AddLayer(new DropoutLayer<double>() { DropProbability = 0.25 });

this._net.AddLayer(new ConvLayer(3, 3, 64) { Stride = 1, Pad = 2 });
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new ConvLayer(3, 3, 64) { Stride = 1, Pad = 2 });
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new PoolLayer(2, 2) { Stride = 2 });
this._net.AddLayer(new DropoutLayer<double>() { DropProbability = 0.25 });

this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new FullyConnLayer(256));
this._net.AddLayer(new DropoutLayer<double>() { DropProbability = 0.5 });
this._net.AddLayer(new FullyConnLayer(10));
this._net.AddLayer(new SoftmaxLayer(10));

@cbovar
Copy link
Owner

cbovar commented Feb 28, 2018

I would put FullyConnLayer(256) before the ReluLayer.

@fdncred
Copy link
Author

fdncred commented Feb 28, 2018

ok, thanks. I wasn't sure which Dropoutlayer and ReluLayer you were referencing. I'll give this a whirl. It'll take a while to train I'm sure.

@fdncred
Copy link
Author

fdncred commented Mar 6, 2018

I'm getting an exception Training when Backward() is called in the Train method. I was wondering if you can help.

In this exception reserveSpace is null.
cudadnn-exception
It's called from this code. So really it's _volumeStorage.DropoutStorage that's null.

public override void DoDropoutGradient(Volume<double> input, Volume<double> outputGradient, Volume<double> inputGradient, double dropProbability)
{
    var inputStorage = _volumeStorage;
    var outputGradientStorage = outputGradient.Storage as VolumeStorage;
    var inputGradientStorage = inputGradient.Storage as VolumeStorage;
.....
        _context.CudnnContext.DropoutBackward(dropoutDesc,
            dOutputDesc, outputGradientStorage.DeviceBuffer,
            dDataDesc, inputGradientStorage.DeviceBuffer,
            _volumeStorage.DropoutStorage);
    }
}

I figure it has to do with the way I stacked these layers but I'm not sure. I'm just trying to emulate the python code above as close as possible.

this._net = new Net<double>();
this._net.AddLayer(new InputLayer(28, 28, 1));
this._net.AddLayer(new ConvLayer(5, 5, 32) { Stride = 1, Pad = 2 });
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new ConvLayer(5, 5, 32) { Stride = 1, Pad = 2 });
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new PoolLayer(2, 2));
this._net.AddLayer(new DropoutLayer<double>() { DropProbability = 0.25 });

this._net.AddLayer(new ConvLayer(3, 3, 64) { Stride = 1, Pad = 2 });
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new ConvLayer(3, 3, 64) { Stride = 1, Pad = 2 });
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new PoolLayer(2, 2) { Stride = 2 });
this._net.AddLayer(new DropoutLayer<double>() { DropProbability = 0.25 });

this._net.AddLayer(new FullyConnLayer(256));
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new DropoutLayer<double>() { DropProbability = 0.5 });
this._net.AddLayer(new FullyConnLayer(10));
this._net.AddLayer(new SoftmaxLayer(10));

Any ideas?

Side Note: It seems to be working in CPU mode but it will take forever. ;)
Thanks,
Darren

@cbovar
Copy link
Owner

cbovar commented Mar 7, 2018

DropoutStorage is shared between forward and backward pass. It is supposed to be allocated in the forward pass.
DropOut was only added recently and may have some bugs. VolumeTests.Dropout() is passing though.

I will reproduce and try to understand that tonight.

@fdncred
Copy link
Author

fdncred commented Mar 7, 2018

Ok, good. Hopefully you can find something because my estimation code says that training this model via CPU will be finished on 6/12/2018 3:45pm. LOL. So, I won't be training this without GPU training.

I forgot to mention that I let this training run all night. It didn't get very far but I let it save out the model when I stopped the training. The model was a 35MB 1 line json file. Wow! Not sure what makes it so big but I can't imagine how big it'll be when I finish training this way.

@cbovar
Copy link
Owner

cbovar commented Mar 7, 2018

No more crash after PR #91

@cbovar
Copy link
Owner

cbovar commented Mar 9, 2018

The json file contains all parameters and their gradients. Overtime their value will change but there won't be more parameters. So the size of the file should remain close to 35MB in your case.
Gradients are not needed for inference so the file could be half it size.

@fdncred
Copy link
Author

fdncred commented Mar 9, 2018

Are you saying I can remove the FilterGradient and BiasGradient sections of the json entirely, in order to make the file smaller, and the model will still work for me?

@NickStrupat
Copy link

NickStrupat commented Sep 14, 2018

With this model I am getting this exception with 0.4.11-alpha

Exception has occurred: CLR/System.ArgumentException
An unhandled exception of type 'System.ArgumentException' occurred in ConvNetSharp.Volume.dll: 'Volume should have a Shape [1] to be converter to a System.Double'
   at ConvNetSharp.Volume.Volume`1.op_Implicit(Volume`1 v)
   at ConvNetSharp.Core.Layers.DropoutLayer`1.Backward(Volume`1 outputGradient)
   at ConvNetSharp.Core.Net`1.Backward(Volume`1 y)
   at ConvNetSharp.Core.Training.TrainerBase`1.Backward(Volume`1 y)
   at ConvNetSharp.Core.Training.TrainerBase`1.Train(Volume`1 x, Volume`1 y)
   at MnistDemo.Program.Train(Volume`1 x, Volume`1 y, Int32[] labels) in /Users/nick/Dev/MnistDemo/Program.cs:line 138
   at MnistDemo.Program.MnistDemo() in /Users/nick/Dev/MnistDemo/Program.cs:line 87
   at MnistDemo.Program.Main() in /Users/nick/Dev/MnistDemo/Program.cs:line 28
this._net = new Net<double>();
this._net.AddLayer(new InputLayer(28, 28, 1));
this._net.AddLayer(new ConvLayer(5, 5, 32) { Stride = 1, Pad = 2 });
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new ConvLayer(5, 5, 32) { Stride = 1, Pad = 2 });
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new PoolLayer(2, 2));
this._net.AddLayer(new DropoutLayer<double>(0.25));

this._net.AddLayer(new ConvLayer(3, 3, 64) { Stride = 1, Pad = 2 });
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new ConvLayer(3, 3, 64) { Stride = 1, Pad = 2 });
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new PoolLayer(2, 2) { Stride = 2 });
this._net.AddLayer(new DropoutLayer<double>(0.25));

this._net.AddLayer(new FullyConnLayer(256));
this._net.AddLayer(new ReluLayer());
this._net.AddLayer(new DropoutLayer<double>(0.5));
this._net.AddLayer(new FullyConnLayer(10));
this._net.AddLayer(new SoftmaxLayer(10));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants