Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caffe model fails to learn #6553

Open
peterbence3 opened this issue Oct 1, 2018 · 1 comment
Open

Caffe model fails to learn #6553

peterbence3 opened this issue Oct 1, 2018 · 1 comment

Comments

@peterbence3
Copy link

I have the following convolutional model implemented in Keras, where after training for 100,000 epoch, it shows excellent performance with greate accuracy.

img_rows, img_cols = 24, 15
input_shape = (img_rows, img_cols, 1)
nb_filters = 32
pool_size = (2, 2)
kernel_size = (3, 3)

model = Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
                        border_mode='valid',
                        input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy'])

However after trying to implement the same model in Caffe, it fails to train with an almost fixed loss value >=2.1 && <=2.6. Here is my Caffe prototext implementation:

name: "FneishNet"
layer {
  name: "inlayer1"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  data_param {
    source: "examples/fneishnet_numbers/fneishnet_numbers_train_lmdb"
    batch_size: 128
    backend: LMDB
  }
}
layer {
  name: "inlayer1"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  data_param {
    source: "examples/fneishnet_numbers/fneishnet_numbers_val_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "conv1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv2"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 1
  }
}
layer {
  name: "drop1"
  type: "Dropout"
  bottom: "pool1"
  top: "pool1"
  dropout_param {
    dropout_ratio: 0.25
  }
}
layer {
  name: "flatten1"
  type: "Flatten"
  bottom: "pool1"
  top: "flatten1"
}
layer {
  name: "fc1"
  type: "InnerProduct"
  bottom: "flatten1"
  top: "fc1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 128
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "fc1"
  top: "fc1"
}
layer {
  name: "drop2"
  type: "Dropout"
  bottom: "fc1"
  top: "fc1"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc2"
  type: "InnerProduct"
  bottom: "fc1"
  top: "fc2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 11
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "fc2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc2"
  bottom: "label"
  top: "loss"
}

And here is my model solver (hyper-parameters):

net: "models/fneishnet_numbers/train_val.prototxt"
test_iter: 1000
test_interval: 4000
test_initialization: false
display: 40
average_loss: 40
base_lr: 0.01
gamma: 0.1
lr_policy: "poly"
power: 0.5
max_iter: 3000000
momentum: 0.9
weight_decay: 0.0005
snapshot: 100000
snapshot_prefix: "models/fneishnet_numbers/fneishnet_numbers_quick"
solver_mode: CPU

I believe that if i have no problem translating the model into Caffe, then it should performs the same way it do in Keras, so i think i had missed something. Any help would be appreciated, thanks.

@peterbence3
Copy link
Author

thanks to @Dusa, who saved me through his answer here.

where is solved my problem through changing my hyper-parameters to fit Adam's optimizer specifications as shown here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant