Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train and Fine-Tuning LightCNN #110

Closed
TheusStremens opened this issue Mar 20, 2017 · 26 comments
Closed

Train and Fine-Tuning LightCNN #110

TheusStremens opened this issue Mar 20, 2017 · 26 comments

Comments

@TheusStremens
Copy link

First, congratulations and thank you for your work, it's very exciting to see that's possible to make a light CNN without millions (or billions) of parameters and achieve state-of-art accuracy.

I intend to do two experiments (varying type of activations, cost functions, solver types, neurons, ...) using the model C architecture, one training a new CNN on my database and another with fine-tuning of model C on my database. I made the following solver.prototxt and train_value.prototxt:

  • solver:
net: "LightenedCNN_New_train_val.prototxt"
test_iter: 1000
test_interval: 10000
iter_size: 60
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 500000
display: 100
max_iter: 5000000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "LightenedCNN_New_Net"
solver_mode: GPU
  • train_val
layer {
  name: "data"
  type:"Data"
  top: "data"
  top: "label"
  data_param{
	  source: "my_csv_train_database.txt"
	  batch_size: 32
	}
  transform_param {
    scale: 0.00390625
    crop_size: 128
    mirror: true
  }
  include: { phase: TRAIN }
}

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  data_param{
	  source: "my_csv_validation_database.txt"
	  batch_size: 10
	}
  transform_param {
    scale: 0.00390625
    crop_size: 128
    mirror: false
  }
  include: { phase: TEST }
}
...
the same layers on deploy

Could you tell me if this solver and train_val are similar to those you used on the final training of model C?
And for fine-tuning can I use the same solver used in train and just freeze the layers on the train_val or it's necessary another solver for fine-tuning?

Thanks

@AlfredXiangWu
Copy link
Owner

The configurations of solver and train_val are right for training light CNN and it is also suitable for fine-tuning on your own dataset.

@jiangxuehan
Copy link

As the description in your paper , “The learning rate is set to 1e-3 initially and reduced to 5e-5 gradually”.Could you please tell me the specific related parameters to achieve this point in Caffe , such as lr_policy\gamma\stepsize\max_iter etc. Thanks.

@TheusStremens
Copy link
Author

@jiangxuehan I believe that isn't a right answer to your question, the learning rate decay depends of your training database. In general, we reduce the learning rate when the training cost don't decrease after some iterations. So, the "best" way is to take a look in your training cost and reduce the learning rate after X iterations. In Caffe you can specify the number of steps for decrease the learning rate in the solver, for example:

lr_policy: "multistep"
gamma: 0.9
stepvalue: 5000
stepvalue: 7000
stepvalue: 8000
stepvalue: 9000
stepvalue: 9500

The multistep policy makes the learning rate reduce for gamma in each stepvalue (that you can find looking at training cost).
But if you just want follow the paper, it's easy to calculate the gamma and use step policy. For example, take a look at my solver, my stepsize is 500000 and max_iter is 5000000. This means that (in worse case) my learning rate will drop ten times. So, with base_lr = 0.001, after ten drops should be 0.00005. Calculating: base_lr * (gamma)^10 = 5e-5 we get gamma ~= 0.933.
So, the solver becomes:

lr_policy: "step"
gamma: 0.933
stepsize: 500000

If you won't do all the 5000000 iterations just adjust the equation.

@TheusStremens
Copy link
Author

@AlfredXiangWu I'm training using a Tesla K40 and at Iteration 1100 my training is loss = 11.3229. That's taking so long, is this normal? I normalized 5M images of MS-CELEB (clean list) using the paper's specification and used the solver of this issue.

@AlfredXiangWu
Copy link
Owner

@TheusStremens I think it is normal for training the light CNN.

@jiangxuehan You can follow the configurations as @TheusStremens mentioned. It is similar as my configurations.

@jiangxuehan
Copy link

@TheusStremens @AlfredXiangWu ,Thanks for your reply, i will follow similar configurations to train this model. BTW, the loss of light CNN drops slowly at the begin several thousands iterations, @TheusStremens
just keep training is OK.

@TheusStremens
Copy link
Author

Hi guys, after 7 days of training, the cost just barely oscillated and it's already at iteration 20K. Following this proportion, it will be in iteration 100K in 5 weeks and iteration 1M (1/5 of the max iteration number) in a year. @AlfredXiangWu is this normal? How long did your training take? Can you tell me the number of iterations in the end of your training?
ps: I'm training on a Tesla K40

@AlfredXiangWu
Copy link
Owner

AlfredXiangWu commented Apr 5, 2017

@TheusStremens Do you mean that you train the light CNN for about a week and the iterations are only 20k?

It is abnormal. I set max iteration to 4,000,000 and it takes about 1 week on Titan X.

@TheusStremens
Copy link
Author

I remove iter_size: 60 from the solver and the speed grows up. But now I have a problem with convergence like #36 my loss is 87.3365 at the beginning. Changing the batch_size to 80 apparently resolved the problem with the convergence, but the speed still abnormal (1/4 of your speed). Did you use iter_size in your training @AlfredXiangWu ? I'll try different sets of batch_size

@TheusStremens
Copy link
Author

The convergence problem doesn't change. Just happeed at iteration 8980. I'm using the normalization correctly, the same base_lr, same architecture, so I can't figure out what is the problem of convergence.

@AlfredXiangWu
Copy link
Owner

net: "DeepFace_set003_train_test.prototxt"

test_iter: 500
test_interval: 1000
test_compute_loss: true

base_lr: 0.001
momentum: 0.9
weight_decay: 0.0005
lr_policy: "step"
stepsize:500000
gamma:0.457305051927326

display: 100
max_iter: 4000000
snapshot: 40000
snapshot_prefix: "DeepFace_set003_net"

solver_mode: GPU

debug_info: false

clip_gradients: 150

The solver I used for training is above. Clipping gradient may help to solve your problems. If not, I think you can finetune the light CNN with your own datasets by the pre-trained model.

@lei-xiong
Copy link

lei-xiong commented Apr 18, 2017

@AlfredXiangWu @TheusStremens
I tried to train with MS-Celeb-1M and model C until 20 million iterations. Loss is always at 11.0, I use 61332 class altogether 390,000 pictures and batchsize = 96x4. Is this normal? How many times did you drift begin to drop significantly? Thank you

I tried to lower my learning rate

@TheusStremens
Copy link
Author

@xionglei181818 Did you use the clean list of MS-Celeb-1M? Why did you use only 390,000 pictures if MS-Celeb-1M have 5M+? What learning rate did you use?

In my case, shuffle the train data solves the problem with convergence. After 700K iterations the loss dropped to 3. Now I'm at 1,8M iterations, loss = 1 and acc = 89%

@lei-xiong
Copy link

lei-xiong commented Apr 19, 2017

@TheusStremens I use the clean list of MS-Celeb-1M and took 50 images of each category, so that after screening to get 61332 categories, about 390,000 images.

1、Using the learning rate provided by @AlfredXiangWu
base_lr: 0.001
momentum: 0.9
weight_decay: 0.0005
lr_policy: "step"
stepsize:200000
gamma: 0.457305051927326

2、Also try to use a set of parameters is
base_lr: 0.001
momentum: 0.9
weight_decay: 0.0005
lr_policy: "inv"
gamma: 0.000005
power:0.75

These two sets of parameters under the run, run 200,000 iteration loss has been around 11.0. Reduce the learning rate to 0.0001 as well. Have you observed this phenomenon? Thank you

@AlfredXiangWu
Copy link
Owner

@xionglei181818
I recommend that the policy of learning rate is set to "fixed" or "step" rather than "inv" .

@lyuchuny3
Copy link

I have trained on 1M_Celeb_MS with solver config provided by@AlfredXiangWu. It tooks 9 days on TitanX for 3,500,000 iters. The performance of my model on LFW is not as well as model C.
My test results:
model C: DIR= 0.835 @ FAR=1% on LFW
my model: DIR= 0.641 @ FAR=1% on LFW
I wonder the reasons are:

  • train images: I directly crop the aligned image of 1M-Celeb-MS without alignment. I note that:
    Dataset size ec_mc_y ec_y
    Training set 144x144 48 48
    Testing set 128x128 48 40
  • train batch: in my training, I set the batch for train is 124 (but I think this is not the main reason)
  • in proto of model C, I add param for weight decay for 'fc2' as mentioned in the paper
    param{
    lr_mult:1
    decay_mult:10
    }
    @AlfredXiangWu , do you have some advice?

@ctgushiwei
Copy link

@AlfredXiangWu @TheusStremens @lyuchuny3 can you share you train_tese_prototxt and your solver.prototxt? I'm training light cnn with the clean list, after screening to get 79056 categories, about 4,920,000 images.But run 400,000 iteration loss is also 11.2?can you give me a hand?

@yuzcccc
Copy link

yuzcccc commented Apr 27, 2017

how many iterations (what batch-size) are needed to achieve the results of model B trained on the CASIA-webface dataset?

@TheusStremens
Copy link
Author

TheusStremens commented Apr 27, 2017

@ctgushiwei
solver:

net: "/path_to_your_train_val_net/your_net_train_val.prototxt"
test_iter: 1000
test_interval: 10000
test_compute_loss: true
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 500000
display: 10
max_iter: 4000000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "Snapshot_your_net"
solver_mode: GPU
debug_info: false

your_net_train_val.prototxt:

name: "Your_Name_Net"

layer {
  name: "data"
  type:"ImageData"
  top: "data"
  top: "label"
  image_data_param{
	  source: "/your_path/train_csv.txt"
	  batch_size: 50
	  shuffle: true
	}
  transform_param {
    scale: 0.00390625
    crop_size: 128
    mirror: true
    
  }
  include: { phase: TRAIN }
}

layer {
  name: "data"
  type: "ImageData"
  top: "data"
  top: "label"
  image_data_param{
	  source: "/your_path/validation_csv.txt"
	  batch_size: 10
	}
  transform_param {
    scale: 0.00390625
    crop_size: 128
    mirror: false
  }
  include: { phase: TEST }
}

layer{
  name: "conv1"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
	num_output: 96
	kernel_size: 5
	stride: 1
	pad: 2
	weight_filler {
	  type: "xavier"
	}
	bias_filler {
	  type: "constant"
	  value: 0.1
	}
  }
  bottom: "data"
  top: "conv1"
}

layer{
  name: "slice1"
  type:"Slice"
  slice_param {
	slice_dim: 1
  }
  bottom: "conv1"
  top: "slice1_1"
  top: "slice1_2"
}
layer{
  name: "etlwise1"
  type: "Eltwise"
  bottom: "slice1_1"
  bottom: "slice1_2"
  top: "eltwise1"
  eltwise_param {
	operation: MAX
  }
}
layer{
  name: "pool1"
  type: "Pooling"
  pooling_param {
	pool: MAX
	kernel_size: 2
	stride: 2
  }
  bottom: "eltwise1"
  top: "pool1"
}

layer{
  name: "conv2a"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
	num_output: 96
	kernel_size: 1
	stride: 1
	weight_filler {
	  type: "xavier"
	}
	bias_filler {
	  type: "constant"
	  value: 0.1
	}
  }
  bottom: "pool1"
  top: "conv2a"
}
layer{
  name: "slice2a"
  type:"Slice"
  slice_param {
	slice_dim: 1
  }
  bottom: "conv2a"
  top: "slice2a_1"
  top: "slice2a_2"
}
layer{
  name: "etlwise2a"
  type: "Eltwise"
  bottom: "slice2a_1"
  bottom: "slice2a_2"
  top: "eltwise2a"
  eltwise_param {
	operation: MAX
  }
}

layer{
  name: "conv2"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
	num_output: 192
	kernel_size: 3
	stride: 1
	pad: 1
	weight_filler {
	  type: "xavier"
	}
	bias_filler {
	  type: "constant"
	  value: 0.1
	}
  }
  bottom: "eltwise2a"
  top: "conv2"
}



layer{
  name: "slice2"
  type:"Slice"
  slice_param {
	slice_dim: 1
  }
  bottom: "conv2"
  top: "slice2_1"
  top: "slice2_2"
}
layer{
  name: "etlwise2"
  type: "Eltwise"
  bottom: "slice2_1"
  bottom: "slice2_2"
  top: "eltwise2"
  eltwise_param {
	operation: MAX
  }
}
layer{
  name: "pool2"
  type: "Pooling"
  pooling_param {
	pool: MAX
	kernel_size: 2
	stride: 2
  }
  bottom: "eltwise2"
  top: "pool2"
}

layer{
  name: "conv3a"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
	num_output: 192
	kernel_size: 1
	stride: 1
	weight_filler {
	  type: "xavier"
	}
	bias_filler {
	  type: "constant"
	  value: 0.1
	}
  }
  bottom: "pool2"
  top: "conv3a"
}
layer{
  name: "slice3a"
  type:"Slice"
  slice_param {
	slice_dim: 1
  }
  bottom: "conv3a"
  top: "slice3a_1"
  top: "slice3a_2"
}
layer{
  name: "etlwise3a"
  type: "Eltwise"
  bottom: "slice3a_1"
  bottom: "slice3a_2"
  top: "eltwise3a"
  eltwise_param {
	operation: MAX
  }
}

layer{
  name: "conv3"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
	num_output: 384
	kernel_size: 3
	stride: 1
	pad: 1
	weight_filler {
	  type: "xavier"
	}
	bias_filler {
	  type: "constant"
	  value: 0.1
	}
  }
  bottom: "eltwise3a"
  top: "conv3"
}


layer{
  name: "slice3"
  type:"Slice"
  slice_param {
	slice_dim: 1
  }
  bottom: "conv3"
  top: "slice3_1"
  top: "slice3_2"
}
layer{
  name: "etlwise3"
  type: "Eltwise"
  bottom: "slice3_1"
  bottom: "slice3_2"
  top: "eltwise3"
  eltwise_param {
	operation: MAX
  }
}
layer{
  name: "pool3"
  type: "Pooling"
  pooling_param {
	pool: MAX
	kernel_size: 2
	stride: 2
  }
  bottom: "eltwise3"
  top: "pool3"
}

layer{
  name: "conv4a"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param{
    num_output: 384
    kernel_size: 1
    stride: 1
    weight_filler{
      type:"xavier"
    }
    bias_filler{
      type: "constant"
      value: 0.1    
    }
  }
  bottom: "pool3"
  top: "conv4a"
}
layer{
  name: "slice4a"
  type:"Slice"
  slice_param {
	slice_dim: 1
  }
  bottom: "conv4a"
  top: "slice4a_1"
  top: "slice4a_2"
}
layer{
  name: "etlwise4a"
  type: "Eltwise"
  bottom: "slice4a_1"
  bottom: "slice4a_2"
  top: "eltwise4a"
  eltwise_param {
	operation: MAX
  }
}
layer{
  name: "conv4"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param{
    num_output: 256
    kernel_size: 3
    stride: 1
    pad: 1
    weight_filler{
      type:"xavier"
    }
    bias_filler{
      type: "constant"
      value: 0.1    
    }
  }
  bottom: "eltwise4a"
  top: "conv4"
}



layer{
  name: "slice4"
  type:"Slice"
  slice_param {
	slice_dim: 1
  }
  bottom: "conv4"
  top: "slice4_1"
  top: "slice4_2"
}
layer{
  name: "etlwise4"
  type: "Eltwise"
  bottom: "slice4_1"
  bottom: "slice4_2"
  top: "eltwise4"
  eltwise_param {
	operation: MAX
  }
}

layer{
  name: "conv5a"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param{
    num_output: 256
    kernel_size: 1
    stride: 1
    weight_filler{
      type:"xavier"
    }
    bias_filler{
      type: "constant"
      value: 0.1    
    }
  }
  bottom: "eltwise4"
  top: "conv5a"
}
layer{
  name: "slice5a"
  type:"Slice"
  slice_param {
	slice_dim: 1
  }
  bottom: "conv5a"
  top: "slice5a_1"
  top: "slice5a_2"
}
layer{
  name: "etlwise5a"
  type: "Eltwise"
  bottom: "slice5a_1"
  bottom: "slice5a_2"
  top: "eltwise5a"
  eltwise_param {
	operation: MAX
  }
}
layer{
  name: "conv5"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param{
    num_output: 256
    kernel_size: 3
    stride: 1
    pad: 1
    weight_filler{
      type:"xavier"
    }
    bias_filler{
      type: "constant"
      value: 0.1    
    }
  }
  bottom: "eltwise5a"
  top: "conv5"
}


layer{
  name: "slice5"
  type:"Slice"
  slice_param {
	slice_dim: 1
  }
  bottom: "conv5"
  top: "slice5_1"
  top: "slice5_2"
}
layer{
  name: "etlwise5"
  type: "Eltwise"
  bottom: "slice5_1"
  bottom: "slice5_2"
  top: "eltwise5"
  eltwise_param {
	operation: MAX
  }
}

layer{
  name: "pool4"
  type: "Pooling"
  pooling_param {
	pool: MAX
	kernel_size: 2
	stride: 2
  }
  bottom: "eltwise5"
  top: "pool4"
}

layer{
  name: "fc1"
  type: "InnerProduct"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
	num_output: 512
	weight_filler {
	  type: "xavier"
	}
	bias_filler {
	  type: "constant"
	  value: 0.1
	}	
  }  
  bottom: "pool4"
  top: "fc1"
}
layer{
  name: "slice_fc1"
  type:"Slice"
  slice_param {
	slice_dim: 1
  }
  bottom: "fc1"
  top: "slice_fc1_1"
  top: "slice_fc1_2"
}
layer{
  name: "etlwise_fc1"
  type: "Eltwise"
  bottom: "slice_fc1_1"
  bottom: "slice_fc1_2"
  top: "eltwise_fc1"
  eltwise_param {
	operation: MAX
  }
}

layer{
  name: "drop1"
  type: "Dropout"
  dropout_param{
	dropout_ratio: 0.7
  }
  bottom: "eltwise_fc1"
  top: "eltwise_fc1"
}

layer{
  name: "fc2"
  type: "InnerProduct"

  inner_product_param{
	num_output: 79010
	weight_filler {
	  type: "xavier"
	}
	bias_filler {
	  type: "constant"
	  value: 0.1
	}	
  }
  bottom: "eltwise_fc1"
  top: "fc2"
}

layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "fc2"
  bottom: "label"
  top: "accuracy"
  include: { phase: TEST }
}

layer {
  name: "softmaxloss"
  type: "SoftmaxWithLoss"
  bottom: "fc2"
  bottom: "label"
  top: "loss"
}

remember to change the num_output value in fc2

@ctgushiwei
Copy link

@TheusStremens firstly,thank you very much for your answer!
I have other two questions :
1.
I use the same train_test_prototxt as your configurations,but after 500K iterations,the loss is still at 11.2,then I change the fc2 layer param as follow:
my fc2 layer add the param{
lr_mult:10
decay_mult:1
}
param
{
lr_mult:20
decay_mult:0
}
and then the loss begin to drop, as your configurations,how many iterations,the loss begin to drop?

2.do you test your model on LFW and the accuracy can achieve 98%?

@TheusStremens
Copy link
Author

@ctgushiwei

  1. In my case, at iteration 700K the drop was 2. The drop begin to drop only after I change the batch size and allow shuffle the train data.
  2. I'm still training. My training is take four times longer then mr Wu, and the electricity went off in my lab a few times. Besides that I have to suspend the training for train another urgent work. When it's over I notice you the results on LFW.

@honghuCode
Copy link

honghuCode commented Nov 29, 2017

@TheusStremens,hello, when I fine-tuning the lightcnn,I met the error of "Cannot copy param 0 weights from layer 'conv1'; shape mismatch. Source param shape is 96 1 5 5 (2400); target param shape is 96 3 5 5 (7200). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer".could you please help me?

@TheusStremens
Copy link
Author

@honghuCode check if you are loading rgb images, LightCNN works with grayscale images

@honghuCode
Copy link

honghuCode commented Nov 29, 2017

@TheusStremens I used the following code to gray the image and resize it to 128 * 128.
`
mat = cv2.imread(imgPath,1)

mat=cv2.resize(mat,(128,128))

im_gray = cv2.cvtColor(mat, cv2.COLOR_BGR2GRAY)
`

the following is my train_test_bak.prototxt

`
name: "DeepFace_set003_net"

layer {
name: "data"
type:"ImageData"
top: "data"
top: "label"
image_data_param{
source: "/home/honghu/code/caffe-master/lightCNNFace/train.txt"
batch_size: 20
shuffle: true
}
transform_param {
scale: 0.00390625
crop_size: 128
mirror: true

}
include: { phase: TRAIN }
}

layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
image_data_param{
source: "/home/honghu/code/caffe-master/lightCNNFace/val.txt"
batch_size: 20
}
transform_param {
scale: 0.00390625
crop_size: 128
mirror: false
}
include: { phase: TEST }
}

layer{
name: "conv1"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 5
stride: 1
pad: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "data"
top: "conv1"
}

layer{
name: "slice1"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv1"
top: "slice1_1"
top: "slice1_2"
}
layer{
name: "etlwise1"
type: "Eltwise"
bottom: "slice1_1"
bottom: "slice1_2"
top: "eltwise1"
eltwise_param {
operation: MAX
}
}
layer{
name: "pool1"
type: "Pooling"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
bottom: "eltwise1"
top: "pool1"
}

layer{
name: "conv2a"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "pool1"
top: "conv2a"
}
layer{
name: "slice2a"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv2a"
top: "slice2a_1"
top: "slice2a_2"
}
layer{
name: "etlwise2a"
type: "Eltwise"
bottom: "slice2a_1"
bottom: "slice2a_2"
top: "eltwise2a"
eltwise_param {
operation: MAX
}
}

layer{
name: "conv2"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 192
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "eltwise2a"
top: "conv2"
}

layer{
name: "slice2"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv2"
top: "slice2_1"
top: "slice2_2"
}
layer{
name: "etlwise2"
type: "Eltwise"
bottom: "slice2_1"
bottom: "slice2_2"
top: "eltwise2"
eltwise_param {
operation: MAX
}
}
layer{
name: "pool2"
type: "Pooling"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
bottom: "eltwise2"
top: "pool2"
}

layer{
name: "conv3a"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 192
kernel_size: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "pool2"
top: "conv3a"
}
layer{
name: "slice3a"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv3a"
top: "slice3a_1"
top: "slice3a_2"
}
layer{
name: "etlwise3a"
type: "Eltwise"
bottom: "slice3a_1"
bottom: "slice3a_2"
top: "eltwise3a"
eltwise_param {
operation: MAX
}
}

layer{
name: "conv3"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "eltwise3a"
top: "conv3"
}

layer{
name: "slice3"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv3"
top: "slice3_1"
top: "slice3_2"
}
layer{
name: "etlwise3"
type: "Eltwise"
bottom: "slice3_1"
bottom: "slice3_2"
top: "eltwise3"
eltwise_param {
operation: MAX
}
}
layer{
name: "pool3"
type: "Pooling"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
bottom: "eltwise3"
top: "pool3"
}

layer{
name: "conv4a"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param{
num_output: 384
kernel_size: 1
stride: 1
weight_filler{
type:"xavier"
}
bias_filler{
type: "constant"
value: 0.1
}
}
bottom: "pool3"
top: "conv4a"
}
layer{
name: "slice4a"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv4a"
top: "slice4a_1"
top: "slice4a_2"
}
layer{
name: "etlwise4a"
type: "Eltwise"
bottom: "slice4a_1"
bottom: "slice4a_2"
top: "eltwise4a"
eltwise_param {
operation: MAX
}
}
layer{
name: "conv4"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param{
num_output: 256
kernel_size: 3
stride: 1
pad: 1
weight_filler{
type:"xavier"
}
bias_filler{
type: "constant"
value: 0.1
}
}
bottom: "eltwise4a"
top: "conv4"
}

layer{
name: "slice4"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv4"
top: "slice4_1"
top: "slice4_2"
}
layer{
name: "etlwise4"
type: "Eltwise"
bottom: "slice4_1"
bottom: "slice4_2"
top: "eltwise4"
eltwise_param {
operation: MAX
}
}

layer{
name: "conv5a"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param{
num_output: 256
kernel_size: 1
stride: 1
weight_filler{
type:"xavier"
}
bias_filler{
type: "constant"
value: 0.1
}
}
bottom: "eltwise4"
top: "conv5a"
}
layer{
name: "slice5a"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv5a"
top: "slice5a_1"
top: "slice5a_2"
}
layer{
name: "etlwise5a"
type: "Eltwise"
bottom: "slice5a_1"
bottom: "slice5a_2"
top: "eltwise5a"
eltwise_param {
operation: MAX
}
}
layer{
name: "conv5"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param{
num_output: 256
kernel_size: 3
stride: 1
pad: 1
weight_filler{
type:"xavier"
}
bias_filler{
type: "constant"
value: 0.1
}
}
bottom: "eltwise5a"
top: "conv5"
}

layer{
name: "slice5"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv5"
top: "slice5_1"
top: "slice5_2"
}
layer{
name: "etlwise5"
type: "Eltwise"
bottom: "slice5_1"
bottom: "slice5_2"
top: "eltwise5"
eltwise_param {
operation: MAX
}
}

layer{
name: "pool4"
type: "Pooling"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
bottom: "eltwise5"
top: "pool4"
}

layer{
name: "fc1"
type: "InnerProduct"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 512
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "pool4"
top: "fc1"
}
layer{
name: "slice_fc1"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "fc1"
top: "slice_fc1_1"
top: "slice_fc1_2"
}
layer{
name: "etlwise_fc1"
type: "Eltwise"
bottom: "slice_fc1_1"
bottom: "slice_fc1_2"
top: "eltwise_fc1"
eltwise_param {
operation: MAX
}
}

layer{
name: "drop1"
type: "Dropout"
dropout_param{
dropout_ratio: 0.7
}
bottom: "eltwise_fc1"
top: "eltwise_fc1"
}

layer{
name: "fnc2"
type: "InnerProduct"

inner_product_param{
num_output: 50
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "eltwise_fc1"
top: "fnc2"
}

layer {
name: "accuracy"
type: "Accuracy"
bottom: "fnc2"
bottom: "label"
top: "accuracy"
include: { phase: TEST }
}

layer {
name: "softmaxloss"
type: "SoftmaxWithLoss"
bottom: "fnc2"
bottom: "label"
top: "loss"
}
`

@TheusStremens
Copy link
Author

@honghuCode add is_color: false in the data layer. Caffe loads images in 3 channels even if they are in grayscale unless you set this parameter to true.

@honghuCode
Copy link

@TheusStremens thank you very much,you solved my problems.

`

layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
image_data_param{
source: "/home/code/caffe-master/lightCNNFace/val.txt"
batch_size: 20
is_color:false
}
transform_param {
scale: 0.00390625
crop_size: 128
mirror: false
}
include: { phase: TEST }
}
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants