Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any input produce the same output #1396

Closed
mender05 opened this Issue Nov 4, 2014 · 52 comments

Comments

Projects
None yet
@mender05
Copy link

mender05 commented Nov 4, 2014

I try to ues caffe to implement the DeepPose proposed in this paper: http://arxiv.org/abs/1312.4659
DeepPose has 3 stages. And each stage is almost the same as AlexNet (DeepPose changes the loss layer in AlexNet to euclidean loss). It is a regression problem in fact.

The train.prototxt is:

name: "CaffeNet"
layers {
  name: "image"
  type: DATA
  top: "image"
  data_param {
    source: "examples/lsp/lsp_train_images_lmdb"
    backend: LMDB
    batch_size: 30
    scale: 0.00390625
  }
}
layers {
  name: "label"
  type: DATA
  top: "label"
  data_param {
    source: "examples/lsp/lsp_train_labels_lmdb"
    backend: LMDB
    batch_size: 30
    scale: 0.00454545
  }
}
layers {
  name: "conv1"
  type: CONVOLUTION
  bottom: "image"
  top: "conv1"
...  THIS IS THE SAME AS ALEXNET ...
layers {
  name: "fc8"
  type: INNER_PRODUCT
  bottom: "fc7"
  top: "fc8"
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 28
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  name: "loss"
  type: EUCLIDEAN_LOSS
  bottom: "fc8"
  bottom: "label"
  top: "loss"
}

The solve.prototxt is:

net: "models/lsp/deeppose_train.prototxt"
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 7500
display: 50
max_iter: 36500
momentum: 0.9
weight_decay: 0.0000005
snapshot: 2000
snapshot_prefix: "models/lsp/caffenet_train"
solver_mode: GPU

After trainning completed, I use python interface to do prediction on testset. The test.prototxt is:

name: "CaffeNet"
layers {
  name: "image"
  type: MEMORY_DATA
  top: "image"
    top: "useless"
  memory_data_param {
    batch_size: 30
    channels: 3
    height: 220
    width: 220
  }
}
layers {
  name: "conv1"
  type: CONVOLUTION
  bottom: "image"
... 
layers {
  name: "fc8"
  type: INNER_PRODUCT
  bottom: "fc7"
  top: "fc8"
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 28
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

but the output is very strange. Dumpping the output of "fc8" layer, I find that all the images produce the same output:

array([[ 0.48381898,  0.02326088,  0.02317634,  0.02317682,  0.48248914,
         0.01622555,  0.0161516 ,  0.01615119,  0.48646507,  0.03201264,
         0.03185751,  0.03185739,  0.52191395,  0.03508802,  0.03494693,
         0.03494673,  0.52380753,  0.01708153,  0.01701014,  0.01700996,
         0.52726734,  0.02286946,  0.02277863,  0.0227785 ,  0.46513146,
         0.02239206,  0.02227863,  0.02227836],
       [ 0.48381898,  0.02326088,  0.02317634,  0.02317682,  0.48248914,
         0.01622555,  0.0161516 ,  0.01615119,  0.48646507,  0.03201264,
         0.03185751,  0.03185739,  0.52191395,  0.03508802,  0.03494693,
         0.03494673,  0.52380753,  0.01708153,  0.01701014,  0.01700996,
         0.52726734,  0.02286946,  0.02277863,  0.0227785 ,  0.46513146,
         0.02239206,  0.02227863,  0.02227836],
       [ 0.48381898,  0.02326088,  0.02317634,  0.02317682,  0.48248914,
         0.01622555,  0.0161516 ,  0.01615119,  0.48646507,  0.03201264,
         0.03185751,  0.03185739,  0.52191395,  0.03508802,  0.03494693,
         0.03494673,  0.52380753,  0.01708153,  0.01701014,  0.01700996,
         0.52726734,  0.02286946,  0.02277863,  0.0227785 ,  0.46513146,
         0.02239206,  0.02227863,  0.02227836],

In fact, no mater what the inputs are, the outputs are always the same with the values above. How the problem caused?

@jiangdong123

This comment has been minimized.

Copy link

jiangdong123 commented Nov 11, 2014

Maybe you should take the fc8 features like this: net.blobs['fc8'].data[4].copy()

@mender05

This comment has been minimized.

Copy link
Author

mender05 commented Nov 12, 2014

@jiangdong123 Thank you for your advice! I'll try it. This is my previous code:
I take the output of fc8 like this (just a BATCH_SIZE of test images):

net.set_input_arrays(\
    data4D.astype(np.float32), data4DL.astype(np.float32))
pred = net.forward()
for i in range(0,BATCH_SIZE):
  for c in range(0,28):
    pred_normal[i][c] = pred['fc8'][i][c][0][0]
print pred_normal

Is there any mistake?

@mender05

This comment has been minimized.

Copy link
Author

mender05 commented Nov 14, 2014

The loss variation looks very strange:
figure_1
What causes the loss changing periodically?

@sguada

This comment has been minimized.

Copy link
Contributor

sguada commented Nov 14, 2014

Probably your training data is not randomized.

On Thursday, November 13, 2014, mender05 notifications@github.com wrote:

The loss variation looks very strange:
[image: figure_1]
https://cloud.githubusercontent.com/assets/7811449/5040242/bab9f878-6be6-11e4-99dc-675424c37cbd.png
What causes the loss changing periodically?


Reply to this email directly or view it on GitHub
#1396 (comment).

Sergio

@mender05

This comment has been minimized.

Copy link
Author

mender05 commented Nov 14, 2014

@jiangdong123

  1. The outputs always the same for all test images
  2. I have tried bigger learning rate (0.1) but get the same results.
  3. I also tried to change the input layer type from DATA to IMAGE_DATA, but the loss still changes periadically.
  4. I'll check if the labels are read properly.

As @sguada said, the training data is not randomized. I'll randomize them before training.

@mender05

This comment has been minimized.

Copy link
Author

mender05 commented Nov 18, 2014

@sguada @jiangdong123
The reason is that labels are interpreted inproperly!
for example, the following 2 groups of 28-dim label:

label_1 = { 189, 116, 165, 259, 95, 144, 122, 151, 88, 125, 218, 160, 68, 32, 95, 110, 165, 266, 123, 32, 151, 182, 189, 284, 294, 218, 173, 157 }
label_2 = { 64, 71, 91, 115, 126, 105, 24, 51, 92, 144, 170, 197, 114, 132, 188, 138, 97, 103, 148, 201, 20, 29, 30, 39, 68, 99, 34, 22 }

are interpreted as:

label_1 = { 189, 0, 0, 0, 116, 0, 0, 0, 165, 0, 0, 0, 3, 1, 0, 0, 95, 0, 0, 0, 144, 0, 0, 0, 122, 0, 0, 0 }
label_2 = { 64, 0, 0, 0, 71, 0, 0, 0, 91, 0, 0, 0, 115, 0, 0, 0, 126, 0, 0, 0, 105, 0, 0, 0, 24, 0, 0, 0 }

It seems that CAFFE reads the input label by byte. As a result, the 259 is read as 3,1,0,0.
In the little-endian machine, 259 is 3,1,0,0 in memory.

Previously, my labels are stored to lmdb in this way:

int datum_size = sizeof(int)*28;
data_file.read(str_buffer, datum_size);
...
datum.set_data(str_buffer, datum_size);
datum.SerializeToString(&value);
...
mdb_data.mv_data = reinterpret_cast<void*>(&value[0]);
mdb_put(mdb_txn, mdb_dbi, &mdb_key, &mdb_data, 0)

CAFFE uses c++ template: template <typename Dtype>. How can I specify the Dtype to be int?

@mender05

This comment has been minimized.

Copy link
Author

mender05 commented Nov 24, 2014

I have corrected the labels, changed the input type to float and randomlized the training samples, but this problem still there.
figure_1
a period == 2400 iterations. A iteration processes 2400*30 = 72000 images. There are 22000 training images, which is equivalent to 72000/22000 = 3.3 epochs

@sguada

This comment has been minimized.

Copy link
Contributor

sguada commented Nov 24, 2014

When you shuffle the training data did you made sure the labels align?

Can you increase the batch size? Also try to increase the dropout.

On Sunday, November 23, 2014, mender05 notifications@github.com wrote:

I have corrected the labels, changed the input type to float and
randomlized the training samples, but this problem still there.
[image: figure_1]
https://cloud.githubusercontent.com/assets/7811449/5161721/8ee3061a-73eb-11e4-8c93-48e7c7bee80d.jpg
a period == 2400 iterations. A iteration processes 2400*30 = 72000 images.
There are 22000 training images, which is equivalent to 72000/22000 = 3.3
epochs


Reply to this email directly or view it on GitHub
#1396 (comment).

Sergio

@mender05

This comment has been minimized.

Copy link
Author

mender05 commented Nov 25, 2014

Thank you @sguada.

  1. The labels and images are shuffled synchronously.
  2. The limited VRAM restricts the batch size. 30 is the maximum size for me.
  3. I'll try to increase the dropout.
@sguada

This comment has been minimized.

Copy link
Contributor

sguada commented Nov 25, 2014

@mender05 you could also try https://github.com/shelhamer/caffe/tree/accum-grad to allow having bigger batch size by doing several iterations before updating the gradients.

@mender05

This comment has been minimized.

Copy link
Author

mender05 commented Nov 26, 2014

@sguada I have tried this branch. But what parameters should be set to enable bigger batch size?
After batch_size was increased from 30 to 35, it ran out of memory.

F1126 16:43:02.337970  7332 syncedmem.cpp:51] Check failed: error == cudaSuc    cess (2 vs. 0)  out of memory
@sguada

This comment has been minimized.

Copy link
Contributor

sguada commented Nov 26, 2014

In the solver.prototxt add

iter_size: 2

That will mean that it would do 2 iterations of batch_size: 30 before updating the weights. This means that effectively you would using a batch_size: 60.

You can change your batch_size and iter_size to define the desired batch_size.

@mender05

This comment has been minimized.

Copy link
Author

mender05 commented Nov 28, 2014

It is so strange. As the batch_size increasing from 30 to 60, the loss variation pattern changed, but still periodic.

figure_1

@sguada

This comment has been minimized.

Copy link
Contributor

sguada commented Nov 28, 2014

There must be something weird with your data, the loss decrease very
quickly and then oscillates with periodicity. Could you shuffle your data
again?

Sergio

2014-11-28 3:34 GMT-08:00 mender05 notifications@github.com:

It is so strange. As the batch_size increasing from 30 to 60, the loss
variation pattern changed, but still periodic.

[image: figure_1]
https://cloud.githubusercontent.com/assets/7811449/5227188/dac48942-7732-11e4-8386-0cbd491695be.png


Reply to this email directly or view it on GitHub
#1396 (comment).

@mender05

This comment has been minimized.

Copy link
Author

mender05 commented Nov 28, 2014

I use the snapshot at the 2000th iteration to predict, the outputs are all the same.

array([[ 0.49006659,  0.48892561,  0.49674234,  0.52244973,  0.52458155,
         0.52957731,  0.46845111,  0.47450158,  0.49067837,  0.52837992,
         0.53714836,  0.54056102,  0.52498746,  0.50657398,  0.53844237,
         0.5057267 ,  0.42278934,  0.42133904,  0.50450838,  0.5381543 ,
         0.45289528,  0.42029274,  0.37055418,  0.36709356,  0.41887969,
         0.44862145,  0.32116845,  0.36128747],
       [ 0.49006659,  0.48892561,  0.49674234,  0.52244973,  0.52458155,
         0.52957731,  0.46845111,  0.47450158,  0.49067837,  0.52837992,
         0.53714836,  0.54056102,  0.52498746,  0.50657398,  0.53844237,
         0.5057267 ,  0.42278934,  0.42133904,  0.50450838,  0.5381543 ,
         0.45289528,  0.42029274,  0.37055418,  0.36709356,  0.41887969,
         0.44862145,  0.32116845,  0.36128747],
       [ 0.49006659,  0.48892561,  0.49674234,  0.52244973,  0.52458155,
         0.52957731,  0.46845111,  0.47450158,  0.49067837,  0.52837992,
         0.53714836,  0.54056102,  0.52498746,  0.50657398,  0.53844237,
         0.5057267 ,  0.42278934,  0.42133904,  0.50450838,  0.5381543 ,
         0.45289528,  0.42029274,  0.37055418,  0.36709356,  0.41887969,
         0.44862145,  0.32116845,  0.36128747],

Outputs of the final model at the 36500th iteration:

array([[ 0.482418  ,  0.48542902,  0.49439543,  0.52315784,  0.52507049,
         0.52752018,  0.47199821,  0.47462174,  0.49217641,  0.52927047,
         0.54133612,  0.54410964,  0.52102458,  0.50839245,  0.53855455,
         0.5059635 ,  0.41948465,  0.4194364 ,  0.50593352,  0.53848571,
         0.44772175,  0.41696107,  0.36593205,  0.36593369,  0.41697961,
         0.44766867,  0.31933263,  0.36117038],
       [ 0.482418  ,  0.48542902,  0.49439543,  0.52315784,  0.52507049,
         0.52752018,  0.47199821,  0.47462174,  0.49217641,  0.52927047,
         0.54133612,  0.54410964,  0.52102458,  0.50839245,  0.53855455,
         0.5059635 ,  0.41948465,  0.4194364 ,  0.50593352,  0.53848571,
         0.44772175,  0.41696107,  0.36593205,  0.36593369,  0.41697961,
         0.44766867,  0.31933263,  0.36117038],
       [ 0.482418  ,  0.48542902,  0.49439543,  0.52315784,  0.52507049,
         0.52752018,  0.47199821,  0.47462174,  0.49217641,  0.52927047,
         0.54133612,  0.54410964,  0.52102458,  0.50839245,  0.53855455,
         0.5059635 ,  0.41948465,  0.4194364 ,  0.50593352,  0.53848571,
         0.44772175,  0.41696107,  0.36593205,  0.36593369,  0.41697961,
         0.44766867,  0.31933263,  0.36117038],
@StevenLOL

This comment has been minimized.

Copy link

StevenLOL commented Dec 23, 2014

hi, @mender05 do you mind to show some code on how you make prediction on test data and get the array in your last post?

@mollahosseini

This comment has been minimized.

Copy link

mollahosseini commented Jan 6, 2015

Hi,
I have the same problem. I am using regression for video processing and therefore I used 9 consecutive frames as input of of the network.
I changed convert_imageset.cpp to store data as 9 frames in each blob, reading data in train_val.prototxt as:

name: "CaffeNet"
layers {
  name: "data"
  type: DATA
  top: "data"
  top: "label"
  data_param {
    source: "examples/project/train_lmdb"
    backend: LMDB   
    batch_size: 256
  }
  transform_param {
    crop_size: 227
    mean_file: "examples/project//train_mean.binaryproto"
    mirror: true
  }
  include: { phase: TRAIN }
}
layers {
  name: "data"
  type: DATA
  top: "data"
  top: "label"
  data_param {
    source: "examples/project//val_lmdb"
    backend: LMDB   
    batch_size: 50
  }
  transform_param {
    crop_size: 227
    mean_file: "examples/project/train_mean.binaryproto"
    mirror: false
  }
  include: { phase: TEST }
}

and changed the accuracy layer to EUCLIDEAN_LOSS in train_val.prototxt

layers {
  name: "fc8"
  type: INNER_PRODUCT
  bottom: "fc7"
  top: "fc8"
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  name: "loss"
  type: EUCLIDEAN_LOSS
  bottom: "fc8"
  bottom: "label"
  top: "loss"
}

for deploying I used:

input: "data"
input_dim: 10
input_dim: 9
input_dim: 227
input_dim: 227
layers {
  name: "conv1"
  type: CONVOLUTION
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
  }
}

<rest the same>
layers {
  name: "fc8"
  type: INNER_PRODUCT
  bottom: "fc7"
  top: "fc8"
  inner_product_param {
    num_output: 1
  }
}

I have

base_lr: 0.001
batch_size: 256 for train
batch_size: 50 for val

The same is like image_net network. I have the same loss behavior as @mender05. It decreased dramatically at first and then fluctuated till end. I have not shuffled the data, and labels are integer 1 to 100.
To test, I used Matlab interface, i.e. read 9 images, concatenate them together and use

scores = matcaffe_demo(imgFrames, 1);

As I am cropping the images, the result is a score vector with length of 10, all having the same value, e.g. 71.4674 regardless of input images. I also tried different snapshots of the network and the result changed a bit but still the same for all crops, all images.

@mender05, could you solve your problem? Do you still have the same output for all images?

@sguada, do I do every steps right for the regression? I am going to shuffle my data but I don't know if it is due to shuffling or something else!

@sguada

This comment has been minimized.

Copy link
Contributor

sguada commented Jan 6, 2015

A possible explanation is that the model is not learning much, it probably got trapped in a local minimum which is similar to random weights.
Try to change the way you initialize the weights, change gaussian to xavier for the convolutional layers.

@mollahosseini

This comment has been minimized.

Copy link

mollahosseini commented Jan 6, 2015

Thanks @sguada,
I changed the weights from gaussian to xavier. But it gives me nan loss even with learning rate 0.001. I've read that people decreased lr to overcome nan loss, however I am afraid if I decrease lr more than 0.001 my network doesn't learn at all.
I will work on shuffling data and see if it changes anything.
BTW, I have about 2700 inputs (each has 9 images). considering 10 crops for each input, network is only trained with about 27000 inputs. Do you think it can be the reason of trapping in local minima?

@sguada

This comment has been minimized.

Copy link
Contributor

sguada commented Jan 6, 2015

Don't worry about decreasing the learning rate, it is relative to the
magnitude of the loss, which in case of euclidean loss can be huge.
And yes having only 9 images will cause problems of overfitting.

All questions about usage, installation, code, and applications should be searched for and asked on the caffe-users mailing list.

@sguada sguada closed this Jan 6, 2015

@OnlySang

This comment has been minimized.

Copy link

OnlySang commented Jan 14, 2015

@mender05 have you ever solved this problem? I also meet it. I do not think its related to hyperparameters.

@wizardcsy

This comment has been minimized.

Copy link

wizardcsy commented Jan 16, 2015

@OnlySang I met the same problem recently while I used the AlexNet for training a 2-category classifier. When I used the model to test my images with python interface, I always got the same output. finelly, I set the outputnum of fc7 to 1000, and it became normal.
I don't understand why absolutely, but hope it's useful to you!

@mollahosseini

This comment has been minimized.

Copy link

mollahosseini commented Jan 16, 2015

@OnlySang, I had the same problem. I decreased learning rate and shuffled data. The problem solved.

@OnlySang

This comment has been minimized.

Copy link

OnlySang commented Jan 20, 2015

@wizardcsy binary classification using alexnet? I feel you make things complcated. small model may fit it.

@OnlySang

This comment has been minimized.

Copy link

OnlySang commented Jan 20, 2015

@mollahosseini I tried what u have tried. but they didn't work. thanks for ur advices.

@mender05

This comment has been minimized.

Copy link
Author

mender05 commented Jan 22, 2015

@OnlySang I agree with you. I do not think its related to hyper parameters too.

@wizardcsy @mollahosseini I have not solved the problem. But after I changed the training and test dataset, this problem disappeared. According to my experience, it is difficult to directly regress a image to a pose vector. Besides, you may try a sampler network which is easier to train.

@mender05

This comment has been minimized.

Copy link
Author

mender05 commented Jan 22, 2015

@StevenLOL This is my prediction code:

###################
NUMBER = 1000
CHANNEL = 3
HEIGHT = 220
WIDTH = 220
###################
# read test image #
###################
...
# test[number,chanel,height,width]
...
#############################
# predict using caffe model #
#############################
# make sure that caffe is on the python path
CAFFE_ROOT = '/home/mender/caffe-master/'
import sys
sys.path.insert(0, CAFFE_ROOT + 'python')
import caffe

# set path to test model file and trained model
MODEL_FILE = './deeppose_test.prototxt'
TRAINED_MODEL = './caffenet_train_iter_36500.caffemodel'

net = caffe.Net(MODEL_FILE, TRAINED_MODEL)
#net.set_phase_test()
data4D = np.ones([1,CHANNEL,HEIGHT,WIDTH])
data4DL = np.zeros([1,14,1,1])
pred_normal = np.zeros([NUMBER,14])
n = 0
for n in range(0, NUMBER):
  data4D[0] = test[n]
  data4DL[0][0][0][0] = n
  net.set_input_arrays(\
      data4D.astype(np.float32), data4DL.astype(np.float32))
  pred = net.forward()
  for c in range(0,14):
    pred_normal[n][c] = pred['fc8'][0][c][0][0]
np.save('prediction_36500it.npy', pred_normal)
@OnlySang

This comment has been minimized.

Copy link

OnlySang commented Jan 24, 2015

@mender05 When I use theano and Lasagne, which u can find them on github, the regression go to convergence. The main architecture of the network is the same, as well as training pipe. So why different realization make different results?

@StevenLOL

This comment has been minimized.

Copy link

StevenLOL commented Mar 2, 2015

Hi, @mender05 thank you for posting the code.

@sjtujulian

This comment has been minimized.

Copy link

sjtujulian commented Jul 20, 2015

@mender05 I have the same problem and I check the filter weight of middle layer. I turns out that the filter weights are all 0. Do you know why?

@mtrth

This comment has been minimized.

Copy link

mtrth commented Sep 9, 2015

Hi, I am also getting the same error filter weights are all 0
did you find a solution? @mender05 can u share your full train.prototxt

@JoeMWatson

This comment has been minimized.

Copy link

JoeMWatson commented Nov 19, 2015

@mender05 did you ever find a solution? I'm having the same problem with periodicity and constant output...

@wusongbeckham

This comment has been minimized.

Copy link

wusongbeckham commented Nov 23, 2015

could you share the code prepare the train data and test data? I also have the same problem. Thanks

@wusongbeckham

This comment has been minimized.

Copy link

wusongbeckham commented Nov 23, 2015

@mender05 could you share the code prepare the train data and test data? I also have the same problem. Thanks

@wusongbeckham

This comment has been minimized.

Copy link

wusongbeckham commented Nov 23, 2015

@mender05 have you successful implemented the deeppose? Could you share the code for data prepare?

@zeakey

This comment has been minimized.

Copy link

zeakey commented Dec 7, 2015

@sguada
what do you mean by "Don't worry about decreasing the learning rate, it is relative to the
magnitude of the loss"
? I find that smaller lr will lead to a better convergence under some condition, but theoretically small lr may make a local minimum, why not ?

@ginobilinie

This comment has been minimized.

Copy link

ginobilinie commented Dec 16, 2015

@mender05
Have you solved this problem? I'm doing regression with caffe, I suffered from the same problem as you. No matter what input is, the value is always one same value. The only possibility I can think is that the weights and bias of network is 0.

Anyone who solve this problem, please help.

@kshalini

This comment has been minimized.

Copy link

kshalini commented Dec 19, 2015

@mender05, @sguada

did you manage to solve the problem by modifying the protoxts (train & test)? if yes, can you please share them?

the Net seems similar to AlexNet but there are subtle variations and I run into problems that are mentioned earlier by others also. Effectively stuck! Any help would be greatly appreciated. thanks -:)

@wqysq

This comment has been minimized.

Copy link

wqysq commented Dec 26, 2015

I also have this issue when I use C++ command for predict.

@JoeMWatson

This comment has been minimized.

Copy link

JoeMWatson commented Dec 30, 2015

@ginobilinie @kshalini

I've been doing regression from images and fixed the problem by scaling the pixel values down by 255 and subtracting the dataset mean (so the pixel values are now between {-1,1}), and also by scaling the labels down so they were between {0,1}. I also set all my new layer weights (I was transfer learning from AlexNet) to be initialized using the 'type:xavier' parameter.

Hope this helps!

@kshalini

This comment has been minimized.

Copy link

kshalini commented Dec 30, 2015

@JoeMWatson

thanks for the post. but didnt quite follow fully. my specific questions are:
a) do you use LMDB or hdf5 for inputs?
b) did you use the same train_val.prototxt as mentioned by @mender05 ? if not, can you please share yours for reference
c) finally, can you also share the few lines of Python code to interpret the output labels you get from the net?

thanks

@ginobilinie

This comment has been minimized.

Copy link

ginobilinie commented Dec 30, 2015

@JoeMWatson

Thanks. In fact, I've already scale the label to [0,1], and the input image data to [-1,1], but I still found the predicted output value is the same. I analyze the trained model and the test data in each blob(do a forward pass), I found the bias dominates the output value, and the layer before the last layer usually goes to almost 0.

@ginobilinie

This comment has been minimized.

Copy link

ginobilinie commented Feb 15, 2016

Hi, I have solved my problem. In my case, the problem comes from the initialization of network. I change weight filler: gaussian->xavier, and set bias filler: constant 0. Then the problem is solved.

@Venkatesh-Murthy

This comment has been minimized.

Copy link

Venkatesh-Murthy commented May 23, 2016

@ginobilinie Thanks, that did the trick.

@hagg30

This comment has been minimized.

Copy link

hagg30 commented Jun 25, 2016

another solver here.
I recognized there is lack of non-linearity in my model.
So, I add some more FC layer and dropout with ReLU activation function. then it did the better performance.

@lood339

This comment has been minimized.

Copy link

lood339 commented Aug 19, 2016

@ginobilinie
I have the same problem. The learned weight is zero everywhere and the output is constant. I guess the bias dominate the net. How do you solve this problem? ( I already change weight filler to xavier and set bias filler: constant 0). Shall I disable bias_term ?

@ginobilinie

This comment has been minimized.

Copy link

ginobilinie commented Aug 19, 2016

@lood339 In my case, when I set the bias to 0, then the training is normal... What about yours?

@lood339

This comment has been minimized.

Copy link

lood339 commented Aug 20, 2016

@ginobilinie
When I set eh bias to 0, it has the same problem. Then, I change the weight_decay to a small number (0.0005), then it is normal. I think the if the weight_decay is large (like 0.5), all the weight will eventually becomes zero in my case.
I did another modification that helps me. I set the learning rate in convolutional layer as 0 because I transfer weights from pre-trained model. So that the weights in convolutional layer won't change during the training.

@ginobilinie

This comment has been minimized.

Copy link

ginobilinie commented Aug 20, 2016

@lood339 Good. I always set the weight_decay very small. If you just want to fine-tune some layers, you should set the learning rate of other layers to be 0.

@joyousrabbit

This comment has been minimized.

Copy link

joyousrabbit commented Jan 5, 2017

@ginobilinie But why std:0.01 doesn't work? Why periodical loss?

@YangForever

This comment has been minimized.

Copy link

YangForever commented May 19, 2017

Hi, have you solved this problem? I also follow to use AlexNet to compute the coordinates of the body joints and get the same results. However, when I use a simple Network like LeNet, it performs better and can produce reasonable results. So, I guess AlexNet is too deep, but I still do not know why the paper can achieve this with AlexNet.

@Manchery

This comment has been minimized.

Copy link

Manchery commented Jul 8, 2018

I got the same problem while I was implementing the DeepPose.
Then I read every comment and found them all really helpful.
I've changed my model deploy for many times according to many of these nice advice and finally the problem disappeared.
Because of so many changes, I can't say which particular operation is the key and make things work.
But I really suggest you, my friends who still suffer from the problem, try these advice no matter they work or not.
Thank you all !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.