New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train and Fine-Tuning LightCNN #110
Comments
The configurations of solver and train_val are right for training light CNN and it is also suitable for fine-tuning on your own dataset. |
As the description in your paper , “The learning rate is set to 1e-3 initially and reduced to 5e-5 gradually”.Could you please tell me the specific related parameters to achieve this point in Caffe , such as lr_policy\gamma\stepsize\max_iter etc. Thanks. |
@jiangxuehan I believe that isn't a right answer to your question, the learning rate decay depends of your training database. In general, we reduce the learning rate when the training cost don't decrease after some iterations. So, the "best" way is to take a look in your training cost and reduce the learning rate after X iterations. In Caffe you can specify the number of steps for decrease the learning rate in the solver, for example:
The multistep policy makes the learning rate reduce for gamma in each stepvalue (that you can find looking at training cost).
If you won't do all the 5000000 iterations just adjust the equation. |
@AlfredXiangWu I'm training using a Tesla K40 and at Iteration 1100 my training is loss = 11.3229. That's taking so long, is this normal? I normalized 5M images of MS-CELEB (clean list) using the paper's specification and used the solver of this issue. |
@TheusStremens I think it is normal for training the light CNN. @jiangxuehan You can follow the configurations as @TheusStremens mentioned. It is similar as my configurations. |
@TheusStremens @AlfredXiangWu ,Thanks for your reply, i will follow similar configurations to train this model. BTW, the loss of light CNN drops slowly at the begin several thousands iterations, @TheusStremens |
Hi guys, after 7 days of training, the cost just barely oscillated and it's already at iteration 20K. Following this proportion, it will be in iteration 100K in 5 weeks and iteration 1M (1/5 of the max iteration number) in a year. @AlfredXiangWu is this normal? How long did your training take? Can you tell me the number of iterations in the end of your training? |
@TheusStremens Do you mean that you train the light CNN for about a week and the iterations are only 20k? It is abnormal. I set max iteration to 4,000,000 and it takes about 1 week on Titan X. |
I remove iter_size: 60 from the solver and the speed grows up. But now I have a problem with convergence like #36 my loss is 87.3365 at the beginning. Changing the batch_size to 80 apparently resolved the problem with the convergence, but the speed still abnormal (1/4 of your speed). Did you use iter_size in your training @AlfredXiangWu ? I'll try different sets of batch_size |
The convergence problem doesn't change. Just happeed at iteration 8980. I'm using the normalization correctly, the same base_lr, same architecture, so I can't figure out what is the problem of convergence. |
The solver I used for training is above. Clipping gradient may help to solve your problems. If not, I think you can finetune the light CNN with your own datasets by the pre-trained model. |
@AlfredXiangWu @TheusStremens I tried to lower my learning rate |
@xionglei181818 Did you use the clean list of MS-Celeb-1M? Why did you use only 390,000 pictures if MS-Celeb-1M have 5M+? What learning rate did you use? In my case, shuffle the train data solves the problem with convergence. After 700K iterations the loss dropped to 3. Now I'm at 1,8M iterations, loss = 1 and acc = 89% |
@TheusStremens I use the clean list of MS-Celeb-1M and took 50 images of each category, so that after screening to get 61332 categories, about 390,000 images. 1、Using the learning rate provided by @AlfredXiangWu。 2、Also try to use a set of parameters is These two sets of parameters under the run, run 200,000 iteration loss has been around 11.0. Reduce the learning rate to 0.0001 as well. Have you observed this phenomenon? Thank you |
@xionglei181818 |
I have trained on 1M_Celeb_MS with solver config provided by@AlfredXiangWu. It tooks 9 days on TitanX for 3,500,000 iters. The performance of my model on LFW is not as well as model C.
|
@AlfredXiangWu @TheusStremens @lyuchuny3 can you share you train_tese_prototxt and your solver.prototxt? I'm training light cnn with the clean list, after screening to get 79056 categories, about 4,920,000 images.But run 400,000 iteration loss is also 11.2?can you give me a hand? |
how many iterations (what batch-size) are needed to achieve the results of model B trained on the CASIA-webface dataset? |
@ctgushiwei
your_net_train_val.prototxt:
remember to change the num_output value in fc2 |
@TheusStremens firstly,thank you very much for your answer! 2.do you test your model on LFW and the accuracy can achieve 98%? |
|
@TheusStremens,hello, when I fine-tuning the lightcnn,I met the error of "Cannot copy param 0 weights from layer 'conv1'; shape mismatch. Source param shape is 96 1 5 5 (2400); target param shape is 96 3 5 5 (7200). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer".could you please help me? |
@honghuCode check if you are loading rgb images, LightCNN works with grayscale images |
@TheusStremens I used the following code to gray the image and resize it to 128 * 128. mat=cv2.resize(mat,(128,128)) im_gray = cv2.cvtColor(mat, cv2.COLOR_BGR2GRAY) the following is my train_test_bak.prototxt ` layer { } layer { layer{ layer{ layer{ layer{ layer{ layer{ layer{ layer{ layer{ layer{ layer{ layer{ layer{ layer{ layer{ layer{ inner_product_param{ layer { layer { |
@honghuCode add is_color: false in the data layer. Caffe loads images in 3 channels even if they are in grayscale unless you set this parameter to true. |
@TheusStremens thank you very much,you solved my problems. ` layer { |
First, congratulations and thank you for your work, it's very exciting to see that's possible to make a light CNN without millions (or billions) of parameters and achieve state-of-art accuracy.
I intend to do two experiments (varying type of activations, cost functions, solver types, neurons, ...) using the model C architecture, one training a new CNN on my database and another with fine-tuning of model C on my database. I made the following solver.prototxt and train_value.prototxt:
Could you tell me if this solver and train_val are similar to those you used on the final training of model C?
And for fine-tuning can I use the same solver used in train and just freeze the layers on the train_val or it's necessary another solver for fine-tuning?
Thanks
The text was updated successfully, but these errors were encountered: