Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

problem in training #885

Closed
achao2013 opened this issue Dec 10, 2015 · 7 comments
Closed

problem in training #885

achao2013 opened this issue Dec 10, 2015 · 7 comments

Comments

@achao2013
Copy link
Contributor

when i train a network,the accuracy keeps falling down from the begin of each epoch,why?

@gujunli
Copy link

gujunli commented Dec 10, 2015

Is it accuracy or loss? Loss goes down.
Thanks.
junli

On Wed, Dec 9, 2015 at 6:05 PM, achao2013 notifications@github.com wrote:

when i train a network,the accuracy keeps falling down from the begin of
each epoch,why?


Reply to this email directly or view it on GitHub
#885.


Junli Gu--谷俊丽
Coordinated Science Lab
University of Illinois at Urbana-Champaign


@achao2013
Copy link
Contributor Author

the full name is “Train-accuracy”,i don't know how to display the loss yet

@tqchen
Copy link
Member

tqchen commented Dec 11, 2015

please be a bit more specific, e.g. what is the configuration you use which could be helpful to give others more context. Usually accuracy goes down when you have bad initialization, or too large learning rate.

@tqchen
Copy link
Member

tqchen commented Dec 15, 2015

closing due to inactive status, please feel free to reopen

@tqchen tqchen closed this as completed Dec 15, 2015
@achao2013
Copy link
Contributor Author

@tqchen ,I use the demo of the cifar-100.ipynb, for the classfication task of 146 classes. I don't modify any configuration except input data(batch size=32 for memory limitation). The accracy keeps falling down from 90% to 54% and begin to shock weakly around 55%.
image

Moreover, when i set another net (34layers ResNet of MSRA), the same problem is happened, the data is ilsvrc2012 and the accuracy decrease from 19% to 1% and keeps decrease up to now.I try many configuration and the result is the same. The current params are as follows:
model_args={}
model_args['clip_gradient']=5
model_args['lr_scheduler']=mx.lr_scheduler.FactorScheduler(step=50000,factor=0.5)
model = mx.model.FeedForward(ctx=mx.gpu(), symbol=softmax, num_epoch=num_epoch,
learning_rate=0.01, momentum=0.9, wd=0.0001,initializer= mx.init.Xavier(rnd_type='gaussian',factor_type="in", magnitude=2.34),arg_params=model_args)

@tqchen
Copy link
Member

tqchen commented Dec 16, 2015

if you use a smaller batch size, it is likely you need to re-tune your parameter with a smaller learning rate

@achao2013
Copy link
Contributor Author

@tqchen I try some learning rate, The speed of decreasing go down slightly, and the train accuracy pan up, but it is still a general downward trend. I'm a new user, and i haven't find the code for calculating the train accuracy,but i conjecture the train accuracy will include more and more samples as the batch num inclease. I don't know if this situation is correct or not because i have encountered this when the data labels are wrong in caffe.(it's right here)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants