-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training error with my own datasets with RuntimeWarning, Help? #22
Comments
Lower the learning rate. |
@hgaiser Thanks ! Besides, |
Why do you want to train from scratch? My advice is to use the pretrained model actually. |
@hgaiser hi~ when i change the input data size to 1_3_1200*1200, it happend to this error and the loss become NaN. i want to know why it happends?and low the learning rate could be work? |
Not sure what you mean with that input size. Images are by default resized such that the shortest axis is 600px, meaning the input would for example be 600xHx3. But if you are seeing the same error as above, chances are lowering the learning rate helps. |
@oh233 hi~ i have trained on my own data once before normally, but now i train it again, without any change, it happend to this: /data2/qinhaifang/MNC/tools/../lib/transform/bbox_transform.py:86: RuntimeWarning: overflow encountered in exp |
@hgaiser thank you! but i used to train the model normally, now it happends to this error without any change. |
Anyone got such error? Thanks. /mnt/sda/MNC/tools/../lib/transform/bbox_transform.py:129: RuntimeWarning: invalid value encountered in greater_equal |
I don't think I have seen that error before, but what you could try is to delete the cache files (in |
Lowering the learning rate can solve it. Will dive into python code if issue happens again. |
I found the same error as above with learning rate = 0.001 and 0.0001. However, it works after changing the learning rate to 0.00001. From observing the code, it seems that the regression value of bbox location diverge and cause the overflow in backprop. |
Thanks for the info. Is lr 0.00001 too small for fine-tuning a new dataset?
…On Wed, Jan 25, 2017 at 3:21 PM, souryuu ***@***.***> wrote:
I found the same error as above with learning rate = 0.001 and 0.0001.
However, it works after changing the learning rate to 0.00001. From
observing the code, it seems that the regression value of bbox location
diverge and cause the overflow in backprop.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB46ZNKY3Nsgb2QmHSkOQZooCsaThTgcks5rVvfngaJpZM4Ka6YO>
.
|
Adjusting learning rate affected the fine-tuning results. By following the instruction for end-to-end training 5-stage MNC model by VOC2012 with adjusted learning rate (0.0001), I got mAP@0.5 = 36.16 and mAP@0.7 = 13.08. |
Yeah, I can run well using end2end 5-stage on VOC 2012 as well. Did you try
own dataset?
…On Mon, Jan 30, 2017 at 12:01 PM, souryuu ***@***.***> wrote:
Adjusting learning rate affected the fine-tuning results. By following the
instruction for end-to-end training 5-stage MNC model by VOC2012 with
adjusted learning rate (0.0001), I got ***@***.*** = 36.16 and ***@***.*** =
13.08.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB46ZDcHcL4NX72ABPmETnbX9d1xqx6aks5rXWCigaJpZM4Ka6YO>
.
|
@souryuu Hi, I am using mnc_5stage.sh to train the model with learning rate (0.00001), I am wondering how much iteration I need? Did you use lr=0.0001 with iteration=25000 or 250000? |
@brisker did you change the number of classes? |
great with 0.00001 |
the error looks llike this:
I1019 20:25:01.929584 24937 solver.cpp:245] Train net output #12: seg_cls_loss = 3.18091 (* 1 = 3.18091 loss)
I1019 20:25:01.929592 24937 solver.cpp:245] Train net output #13: seg_cls_loss_ext = 3.28305 (* 1 = 3.28305 loss)
I1019 20:25:01.929605 24937 sgd_solver.cpp:106] Iteration 0, lr = 0.001
/home/sjtu/code/MNC-master/tools/../lib/pylayer/stage_bridge_layer.py:106: RuntimeWarning: overflow encountered in exp
bottom[0].diff[i, 3] = dfdw[ind] * (delta_x + np.exp(delta_w))
/home/sjtu/code/MNC-master/tools/../lib/pylayer/stage_bridge_layer.py:106: RuntimeWarning: invalid value encountered in float_scalars
bottom[0].diff[i, 3] = dfdw[ind] * (delta_x + np.exp(delta_w))
/home/sjtu/code/MNC-master/tools/../lib/pylayer/stage_bridge_layer.py:107: RuntimeWarning: overflow encountered in exp
bottom[0].diff[i, 4] = dfdh[ind] * (delta_y + np.exp(delta_h))
/home/sjtu/code/MNC-master/tools/../lib/pylayer/stage_bridge_layer.py:107: RuntimeWarning: invalid value encountered in float_scalars
bottom[0].diff[i, 4] = dfdh[ind] * (delta_y + np.exp(delta_h))
/home/sjtu/code/MNC-master/tools/../lib/pylayer/proposal_layer.py:183: RuntimeWarning: invalid value encountered in greater
top_non_zero_ind = np.unique(np.where(abs(top[0].diff[:, :]) > 0)[0])
/home/sjtu/code/MNC-master/tools/../lib/transform/bbox_transform.py:129: RuntimeWarning: invalid value encountered in greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]
./experiments/scripts/mnc_5stage.sh: 行 35: 24937 浮点数例外 (核心已转储) ./tools/train_net.py --gpu ${GPU_ID} --solver models/${NET}/mnc_5stage/solver.prototxt --weights ${NET_INIT} --imdb ${DATASET_TRAIN} --iters ${ITERS} --cfg experiments/cfgs/${NET}/mnc_5stage.yml ${EXTRA_ARGS}
The text was updated successfully, but these errors were encountered: