Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It seems there is bug when using fluid.memory_optimize by the latest code. #10204

Closed
qingqing01 opened this issue Apr 25, 2018 · 0 comments · Fixed by #10645
Closed

It seems there is bug when using fluid.memory_optimize by the latest code. #10204

qingqing01 opened this issue Apr 25, 2018 · 0 comments · Fixed by #10645
Assignees

Comments

@qingqing01
Copy link
Contributor

正确的版本: e84d3a7

SE-ResNeXt包含fluid.memory_optimize, batch size=256:

Pass 0, trainbatch 3760, loss 4.38969898224,                        acc1 0.19140625, acc5 0.37109375 time 0.75 sec
Pass 0, trainbatch 3770, loss 4.156873703,                        acc1 0.16796875, acc5 0.37890625 time 0.80 sec
Pass 0, trainbatch 3780, loss 4.25750732422,                        acc1 0.171875, acc5 0.38671875 time 0.70 sec
Pass 0, trainbatch 3790, loss 4.02043914795,                        acc1 0.19921875, acc5 0.41796875 time 0.73 sec
Pass 0, trainbatch 3800, loss 4.29896831512,                        acc1 0.15234375, acc5 0.3671875 time 0.75 sec
Pass 0, trainbatch 3810, loss 4.40069770813,                        acc1 0.1640625, acc5 0.328125 time 0.67 sec
Pass 0, trainbatch 3820, loss 4.31035327911,                        acc1 0.19921875, acc5 0.390625 time 0.71 sec
Pass 0, trainbatch 3830, loss 4.04942512512,                        acc1 0.18359375, acc5 0.40234375 time 0.68 sec
...
End pass 0, train_loss 5.02297019958, train_acc1 0.107018910348, train_acc5 0.250506520271,                test_loss 3.76529192924, test_acc1 0.227647155523, test_acc5 0.464091479778

错误的版本:较新的代码,下面log具体的commit找不到了。

  • SE-ResNeXt包含fluid.memory_optimize, batch size=256:
End pass 0, train_loss 6.45454835892, train_acc1 0.0108051998541, train_acc5 0.0409371778369,                test_loss 6.39663696289, test_acc1 0.028530869633, test_acc5 0.0967890247703
  • SE-ResNeXt【不含fluid.memory_optimize】, batch size=128,和上面同样代码,结果看着正常:
Pass 0, trainbatch 3510, loss 5.32282447815,                        acc1 0.0546875, acc5 0.203125 time 0.51 sec
Pass 0, trainbatch 3520, loss 5.63942909241,                        acc1 0.0390625, acc5 0.15625 time 0.52 sec
Pass 0, trainbatch 3530, loss 5.09431886673,                        acc1 0.0625, acc5 0.1875 time 0.52 sec
Pass 0, trainbatch 3540, loss 4.83160495758,                        acc1 0.125, acc5 0.265625 time 0.53 sec
Pass 0, trainbatch 3550, loss 5.54707527161,                        acc1 0.0234375, acc5 0.1171875 time 0.51 sec
Pass 0, trainbatch 3560, loss 5.20001316071,                        acc1 0.0859375, acc5 0.171875 time 0.52 sec
Pass 0, trainbatch 3570, loss 5.47606658936,                        acc1 0.0625, acc5 0.1484375 time 0.52 sec
Pass 0, trainbatch 3580, loss 5.00409889221,                        acc1 0.1015625, acc5 0.265625 time 0.51 sec
Pass 0, trainbatch 3590, loss 5.05407619476,                        acc1 0.078125, acc5 0.2265625 time 0.51 sec
Pass 0, trainbatch 3600, loss 5.28152179718,                        acc1 0.0625, acc5 0.125 time 0.52 sec
Pass 0, trainbatch 3610, loss 5.38609313965,                        acc1 0.0546875, acc5 0.234375 time 0.52 sec
Pass 0, trainbatch 3620, loss 5.12094688416,                        acc1 0.0859375, acc5 0.203125 time 0.51 sec
Pass 0, trainbatch 3630, loss 4.9435043335,                        acc1 0.1171875, acc5 0.21875 time 0.54 sec
Pass 0, trainbatch 3640, loss 5.47095394135,                        acc1 0.0859375, acc5 0.171875 time 0.53 sec
Pass 0, trainbatch 3650, loss 5.48800849915,                        acc1 0.0234375, acc5 0.171875 time 0.52 sec
Pass 0, trainbatch 3660, loss 5.32852506638,                        acc1 0.09375, acc5 0.21875 time 0.51 sec
Pass 0, trainbatch 3670, loss 5.19394111633,                        acc1 0.078125, acc5 0.203125 time 0.50 sec
Pass 0, trainbatch 3680, loss 5.25023794174,                        acc1 0.0625, acc5 0.1953125 time 0.51 sec
Pass 0, trainbatch 3690, loss 5.2334280014,                        acc1 0.0546875, acc5 0.203125 time 0.51 sec
Pass 0, trainbatch 3700, loss 5.25305175781,                        acc1 0.0703125, acc5 0.1796875 time 0.51 sec

错误的版本:最新的代码, 2f53cd0

SE-ResNeXt包含fluid.memory_optimize, batch size=256,由于1个pass还没训完,只贴部分log:

Pass 0, trainbatch 3760, loss 6.22944545746,                        acc1 0.00390625, acc5 0.04296875 time 1.10 sec
Pass 0, trainbatch 3770, loss 6.27605676651,                        acc1 0.0234375, acc5 0.0546875 time 1.11 sec
Pass 0, trainbatch 3780, loss 6.26356983185,                        acc1 0.02734375, acc5 0.05078125 time 1.10 sec
Pass 0, trainbatch 3790, loss 6.11884117126,                        acc1 0.0078125, acc5 0.05859375 time 1.10 sec
Pass 0, trainbatch 3800, loss 6.31590509415,                        acc1 0.0078125, acc5 0.0390625 time 1.11 sec
Pass 0, trainbatch 3810, loss 6.23099756241,                        acc1 0.0078125, acc5 0.0546875 time 1.12 sec
Pass 0, trainbatch 3820, loss 6.21815776825,                        acc1 0.01171875, acc5 0.05859375 time 1.11 sec
Pass 0, trainbatch 3830, loss 6.22207260132,                        acc1 0.015625, acc5 0.06640625 time 1.10 sec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants