Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training inf and nan #318

Closed
IQ17 opened this issue Feb 5, 2020 · 6 comments
Closed

Training inf and nan #318

IQ17 opened this issue Feb 5, 2020 · 6 comments

Comments

@IQ17
Copy link

IQ17 commented Feb 5, 2020

Hi, I am using the latest commit (eee5ce3) in the master branch, and I tried to train COCO dataset with four GPUs, with pytorch 1.3.0

See logs below

root@HDFS-Slave-01:/work/code/yolact_git# python train.py --config=yolact_resnet50_config --batch_size=96
Multiple GPUs detected! Turning off JIT.
Scaling parameters by 12.00 to account for a batch size of 96.
loading annotations into memory...
Done (t=17.48s)
creating index...
index created!
loading annotations into memory...
Done (t=0.60s)
creating index...
index created!
Initializing weights...
Begin training!

[ 0] 0 || B: 6.872 | C: 22.219 | M: 10.292 | S: 65.478 | T: 104.861 || ETA: 30 days, 18:40:57 || timer: 39.889
[ 0] 10 || B: 5.774 | C: 18.310 | M: 6.371 | S: 44.301 | T: 74.756 || ETA: 3 days, 23:40:39 || timer: 1.618
[ 0] 20 || B: 5.730 | C: 15.922 | M: 6.166 | S: 24.768 | T: 52.586 || ETA: 2 days, 17:52:39 || timer: 1.614
[ 0] 30 || B: 5.725 | C: 13.624 | M: 6.076 | S: 17.789 | T: 43.214 || ETA: 2 days, 11:05:50 || timer: 2.174
[ 0] 40 || B: 5.592 | C: 11.825 | M: 5.970 | S: 14.158 | T: 37.545 || ETA: 2 days, 6:18:21 || timer: 1.583
[ 0] 50 || B: 5.509 | C: 10.415 | M: 5.919 | S: 11.737 | T: 33.580 || ETA: 2 days, 4:28:42 || timer: 1.898
[ 0] 60 || B: 5.422 | C: 9.391 | M: 5.852 | S: 10.065 | T: 30.730 || ETA: 2 days, 2:07:50 || timer: 1.851
[ 0] 70 || B: 5.355 | C: 8.618 | M: 5.821 | S: 8.828 | T: 28.622 || ETA: 2 days, 1:02:20 || timer: 1.582
[ 0] 80 || B: 5.288 | C: 8.018 | M: 5.788 | S: 7.888 | T: 26.983 || ETA: 2 days, 0:03:05 || timer: 1.935
[ 0] 90 || B: 5.215 | C: 7.533 | M: 5.759 | S: 7.162 | T: 25.670 || ETA: 1 day, 23:11:19 || timer: 1.686
[ 0] 100 || B: 5.138 | C: 6.994 | M: 5.684 | S: 5.983 | T: 23.798 || ETA: 1 day, 22:40:53 || timer: 1.966
[ 0] 110 || B: 5.022 | C: 5.560 | M: 5.631 | S: 1.887 | T: 18.101 || ETA: 1 day, 22:03:22 || timer: 1.795
[ 0] 120 || B: 4.899 | C: 4.587 | M: 5.580 | S: 1.682 | T: 16.747 || ETA: 1 day, 21:31:30 || timer: 1.684
[ 0] 130 || B: 4.761 | C: 4.059 | M: 5.537 | S: 1.484 | T: 15.841 || ETA: 1 day, 21:08:44 || timer: 1.611
[ 0] 140 || B: 4.678 | C: 3.780 | M: 5.514 | S: 1.299 | T: 15.272 || ETA: 1 day, 21:04:35 || timer: 1.660
[ 0] 150 || B: 4.595 | C: 3.673 | M: 5.473 | S: 1.231 | T: 14.972 || ETA: 1 day, 20:52:10 || timer: 1.952
[ 0] 160 || B: 4.525 | C: 3.803 | M: 5.459 | S: 1.214 | T: 15.001 || ETA: 1 day, 20:45:43 || timer: 1.831
[ 0] 170 || B: 4.491 | C: 3.780 | M: 5.470 | S: 1.221 | T: 14.962 || ETA: 1 day, 20:26:25 || timer: 2.007
[ 0] 180 || B: 4.444 | C: 3.763 | M: 5.467 | S: 1.221 | T: 14.895 || ETA: 1 day, 20:18:10 || timer: 1.700
[ 0] 190 || B: 4.400 | C: 3.756 | M: 5.476 | S: 1.209 | T: 14.841 || ETA: 1 day, 20:04:57 || timer: 2.215
[ 0] 200 || B: 4.366 | C: 3.805 | M: 5.494 | S: 1.214 | T: 14.879 || ETA: 1 day, 20:00:01 || timer: 1.852
[ 0] 210 || B: 4.328 | C: 3.802 | M: 5.500 | S: 1.209 | T: 14.839 || ETA: 1 day, 19:49:29 || timer: 1.886
[ 0] 220 || B: 4.282 | C: 3.798 | M: 5.501 | S: 1.211 | T: 14.792 || ETA: 1 day, 19:47:41 || timer: 1.716
[ 0] 230 || B: 4.232 | C: 3.802 | M: 5.480 | S: 1.207 | T: 14.720 || ETA: 1 day, 19:33:02 || timer: 1.762
[ 0] 240 || B: 4.177 | C: 3.807 | M: 5.457 | S: 1.215 | T: 14.657 || ETA: 1 day, 19:24:41 || timer: 1.769
[ 0] 250 || B: 4.135 | C: 3.793 | M: 5.459 | S: 1.211 | T: 14.598 || ETA: 1 day, 19:23:43 || timer: 2.137
[ 0] 260 || B: 4.069 | C: 3.578 | M: 5.425 | S: 1.183 | T: 14.255 || ETA: 1 day, 19:19:09 || timer: 1.905
[ 0] 270 || B: 3.972 | C: 3.548 | M: 5.342 | S: 1.157 | T: 14.018 || ETA: 1 day, 19:10:36 || timer: 1.624
[ 0] 280 || B: 3.888 | C: 3.520 | M: 5.262 | S: 1.145 | T: 13.815 || ETA: 1 day, 19:04:11 || timer: 1.836
[ 0] 290 || B: 3.815 | C: 3.491 | M: 5.171 | S: 1.139 | T: 13.615 || ETA: 1 day, 18:56:37 || timer: 1.971
[ 0] 300 || B: 3.738 | C: 3.401 | M: 5.097 | S: 1.112 | T: 13.348 || ETA: 1 day, 18:56:39 || timer: 1.989
[ 0] 310 || B: 3.668 | C: 3.368 | M: 5.027 | S: 1.098 | T: 13.160 || ETA: 1 day, 18:49:17 || timer: 1.872
[ 0] 320 || B: 3.605 | C: 3.328 | M: 4.949 | S: 1.073 | T: 12.956 || ETA: 1 day, 18:44:44 || timer: 1.844
[ 0] 330 || B: 3.557 | C: 3.276 | M: 4.887 | S: 1.055 | T: 12.776 || ETA: 1 day, 18:39:40 || timer: 2.098
[ 0] 340 || B: 3.517 | C: 3.237 | M: 4.845 | S: 1.047 | T: 12.646 || ETA: 1 day, 18:34:12 || timer: 1.657
[ 0] 350 || B: 3.451 | C: 3.196 | M: 4.772 | S: 1.043 | T: 12.462 || ETA: 1 day, 18:24:13 || timer: 1.566
[ 0] 360 || B: 3.426 | C: 3.148 | M: 4.739 | S: 1.027 | T: 12.341 || ETA: 1 day, 18:26:16 || timer: 1.976
[ 0] 370 || B: 3.391 | C: 3.105 | M: 4.706 | S: 1.013 | T: 12.215 || ETA: 1 day, 18:17:22 || timer: 1.808
[ 0] 380 || B: 3.365 | C: 3.059 | M: 4.687 | S: 1.004 | T: 12.116 || ETA: 1 day, 18:18:23 || timer: 1.665
[ 0] 390 || B: 3.345 | C: 3.011 | M: 4.678 | S: 0.989 | T: 12.022 || ETA: 1 day, 18:09:35 || timer: 1.626
[ 0] 400 || B: 3.325 | C: 2.967 | M: 4.649 | S: 0.973 | T: 11.914 || ETA: 1 day, 18:11:08 || timer: 1.645
[ 0] 410 || B: 3.296 | C: 2.927 | M: 4.607 | S: 0.968 | T: 11.799 || ETA: 1 day, 18:01:48 || timer: 1.627
[ 0] 420 || B: 3.289 | C: 2.891 | M: 4.593 | S: 0.961 | T: 11.734 || ETA: 1 day, 18:04:22 || timer: 1.762
[ 0] 430 || B: 3.273 | C: 2.861 | M: 4.572 | S: 0.964 | T: 11.670 || ETA: 1 day, 18:01:27 || timer: 1.787
[ 0] 440 || B: 3.258 | C: 2.815 | M: 4.547 | S: 0.948 | T: 11.567 || ETA: 1 day, 17:58:56 || timer: 1.716
[ 0] 450 || B: 3.256 | C: 2.775 | M: 4.542 | S: 0.935 | T: 11.508 || ETA: 1 day, 17:56:44 || timer: 1.886
[ 0] 460 || B: 3.248 | C: 2.755 | M: 4.519 | S: 0.937 | T: 11.458 || ETA: 1 day, 17:59:23 || timer: 1.641
[ 0] 470 || B: 3.231 | C: 2.719 | M: 4.511 | S: 0.930 | T: 11.393 || ETA: 1 day, 17:57:11 || timer: 1.949
[ 0] 480 || B: 3.234 | C: 2.682 | M: 4.498 | S: 0.920 | T: 11.333 || ETA: 1 day, 17:58:01 || timer: 1.969
[ 0] 490 || B: 3.238 | C: 2.657 | M: 4.482 | S: 0.921 | T: 11.297 || ETA: 1 day, 17:55:02 || timer: 1.846
[ 0] 500 || B: 3.243 | C: 2.632 | M: 4.473 | S: 0.919 | T: 11.268 || ETA: 1 day, 17:58:24 || timer: 1.619
[ 0] 510 || B: 3.259 | C: 2.587 | M: 4.480 | S: 0.909 | T: 11.236 || ETA: 1 day, 17:56:39 || timer: 1.960
[ 0] 520 || B: 3.244 | C: 2.549 | M: 4.470 | S: 0.909 | T: 11.173 || ETA: 1 day, 18:04:09 || timer: 1.751
[ 0] 530 || B: 3.233 | C: 2.508 | M: 4.462 | S: 0.910 | T: 11.113 || ETA: 1 day, 17:55:18 || timer: 1.783
[ 0] 540 || B: 3.224 | C: 2.471 | M: 4.457 | S: 0.919 | T: 11.071 || ETA: 1 day, 17:50:57 || timer: 1.956
[ 0] 550 || B: 3.214 | C: 2.432 | M: 4.439 | S: 0.911 | T: 10.997 || ETA: 1 day, 17:46:56 || timer: 1.644
[ 0] 560 || B: 3.192 | C: 2.396 | M: 4.418 | S: 0.907 | T: 10.914 || ETA: 1 day, 17:45:10 || timer: 2.157
[ 0] 570 || B: 3.185 | C: 2.365 | M: 4.393 | S: 0.908 | T: 10.851 || ETA: 1 day, 17:45:48 || timer: 1.767
[ 0] 580 || B: 3.166 | C: 2.350 | M: 4.377 | S: 0.911 | T: 10.804 || ETA: 1 day, 17:46:02 || timer: 1.605
[ 0] 590 || B: 3.142 | C: 2.314 | M: 4.357 | S: 0.897 | T: 10.710 || ETA: 1 day, 17:45:47 || timer: 1.671
[ 0] 600 || B: 3.116 | C: 2.273 | M: 4.329 | S: 0.899 | T: 10.617 || ETA: 1 day, 17:46:10 || timer: 1.613
[ 0] 610 || B: 3.088 | C: 2.247 | M: 4.305 | S: 0.895 | T: 10.534 || ETA: 1 day, 17:45:04 || timer: 1.643
[ 0] 620 || B: 3.078 | C: 2.219 | M: 4.288 | S: 0.883 | T: 10.467 || ETA: 1 day, 17:43:27 || timer: 1.601
[ 0] 630 || B: 3.073 | C: 2.196 | M: 4.280 | S: 0.865 | T: 10.415 || ETA: 1 day, 17:42:06 || timer: 1.857
[ 0] 640 || B: 3.062 | C: 2.178 | M: 4.269 | S: 0.847 | T: 10.356 || ETA: 1 day, 17:44:44 || timer: 1.804
[ 0] 650 || B: 3.043 | C: 2.158 | M: 4.244 | S: 0.844 | T: 10.290 || ETA: 1 day, 17:39:18 || timer: 1.568
[ 0] 660 || B: 3.043 | C: 2.141 | M: 4.237 | S: 0.836 | T: 10.257 || ETA: 1 day, 17:38:26 || timer: 1.930
[ 0] 670 || B: 3.041 | C: 2.113 | M: 4.236 | S: 0.829 | T: 10.218 || ETA: 1 day, 17:36:09 || timer: 1.862
[ 0] 680 || B: 3.041 | C: 2.079 | M: 4.234 | S: 0.811 | T: 10.165 || ETA: 1 day, 17:37:51 || timer: 1.778
[ 0] 690 || B: 3.042 | C: 2.055 | M: 4.226 | S: 0.808 | T: 10.131 || ETA: 1 day, 17:35:31 || timer: 1.824
[ 0] 700 || B: 3.023 | C: 2.043 | M: 4.216 | S: 0.811 | T: 10.092 || ETA: 1 day, 17:35:55 || timer: 1.636
[ 0] 710 || B: 3.027 | C: 2.033 | M: 4.213 | S: 0.806 | T: 10.079 || ETA: 1 day, 17:34:29 || timer: 1.911
[ 0] 720 || B: 3.031 | C: 2.028 | M: 4.206 | S: 0.808 | T: 10.074 || ETA: 1 day, 17:37:05 || timer: 1.632
[ 0] 730 || B: 3.032 | C: 2.009 | M: 4.198 | S: 0.809 | T: 10.048 || ETA: 1 day, 17:35:38 || timer: 1.668
[ 0] 740 || B: 3.021 | C: 1.994 | M: 4.175 | S: 0.808 | T: 9.998 || ETA: 1 day, 17:37:06 || timer: 1.733
[ 0] 750 || B: 3.023 | C: 1.981 | M: 4.183 | S: 0.802 | T: 9.989 || ETA: 1 day, 17:35:04 || timer: 1.840
[ 0] 760 || B: 3.013 | C: 1.958 | M: 4.177 | S: 0.797 | T: 9.944 || ETA: 1 day, 17:34:07 || timer: 1.703
[ 0] 770 || B: 3.011 | C: 2.095 | M: 4.173 | S: 0.791 | T: 10.070 || ETA: 1 day, 17:33:14 || timer: 1.877
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
[ 0] 780 || B: 269798101701.176 | C: 10486286282.783 | M: 8.589 | S: 11862522254973579552620544.000 | T: 11862522254973860872978432.000 || ETA: 1 day, 17:31:33 || timer: 1.876
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
[ 0] 790 || B: 269798101701.176 | C: 10486286282.783 | M: 18.004 | S: 51908313622893461812281344.000 | T: 51908313622893745280122880.000 || ETA: 1 day, 17:28:32 || timer: 1.577
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
[ 0] 800 || B: 269798101701.176 | C: 10486286282.783 | M: 27.277 | S: 92311636464036052683718656.000 | T: 92311636464036327561625600.000 || ETA: 1 day, 17:30:08 || timer: 1.535
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
[ 0] 810 || B: 269798101701.176 | C: 10486286282.783 | M: 36.434 | S: 131908262526481968836640768.000 | T: 131908262526482243714547712.000 || ETA: 1 day, 17:33:02 || timer: 1.974
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
[ 0] 820 || B: 269798101701.176 | C: 10486286282.783 | M: 45.753 | S: 173994458955506417989058560.000 | T: 173994458955506692866965504.000 || ETA: 1 day, 17:30:01 || timer: 1.904
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
[ 0] 830 || B: 269798101701.176 | C: 10486286282.783 | M: 54.797 | S: 213965729166717338280525824.000 | T: 213965729166717613158432768.000 || ETA: 1 day, 17:29:55 || timer: 1.684
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
[ 0] 840 || B: 269798101701.176 | C: 10486286282.783 | M: 63.979 | S: 254684610497487086551040000.000 | T: 254684610497487361428946944.000 || ETA: 1 day, 17:31:50 || timer: 1.893
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
[ 0] 850 || B: 269798101701.176 | C: 10486286282.783 | M: 73.085 | S: 296303236197670565655347200.000 | T: 296303236197670840533254144.000 || ETA: 1 day, 17:30:57 || timer: 1.658
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan

@dbolya
Copy link
Owner

dbolya commented Feb 5, 2020

Uh oh.

Does this happen every time you train? Typically this kind of thing only happens in the first epoch or two, so if you get past that it might be fine.

Also this may have to do with your batch size being super high, since the learning rate is automatically multiplied by 12 here so that might affect the instability.

Perhaps try reducing the batch size a little or changing the warmup_until parameter in the config (i.e., how long to keep the initial lr low for) to something much higher than 1000 (which I think it is right now). Seeing as the explosion happened close the the end of lr warmup, that might be your best bet.

@IQ17
Copy link
Author

IQ17 commented Feb 6, 2020

Hi,I tried to train with 2 GPUs and batchsize=48, failed again.
then I tried with batchsize=32, and nan appeared a bit later.

I just changed the batch size, without touching the warp_until parameter.

error logs with batchsize=32 below

root@HDFS-Slave-01:/work/code/yolact_git# CUDA_VISIBLE_DEVICES=5,6 python train.py --config=yolact_resnet50_config --batch_size=32
Multiple GPUs detected! Turning off JIT.
Scaling parameters by 4.00 to account for a batch size of 32.
loading annotations into memory...
Done (t=17.90s)
creating index...
index created!
loading annotations into memory...
Done (t=0.63s)
creating index...
index created!
Initializing weights...
Begin training!

[ 0] 0 || B: 6.875 | C: 22.705 | M: 9.248 | S: 67.251 | T: 106.079 || ETA: 31 days, 14:16:06 || timer: 13.649
[ 0] 10 || B: 6.027 | C: 19.059 | M: 6.488 | S: 48.170 | T: 79.744 || ETA: 4 days, 19:15:27 || timer: 0.967
[ 0] 20 || B: 5.775 | C: 16.907 | M: 6.269 | S: 28.997 | T: 57.948 || ETA: 3 days, 12:50:03 || timer: 0.900
[ 0] 30 || B: 5.575 | C: 14.788 | M: 6.170 | S: 20.256 | T: 46.789 || ETA: 3 days, 2:07:42 || timer: 0.955
[ 0] 40 || B: 5.467 | C: 13.248 | M: 6.051 | S: 15.839 | T: 40.605 || ETA: 2 days, 20:42:42 || timer: 0.903
[ 0] 50 || B: 5.388 | C: 12.150 | M: 5.993 | S: 13.083 | T: 36.614 || ETA: 2 days, 17:38:36 || timer: 0.919
[ 0] 60 || B: 5.324 | C: 11.362 | M: 6.383 | S: 11.169 | T: 34.238 || ETA: 2 days, 15:08:57 || timer: 0.868
[ 0] 70 || B: 5.250 | C: 10.714 | M: 6.338 | S: 9.851 | T: 32.154 || ETA: 2 days, 13:32:12 || timer: 1.006
[ 0] 80 || B: 5.248 | C: 10.382 | M: 6.283 | S: 8.807 | T: 30.719 || ETA: 2 days, 12:20:22 || timer: 0.872
[ 0] 90 || B: 5.251 | C: 9.973 | M: 6.271 | S: 8.112 | T: 29.608 || ETA: 2 days, 11:31:22 || timer: 0.961
[ 0] 100 || B: 5.209 | C: 9.594 | M: 6.198 | S: 6.970 | T: 27.970 || ETA: 2 days, 10:41:50 || timer: 0.995
[ 0] 110 || B: 5.105 | C: 8.467 | M: 6.140 | S: 2.502 | T: 22.214 || ETA: 2 days, 10:08:04 || timer: 0.973
[ 0] 120 || B: 5.092 | C: 7.833 | M: 6.104 | S: 2.147 | T: 21.176 || ETA: 2 days, 9:33:52 || timer: 0.946
[ 0] 130 || B: 5.065 | C: 7.341 | M: 6.072 | S: 2.229 | T: 20.708 || ETA: 2 days, 9:06:01 || timer: 0.886
[ 0] 140 || B: 5.047 | C: 6.891 | M: 6.076 | S: 2.140 | T: 20.154 || ETA: 2 days, 8:50:48 || timer: 0.980
[ 0] 150 || B: 5.014 | C: 6.518 | M: 6.067 | S: 2.114 | T: 19.713 || ETA: 2 days, 8:23:29 || timer: 0.958
[ 0] 160 || B: 5.024 | C: 6.167 | M: 5.827 | S: 2.105 | T: 19.123 || ETA: 2 days, 8:09:08 || timer: 0.915
[ 0] 170 || B: 5.023 | C: 5.881 | M: 5.783 | S: 2.057 | T: 18.745 || ETA: 2 days, 7:56:36 || timer: 0.858
[ 0] 180 || B: 4.969 | C: 5.450 | M: 5.760 | S: 2.055 | T: 18.233 || ETA: 2 days, 7:41:01 || timer: 0.930
[ 0] 190 || B: 4.917 | C: 5.146 | M: 5.714 | S: 1.927 | T: 17.704 || ETA: 2 days, 7:28:49 || timer: 0.986
[ 0] 200 || B: 4.879 | C: 4.758 | M: 5.680 | S: 1.792 | T: 17.109 || ETA: 2 days, 7:17:16 || timer: 0.874
[ 0] 210 || B: 4.841 | C: 4.381 | M: 5.668 | S: 1.745 | T: 16.635 || ETA: 2 days, 7:08:18 || timer: 0.946
[ 0] 220 || B: 4.753 | C: 3.918 | M: 5.655 | S: 1.431 | T: 15.758 || ETA: 2 days, 6:57:49 || timer: 0.869
[ 0] 230 || B: 4.707 | C: 3.724 | M: 5.627 | S: 1.287 | T: 15.345 || ETA: 2 days, 6:47:10 || timer: 0.968
[ 0] 240 || B: 4.655 | C: 3.688 | M: 5.597 | S: 1.280 | T: 15.220 || ETA: 2 days, 6:35:35 || timer: 0.891
[ 0] 250 || B: 4.617 | C: 3.647 | M: 5.577 | S: 1.236 | T: 15.077 || ETA: 2 days, 6:27:20 || timer: 0.928
[ 0] 260 || B: 4.551 | C: 3.634 | M: 5.540 | S: 1.225 | T: 14.949 || ETA: 2 days, 6:17:45 || timer: 0.953
[ 0] 270 || B: 4.501 | C: 3.606 | M: 5.532 | S: 1.199 | T: 14.838 || ETA: 2 days, 6:13:12 || timer: 0.861
[ 0] 280 || B: 4.461 | C: 3.598 | M: 5.512 | S: 1.202 | T: 14.773 || ETA: 2 days, 6:05:57 || timer: 0.828
[ 0] 290 || B: 4.408 | C: 3.594 | M: 5.480 | S: 1.200 | T: 14.682 || ETA: 2 days, 6:03:28 || timer: 0.915
[ 0] 300 || B: 4.342 | C: 3.586 | M: 5.456 | S: 1.214 | T: 14.598 || ETA: 2 days, 5:58:48 || timer: 0.994
[ 0] 310 || B: 4.309 | C: 3.571 | M: 5.449 | S: 1.212 | T: 14.541 || ETA: 2 days, 5:53:28 || timer: 0.896
[ 0] 320 || B: 4.274 | C: 3.584 | M: 5.421 | S: 1.215 | T: 14.494 || ETA: 2 days, 5:49:02 || timer: 0.981
[ 0] 330 || B: 4.227 | C: 3.571 | M: 5.414 | S: 1.192 | T: 14.404 || ETA: 2 days, 5:44:35 || timer: 0.948
[ 0] 340 || B: 4.193 | C: 3.546 | M: 5.388 | S: 1.189 | T: 14.316 || ETA: 2 days, 5:42:34 || timer: 0.964
[ 0] 350 || B: 4.164 | C: 3.552 | M: 5.361 | S: 1.181 | T: 14.258 || ETA: 2 days, 5:36:21 || timer: 0.877
[ 0] 360 || B: 4.114 | C: 3.529 | M: 5.313 | S: 1.175 | T: 14.130 || ETA: 2 days, 5:34:39 || timer: 0.905
[ 0] 370 || B: 4.077 | C: 3.509 | M: 5.276 | S: 1.179 | T: 14.041 || ETA: 2 days, 5:32:33 || timer: 0.963
[ 0] 380 || B: 4.038 | C: 3.494 | M: 5.244 | S: 1.165 | T: 13.941 || ETA: 2 days, 5:27:12 || timer: 0.896
[ 0] 390 || B: 4.016 | C: 3.480 | M: 5.202 | S: 1.155 | T: 13.853 || ETA: 2 days, 5:22:45 || timer: 0.925
[ 0] 400 || B: 4.003 | C: 3.472 | M: 5.183 | S: 1.121 | T: 13.780 || ETA: 2 days, 5:21:14 || timer: 1.022
[ 0] 410 || B: 3.967 | C: 3.452 | M: 5.154 | S: 1.114 | T: 13.686 || ETA: 2 days, 5:20:20 || timer: 0.995
[ 0] 420 || B: 3.948 | C: 3.412 | M: 5.116 | S: 1.096 | T: 13.571 || ETA: 2 days, 5:17:09 || timer: 0.886
[ 0] 430 || B: 3.934 | C: 3.425 | M: 5.079 | S: 1.106 | T: 13.545 || ETA: 2 days, 5:15:35 || timer: 0.863
[ 0] 440 || B: 3.884 | C: 3.428 | M: 5.033 | S: 1.097 | T: 13.442 || ETA: 2 days, 5:12:55 || timer: 0.974
[ 0] 450 || B: 3.843 | C: 3.413 | M: 4.989 | S: 1.115 | T: 13.360 || ETA: 2 days, 5:10:49 || timer: 1.128
[ 0] 460 || B: 3.815 | C: 3.388 | M: 4.972 | S: 1.112 | T: 13.287 || ETA: 2 days, 5:07:35 || timer: 0.926
[ 0] 470 || B: 3.760 | C: 3.378 | M: 4.920 | S: 1.104 | T: 13.162 || ETA: 2 days, 5:04:57 || timer: 0.896
[ 0] 480 || B: 3.712 | C: 3.365 | M: 4.875 | S: 1.100 | T: 13.052 || ETA: 2 days, 5:04:46 || timer: 1.042
[ 0] 490 || B: 3.699 | C: 3.359 | M: 4.886 | S: 1.095 | T: 13.039 || ETA: 2 days, 5:02:10 || timer: 0.966
[ 0] 500 || B: 3.663 | C: 3.347 | M: 4.852 | S: 1.100 | T: 12.963 || ETA: 2 days, 4:58:58 || timer: 0.879
[ 0] 510 || B: 3.644 | C: 3.344 | M: 4.820 | S: 1.102 | T: 12.910 || ETA: 2 days, 4:56:24 || timer: 0.890
[ 0] 520 || B: 3.608 | C: 3.339 | M: 4.799 | S: 1.096 | T: 12.843 || ETA: 2 days, 4:54:34 || timer: 0.985
[ 0] 530 || B: 3.601 | C: 3.333 | M: 4.857 | S: 1.087 | T: 12.879 || ETA: 2 days, 4:52:06 || timer: 0.890
[ 0] 540 || B: 3.612 | C: 3.322 | M: 4.906 | S: 1.094 | T: 12.934 || ETA: 2 days, 4:50:09 || timer: 0.916
[ 0] 550 || B: 3.614 | C: 3.302 | M: 4.934 | S: 1.080 | T: 12.930 || ETA: 2 days, 4:48:01 || timer: 0.886
[ 0] 560 || B: 3.598 | C: 3.302 | M: 4.913 | S: 1.071 | T: 12.885 || ETA: 2 days, 4:46:45 || timer: 0.932
[ 0] 570 || B: 3.613 | C: 3.294 | M: 4.918 | S: 1.072 | T: 12.897 || ETA: 2 days, 4:44:40 || timer: 0.982
[ 0] 580 || B: 3.619 | C: 3.267 | M: 4.913 | S: 1.065 | T: 12.864 || ETA: 2 days, 4:43:06 || timer: 0.893
[ 0] 590 || B: 3.592 | C: 3.246 | M: 4.882 | S: 1.055 | T: 12.775 || ETA: 2 days, 4:41:18 || timer: 0.898
[ 0] 600 || B: 3.598 | C: 3.218 | M: 4.894 | S: 1.041 | T: 12.751 || ETA: 2 days, 4:39:46 || timer: 0.838
[ 0] 610 || B: 3.589 | C: 3.195 | M: 4.884 | S: 1.034 | T: 12.702 || ETA: 2 days, 4:38:50 || timer: 0.857
[ 0] 620 || B: 3.581 | C: 3.168 | M: 4.875 | S: 1.017 | T: 12.640 || ETA: 2 days, 4:37:05 || timer: 0.953
[ 0] 630 || B: 3.549 | C: 3.130 | M: 4.794 | S: 1.004 | T: 12.477 || ETA: 2 days, 4:36:36 || timer: 0.967
[ 0] 640 || B: 3.527 | C: 3.092 | M: 4.737 | S: 0.987 | T: 12.342 || ETA: 2 days, 4:35:36 || timer: 0.911
[ 0] 650 || B: 3.488 | C: 3.062 | M: 4.686 | S: 0.986 | T: 12.223 || ETA: 2 days, 4:33:24 || timer: 0.913
[ 0] 660 || B: 3.486 | C: 3.039 | M: 4.686 | S: 0.986 | T: 12.197 || ETA: 2 days, 4:31:59 || timer: 0.945
[ 0] 670 || B: 3.469 | C: 3.001 | M: 4.687 | S: 0.978 | T: 12.134 || ETA: 2 days, 4:30:15 || timer: 0.942
[ 0] 680 || B: 3.449 | C: 2.975 | M: 4.676 | S: 0.961 | T: 12.062 || ETA: 2 days, 4:28:24 || timer: 0.858
[ 0] 690 || B: 3.417 | C: 2.951 | M: 4.649 | S: 0.957 | T: 11.974 || ETA: 2 days, 4:27:12 || timer: 0.906
[ 0] 700 || B: 3.405 | C: 2.932 | M: 4.628 | S: 0.964 | T: 11.929 || ETA: 2 days, 4:25:48 || timer: 0.890
[ 0] 710 || B: 3.383 | C: 2.909 | M: 4.602 | S: 0.970 | T: 11.865 || ETA: 2 days, 4:24:25 || timer: 0.867
[ 0] 720 || B: 3.382 | C: 2.896 | M: 4.613 | S: 0.972 | T: 11.863 || ETA: 2 days, 4:23:01 || timer: 0.971
[ 0] 730 || B: 3.382 | C: 2.878 | M: 4.607 | S: 0.969 | T: 11.835 || ETA: 2 days, 4:21:33 || timer: 0.862
[ 0] 740 || B: 3.385 | C: 2.871 | M: 4.612 | S: 0.960 | T: 11.829 || ETA: 2 days, 4:22:00 || timer: 0.987
[ 0] 750 || B: 3.373 | C: 2.856 | M: 4.603 | S: 0.954 | T: 11.786 || ETA: 2 days, 4:20:59 || timer: 0.953
[ 0] 760 || B: 3.353 | C: 2.838 | M: 4.579 | S: 0.945 | T: 11.715 || ETA: 2 days, 4:20:02 || timer: 0.944
[ 0] 770 || B: 3.346 | C: 2.829 | M: 4.556 | S: 0.944 | T: 11.675 || ETA: 2 days, 4:18:56 || timer: 0.980
[ 0] 780 || B: 3.367 | C: 2.807 | M: 4.562 | S: 0.939 | T: 11.674 || ETA: 2 days, 4:17:26 || timer: 0.888
[ 0] 790 || B: 3.345 | C: 2.769 | M: 4.542 | S: 0.944 | T: 11.599 || ETA: 2 days, 4:16:26 || timer: 0.922
[ 0] 800 || B: 3.334 | C: 2.751 | M: 4.529 | S: 0.937 | T: 11.551 || ETA: 2 days, 4:15:40 || timer: 0.947
[ 0] 810 || B: 3.303 | C: 2.720 | M: 4.498 | S: 0.925 | T: 11.446 || ETA: 2 days, 4:14:03 || timer: 0.887
[ 0] 820 || B: 3.291 | C: 2.701 | M: 4.477 | S: 0.940 | T: 11.409 || ETA: 2 days, 4:13:35 || timer: 0.909
[ 0] 830 || B: 3.270 | C: 2.677 | M: 4.443 | S: 0.942 | T: 11.332 || ETA: 2 days, 4:11:29 || timer: 0.838
[ 0] 840 || B: 3.263 | C: 2.656 | M: 4.421 | S: 0.944 | T: 11.284 || ETA: 2 days, 4:10:38 || timer: 0.945
[ 0] 850 || B: 3.294 | C: 2.645 | M: 4.470 | S: 0.929 | T: 11.337 || ETA: 2 days, 4:09:54 || timer: 0.918
[ 0] 860 || B: 3.306 | C: 2.613 | M: 4.487 | S: 0.940 | T: 11.346 || ETA: 2 days, 4:09:00 || timer: 0.919
[ 0] 870 || B: 3.311 | C: 2.596 | M: 4.492 | S: 0.933 | T: 11.332 || ETA: 2 days, 4:08:34 || timer: 0.905
[ 0] 880 || B: 3.324 | C: 2.596 | M: 4.502 | S: 0.933 | T: 11.354 || ETA: 2 days, 4:08:57 || timer: 1.025
[ 0] 890 || B: 3.342 | C: 2.600 | M: 4.527 | S: 0.921 | T: 11.389 || ETA: 2 days, 4:07:16 || timer: 0.928
[ 0] 900 || B: 3.358 | C: 2.580 | M: 4.548 | S: 0.903 | T: 11.389 || ETA: 2 days, 4:06:47 || timer: 0.865
[ 0] 910 || B: 3.382 | C: 2.567 | M: 4.567 | S: 0.898 | T: 11.415 || ETA: 2 days, 4:05:00 || timer: 0.908
[ 0] 920 || B: 3.380 | C: 2.544 | M: 4.576 | S: 0.885 | T: 11.386 || ETA: 2 days, 4:04:41 || timer: 0.930
[ 0] 930 || B: 3.382 | C: 2.534 | M: 4.604 | S: 0.904 | T: 11.424 || ETA: 2 days, 4:04:53 || timer: 1.017
[ 0] 940 || B: 3.363 | C: 2.521 | M: 4.594 | S: 0.911 | T: 11.389 || ETA: 2 days, 4:04:37 || timer: 0.896
[ 0] 950 || B: 3.353 | C: 2.500 | M: 4.553 | S: 0.923 | T: 11.329 || ETA: 2 days, 4:04:07 || timer: 0.937
[ 0] 960 || B: 3.321 | C: 2.505 | M: 4.515 | S: 0.901 | T: 11.242 || ETA: 2 days, 4:11:20 || timer: 3.253
[ 0] 970 || B: 3.301 | C: 2.503 | M: 4.503 | S: 0.902 | T: 11.210 || ETA: 2 days, 4:10:15 || timer: 0.936
[ 0] 980 || B: 3.253 | C: 2.492 | M: 4.470 | S: 0.907 | T: 11.121 || ETA: 2 days, 4:08:43 || timer: 0.916
[ 0] 990 || B: 3.246 | C: 2.474 | M: 4.462 | S: 0.906 | T: 11.088 || ETA: 2 days, 4:07:35 || timer: 0.879
[ 0] 1000 || B: 3.218 | C: 2.474 | M: 4.438 | S: 0.913 | T: 11.043 || ETA: 2 days, 3:24:21 || timer: 0.967
[ 0] 1010 || B: 3.224 | C: 2.467 | M: 4.438 | S: 0.911 | T: 11.041 || ETA: 2 days, 3:24:53 || timer: 0.984
[ 0] 1020 || B: 3.202 | C: 2.467 | M: 4.414 | S: 0.920 | T: 11.004 || ETA: 2 days, 3:24:10 || timer: 0.997
[ 0] 1030 || B: 3.184 | C: 2.455 | M: 4.386 | S: 0.896 | T: 10.922 || ETA: 2 days, 3:23:56 || timer: 1.011
[ 0] 1040 || B: 3.175 | C: 2.435 | M: 4.389 | S: 0.887 | T: 10.887 || ETA: 2 days, 3:23:32 || timer: 0.917
[ 0] 1050 || B: 3.164 | C: 2.413 | M: 4.375 | S: 0.891 | T: 10.843 || ETA: 2 days, 3:22:36 || timer: 0.925
[ 0] 1060 || B: 3.174 | C: 2.387 | M: 4.388 | S: 0.905 | T: 10.853 || ETA: 2 days, 3:22:05 || timer: 0.898
[ 0] 1070 || B: 3.189 | C: 2.348 | M: 4.395 | S: 0.909 | T: 10.841 || ETA: 2 days, 3:22:09 || timer: 0.978
[ 0] 1080 || B: 3.207 | C: 2.346 | M: 4.395 | S: 0.900 | T: 10.848 || ETA: 2 days, 3:21:56 || timer: 0.871
[ 0] 1090 || B: 3.204 | C: 2.328 | M: 4.392 | S: 0.906 | T: 10.830 || ETA: 2 days, 3:21:07 || timer: 0.878
[ 0] 1100 || B: 3.200 | C: 2.309 | M: 4.393 | S: 0.909 | T: 10.811 || ETA: 2 days, 3:21:15 || timer: 0.905
[ 0] 1110 || B: 3.203 | C: 2.305 | M: 4.416 | S: 0.909 | T: 10.832 || ETA: 2 days, 3:21:31 || timer: 0.956
[ 0] 1120 || B: 3.221 | C: 2.292 | M: 4.434 | S: 0.910 | T: 10.856 || ETA: 2 days, 3:21:22 || timer: 0.824
[ 0] 1130 || B: 3.232 | C: 2.275 | M: 4.444 | S: 0.902 | T: 10.853 || ETA: 2 days, 3:20:58 || timer: 0.850
[ 0] 1140 || B: 3.225 | C: 2.280 | M: 4.432 | S: 0.906 | T: 10.843 || ETA: 2 days, 3:20:03 || timer: 0.881
[ 0] 1150 || B: 3.223 | C: 2.278 | M: 4.429 | S: 0.892 | T: 10.822 || ETA: 2 days, 3:20:58 || timer: 0.898
[ 0] 1160 || B: 3.216 | C: 2.282 | M: 4.420 | S: 0.889 | T: 10.807 || ETA: 2 days, 3:20:13 || timer: 0.906
[ 0] 1170 || B: 3.202 | C: 2.292 | M: 4.399 | S: 0.885 | T: 10.779 || ETA: 2 days, 3:18:58 || timer: 0.891
[ 0] 1180 || B: 3.195 | C: 2.274 | M: 4.406 | S: 0.873 | T: 10.748 || ETA: 2 days, 3:18:44 || timer: 0.846
[ 0] 1190 || B: 3.203 | C: 2.282 | M: 4.390 | S: 0.866 | T: 10.741 || ETA: 2 days, 3:17:53 || timer: 0.911
[ 0] 1200 || B: 3.187 | C: 2.286 | M: 4.345 | S: 0.869 | T: 10.687 || ETA: 2 days, 3:16:43 || timer: 0.969
[ 0] 1210 || B: 3.167 | C: 2.274 | M: 4.319 | S: 0.871 | T: 10.631 || ETA: 2 days, 3:16:04 || timer: 0.958
[ 0] 1220 || B: 3.145 | C: 2.262 | M: 4.304 | S: 0.863 | T: 10.575 || ETA: 2 days, 3:16:38 || timer: 0.985
[ 0] 1230 || B: 3.118 | C: 2.253 | M: 4.289 | S: 0.867 | T: 10.528 || ETA: 2 days, 3:17:00 || timer: 0.976
[ 0] 1240 || B: 3.119 | C: 2.239 | M: 4.287 | S: 0.857 | T: 10.501 || ETA: 2 days, 3:17:03 || timer: 0.847
[ 0] 1250 || B: 3.102 | C: 2.238 | M: 4.296 | S: 0.884 | T: 10.520 || ETA: 2 days, 3:17:30 || timer: 0.932
[ 0] 1260 || B: 3.108 | C: 2.222 | M: 4.300 | S: 0.868 | T: 10.498 || ETA: 2 days, 3:18:02 || timer: 0.909
[ 0] 1270 || B: 3.095 | C: 2.207 | M: 4.292 | S: 0.868 | T: 10.462 || ETA: 2 days, 3:17:35 || timer: 0.935
[ 0] 1280 || B: 3.082 | C: 2.181 | M: 4.272 | S: 0.875 | T: 10.411 || ETA: 2 days, 3:17:44 || timer: 0.924
[ 0] 1290 || B: 3.050 | C: 2.162 | M: 4.257 | S: 0.876 | T: 10.345 || ETA: 2 days, 3:16:16 || timer: 0.864
[ 0] 1300 || B: 3.068 | C: 2.141 | M: 4.288 | S: 0.873 | T: 10.370 || ETA: 2 days, 3:16:23 || timer: 0.959
[ 0] 1310 || B: 3.067 | C: 2.140 | M: 4.279 | S: 0.873 | T: 10.359 || ETA: 2 days, 3:15:23 || timer: 0.907
[ 0] 1320 || B: 3.072 | C: 2.127 | M: 4.270 | S: 0.857 | T: 10.327 || ETA: 2 days, 3:15:15 || timer: 1.063
[ 0] 1330 || B: 3.105 | C: 2.118 | M: 4.295 | S: 0.861 | T: 10.380 || ETA: 2 days, 3:15:10 || timer: 0.911
[ 0] 1340 || B: 3.114 | C: 2.097 | M: 4.309 | S: 0.861 | T: 10.381 || ETA: 2 days, 3:14:17 || timer: 0.944
[ 0] 1350 || B: 3.112 | C: 2.077 | M: 4.295 | S: 0.841 | T: 10.325 || ETA: 2 days, 3:14:31 || timer: 0.914
[ 0] 1360 || B: 3.110 | C: 2.061 | M: 4.288 | S: 0.842 | T: 10.302 || ETA: 2 days, 3:14:12 || timer: 1.009
[ 0] 1370 || B: 3.117 | C: 2.058 | M: 4.306 | S: 0.831 | T: 10.311 || ETA: 2 days, 3:12:56 || timer: 0.933
[ 0] 1380 || B: 3.121 | C: 2.061 | M: 4.311 | S: 0.840 | T: 10.334 || ETA: 2 days, 3:13:25 || timer: 0.959
[ 0] 1390 || B: 3.169 | C: 2.062 | M: 4.335 | S: 0.829 | T: 10.395 || ETA: 2 days, 3:13:22 || timer: 0.877
[ 0] 1400 || B: 3.154 | C: 2.060 | M: 4.315 | S: 0.821 | T: 10.350 || ETA: 2 days, 3:12:33 || timer: 0.939
[ 0] 1410 || B: 3.142 | C: 2.055 | M: 4.321 | S: 0.813 | T: 10.331 || ETA: 2 days, 3:12:10 || timer: 1.002
[ 0] 1420 || B: 3.129 | C: 2.048 | M: 4.313 | S: 0.818 | T: 10.308 || ETA: 2 days, 3:11:36 || timer: 1.002
[ 0] 1430 || B: 3.104 | C: 2.045 | M: 4.284 | S: 0.815 | T: 10.248 || ETA: 2 days, 3:10:46 || timer: 0.950
[ 0] 1440 || B: 3.084 | C: 2.048 | M: 4.267 | S: 0.821 | T: 10.220 || ETA: 2 days, 3:10:35 || timer: 0.926
[ 0] 1450 || B: 3.073 | C: 2.046 | M: 4.260 | S: 0.810 | T: 10.189 || ETA: 2 days, 3:09:55 || timer: 0.920
[ 0] 1460 || B: 3.067 | C: 2.046 | M: 4.257 | S: 0.818 | T: 10.189 || ETA: 2 days, 3:10:05 || timer: 0.912
[ 0] 1470 || B: 3.046 | C: 2.027 | M: 4.231 | S: 0.827 | T: 10.131 || ETA: 2 days, 3:09:18 || timer: 0.876
[ 0] 1480 || B: 3.042 | C: 2.031 | M: 4.224 | S: 0.821 | T: 10.118 || ETA: 2 days, 3:07:48 || timer: 1.007
[ 0] 1490 || B: 3.016 | C: 2.021 | M: 4.222 | S: 0.822 | T: 10.081 || ETA: 2 days, 3:07:52 || timer: 0.966
[ 0] 1500 || B: 3.004 | C: 2.014 | M: 4.217 | S: 0.840 | T: 10.075 || ETA: 2 days, 3:07:55 || timer: 0.963
[ 0] 1510 || B: 3.014 | C: 1.994 | M: 4.222 | S: 0.839 | T: 10.068 || ETA: 2 days, 3:08:27 || timer: 0.966
[ 0] 1520 || B: 3.022 | C: 1.999 | M: 4.224 | S: 0.837 | T: 10.082 || ETA: 2 days, 3:07:56 || timer: 0.905
[ 0] 1530 || B: 3.042 | C: 1.990 | M: 4.240 | S: 0.830 | T: 10.102 || ETA: 2 days, 3:08:53 || timer: 0.932
[ 0] 1540 || B: 3.052 | C: 1.985 | M: 4.248 | S: 0.831 | T: 10.115 || ETA: 2 days, 3:09:50 || timer: 1.019
[ 0] 1550 || B: 3.050 | C: 1.965 | M: 4.241 | S: 0.837 | T: 10.094 || ETA: 2 days, 3:09:30 || timer: 0.993
[ 0] 1560 || B: 3.046 | C: 1.942 | M: 4.230 | S: 0.827 | T: 10.045 || ETA: 2 days, 3:08:58 || timer: 0.883
[ 0] 1570 || B: 3.063 | C: 1.934 | M: 4.244 | S: 0.822 | T: 10.063 || ETA: 2 days, 3:09:29 || timer: 0.987
[ 0] 1580 || B: 3.066 | C: 1.920 | M: 4.244 | S: 0.811 | T: 10.041 || ETA: 2 days, 3:10:00 || timer: 0.960
[ 0] 1590 || B: 3.045 | C: 1.900 | M: 4.210 | S: 0.815 | T: 9.970 || ETA: 2 days, 3:09:37 || timer: 0.947
[ 0] 1600 || B: 3.070 | C: 1.886 | M: 4.233 | S: 0.792 | T: 9.980 || ETA: 2 days, 3:09:53 || timer: 0.920
[ 0] 1610 || B: 3.070 | C: 1.872 | M: 4.228 | S: 0.788 | T: 9.958 || ETA: 2 days, 3:09:24 || timer: 0.972
[ 0] 1620 || B: 3.057 | C: 1.874 | M: 4.214 | S: 0.790 | T: 9.935 || ETA: 2 days, 3:09:05 || timer: 0.934
[ 0] 1630 || B: 3.041 | C: 1.861 | M: 4.217 | S: 0.789 | T: 9.908 || ETA: 2 days, 3:09:00 || timer: 0.954
[ 0] 1640 || B: 3.020 | C: 1.852 | M: 4.210 | S: 0.783 | T: 9.865 || ETA: 2 days, 3:08:22 || timer: 0.948
[ 0] 1650 || B: 3.020 | C: 1.860 | M: 4.224 | S: 0.772 | T: 9.876 || ETA: 2 days, 3:09:10 || timer: 0.894
[ 0] 1660 || B: 3.019 | C: 1.872 | M: 4.230 | S: 0.773 | T: 9.895 || ETA: 2 days, 3:08:52 || timer: 0.900
[ 0] 1670 || B: 3.025 | C: 1.888 | M: 4.229 | S: 0.771 | T: 9.913 || ETA: 2 days, 3:09:49 || timer: 0.902
[ 0] 1680 || B: 3.004 | C: 1.876 | M: 4.215 | S: 0.779 | T: 9.874 || ETA: 2 days, 3:09:35 || timer: 0.883
[ 0] 1690 || B: 3.014 | C: 1.891 | M: 4.235 | S: 0.782 | T: 9.921 || ETA: 2 days, 3:09:21 || timer: 0.972
[ 0] 1700 || B: 2.985 | C: 1.885 | M: 4.208 | S: 0.792 | T: 9.870 || ETA: 2 days, 3:08:37 || timer: 0.836
[ 0] 1710 || B: 2.972 | C: 1.888 | M: 4.193 | S: 0.802 | T: 9.856 || ETA: 2 days, 3:08:47 || timer: 0.886
[ 0] 1720 || B: 2.987 | C: 1.870 | M: 4.206 | S: 0.803 | T: 9.866 || ETA: 2 days, 3:08:22 || timer: 0.947
[ 0] 1730 || B: 2.996 | C: 1.882 | M: 4.202 | S: 0.812 | T: 9.893 || ETA: 2 days, 3:08:23 || timer: 0.943
[ 0] 1740 || B: 3.020 | C: 1.886 | M: 4.197 | S: 0.808 | T: 9.910 || ETA: 2 days, 3:06:42 || timer: 0.968
[ 0] 1750 || B: 3.015 | C: 1.895 | M: 4.164 | S: 0.823 | T: 9.898 || ETA: 2 days, 3:05:49 || timer: 1.039
[ 0] 1760 || B: 3.010 | C: 1.882 | M: 4.158 | S: 0.825 | T: 9.875 || ETA: 2 days, 3:05:34 || timer: 1.007
[ 0] 1770 || B: 2.993 | C: 1.865 | M: 4.140 | S: 0.826 | T: 9.825 || ETA: 2 days, 3:05:23 || timer: 0.957
[ 0] 1780 || B: 3.008 | C: 1.875 | M: 4.157 | S: 0.816 | T: 9.857 || ETA: 2 days, 3:05:50 || timer: 0.880
[ 0] 1790 || B: 3.008 | C: 1.862 | M: 4.158 | S: 0.820 | T: 9.847 || ETA: 2 days, 3:05:21 || timer: 0.952
[ 0] 1800 || B: 3.027 | C: 1.871 | M: 4.168 | S: 0.806 | T: 9.872 || ETA: 2 days, 3:04:45 || timer: 0.933
[ 0] 1810 || B: 3.029 | C: 1.870 | M: 4.177 | S: 0.793 | T: 9.869 || ETA: 2 days, 3:04:43 || timer: 0.885
[ 0] 1820 || B: 3.017 | C: 1.865 | M: 4.177 | S: 0.799 | T: 9.858 || ETA: 2 days, 3:03:37 || timer: 0.832
[ 0] 1830 || B: 3.005 | C: 1.873 | M: 4.161 | S: 0.794 | T: 9.833 || ETA: 2 days, 3:04:04 || timer: 0.857
[ 0] 1840 || B: 2.986 | C: 1.856 | M: 4.149 | S: 0.793 | T: 9.784 || ETA: 2 days, 3:04:36 || timer: 0.955
[ 0] 1850 || B: 3.001 | C: 1.824 | M: 4.156 | S: 0.781 | T: 9.763 || ETA: 2 days, 3:04:04 || timer: 0.945
[ 0] 1860 || B: 3.008 | C: 1.812 | M: 4.150 | S: 0.775 | T: 9.745 || ETA: 2 days, 3:03:40 || timer: 0.933
[ 0] 1870 || B: 3.016 | C: 1.794 | M: 4.161 | S: 0.771 | T: 9.743 || ETA: 2 days, 3:02:59 || timer: 0.947
[ 0] 1880 || B: 3.000 | C: 1.776 | M: 4.135 | S: 0.776 | T: 9.686 || ETA: 2 days, 3:01:19 || timer: 0.850
[ 0] 1890 || B: 3.007 | C: 1.772 | M: 4.139 | S: 0.766 | T: 9.684 || ETA: 2 days, 3:01:40 || timer: 0.934
[ 0] 1900 || B: 2.997 | C: 1.757 | M: 4.131 | S: 0.776 | T: 9.661 || ETA: 2 days, 3:01:37 || timer: 0.887
[ 0] 1910 || B: 2.977 | C: 1.741 | M: 4.105 | S: 0.783 | T: 9.607 || ETA: 2 days, 3:02:00 || timer: 0.905
[ 0] 1920 || B: 2.960 | C: 1.731 | M: 4.077 | S: 0.777 | T: 9.545 || ETA: 2 days, 3:01:13 || timer: 0.882
[ 0] 1930 || B: 2.949 | C: 1.711 | M: 4.083 | S: 0.777 | T: 9.520 || ETA: 2 days, 3:00:29 || timer: 0.882
[ 0] 1940 || B: 2.945 | C: 1.714 | M: 4.080 | S: 0.775 | T: 9.515 || ETA: 2 days, 3:00:04 || timer: 0.941
[ 0] 1950 || B: 2.949 | C: 1.734 | M: 4.092 | S: 0.778 | T: 9.553 || ETA: 2 days, 2:59:38 || timer: 0.909
[ 0] 1960 || B: 2.922 | C: 1.732 | M: 4.075 | S: 0.778 | T: 9.507 || ETA: 2 days, 2:50:52 || timer: 1.001
[ 0] 1970 || B: 2.920 | C: 1.738 | M: 4.080 | S: 0.778 | T: 9.515 || ETA: 2 days, 2:52:22 || timer: 1.028
[ 0] 1980 || B: 2.923 | C: 1.739 | M: 4.096 | S: 0.775 | T: 9.533 || ETA: 2 days, 2:52:53 || timer: 0.969
[ 0] 1990 || B: 2.902 | C: 1.739 | M: 4.081 | S: 0.787 | T: 9.509 || ETA: 2 days, 2:53:00 || timer: 1.037
[ 0] 2000 || B: 2.890 | C: 1.761 | M: 4.073 | S: 0.798 | T: 9.522 || ETA: 2 days, 2:53:09 || timer: 0.964
[ 0] 2010 || B: 2.916 | C: 1.770 | M: 4.087 | S: 0.797 | T: 9.570 || ETA: 2 days, 2:52:16 || timer: 0.927
[ 0] 2020 || B: 2.948 | C: 1.759 | M: 4.114 | S: 0.801 | T: 9.623 || ETA: 2 days, 2:52:13 || timer: 0.863
[ 0] 2030 || B: 2.962 | C: 1.765 | M: 4.099 | S: 0.802 | T: 9.628 || ETA: 2 days, 2:51:51 || timer: 0.889
[ 0] 2040 || B: 2.977 | C: 1.760 | M: 4.110 | S: 0.802 | T: 9.650 || ETA: 2 days, 2:51:37 || timer: 0.930
[ 0] 2050 || B: 2.954 | C: 1.739 | M: 4.085 | S: 0.794 | T: 9.572 || ETA: 2 days, 2:50:43 || timer: 0.889
[ 0] 2060 || B: 2.985 | C: 1.757 | M: 4.118 | S: 0.806 | T: 9.667 || ETA: 2 days, 2:50:44 || timer: 0.930
[ 0] 2070 || B: 2.974 | C: 1.758 | M: 4.094 | S: 0.804 | T: 9.631 || ETA: 2 days, 2:49:37 || timer: 0.938
[ 0] 2080 || B: 2.969 | C: 1.740 | M: 4.082 | S: 0.807 | T: 9.598 || ETA: 2 days, 2:49:22 || timer: 0.926
[ 0] 2090 || B: 2.984 | C: 1.729 | M: 4.089 | S: 0.797 | T: 9.600 || ETA: 2 days, 2:49:10 || timer: 1.024
[ 0] 2100 || B: 3.009 | C: 1.711 | M: 4.124 | S: 0.777 | T: 9.621 || ETA: 2 days, 2:48:53 || timer: 0.956
[ 0] 2110 || B: 2.990 | C: 1.716 | M: 4.120 | S: 0.773 | T: 9.600 || ETA: 2 days, 2:46:52 || timer: 0.911
[ 0] 2120 || B: 2.972 | C: 1.719 | M: 4.112 | S: 0.765 | T: 9.568 || ETA: 2 days, 2:45:56 || timer: 0.922
[ 0] 2130 || B: 2.961 | C: 1.708 | M: 4.109 | S: 0.761 | T: 9.539 || ETA: 2 days, 2:45:58 || timer: 0.949
[ 0] 2140 || B: 2.937 | C: 1.696 | M: 4.085 | S: 0.758 | T: 9.476 || ETA: 2 days, 2:44:35 || timer: 0.869
[ 0] 2150 || B: 2.958 | C: 1.708 | M: 4.112 | S: 0.765 | T: 9.543 || ETA: 2 days, 2:44:00 || timer: 0.867
[ 0] 2160 || B: 2.951 | C: 1.701 | M: 4.114 | S: 0.768 | T: 9.535 || ETA: 2 days, 2:44:22 || timer: 0.861
[ 0] 2170 || B: 2.953 | C: 1.698 | M: 4.132 | S: 0.770 | T: 9.553 || ETA: 2 days, 2:44:42 || timer: 0.969
[ 0] 2180 || B: 2.970 | C: 1.702 | M: 4.152 | S: 0.765 | T: 9.589 || ETA: 2 days, 2:45:47 || timer: 1.054
[ 0] 2190 || B: 2.964 | C: 1.697 | M: 4.133 | S: 0.769 | T: 9.563 || ETA: 2 days, 2:45:40 || timer: 1.004
[ 0] 2200 || B: 2.949 | C: 1.699 | M: 4.110 | S: 0.776 | T: 9.534 || ETA: 2 days, 2:46:31 || timer: 0.954
[ 0] 2210 || B: 2.964 | C: 1.686 | M: 4.109 | S: 0.779 | T: 9.537 || ETA: 2 days, 2:45:10 || timer: 0.829
[ 0] 2220 || B: 2.947 | C: 1.682 | M: 4.090 | S: 0.784 | T: 9.503 || ETA: 2 days, 2:44:42 || timer: 0.969
[ 0] 2230 || B: 2.931 | C: 1.674 | M: 4.079 | S: 0.789 | T: 9.472 || ETA: 2 days, 2:43:20 || timer: 0.820
[ 0] 2240 || B: 2.928 | C: 1.686 | M: 4.081 | S: 0.793 | T: 9.488 || ETA: 2 days, 2:43:30 || timer: 0.964
[ 0] 2250 || B: 2.915 | C: 1.675 | M: 4.064 | S: 0.778 | T: 9.433 || ETA: 2 days, 2:43:48 || timer: 1.085
[ 0] 2260 || B: 2.915 | C: 1.663 | M: 4.040 | S: 0.761 | T: 9.379 || ETA: 2 days, 2:43:14 || timer: 0.916
[ 0] 2270 || B: 3.197 | C: 4.093 | M: 4.136 | S: 0.875 | T: 12.301 || ETA: 2 days, 2:43:10 || timer: 0.968
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
[ 0] 2280 || B: 844195596750879086084096.000 | C: 40308940282615329456128.000 | M: 7.250 | S: 206793274.694 | T: 884504537033494675587072.000 || ETA: 2 days, 2:42:15 || timer: 0.895
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
[ 0] 2290 || B: 844195596750879086084096.000 | C: 40308940282615329456128.000 | M: 7.740 | S: 206793279.987 | T: 884504537033494675587072.000 || ETA: 2 days, 2:41:10 || timer: 0.920

error logs with batchsize=48 below

root@HDFS-Slave-01:/work/code/yolact_git# CUDA_VISIBLE_DEVICES=5,6 python train.py --config=yolact_resnet50_config --batch_size=48
Multiple GPUs detected! Turning off JIT.
Scaling parameters by 6.00 to account for a batch size of 48.
loading annotations into memory...
Done (t=18.21s)
creating index...
index created!
loading annotations into memory...
Done (t=0.66s)
creating index...
index created!
Initializing weights...
Begin training!

[ 0] 0 || B: 6.326 | C: 27.412 | M: 6.040 | S: 65.752 | T: 105.530 || ETA: 27 days, 0:51:02 || timer: 17.519
[ 0] 10 || B: 5.982 | C: 20.326 | M: 6.039 | S: 46.819 | T: 79.166 || ETA: 4 days, 8:06:50 || timer: 1.368
[ 0] 20 || B: 5.712 | C: 17.446 | M: 5.869 | S: 27.535 | T: 56.561 || ETA: 3 days, 5:27:57 || timer: 1.281
[ 0] 30 || B: 5.598 | C: 15.078 | M: 5.994 | S: 19.303 | T: 45.974 || ETA: 2 days, 20:22:37 || timer: 1.507
[ 0] 40 || B: 5.470 | C: 13.464 | M: 6.011 | S: 15.002 | T: 39.947 || ETA: 2 days, 15:57:51 || timer: 1.332
[ 0] 50 || B: 5.380 | C: 12.317 | M: 5.985 | S: 12.370 | T: 36.052 || ETA: 2 days, 12:57:29 || timer: 1.276
[ 0] 60 || B: 5.305 | C: 11.411 | M: 5.949 | S: 10.584 | T: 33.250 || ETA: 2 days, 11:14:02 || timer: 1.469
[ 0] 70 || B: 5.241 | C: 10.757 | M: 5.895 | S: 9.273 | T: 31.166 || ETA: 2 days, 9:57:38 || timer: 1.401
[ 0] 80 || B: 5.191 | C: 10.288 | M: 5.858 | S: 8.302 | T: 29.639 || ETA: 2 days, 8:53:42 || timer: 1.336
[ 0] 90 || B: 5.151 | C: 9.908 | M: 5.832 | S: 7.530 | T: 28.421 || ETA: 2 days, 8:01:22 || timer: 1.204
[ 0] 100 || B: 5.104 | C: 9.413 | M: 5.805 | S: 6.321 | T: 26.643 || ETA: 2 days, 7:26:22 || timer: 1.312
[ 0] 110 || B: 4.990 | C: 8.121 | M: 5.802 | S: 1.949 | T: 20.862 || ETA: 2 days, 6:54:05 || timer: 1.343
[ 0] 120 || B: 4.907 | C: 7.348 | M: 5.785 | S: 1.441 | T: 19.481 || ETA: 2 days, 6:23:29 || timer: 1.266
[ 0] 130 || B: 4.832 | C: 7.012 | M: 5.727 | S: 1.358 | T: 18.929 || ETA: 2 days, 6:05:09 || timer: 1.261
[ 0] 140 || B: 4.793 | C: 6.829 | M: 5.689 | S: 1.325 | T: 18.636 || ETA: 2 days, 5:46:28 || timer: 1.240
[ 0] 150 || B: 4.761 | C: 6.741 | M: 5.674 | S: 1.300 | T: 18.476 || ETA: 2 days, 5:27:03 || timer: 1.295
[ 0] 160 || B: 4.773 | C: 6.637 | M: 5.675 | S: 1.380 | T: 18.464 || ETA: 2 days, 5:17:23 || timer: 1.295
[ 0] 170 || B: 4.777 | C: 6.357 | M: 5.683 | S: 1.426 | T: 18.243 || ETA: 2 days, 5:06:25 || timer: 1.351
[ 0] 180 || B: 4.751 | C: 6.135 | M: 5.675 | S: 1.443 | T: 18.004 || ETA: 2 days, 4:56:50 || timer: 1.376
[ 0] 190 || B: 4.719 | C: 5.831 | M: 5.670 | S: 1.464 | T: 17.685 || ETA: 2 days, 4:45:15 || timer: 1.424
[ 0] 200 || B: 4.686 | C: 5.522 | M: 5.665 | S: 1.472 | T: 17.344 || ETA: 2 days, 4:35:31 || timer: 1.332
[ 0] 210 || B: 4.626 | C: 5.213 | M: 5.602 | S: 1.490 | T: 16.931 || ETA: 2 days, 4:23:36 || timer: 1.378
[ 0] 220 || B: 4.585 | C: 4.925 | M: 5.590 | S: 1.497 | T: 16.596 || ETA: 2 days, 4:14:22 || timer: 1.255
[ 0] 230 || B: 4.541 | C: 4.601 | M: 5.564 | S: 1.495 | T: 16.201 || ETA: 2 days, 4:06:46 || timer: 1.342
[ 0] 240 || B: 4.495 | C: 4.287 | M: 5.537 | S: 1.482 | T: 15.801 || ETA: 2 days, 4:00:15 || timer: 1.340
[ 0] 250 || B: 4.438 | C: 3.959 | M: 5.497 | S: 1.470 | T: 15.363 || ETA: 2 days, 3:57:12 || timer: 1.392
[ 0] 260 || B: 4.336 | C: 3.725 | M: 5.446 | S: 1.362 | T: 14.869 || ETA: 2 days, 3:55:16 || timer: 1.379
[ 0] 270 || B: 4.262 | C: 3.671 | M: 5.429 | S: 1.289 | T: 14.651 || ETA: 2 days, 3:55:31 || timer: 1.335
[ 0] 280 || B: 4.202 | C: 3.540 | M: 5.406 | S: 1.227 | T: 14.376 || ETA: 2 days, 3:55:46 || timer: 1.388
[ 0] 290 || B: 4.146 | C: 3.508 | M: 5.366 | S: 1.182 | T: 14.202 || ETA: 2 days, 3:56:11 || timer: 1.414
[ 0] 300 || B: 4.077 | C: 3.488 | M: 5.335 | S: 1.163 | T: 14.064 || ETA: 2 days, 3:56:16 || timer: 1.305
[ 0] 310 || B: 4.041 | C: 3.471 | M: 5.300 | S: 1.128 | T: 13.941 || ETA: 2 days, 3:54:41 || timer: 1.359
[ 0] 320 || B: 3.993 | C: 3.434 | M: 5.265 | S: 1.108 | T: 13.800 || ETA: 2 days, 3:58:09 || timer: 1.414
[ 0] 330 || B: 3.954 | C: 3.414 | M: 5.219 | S: 1.090 | T: 13.677 || ETA: 2 days, 3:56:59 || timer: 1.367
[ 0] 340 || B: 3.905 | C: 3.382 | M: 5.185 | S: 1.068 | T: 13.539 || ETA: 2 days, 3:57:44 || timer: 1.356
[ 0] 350 || B: 3.864 | C: 3.368 | M: 5.148 | S: 1.049 | T: 13.429 || ETA: 2 days, 3:59:08 || timer: 1.373
[ 0] 360 || B: 3.813 | C: 3.346 | M: 5.110 | S: 1.033 | T: 13.301 || ETA: 2 days, 4:00:49 || timer: 1.551
[ 0] 370 || B: 3.761 | C: 3.319 | M: 5.043 | S: 1.021 | T: 13.144 || ETA: 2 days, 4:00:06 || timer: 1.455
[ 0] 380 || B: 3.714 | C: 3.303 | M: 4.989 | S: 1.041 | T: 13.046 || ETA: 2 days, 4:01:50 || timer: 1.557
[ 0] 390 || B: 3.668 | C: 3.276 | M: 4.951 | S: 1.041 | T: 12.937 || ETA: 2 days, 4:02:18 || timer: 1.471
[ 0] 400 || B: 3.623 | C: 3.259 | M: 4.896 | S: 1.036 | T: 12.814 || ETA: 2 days, 4:02:53 || timer: 1.268
[ 0] 410 || B: 3.585 | C: 3.237 | M: 4.870 | S: 1.049 | T: 12.740 || ETA: 2 days, 4:06:38 || timer: 1.549
[ 0] 420 || B: 3.557 | C: 3.227 | M: 4.834 | S: 1.037 | T: 12.655 || ETA: 2 days, 4:07:38 || timer: 1.446
[ 0] 430 || B: 3.532 | C: 3.208 | M: 4.803 | S: 1.043 | T: 12.585 || ETA: 2 days, 4:09:50 || timer: 1.346
[ 0] 440 || B: 3.487 | C: 3.209 | M: 4.752 | S: 1.049 | T: 12.497 || ETA: 2 days, 4:11:13 || timer: 1.388
[ 0] 450 || B: 3.456 | C: 3.168 | M: 4.720 | S: 1.048 | T: 12.392 || ETA: 2 days, 4:13:00 || timer: 1.542
[ 0] 460 || B: 3.445 | C: 3.152 | M: 4.691 | S: 1.045 | T: 12.333 || ETA: 2 days, 4:16:13 || timer: 1.491
[ 0] 470 || B: 3.441 | C: 3.141 | M: 4.694 | S: 1.048 | T: 12.324 || ETA: 2 days, 4:19:39 || timer: 1.392
[ 0] 480 || B: 3.435 | C: 3.107 | M: 4.684 | S: 1.020 | T: 12.246 || ETA: 2 days, 4:19:17 || timer: 1.375
[ 0] 490 || B: 3.422 | C: 3.076 | M: 4.671 | S: 1.005 | T: 12.174 || ETA: 2 days, 4:23:11 || timer: 1.472
[ 0] 500 || B: 3.438 | C: 3.037 | M: 4.675 | S: 0.982 | T: 12.132 || ETA: 2 days, 4:23:34 || timer: 1.506
[ 0] 510 || B: 3.458 | C: 2.993 | M: 4.682 | S: 0.949 | T: 12.082 || ETA: 2 days, 4:26:35 || timer: 1.749
[ 0] 520 || B: 3.451 | C: 2.964 | M: 4.671 | S: 0.945 | T: 12.031 || ETA: 2 days, 4:31:22 || timer: 1.364
[ 0] 530 || B: 3.443 | C: 2.929 | M: 4.674 | S: 0.926 | T: 11.972 || ETA: 2 days, 4:33:06 || timer: 1.256
[ 0] 540 || B: 3.446 | C: 2.875 | M: 4.671 | S: 0.914 | T: 11.905 || ETA: 2 days, 4:39:56 || timer: 1.655
[ 0] 550 || B: 3.455 | C: 2.862 | M: 4.662 | S: 0.905 | T: 11.884 || ETA: 2 days, 4:40:43 || timer: 1.515
[ 0] 560 || B: 3.455 | C: 2.826 | M: 4.665 | S: 0.892 | T: 11.838 || ETA: 2 days, 4:36:31 || timer: 1.381
[ 0] 570 || B: 3.429 | C: 2.782 | M: 4.641 | S: 0.891 | T: 11.744 || ETA: 2 days, 4:31:51 || timer: 1.295
[ 0] 580 || B: 3.408 | C: 2.759 | M: 4.620 | S: 0.901 | T: 11.687 || ETA: 2 days, 4:28:40 || timer: 1.315
[ 0] 590 || B: 3.392 | C: 2.738 | M: 4.596 | S: 0.900 | T: 11.625 || ETA: 2 days, 4:24:53 || timer: 1.347
[ 0] 600 || B: 3.372 | C: 2.695 | M: 4.570 | S: 0.898 | T: 11.535 || ETA: 2 days, 4:22:26 || timer: 1.512
[ 0] 610 || B: 3.331 | C: 2.675 | M: 4.532 | S: 0.918 | T: 11.456 || ETA: 2 days, 4:18:59 || timer: 1.334
[ 0] 620 || B: 3.325 | C: 2.630 | M: 4.528 | S: 0.902 | T: 11.385 || ETA: 2 days, 4:16:14 || timer: 1.257
[ 0] 630 || B: 3.308 | C: 2.604 | M: 4.506 | S: 0.904 | T: 11.322 || ETA: 2 days, 4:12:57 || timer: 1.271
[ 0] 640 || B: 3.301 | C: 2.592 | M: 4.493 | S: 0.895 | T: 11.281 || ETA: 2 days, 4:10:08 || timer: 1.253
[ 0] 650 || B: 3.263 | C: 2.562 | M: 4.475 | S: 0.897 | T: 11.197 || ETA: 2 days, 4:07:18 || timer: 1.433
[ 0] 660 || B: 3.239 | C: 2.530 | M: 4.448 | S: 0.900 | T: 11.117 || ETA: 2 days, 4:03:40 || timer: 1.329
[ 0] 670 || B: 3.224 | C: 2.508 | M: 4.424 | S: 0.896 | T: 11.052 || ETA: 2 days, 4:00:19 || timer: 1.296
[ 0] 680 || B: 3.213 | C: 2.466 | M: 4.425 | S: 0.886 | T: 10.990 || ETA: 2 days, 3:57:02 || timer: 1.305
[ 0] 690 || B: 3.204 | C: 2.435 | M: 4.435 | S: 0.897 | T: 10.971 || ETA: 2 days, 3:55:27 || timer: 1.419
[ 0] 700 || B: 3.194 | C: 2.420 | M: 4.433 | S: 0.907 | T: 10.953 || ETA: 2 days, 3:52:37 || timer: 1.215
[ 0] 710 || B: 3.173 | C: 2.398 | M: 4.408 | S: 0.894 | T: 10.873 || ETA: 2 days, 3:49:41 || timer: 1.277
[ 0] 720 || B: 3.156 | C: 2.376 | M: 4.385 | S: 0.906 | T: 10.823 || ETA: 2 days, 3:46:22 || timer: 1.256
[ 0] 730 || B: 3.151 | C: 2.354 | M: 4.386 | S: 0.905 | T: 10.796 || ETA: 2 days, 3:44:22 || timer: 1.408
[ 0] 740 || B: 3.120 | C: 2.325 | M: 4.367 | S: 0.911 | T: 10.724 || ETA: 2 days, 3:40:56 || timer: 1.381
[ 0] 750 || B: 3.125 | C: 2.320 | M: 4.361 | S: 0.903 | T: 10.709 || ETA: 2 days, 3:38:30 || timer: 1.318
[ 0] 760 || B: 3.121 | C: 2.325 | M: 4.351 | S: 0.895 | T: 10.691 || ETA: 2 days, 3:35:30 || timer: 1.446
[ 0] 770 || B: 3.107 | C: 2.314 | M: 4.341 | S: 0.902 | T: 10.664 || ETA: 2 days, 3:32:20 || timer: 1.300
[ 0] 780 || B: 3.120 | C: 2.299 | M: 4.335 | S: 0.901 | T: 10.655 || ETA: 2 days, 3:37:55 || timer: 1.267
[ 0] 790 || B: 3.121 | C: 2.290 | M: 4.304 | S: 0.891 | T: 10.605 || ETA: 2 days, 3:34:54 || timer: 1.360
[ 0] 800 || B: 3.107 | C: 2.272 | M: 4.285 | S: 0.881 | T: 10.545 || ETA: 2 days, 3:32:41 || timer: 1.339
[ 0] 810 || B: 3.113 | C: 2.238 | M: 4.282 | S: 0.865 | T: 10.498 || ETA: 2 days, 3:30:29 || timer: 1.484
[ 0] 820 || B: 3.109 | C: 2.226 | M: 4.280 | S: 0.852 | T: 10.467 || ETA: 2 days, 3:28:36 || timer: 1.240
[ 0] 830 || B: 3.088 | C: 2.207 | M: 4.251 | S: 0.845 | T: 10.391 || ETA: 2 days, 3:26:06 || timer: 1.233
[ 0] 840 || B: 3.120 | C: 2.199 | M: 4.274 | S: 0.831 | T: 10.424 || ETA: 2 days, 3:26:28 || timer: 1.393
[ 0] 850 || B: 3.121 | C: 2.175 | M: 4.278 | S: 0.820 | T: 10.393 || ETA: 2 days, 3:24:07 || timer: 1.269
[ 0] 860 || B: 3.111 | C: 2.154 | M: 4.286 | S: 0.814 | T: 10.365 || ETA: 2 days, 3:22:05 || timer: 1.379
[ 0] 870 || B: 3.116 | C: 2.136 | M: 4.271 | S: 0.800 | T: 10.323 || ETA: 2 days, 3:19:34 || timer: 1.292
[ 0] 880 || B: 3.099 | C: 2.127 | M: 4.262 | S: 0.799 | T: 10.286 || ETA: 2 days, 3:18:17 || timer: 1.415
[ 0] 890 || B: 3.094 | C: 2.091 | M: 4.271 | S: 0.786 | T: 10.243 || ETA: 2 days, 3:16:26 || timer: 1.344
[ 0] 900 || B: 3.105 | C: 2.076 | M: 4.293 | S: 0.784 | T: 10.258 || ETA: 2 days, 3:14:02 || timer: 1.334
[ 0] 910 || B: 3.115 | C: 2.079 | M: 4.315 | S: 0.792 | T: 10.301 || ETA: 2 days, 3:13:12 || timer: 1.337
[ 0] 920 || B: 3.115 | C: 2.063 | M: 4.317 | S: 0.795 | T: 10.289 || ETA: 2 days, 3:10:30 || timer: 1.264
[ 0] 930 || B: 3.128 | C: 2.048 | M: 4.326 | S: 0.795 | T: 10.297 || ETA: 2 days, 3:09:32 || timer: 1.291
[ 0] 940 || B: 3.120 | C: 2.028 | M: 4.315 | S: 0.795 | T: 10.258 || ETA: 2 days, 3:07:51 || timer: 1.288
[ 0] 950 || B: 3.128 | C: 2.016 | M: 4.319 | S: 0.801 | T: 10.264 || ETA: 2 days, 3:06:38 || timer: 1.324
[ 0] 960 || B: 3.130 | C: 2.004 | M: 4.317 | S: 0.810 | T: 10.260 || ETA: 2 days, 3:04:57 || timer: 1.511
[ 0] 970 || B: 3.130 | C: 1.978 | M: 4.339 | S: 0.808 | T: 10.254 || ETA: 2 days, 3:03:12 || timer: 1.327
[ 0] 980 || B: 3.141 | C: 1.984 | M: 4.351 | S: 0.806 | T: 10.282 || ETA: 2 days, 3:01:15 || timer: 1.371
[ 0] 990 || B: 3.145 | C: 1.993 | M: 4.348 | S: 0.800 | T: 10.286 || ETA: 2 days, 2:59:48 || timer: 1.461
[ 0] 1000 || B: 3.143 | C: 1.992 | M: 4.314 | S: 0.796 | T: 10.244 || ETA: 2 days, 2:22:11 || timer: 1.217
[ 0] 1010 || B: 3.127 | C: 1.986 | M: 4.283 | S: 0.795 | T: 10.191 || ETA: 2 days, 2:21:44 || timer: 1.271
[ 0] 1020 || B: 3.134 | C: 1.978 | M: 4.283 | S: 0.796 | T: 10.191 || ETA: 2 days, 2:22:32 || timer: 1.373
[ 0] 1030 || B: 3.136 | C: 1.977 | M: 4.279 | S: 0.801 | T: 10.193 || ETA: 2 days, 2:21:54 || timer: 1.275
[ 0] 1040 || B: 3.126 | C: 1.969 | M: 4.277 | S: 0.802 | T: 10.174 || ETA: 2 days, 2:21:43 || timer: 1.386
[ 0] 1050 || B: 3.113 | C: 1.950 | M: 4.271 | S: 0.798 | T: 10.132 || ETA: 2 days, 2:22:42 || timer: 1.387
[ 0] 1060 || B: 3.359 | C: 4.154 | M: 4.322 | S: 0.809 | T: 12.644 || ETA: 2 days, 2:22:57 || timer: 1.438
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
[ 0] 1070 || B: 451167146140854940336128.000 | C: 29900119943204984324096.000 | M: 7.313 | S: 596456.118 | T: 481067266084059891105792.000 || ETA: 2 days, 2:21:36 || timer: 1.320
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan
Warning: Moving average ignored a value of inf
Warning: Moving average ignored a value of nan

@IQ17
Copy link
Author

IQ17 commented Feb 6, 2020

Hi, training with batchsize=16 seems ok.

CUDA_VISIBLE_DEVICES=5,6 python train.py --config=yolact_resnet50_config --batch_size=16

@eslambakr
Copy link

Lowering the learning rate solved the issue for me, I hope I could help

@dbolya
Copy link
Owner

dbolya commented Feb 6, 2020

So it does look like the issue was with the learning rate. If you want to still train with 4 GPUs I'd suggest something like 4 GPUs batch size 32.

@IQ17
Copy link
Author

IQ17 commented Feb 7, 2020

Thanks @eslambakr @dbolya !

Yes, the learning rate looks a bit of too high, so I will monitor it when training with large batch size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants