I have some problems when training model by myself #4

gtdong-ustc · 2019-03-05T13:34:56Z

Thank you for sharing the code first. But when I read your code (Train.py), I found some things that confused me, as follows:

I don't understand the function of "-20" and "20" in these two lines of code
op = tf.image.resize_images(tf.nn.relu(op * -20), [self._left_input_batch.get_shape()[1].value, self._left_input_batch.get_shape()[2].value]) lines in 70 of Train.py

u5 = tf.image.resize_images(V6, [image_height // scales[5], image_width // scales[5]]) * 20. / scales[5] lines in 282 of Train.py

rescaled_prediction = tf.image.resize_images(self._get_layer_as_input('final_disp'), [image_height, image_width]) * -20. lines in 371 of Train.py
When I trained on the Monkaa data set, I found that the value of loss always jumped back and forth, and the model did not converge. After many times of failures, I found that some code confused me in the process of network construction.

rescaled_prediction = tf.image.resize_images(self._get_layer_as_input('final_disp'), [image_height, image_width]) * -20. lines in 371 of Train.py

In my understand, this code pad the final disparity map to the size of groundtruth. Then use the padded map compute the loss. I think it is easier to converge by reducing the size of groudtruth in the process of computing the loss. I would appreciate it if you could answer my questions. Thank you very much.

The text was updated successfully, but these errors were encountered:

AlessioTonioni · 2019-03-05T17:29:12Z

The values predicted by the network should be multiplied by -20 to get the real disparity since it was trained by scaling the gt disparity by -20.
We predict negative disparity following the guidelines of Dispnet (https://github.com/lmb-freiburg/dispnet-flownet-docker) and divide the GT disparity by 20 following the guidelines of PWCNet for optical flow (https://github.com/NVlabs/PWC-Net/tree/master/PyTorch)

I think it is easier to converge by reducing the size of groudtruth in the process of computing the loss

In our tests, this strategy is worse than scaling the predictions to full res and computing the loss at that resolution. In the end, at test time, what we want is good full-scale disparity, so that's what we try to optimize during training.

gtdong-ustc · 2019-03-06T01:15:29Z

Thank you for your answer, it's very helpful for me. But I would like to ask if you have trained MADNet on the Monkaa dataset.
I think MADNet has an excellent structure, but when training, it always fails to converge, which makes me confused. I use the loss function as sum_l1, the learning rate is 0.0001, and the image size is 540*960.
By the way, I tried two loss weights included "0.1 0 0 0 0 0" and "0.001 0.005 0.01 0.02 0.08 0.32". And I used about 9000 pairs of images training MADNet from the Monkaa Dataset. Both the size of the batch and the number of epochs are default (4 and 50).
Do you see anything wrong with what I said above. I would appreciate it if you could answer my questions. Thank you very much.
Finally, I would like ask you the size of crop patch when training initial model on the FlyingThings3D And is it [320 1216] that the size of crop patch when fine tuning model on the KITTI? Thank you.
By the way，I am not fine-tuning the model you provided, and the changes of loss in training are shown below.

Step: 0 Loss:82149176.00 f/b time:1.103076 Missing time:2:59:10.579628
Step: 100 Loss:101245104.00 f/b time:1.640108 Missing time:4:23:40.481182
Step: 200 Loss:79647536.00 f/b time:0.739764 Missing time:1:57:41.785609
Step: 300 Loss:44172792.00 f/b time:0.722132 Missing time:1:53:41.258160
Step: 400 Loss:122596168.00 f/b time:0.740674 Missing time:1:55:22.336617
Step: 500 Loss:92552768.00 f/b time:0.726324 Missing time:1:51:55.590848
Step: 600 Loss:48044144.00 f/b time:0.760257 Missing time:1:55:53.312820
Step: 700 Loss:60508004.00 f/b time:0.735130 Missing time:1:50:49.989108
Step: 800 Loss:51625608.00 f/b time:0.716060 Missing time:1:46:45.875763
Step: 900 Loss:30678898.00 f/b time:0.743405 Missing time:1:49:36.162201
Step:1000 Loss:118136176.00 f/b time:0.678501 Missing time:1:38:54.170375
Step:1100 Loss:37568808.00 f/b time:0.683084 Missing time:1:38:25.945040
Step:1200 Loss:20956792.00 f/b time:0.678761 Missing time:1:36:40.688394
Step:1300 Loss:103458920.00 f/b time:0.678957 Missing time:1:35:34.473881
Step:1400 Loss:80669552.00 f/b time:0.680076 Missing time:1:34:35.913426
Step:1500 Loss:109143920.00 f/b time:0.670911 Missing time:1:32:12.331455
Step:1600 Loss:44451808.00 f/b time:0.696983 Missing time:1:34:37.623683
Step:1700 Loss:48482608.00 f/b time:0.662907 Missing time:1:28:53.748088
Step:1800 Loss:96827880.00 f/b time:0.689216 Missing time:1:31:16.509340
Step:1900 Loss:73230400.00 f/b time:0.667668 Missing time:1:27:18.523702
Step:2000 Loss:20895572.00 f/b time:0.609990 Missing time:1:18:44.985464
Step:2100 Loss:31481928.00 f/b time:0.562105 Missing time:1:11:37.858075
Step:2200 Loss:106946216.00 f/b time:0.569965 Missing time:1:11:40.959237
Step:2300 Loss:30177622.00 f/b time:0.570952 Missing time:1:10:51.308490
Step:2400 Loss:45409716.00 f/b time:0.585991 Missing time:1:11:44.686619
Step:2500 Loss:28416378.00 f/b time:0.563872 Missing time:1:08:05.814948
Step:2600 Loss:73035240.00 f/b time:0.564972 Missing time:1:07:17.290117
Step:2700 Loss:12155137.00 f/b time:0.575955 Missing time:1:07:38.177922
Step:2800 Loss:86098592.00 f/b time:0.581054 Missing time:1:07:16.001918
Step:2900 Loss:59940872.00 f/b time:0.581937 Missing time:1:06:23.940038
Step:3000 Loss:106753304.00 f/b time:0.569091 Missing time:1:03:59.085107
Step:3100 Loss:41178208.00 f/b time:0.570131 Missing time:1:03:09.090271
Step:3200 Loss:57402672.00 f/b time:0.561907 Missing time:1:01:18.241368
Step:3300 Loss:21157860.00 f/b time:0.574063 Missing time:1:01:40.409518
Step:3400 Loss:59461328.00 f/b time:0.576110 Missing time:1:00:55.996182
Step:3500 Loss:43945436.00 f/b time:0.579927 Missing time:1:00:22.223140
Step:3600 Loss:26998368.00 f/b time:0.571051 Missing time:0:58:29.678254
Step:3700 Loss:37470368.00 f/b time:0.564951 Missing time:0:56:55.694555
Step:3800 Loss:19729680.00 f/b time:0.563200 Missing time:0:55:48.789762
Step:3900 Loss:32875782.00 f/b time:0.560864 Missing time:0:54:38.810765
Step:4000 Loss:33607632.00 f/b time:0.562201 Missing time:0:53:50.407627
Step:4100 Loss:76776664.00 f/b time:0.571852 Missing time:0:53:48.674331
Step:4200 Loss:30695420.00 f/b time:0.570076 Missing time:0:52:41.644078
Step:4300 Loss:17730886.00 f/b time:0.558033 Missing time:0:50:39.045077
Step:4400 Loss:78198920.00 f/b time:0.561799 Missing time:0:50:03.375462
Step:4500 Loss:42095736.00 f/b time:0.561092 Missing time:0:49:03.489270
Step:4600 Loss:49046488.00 f/b time:0.577978 Missing time:0:49:34.273417
Step:4700 Loss:21321726.00 f/b time:0.585298 Missing time:0:49:13.412827
Step:4800 Loss:55373944.00 f/b time:0.564014 Missing time:0:46:29.611633
Step:4900 Loss:45453356.00 f/b time:0.554697 Missing time:0:44:48.062718
Step:5000 Loss:21018192.00 f/b time:0.566125 Missing time:0:44:46.831454
Step:5100 Loss:47506576.00 f/b time:0.580741 Missing time:0:44:58.120560
Step:5200 Loss:53374140.00 f/b time:0.573076 Missing time:0:43:25.204342
Step:5300 Loss:41059704.00 f/b time:0.578096 Missing time:0:42:50.214196
Step:5400 Loss:49286628.00 f/b time:0.604055 Missing time:0:43:45.223680
Step:5500 Loss:28696432.00 f/b time:0.588838 Missing time:0:41:40.206109
Step:5600 Loss:30107682.00 f/b time:0.614938 Missing time:0:42:29.534556
Step:5700 Loss:38352700.00 f/b time:0.617028 Missing time:0:41:36.495039
Step:5800 Loss:44413684.00 f/b time:0.620078 Missing time:0:40:46.827566
Step:5900 Loss:44799188.00 f/b time:0.617840 Missing time:0:39:36.212774
Step:6000 Loss:40783128.00 f/b time:0.583038 Missing time:0:36:24.061174
Step:6100 Loss:34486452.00 f/b time:0.584891 Missing time:0:35:32.513941
Step:6200 Loss:16102487.00 f/b time:0.571028 Missing time:0:33:44.865876
Step:6300 Loss:24807738.00 f/b time:0.578875 Missing time:0:33:14.802997
Step:6400 Loss:70252552.00 f/b time:0.561124 Missing time:0:31:17.521049
Step:6500 Loss:20925732.00 f/b time:0.557837 Missing time:0:30:10.737725
Step:6600 Loss:25232166.00 f/b time:0.564172 Missing time:0:29:34.885767
Step:6700 Loss:32423636.00 f/b time:0.568004 Missing time:0:28:50.140628
Step:6800 Loss:72949928.00 f/b time:0.585063 Missing time:0:28:43.596300
Step:6900 Loss:35100400.00 f/b time:0.576863 Missing time:0:27:21.751950
Step:7000 Loss:27057478.00 f/b time:0.574164 Missing time:0:26:16.654683
Step:7100 Loss:20094376.00 f/b time:0.580144 Missing time:0:25:35.060633
Step:7200 Loss:26188044.00 f/b time:0.574219 Missing time:0:24:21.960622
Step:7300 Loss:41249288.00 f/b time:0.574885 Missing time:0:23:26.168031
Step:7400 Loss:13672790.00 f/b time:0.573155 Missing time:0:22:24.621211
Step:7500 Loss:28645250.00 f/b time:0.573014 Missing time:0:21:26.989368
Step:7600 Loss:34009620.00 f/b time:0.563103 Missing time:0:20:08.418938
Step:7700 Loss:34804764.00 f/b time:0.577980 Missing time:0:19:42.546784
Step:7800 Loss:17961864.00 f/b time:0.578041 Missing time:0:18:44.867736
Step:7900 Loss:26369896.00 f/b time:0.581047 Missing time:0:17:52.612583
Step:8000 Loss:51959988.00 f/b time:0.579080 Missing time:0:16:51.074404
Step:8100 Loss:54916152.00 f/b time:0.573742 Missing time:0:15:44.379448
Step:8200 Loss:35827344.00 f/b time:0.604047 Missing time:0:15:33.856335
Step:8300 Loss:12400078.00 f/b time:0.585920 Missing time:0:14:07.240551
Step:8400 Loss:22363470.00 f/b time:0.586063 Missing time:0:13:08.840475
Step:8500 Loss:20961122.00 f/b time:0.569973 Missing time:0:11:50.186072
Step:8600 Loss:22290648.00 f/b time:0.576955 Missing time:0:11:01.190844
Step:8700 Loss:16602785.00 f/b time:0.571057 Missing time:0:09:57.325380
Step:8800 Loss:11309609.00 f/b time:0.578001 Missing time:0:09:06.788764
Step:8900 Loss:18950684.00 f/b time:0.586943 Missing time:0:08:16.553779
Step:9000 Loss:12000423.00 f/b time:0.567151 Missing time:0:07:03.094774
Step:9100 Loss:37663100.00 f/b time:0.583944 Missing time:0:06:17.227828
Step:9200 Loss:30709348.00 f/b time:0.571136 Missing time:0:05:11.840073

gtdong-ustc · 2019-03-06T02:20:04Z

I'm sorry that I made a mistake. First, the loss weights mentioned in paper should be 0, 0.005, 0.01, 0.02,
0.08, 0.32. Second, the loss function used in training should be sum_L2. I will try to train again after change these parameter.

AlessioTonioni · 2019-03-06T10:45:30Z

But I would like to ask if you have trained MADNet on the Monkaa dataset.

No never tried. The base model was trained on Flying Things 3D

Finally, I would like ask you the size of crop patch when training initial model on the FlyingThings3D

384 x 768

the loss function used in training should be sum_L2

No actually we used sum_l1

gtdong-ustc · 2019-03-06T10:54:40Z

Thank you for your answer, I will verify what happens to the experimental results when these parameters change.

gtdong-ustc closed this as completed Apr 2, 2019

rogercw mentioned this issue Apr 13, 2019

Couple questions about training from scratch #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I have some problems when training model by myself #4

I have some problems when training model by myself #4

gtdong-ustc commented Mar 5, 2019

AlessioTonioni commented Mar 5, 2019

gtdong-ustc commented Mar 6, 2019 •

edited

gtdong-ustc commented Mar 6, 2019

AlessioTonioni commented Mar 6, 2019 •

edited

gtdong-ustc commented Mar 6, 2019

I have some problems when training model by myself #4

I have some problems when training model by myself #4

Comments

gtdong-ustc commented Mar 5, 2019

AlessioTonioni commented Mar 5, 2019

gtdong-ustc commented Mar 6, 2019 • edited

gtdong-ustc commented Mar 6, 2019

AlessioTonioni commented Mar 6, 2019 • edited

gtdong-ustc commented Mar 6, 2019

gtdong-ustc commented Mar 6, 2019 •

edited

AlessioTonioni commented Mar 6, 2019 •

edited