Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I have some problems when training model by myself #4

Closed
gtdong-ustc opened this issue Mar 5, 2019 · 5 comments
Closed

I have some problems when training model by myself #4

gtdong-ustc opened this issue Mar 5, 2019 · 5 comments

Comments

@gtdong-ustc
Copy link

Thank you for sharing the code first. But when I read your code (Train.py), I found some things that confused me, as follows:

  1. I don't understand the function of "-20" and "20" in these two lines of code
    op = tf.image.resize_images(tf.nn.relu(op * -20), [self._left_input_batch.get_shape()[1].value, self._left_input_batch.get_shape()[2].value]) lines in 70 of Train.py

    u5 = tf.image.resize_images(V6, [image_height // scales[5], image_width // scales[5]]) * 20. / scales[5] lines in 282 of Train.py

    rescaled_prediction = tf.image.resize_images(self._get_layer_as_input('final_disp'), [image_height, image_width]) * -20. lines in 371 of Train.py

  2. When I trained on the Monkaa data set, I found that the value of loss always jumped back and forth, and the model did not converge. After many times of failures, I found that some code confused me in the process of network construction.

rescaled_prediction = tf.image.resize_images(self._get_layer_as_input('final_disp'), [image_height, image_width]) * -20. lines in 371 of Train.py

In my understand, this code pad the final disparity map to the size of groundtruth. Then use the padded map compute the loss. I think it is easier to converge by reducing the size of groudtruth in the process of computing the loss. I would appreciate it if you could answer my questions. Thank you very much.

@AlessioTonioni
Copy link
Member

The values predicted by the network should be multiplied by -20 to get the real disparity since it was trained by scaling the gt disparity by -20.
We predict negative disparity following the guidelines of Dispnet (https://github.com/lmb-freiburg/dispnet-flownet-docker) and divide the GT disparity by 20 following the guidelines of PWCNet for optical flow (https://github.com/NVlabs/PWC-Net/tree/master/PyTorch)

I think it is easier to converge by reducing the size of groudtruth in the process of computing the loss

In our tests, this strategy is worse than scaling the predictions to full res and computing the loss at that resolution. In the end, at test time, what we want is good full-scale disparity, so that's what we try to optimize during training.

@gtdong-ustc
Copy link
Author

gtdong-ustc commented Mar 6, 2019

Thank you for your answer, it's very helpful for me. But I would like to ask if you have trained MADNet on the Monkaa dataset.
I think MADNet has an excellent structure, but when training, it always fails to converge, which makes me confused. I use the loss function as sum_l1, the learning rate is 0.0001, and the image size is 540*960.
By the way, I tried two loss weights included "0.1 0 0 0 0 0" and "0.001 0.005 0.01 0.02 0.08 0.32". And I used about 9000 pairs of images training MADNet from the Monkaa Dataset. Both the size of the batch and the number of epochs are default (4 and 50).
Do you see anything wrong with what I said above. I would appreciate it if you could answer my questions. Thank you very much.
Finally, I would like ask you the size of crop patch when training initial model on the FlyingThings3D And is it [320 1216] that the size of crop patch when fine tuning model on the KITTI? Thank you.
By the way,I am not fine-tuning the model you provided, and the changes of loss in training are shown below.

  1. Step: 0 Loss:82149176.00 f/b time:1.103076 Missing time:2:59:10.579628
  2. Step: 100 Loss:101245104.00 f/b time:1.640108 Missing time:4:23:40.481182
  3. Step: 200 Loss:79647536.00 f/b time:0.739764 Missing time:1:57:41.785609
  4. Step: 300 Loss:44172792.00 f/b time:0.722132 Missing time:1:53:41.258160
  5. Step: 400 Loss:122596168.00 f/b time:0.740674 Missing time:1:55:22.336617
  6. Step: 500 Loss:92552768.00 f/b time:0.726324 Missing time:1:51:55.590848
  7. Step: 600 Loss:48044144.00 f/b time:0.760257 Missing time:1:55:53.312820
  8. Step: 700 Loss:60508004.00 f/b time:0.735130 Missing time:1:50:49.989108
  9. Step: 800 Loss:51625608.00 f/b time:0.716060 Missing time:1:46:45.875763
  10. Step: 900 Loss:30678898.00 f/b time:0.743405 Missing time:1:49:36.162201
  11. Step:1000 Loss:118136176.00 f/b time:0.678501 Missing time:1:38:54.170375
  12. Step:1100 Loss:37568808.00 f/b time:0.683084 Missing time:1:38:25.945040
  13. Step:1200 Loss:20956792.00 f/b time:0.678761 Missing time:1:36:40.688394
  14. Step:1300 Loss:103458920.00 f/b time:0.678957 Missing time:1:35:34.473881
  15. Step:1400 Loss:80669552.00 f/b time:0.680076 Missing time:1:34:35.913426
  16. Step:1500 Loss:109143920.00 f/b time:0.670911 Missing time:1:32:12.331455
  17. Step:1600 Loss:44451808.00 f/b time:0.696983 Missing time:1:34:37.623683
  18. Step:1700 Loss:48482608.00 f/b time:0.662907 Missing time:1:28:53.748088
  19. Step:1800 Loss:96827880.00 f/b time:0.689216 Missing time:1:31:16.509340
  20. Step:1900 Loss:73230400.00 f/b time:0.667668 Missing time:1:27:18.523702
  21. Step:2000 Loss:20895572.00 f/b time:0.609990 Missing time:1:18:44.985464
  22. Step:2100 Loss:31481928.00 f/b time:0.562105 Missing time:1:11:37.858075
  23. Step:2200 Loss:106946216.00 f/b time:0.569965 Missing time:1:11:40.959237
  24. Step:2300 Loss:30177622.00 f/b time:0.570952 Missing time:1:10:51.308490
  25. Step:2400 Loss:45409716.00 f/b time:0.585991 Missing time:1:11:44.686619
  26. Step:2500 Loss:28416378.00 f/b time:0.563872 Missing time:1:08:05.814948
  27. Step:2600 Loss:73035240.00 f/b time:0.564972 Missing time:1:07:17.290117
  28. Step:2700 Loss:12155137.00 f/b time:0.575955 Missing time:1:07:38.177922
  29. Step:2800 Loss:86098592.00 f/b time:0.581054 Missing time:1:07:16.001918
  30. Step:2900 Loss:59940872.00 f/b time:0.581937 Missing time:1:06:23.940038
  31. Step:3000 Loss:106753304.00 f/b time:0.569091 Missing time:1:03:59.085107
  32. Step:3100 Loss:41178208.00 f/b time:0.570131 Missing time:1:03:09.090271
  33. Step:3200 Loss:57402672.00 f/b time:0.561907 Missing time:1:01:18.241368
  34. Step:3300 Loss:21157860.00 f/b time:0.574063 Missing time:1:01:40.409518
  35. Step:3400 Loss:59461328.00 f/b time:0.576110 Missing time:1:00:55.996182
  36. Step:3500 Loss:43945436.00 f/b time:0.579927 Missing time:1:00:22.223140
  37. Step:3600 Loss:26998368.00 f/b time:0.571051 Missing time:0:58:29.678254
  38. Step:3700 Loss:37470368.00 f/b time:0.564951 Missing time:0:56:55.694555
  39. Step:3800 Loss:19729680.00 f/b time:0.563200 Missing time:0:55:48.789762
  40. Step:3900 Loss:32875782.00 f/b time:0.560864 Missing time:0:54:38.810765
  41. Step:4000 Loss:33607632.00 f/b time:0.562201 Missing time:0:53:50.407627
  42. Step:4100 Loss:76776664.00 f/b time:0.571852 Missing time:0:53:48.674331
  43. Step:4200 Loss:30695420.00 f/b time:0.570076 Missing time:0:52:41.644078
  44. Step:4300 Loss:17730886.00 f/b time:0.558033 Missing time:0:50:39.045077
  45. Step:4400 Loss:78198920.00 f/b time:0.561799 Missing time:0:50:03.375462
  46. Step:4500 Loss:42095736.00 f/b time:0.561092 Missing time:0:49:03.489270
  47. Step:4600 Loss:49046488.00 f/b time:0.577978 Missing time:0:49:34.273417
  48. Step:4700 Loss:21321726.00 f/b time:0.585298 Missing time:0:49:13.412827
  49. Step:4800 Loss:55373944.00 f/b time:0.564014 Missing time:0:46:29.611633
  50. Step:4900 Loss:45453356.00 f/b time:0.554697 Missing time:0:44:48.062718
  51. Step:5000 Loss:21018192.00 f/b time:0.566125 Missing time:0:44:46.831454
  52. Step:5100 Loss:47506576.00 f/b time:0.580741 Missing time:0:44:58.120560
  53. Step:5200 Loss:53374140.00 f/b time:0.573076 Missing time:0:43:25.204342
  54. Step:5300 Loss:41059704.00 f/b time:0.578096 Missing time:0:42:50.214196
  55. Step:5400 Loss:49286628.00 f/b time:0.604055 Missing time:0:43:45.223680
  56. Step:5500 Loss:28696432.00 f/b time:0.588838 Missing time:0:41:40.206109
  57. Step:5600 Loss:30107682.00 f/b time:0.614938 Missing time:0:42:29.534556
  58. Step:5700 Loss:38352700.00 f/b time:0.617028 Missing time:0:41:36.495039
  59. Step:5800 Loss:44413684.00 f/b time:0.620078 Missing time:0:40:46.827566
  60. Step:5900 Loss:44799188.00 f/b time:0.617840 Missing time:0:39:36.212774
  61. Step:6000 Loss:40783128.00 f/b time:0.583038 Missing time:0:36:24.061174
  62. Step:6100 Loss:34486452.00 f/b time:0.584891 Missing time:0:35:32.513941
  63. Step:6200 Loss:16102487.00 f/b time:0.571028 Missing time:0:33:44.865876
  64. Step:6300 Loss:24807738.00 f/b time:0.578875 Missing time:0:33:14.802997
  65. Step:6400 Loss:70252552.00 f/b time:0.561124 Missing time:0:31:17.521049
  66. Step:6500 Loss:20925732.00 f/b time:0.557837 Missing time:0:30:10.737725
  67. Step:6600 Loss:25232166.00 f/b time:0.564172 Missing time:0:29:34.885767
  68. Step:6700 Loss:32423636.00 f/b time:0.568004 Missing time:0:28:50.140628
  69. Step:6800 Loss:72949928.00 f/b time:0.585063 Missing time:0:28:43.596300
  70. Step:6900 Loss:35100400.00 f/b time:0.576863 Missing time:0:27:21.751950
  71. Step:7000 Loss:27057478.00 f/b time:0.574164 Missing time:0:26:16.654683
  72. Step:7100 Loss:20094376.00 f/b time:0.580144 Missing time:0:25:35.060633
  73. Step:7200 Loss:26188044.00 f/b time:0.574219 Missing time:0:24:21.960622
  74. Step:7300 Loss:41249288.00 f/b time:0.574885 Missing time:0:23:26.168031
  75. Step:7400 Loss:13672790.00 f/b time:0.573155 Missing time:0:22:24.621211
  76. Step:7500 Loss:28645250.00 f/b time:0.573014 Missing time:0:21:26.989368
  77. Step:7600 Loss:34009620.00 f/b time:0.563103 Missing time:0:20:08.418938
  78. Step:7700 Loss:34804764.00 f/b time:0.577980 Missing time:0:19:42.546784
  79. Step:7800 Loss:17961864.00 f/b time:0.578041 Missing time:0:18:44.867736
  80. Step:7900 Loss:26369896.00 f/b time:0.581047 Missing time:0:17:52.612583
  81. Step:8000 Loss:51959988.00 f/b time:0.579080 Missing time:0:16:51.074404
  82. Step:8100 Loss:54916152.00 f/b time:0.573742 Missing time:0:15:44.379448
  83. Step:8200 Loss:35827344.00 f/b time:0.604047 Missing time:0:15:33.856335
  84. Step:8300 Loss:12400078.00 f/b time:0.585920 Missing time:0:14:07.240551
  85. Step:8400 Loss:22363470.00 f/b time:0.586063 Missing time:0:13:08.840475
  86. Step:8500 Loss:20961122.00 f/b time:0.569973 Missing time:0:11:50.186072
  87. Step:8600 Loss:22290648.00 f/b time:0.576955 Missing time:0:11:01.190844
  88. Step:8700 Loss:16602785.00 f/b time:0.571057 Missing time:0:09:57.325380
  89. Step:8800 Loss:11309609.00 f/b time:0.578001 Missing time:0:09:06.788764
  90. Step:8900 Loss:18950684.00 f/b time:0.586943 Missing time:0:08:16.553779
  91. Step:9000 Loss:12000423.00 f/b time:0.567151 Missing time:0:07:03.094774
  92. Step:9100 Loss:37663100.00 f/b time:0.583944 Missing time:0:06:17.227828
  93. Step:9200 Loss:30709348.00 f/b time:0.571136 Missing time:0:05:11.840073

@gtdong-ustc
Copy link
Author

I'm sorry that I made a mistake. First, the loss weights mentioned in paper should be 0, 0.005, 0.01, 0.02,
0.08, 0.32. Second, the loss function used in training should be sum_L2. I will try to train again after change these parameter.

@AlessioTonioni
Copy link
Member

AlessioTonioni commented Mar 6, 2019

But I would like to ask if you have trained MADNet on the Monkaa dataset.

No never tried. The base model was trained on Flying Things 3D

Finally, I would like ask you the size of crop patch when training initial model on the FlyingThings3D

384 x 768

the loss function used in training should be sum_L2

No actually we used sum_l1

@gtdong-ustc
Copy link
Author

Thank you for your answer, I will verify what happens to the experimental results when these parameters change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants