Unstable training #1

alwynmathew · 2018-06-22T08:15:21Z

Results with lr=1e-3

Epoch 1:

Epoch 2:

Epoch 5:

Disparities are degrading as I train more.

The text was updated successfully, but these errors were encountered:

NikolasEnt · 2018-06-22T09:00:54Z

Hello, @alwynmathew!
What data and hyperparameters do you use for training?
I'd like to reproduce the issue.

alwynmathew · 2018-06-22T09:02:44Z

I used exact same data and hyperparameters you have used in the demo notebook.

NikolasEnt · 2018-06-22T09:20:06Z

Ok, I'll examine it. There were some issues with the parameters.

Meanwhile you can play with our pretrained model - it was trained for 75 epochs with lr=1e-2; batch=20 (You may try to learn with the parameters). Or a better one - it was trained for extra 35 epochs with lr=1e-4 (with the first model as pretrain).

alwynmathew · 2018-06-22T09:25:16Z

I have also ported original monodepth to pytorch here. I faced the exact same issue that the disparities get degraded after few epochs. What are the hyperparameters you used to get better results?

NikolasEnt · 2018-06-22T09:29:52Z

lr=1e-2; batch=20 is good enough (see above) for our implementation

alwynmathew · 2018-06-22T10:34:24Z

@NikolasEnt batch_size= 20 is too big.
Epoch 10 with batch_size = 8 and lr=1e-2

Epoch 16

Its still unstable.

voeykovroman · 2018-06-25T09:50:14Z

Hello, @alwynmathew!

Once again. Are you sure you downloaded correct dataset as described here?
Because there are 38237 images and you attached your results for the first 10 epochs after approximately an hour after @NikolasEnt answered you about lr. And you used smaller batch size what means that you will need more time for training than we do while we needed something about 45 minutes for 1 epoch using single GTX1080 Ti. It looks like something wrong with your data or equipment.
During this week we will try to reproduce our training using exactly this repo without any changes downloaded on new machine and publish disparities we get after 10 epochs in this thread.

alwynmathew · 2018-06-25T14:48:23Z

Hi @Sparkling-Brick, according to the notebook provided in the repo, the data loader is just loading from one of the kitti dataset subfolder 'data_dir':'../../2011_09_26/' .

voeykovroman · 2018-06-25T14:57:06Z

Yes. Thanks for noting it, the path was changed to check whether notebook working or not before publishing. However if you noticed that path in the notebook is just for one subfolder you should be able to change it to load the whole dataset.
Moreover if you read our README you would notice the structure of data and path variable are described here.

alwynmathew · 2018-06-26T05:34:54Z

But @Sparkling-Brick do you think just added more data will solve the problem? The original implementation used batch size as small as 8, why do you recommend higher batch size?

NikolasEnt · 2018-06-27T22:06:39Z

Hi, @alwynmathew, we retrain our model from scratch with the following parameters:

'model':'resnet18_md',
'learning_rate':1e-2,
'batch_size':8,
'adjust_lr':True,
'do_augmentation':True,
'augment_parameters':[0.8, 1.2, 0.5, 2.0, 0.8, 1.2],

Here it is the result. Obviously, it should be trained further, however, it is stable with lr=1e-2 and batch size 8.

Full Kitti dataset from the original repo was utilized for training.

alwynmathew · 2018-06-29T04:32:43Z

Thank you @NikolasEnt for the effort of training the model from scratch.

I do get perfect disparity map for selected images but it doesn't seem to applies to all of the kitti test images even after training for 17 epoch with same parameters on full kitti dataset.

'model':'resnet18_md',
'learning_rate':1e-2,
'batch_size':8,
'adjust_lr':True,
'do_augmentation':True,
'augment_parameters':[0.8, 1.2, 0.5, 2.0, 0.8, 1.2]

Is it just me or do you face the same problem?

Reconstructed images and corresponding disparities during my training:

NikolasEnt · 2018-07-03T20:24:55Z

Hi, @alwynmathew. It looks like the second original raw image has some issues. They may be due to video->image transformation process or image encoding in the dataset. Personally, I didn't observe such examples, however, I didn't exam the whole dataset.
Ideally, such images should be excluded from train/val subsets.

alwynmathew mentioned this issue Jun 22, 2018

Ported monodepth instability (pytorch) mrharicot/monodepth#168

Closed

NikolasEnt closed this as completed Jun 28, 2018

C2H5OHlife mentioned this issue Oct 17, 2018

Training becomes slower and slower after epochs and results get deteriorated #10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unstable training #1

Unstable training #1

alwynmathew commented Jun 22, 2018 •

edited

NikolasEnt commented Jun 22, 2018

alwynmathew commented Jun 22, 2018

NikolasEnt commented Jun 22, 2018 •

edited

alwynmathew commented Jun 22, 2018

NikolasEnt commented Jun 22, 2018

alwynmathew commented Jun 22, 2018 •

edited

voeykovroman commented Jun 25, 2018

alwynmathew commented Jun 25, 2018

voeykovroman commented Jun 25, 2018

alwynmathew commented Jun 26, 2018 •

edited

NikolasEnt commented Jun 27, 2018

alwynmathew commented Jun 29, 2018 •

edited

NikolasEnt commented Jul 3, 2018

Unstable training #1

Unstable training #1

Comments

alwynmathew commented Jun 22, 2018 • edited

NikolasEnt commented Jun 22, 2018

alwynmathew commented Jun 22, 2018

NikolasEnt commented Jun 22, 2018 • edited

alwynmathew commented Jun 22, 2018

NikolasEnt commented Jun 22, 2018

alwynmathew commented Jun 22, 2018 • edited

voeykovroman commented Jun 25, 2018

alwynmathew commented Jun 25, 2018

voeykovroman commented Jun 25, 2018

alwynmathew commented Jun 26, 2018 • edited

NikolasEnt commented Jun 27, 2018

alwynmathew commented Jun 29, 2018 • edited

NikolasEnt commented Jul 3, 2018

alwynmathew commented Jun 22, 2018 •

edited

NikolasEnt commented Jun 22, 2018 •

edited

alwynmathew commented Jun 22, 2018 •

edited

alwynmathew commented Jun 26, 2018 •

edited

alwynmathew commented Jun 29, 2018 •

edited