sth strange when trainning #47

liwenssss · 2019-08-30T03:04:03Z

hi, I modify the train.sh as :

python train.py --name resnet_radvani_32000_20190415 --model resnet --netD conv-up --batch_size 4 --max_dataset_size 32000 --niter 20 --niter_decay 50 --save_result_freq 250 --save_epoch_freq 2 --ndown 6 --data_root /home/liwensh/data

and run about 16 hours(44 epoch), and at the end of log.txt shows:

epoch 43 iter 6499:  l1: 11.288669 tv: 1.522304 total: 11.288669
epoch 43 iter 6749:  l1: 11.599895 tv: 0.667862 total: 11.599895
epoch 43 iter 6999:  l1: 11.125267 tv: 1.277602 total: 11.125267
epoch 43 iter 7249:  l1: 11.893361 tv: 1.366742 total: 11.893361
epoch 43 iter 7499:  l1: 11.343329 tv: 1.228081 total: 11.343329
epoch 43 iter 7749:  l1: 11.397069 tv: 1.426213 total: 11.397069
epoch 43 iter 7999:  l1: 11.519998 tv: 0.664876 total: 11.519998
epoch 44 iter 249:  l1: 11.183926 tv: 1.258252 total: 11.183926
epoch 44 iter 499:  l1: 11.555054 tv: 1.201256 total: 11.555054
epoch 44 iter 749:  l1: 12.041154 tv: 1.312884 total: 12.041154
epoch 44 iter 999:  l1: 11.605458 tv: 0.706056 total: 11.605458
epoch 44 iter 1249:  l1: 11.589639 tv: 1.093558 total: 11.589639
epoch 44 iter 1499:  l1: 11.533211 tv: 1.338729 total: 11.533211
epoch 44 iter 1749:  l1: 11.822362 tv: 1.297630 total: 11.822362
epoch 44 iter 1999:  l1: 12.410873 tv: 1.159959 total: 12.410873
epoch 44 iter 2249:  l1: 11.855060 tv: 1.531642 total: 11.855060

the total have not changed much since the 5th epoch.
and the inter output is strange(44 epoch):

I wonder if it it because the batch size is too small, since I have no enough GPU memory. Or maybe other option set I am wrong?

The text was updated successfully, but these errors were encountered:

Lotayou · 2019-08-30T13:58:08Z

The model is definitely crashed... Seems you've activated the tv_loss and set the weight too high that the output tends to be over smooth and crashed at local minimum [1,0,0] and [0,1,1], leading to the blue and yellow pattern... does all predicted uv maps look like this?

I don't know what else did you change in resnet_model.py, but tv_loss is one of the things not to be messed with... Try switching off the tv_loss and see if the training could stablize, let me know if the crash still happens. Good luck

liwenssss · 2019-09-01T05:34:38Z

I just remove dilate when generate uv map and set align_corners is True when upsample.And now when I trained again it got better. I will give the resaults later.

liwenssss · 2019-09-01T11:48:11Z

when trained to epoch 22 iter 2749, the l1 loss was 1.736614 and the predicted uv map is :

but when it got epoch 22 iter 2999, the l1 loss was 8.662540 and the predicted uv map is :

and it become worse then.

Lotayou · 2019-09-01T15:08:17Z

Interesting, looks like the training was successful. Random crash happens frequently in my experiments, could be a bad item in provided toy dataset but I’m not sure either. Anyway, I just load the last successful checkpoint and resume training whenever crash happens, not a big deal.

liwenssss · 2019-09-02T12:18:08Z

yes, I tried to reload the latest better performanced checkpoint and continue to training. But I found the resample resault is ..emmm old question. so I wonder if my trained network make something wrong. I use the provided pre-trained model and got the following predicted uv map:

but the resample resault is ..intresting:

same resample function can generate nearly normal resault from the generated uv map.I wonder maybe something I can do based on the resample resault, just like the paper said to fit the SMPL model.

onepiece666 · 2020-12-20T06:59:03Z

Later, I would report a mistake about every epoch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sth strange when trainning #47

sth strange when trainning #47

liwenssss commented Aug 30, 2019

Lotayou commented Aug 30, 2019 •

edited

liwenssss commented Sep 1, 2019

liwenssss commented Sep 1, 2019

Lotayou commented Sep 1, 2019

liwenssss commented Sep 2, 2019

onepiece666 commented Dec 20, 2020

sth strange when trainning #47

sth strange when trainning #47

Comments

liwenssss commented Aug 30, 2019

Lotayou commented Aug 30, 2019 • edited

liwenssss commented Sep 1, 2019

liwenssss commented Sep 1, 2019

Lotayou commented Sep 1, 2019

liwenssss commented Sep 2, 2019

onepiece666 commented Dec 20, 2020

Lotayou commented Aug 30, 2019 •

edited