Why is the loss of my training on the DeepFashion dataset rising #13

351246241 · 2022-11-21T07:20:09Z

I use DeepFashion Dataset，then run python train.py --name=DPTN_fashion --model=DPTN --dataset_mode=fashion --dataroot=./dataset/fashion --batchSize 32 --gpu_id=0 At first the loss was falling and then rising again
This is my train_opt.txt:
affine: True
batchSize: 8
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: False
data_type: 32
dataroot: ./dataset/fashion
dataset_mode: fashion
debug: False
device: cuda
dis_layers: 4
display_env: DPTNfashion
display_freq: 200
display_id: 0
display_port: 8096
display_single_pane_ncols: 0
display_winsize: 512
feat_num: 3
fineSize: 512
fp16: False
gan_mode: lsgan
gpu_ids: [0]
image_nc: 3
init_type: orthogonal
input_nc: 3
instance_feat: False
isTrain: True
iter_start: 0
label_feat: False
label_nc: 35
lambda_content: 0.25
lambda_feat: 10.0
lambda_g: 2.0
lambda_rec: 2.5
lambda_style: 250
layers_g: 3
loadSize: 256
load_features: False
load_pretrain:
load_size: 256
local_rank: 0
lr: 0.0002
lr_policy: lambda
max_dataset_size: inf
model: DPTN
nThreads: 2
n_clusters: 10
n_downsample_E: 4
n_layers_D: 3
name: DPTN_fashion
ndf: 64
nef: 16
nhead: 2
niter: 100
niter_decay: 100
no_flip: False
no_ganFeat_loss: False
no_html: False
no_instance: False
no_vgg_loss: False
norm: instance
num_CABs: 2
num_D: 1
num_TTBs: 2
num_blocks: 3
old_size: (256, 176)
output_nc: 3
phase: train
pool_size: 0
pose_nc: 18
print_freq: 200
ratio_g2d: 0.1
resize_or_crop: scale_width
save_epoch_freq: 1
save_input: False
save_latest_freq: 1000
serial_batches: False
structure_nc: 18
t_s_ratio: 0.5
tf_log: False
use_coord: False
use_dropout: False
use_spect_d: True
use_spect_g: False
verbose: False
which_epoch: latest

Do I need to continue training or do I stop to change the parameters.
Thank You！

PangzeCheung · 2022-11-22T07:03:27Z

@351246241 Thanks for your question. According to your loss curve, it might be that the nash equilibrium of your model is collapsed. I have not encountered such a problem in my DPTN training, and it is also difficult for me to judge the cause of this based on these limited information. Maybe you can try to retrain your model, or modify your learning rate, loss weights, etc. to solve this problem.

351246241 · 2022-11-22T08:41:45Z

@PangzeCheung Thank you for your reply.If you can, I hope you can share the hyperparameters you use for training, such as learning rate, loss weights, etc.Thank you very much！

PangzeCheung · 2022-11-22T13:09:01Z

@351246241 All the hyperparameters in our pretrained model are the same as the default hyperparameters in our open-source code. However, we found that your 'lambda_content', 'lambda_style' and 'lambda_rec' are half of the defaults. The 'lambda_content', 'lambda_style' and 'lambda_rec' in our code are the sum of the weights of the dual tasks, while the $\lambda_{l_1}$, $\lambda_{perc}$, $\lambda_{style}$ in our paper are the weights for every task. Therefore, you can directly use our default parameters to train the DPTN. I hope this may help you~

PangzeCheung closed this as completed Nov 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the loss of my training on the DeepFashion dataset rising #13

Why is the loss of my training on the DeepFashion dataset rising #13

351246241 commented Nov 21, 2022

PangzeCheung commented Nov 22, 2022

351246241 commented Nov 22, 2022

PangzeCheung commented Nov 22, 2022 •

edited

Why is the loss of my training on the DeepFashion dataset rising #13

Why is the loss of my training on the DeepFashion dataset rising #13

Comments

351246241 commented Nov 21, 2022

PangzeCheung commented Nov 22, 2022

351246241 commented Nov 22, 2022

PangzeCheung commented Nov 22, 2022 • edited

PangzeCheung commented Nov 22, 2022 •

edited