Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is the loss of my training on the DeepFashion dataset rising #13

Closed
351246241 opened this issue Nov 21, 2022 · 3 comments
Closed

Why is the loss of my training on the DeepFashion dataset rising #13

351246241 opened this issue Nov 21, 2022 · 3 comments

Comments

@351246241
Copy link

I use DeepFashion Dataset,then run python train.py --name=DPTN_fashion --model=DPTN --dataset_mode=fashion --dataroot=./dataset/fashion --batchSize 32 --gpu_id=0 At first the loss was falling and then rising again
This is my train_opt.txt:
affine: True
batchSize: 8
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: False
data_type: 32
dataroot: ./dataset/fashion
dataset_mode: fashion
debug: False
device: cuda
dis_layers: 4
display_env: DPTNfashion
display_freq: 200
display_id: 0
display_port: 8096
display_single_pane_ncols: 0
display_winsize: 512
feat_num: 3
fineSize: 512
fp16: False
gan_mode: lsgan
gpu_ids: [0]
image_nc: 3
init_type: orthogonal
input_nc: 3
instance_feat: False
isTrain: True
iter_start: 0
label_feat: False
label_nc: 35
lambda_content: 0.25
lambda_feat: 10.0
lambda_g: 2.0
lambda_rec: 2.5
lambda_style: 250
layers_g: 3
loadSize: 256
load_features: False
load_pretrain:
load_size: 256
local_rank: 0
lr: 0.0002
lr_policy: lambda
max_dataset_size: inf
model: DPTN
nThreads: 2
n_clusters: 10
n_downsample_E: 4
n_layers_D: 3
name: DPTN_fashion
ndf: 64
nef: 16
nhead: 2
niter: 100
niter_decay: 100
no_flip: False
no_ganFeat_loss: False
no_html: False
no_instance: False
no_vgg_loss: False
norm: instance
num_CABs: 2
num_D: 1
num_TTBs: 2
num_blocks: 3
old_size: (256, 176)
output_nc: 3
phase: train
pool_size: 0
pose_nc: 18
print_freq: 200
ratio_g2d: 0.1
resize_or_crop: scale_width
save_epoch_freq: 1
save_input: False
save_latest_freq: 1000
serial_batches: False
structure_nc: 18
t_s_ratio: 0.5
tf_log: False
use_coord: False
use_dropout: False
use_spect_d: True
use_spect_g: False
verbose: False
which_epoch: latest

image
Do I need to continue training or do I stop to change the parameters.
Thank You!

@PangzeCheung
Copy link
Owner

@351246241 Thanks for your question. According to your loss curve, it might be that the nash equilibrium of your model is collapsed. I have not encountered such a problem in my DPTN training, and it is also difficult for me to judge the cause of this based on these limited information. Maybe you can try to retrain your model, or modify your learning rate, loss weights, etc. to solve this problem.

@351246241
Copy link
Author

@PangzeCheung Thank you for your reply.If you can, I hope you can share the hyperparameters you use for training, such as learning rate, loss weights, etc.Thank you very much!

@PangzeCheung
Copy link
Owner

PangzeCheung commented Nov 22, 2022

@351246241 All the hyperparameters in our pretrained model are the same as the default hyperparameters in our open-source code. However, we found that your 'lambda_content', 'lambda_style' and 'lambda_rec' are half of the defaults. The 'lambda_content', 'lambda_style' and 'lambda_rec' in our code are the sum of the weights of the dual tasks, while the $\lambda_{l_1}$, $\lambda_{perc}$, $\lambda_{style}$ in our paper are the weights for every task. Therefore, you can directly use our default parameters to train the DPTN. I hope this may help you~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants