Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ignoring input and redirecting stderr to stdout" when fine tuning image captioning #190

Closed
teenaxta opened this issue Aug 2, 2022 · 9 comments
Assignees

Comments

@teenaxta
Copy link

teenaxta commented Aug 2, 2022

I'm trying to follow the fine-tuning steps for captioning as listed in readme.md. However my output is just blank and once i hit enter, it exits. Pretraining worked fine, it's fine-tuning thats not working at all. Any idea on what might be causing this issue
My GPU has 8GB vram

image

@logicwong
Copy link
Member

@teenaxta You can check the two files train_stage1.out and stage1_logs/5_0.06_6000.log to see logs.

@teenaxta
Copy link
Author

teenaxta commented Aug 5, 2022

here's what my logs are saying

train_stage1.out:

max_epoch {2,}
warmup_ratio {0.06,}
drop_worst_after {2500,}

train_stage2.out:

lr {1e-5,}
max_epoch {3,}

@logicwong
Copy link
Member

@teenaxta What about the files under stage1_logs? There should be more detailed logs.

@teenaxta
Copy link
Author

teenaxta commented Aug 5, 2022

@logicwong
Copy link
Member

@teenaxta It seems that the specified GPU number is wrong, try the following script?

#!/usr/bin/env

# The port for communication. Note that if you want to run multiple tasks on the same machine,
# you need to specify different port numbers.
export MASTER_PORT=1061

log_dir=./stage1_logs
save_dir=./stage1_checkpoints
mkdir -p $log_dir $save_dir

bpe_dir=../../utils/BPE
user_dir=../../ofa_module

data_dir=../../dataset/caption_data
data=${data_dir}/caption_stage1_train.tsv,${data_dir}/caption_val.tsv
restore_file=../../checkpoints/ofa_base.pt
selected_cols=0,4,2

task=caption
arch=ofa_base
criterion=adjust_label_smoothed_cross_entropy
label_smoothing=0.1
lr=1e-5
max_epoch=5
warmup_ratio=0.06
batch_size=8
update_freq=4
resnet_drop_path_rate=0.0
encoder_drop_path_rate=0.1
decoder_drop_path_rate=0.1
dropout=0.1
attention_dropout=0.0
max_src_length=80
max_tgt_length=20
num_bins=1000
patch_image_size=480
drop_worst_after=6000
eval_cider_cached=${data_dir}/cider_cached_tokens/coco-valid-words.p
drop_worst_ratio=0.2


log_file=${log_dir}/${max_epoch}"_"${warmup_ratio}"_"${drop_worst_after}".log"
save_path=${save_dir}/${max_epoch}"_"${warmup_ratio}"_"${drop_worst_after}
mkdir -p $save_path

CUDA_VISIBLE_DEVICES=0 python3 ../../train.py \
  $data \
  --selected-cols=${selected_cols} \
  --bpe-dir=${bpe_dir} \
  --user-dir=${user_dir} \
  --restore-file=${restore_file} \
  --reset-optimizer --reset-dataloader --reset-meters \
  --save-dir=${save_path} \
  --task=${task} \
  --arch=${arch} \
  --criterion=${criterion} \
  --label-smoothing=${label_smoothing} \
  --batch-size=${batch_size} \
  --update-freq=${update_freq} \
  --encoder-normalize-before \
  --decoder-normalize-before \
  --share-decoder-input-output-embed \
  --share-all-embeddings \
  --layernorm-embedding \
  --patch-layernorm-embedding \
  --code-layernorm-embedding \
  --resnet-drop-path-rate=${resnet_drop_path_rate} \
  --encoder-drop-path-rate=${encoder_drop_path_rate} \
  --decoder-drop-path-rate=${decoder_drop_path_rate} \
  --dropout=${dropout} \
  --attention-dropout=${attention_dropout} \
  --weight-decay=0.01 --optimizer=adam --adam-betas="(0.9,0.999)" --adam-eps=1e-08 --clip-norm=1.0 \
  --lr-scheduler=polynomial_decay --lr=${lr} \
  --max-epoch=${max_epoch} --warmup-ratio=${warmup_ratio} \
  --log-format=simple --log-interval=10 \
  --fixed-validation-seed=7 \
  --no-epoch-checkpoints --keep-best-checkpoints=1 \
  --save-interval=1 --validate-interval=1 \
  --save-interval-updates=500 --validate-interval-updates=500 \
  --eval-cider \
  --eval-cider-cached-tokens=${eval_cider_cached} \
  --eval-args='{"beam":5,"max_len_b":16,"no_repeat_ngram_size":3}' \
  --best-checkpoint-metric=cider --maximize-best-checkpoint-metric \
  --max-src-length=${max_src_length} \
  --max-tgt-length=${max_tgt_length} \
  --find-unused-parameters \
  --freeze-encoder-embedding \
  --freeze-decoder-embedding \
  --add-type-embedding \
  --scale-attn \
  --scale-fc \
  --scale-heads \
  --disable-entangle \
  --num-bins=${num_bins} \
  --patch-image-size=${patch_image_size} \
  --drop-worst-ratio=${drop_worst_ratio} \
  --drop-worst-after=${drop_worst_after} \
  --fp16 \
  --fp16-scale-window=512 \
  --num-workers=0

@teenaxta
Copy link
Author

teenaxta commented Aug 5, 2022

@logicwong i was able to run train_caption_stage1.sh with this script but after an hour or so I got the following output
image

image

@logicwong
Copy link
Member

@teenaxta That means the GPU memory is not enough, you should decrease the batch_size or patch_image_size

@teenaxta
Copy link
Author

teenaxta commented Aug 6, 2022

@logicwong so i now changed the batch size to 2 and patch size to 200. Here's the output.
image

@logicwong
Copy link
Member

@teenaxta Try adding --freeze-resnet , increasing --validate-interval-updates and --save-interval-updates. In addition, setting patch_image_size to 200 may be too small, you can try setting it to 384 and training with more GPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants