"ignoring input and redirecting stderr to stdout" when fine tuning image captioning #190

teenaxta · 2022-08-02T07:57:06Z

I'm trying to follow the fine-tuning steps for captioning as listed in readme.md. However my output is just blank and once i hit enter, it exits. Pretraining worked fine, it's fine-tuning thats not working at all. Any idea on what might be causing this issue
My GPU has 8GB vram

logicwong · 2022-08-03T03:34:53Z

@teenaxta You can check the two files train_stage1.out and stage1_logs/5_0.06_6000.log to see logs.

teenaxta · 2022-08-05T04:25:32Z

here's what my logs are saying

train_stage1.out:

max_epoch {2,}
warmup_ratio {0.06,}
drop_worst_after {2500,}

train_stage2.out:

lr {1e-5,}
max_epoch {3,}

logicwong · 2022-08-05T05:43:59Z

@teenaxta What about the files under stage1_logs? There should be more detailed logs.

teenaxta · 2022-08-05T05:48:40Z

I am sharing these detailed log files

2_0.06_2500.log
{5,}{0.06,}{6000,}.log
{2,}{0.06,}{2500,}el0.75.log
{2,}{0.06,}{2500,}.log

logicwong · 2022-08-05T06:01:42Z

@teenaxta It seems that the specified GPU number is wrong, try the following script?

#!/usr/bin/env

# The port for communication. Note that if you want to run multiple tasks on the same machine,
# you need to specify different port numbers.
export MASTER_PORT=1061

log_dir=./stage1_logs
save_dir=./stage1_checkpoints
mkdir -p $log_dir $save_dir

bpe_dir=../../utils/BPE
user_dir=../../ofa_module

data_dir=../../dataset/caption_data
data=${data_dir}/caption_stage1_train.tsv,${data_dir}/caption_val.tsv
restore_file=../../checkpoints/ofa_base.pt
selected_cols=0,4,2

task=caption
arch=ofa_base
criterion=adjust_label_smoothed_cross_entropy
label_smoothing=0.1
lr=1e-5
max_epoch=5
warmup_ratio=0.06
batch_size=8
update_freq=4
resnet_drop_path_rate=0.0
encoder_drop_path_rate=0.1
decoder_drop_path_rate=0.1
dropout=0.1
attention_dropout=0.0
max_src_length=80
max_tgt_length=20
num_bins=1000
patch_image_size=480
drop_worst_after=6000
eval_cider_cached=${data_dir}/cider_cached_tokens/coco-valid-words.p
drop_worst_ratio=0.2


log_file=${log_dir}/${max_epoch}"_"${warmup_ratio}"_"${drop_worst_after}".log"
save_path=${save_dir}/${max_epoch}"_"${warmup_ratio}"_"${drop_worst_after}
mkdir -p $save_path

CUDA_VISIBLE_DEVICES=0 python3 ../../train.py \
  $data \
  --selected-cols=${selected_cols} \
  --bpe-dir=${bpe_dir} \
  --user-dir=${user_dir} \
  --restore-file=${restore_file} \
  --reset-optimizer --reset-dataloader --reset-meters \
  --save-dir=${save_path} \
  --task=${task} \
  --arch=${arch} \
  --criterion=${criterion} \
  --label-smoothing=${label_smoothing} \
  --batch-size=${batch_size} \
  --update-freq=${update_freq} \
  --encoder-normalize-before \
  --decoder-normalize-before \
  --share-decoder-input-output-embed \
  --share-all-embeddings \
  --layernorm-embedding \
  --patch-layernorm-embedding \
  --code-layernorm-embedding \
  --resnet-drop-path-rate=${resnet_drop_path_rate} \
  --encoder-drop-path-rate=${encoder_drop_path_rate} \
  --decoder-drop-path-rate=${decoder_drop_path_rate} \
  --dropout=${dropout} \
  --attention-dropout=${attention_dropout} \
  --weight-decay=0.01 --optimizer=adam --adam-betas="(0.9,0.999)" --adam-eps=1e-08 --clip-norm=1.0 \
  --lr-scheduler=polynomial_decay --lr=${lr} \
  --max-epoch=${max_epoch} --warmup-ratio=${warmup_ratio} \
  --log-format=simple --log-interval=10 \
  --fixed-validation-seed=7 \
  --no-epoch-checkpoints --keep-best-checkpoints=1 \
  --save-interval=1 --validate-interval=1 \
  --save-interval-updates=500 --validate-interval-updates=500 \
  --eval-cider \
  --eval-cider-cached-tokens=${eval_cider_cached} \
  --eval-args='{"beam":5,"max_len_b":16,"no_repeat_ngram_size":3}' \
  --best-checkpoint-metric=cider --maximize-best-checkpoint-metric \
  --max-src-length=${max_src_length} \
  --max-tgt-length=${max_tgt_length} \
  --find-unused-parameters \
  --freeze-encoder-embedding \
  --freeze-decoder-embedding \
  --add-type-embedding \
  --scale-attn \
  --scale-fc \
  --scale-heads \
  --disable-entangle \
  --num-bins=${num_bins} \
  --patch-image-size=${patch_image_size} \
  --drop-worst-ratio=${drop_worst_ratio} \
  --drop-worst-after=${drop_worst_after} \
  --fp16 \
  --fp16-scale-window=512 \
  --num-workers=0

teenaxta · 2022-08-05T09:40:06Z

@logicwong i was able to run train_caption_stage1.sh with this script but after an hour or so I got the following output

logicwong · 2022-08-05T14:00:28Z

@teenaxta That means the GPU memory is not enough, you should decrease the batch_size or patch_image_size

teenaxta · 2022-08-06T03:42:17Z

@logicwong so i now changed the batch size to 2 and patch size to 200. Here's the output.

logicwong · 2022-08-06T03:51:17Z

@teenaxta Try adding --freeze-resnet , increasing --validate-interval-updates and --save-interval-updates. In addition, setting patch_image_size to 200 may be too small, you can try setting it to 384 and training with more GPUs.

JustinLin610 assigned logicwong Aug 3, 2022

JustinLin610 closed this as completed Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"ignoring input and redirecting stderr to stdout" when fine tuning image captioning #190

"ignoring input and redirecting stderr to stdout" when fine tuning image captioning #190

teenaxta commented Aug 2, 2022

logicwong commented Aug 3, 2022

teenaxta commented Aug 5, 2022

logicwong commented Aug 5, 2022

teenaxta commented Aug 5, 2022 •

edited

logicwong commented Aug 5, 2022

teenaxta commented Aug 5, 2022 •

edited

logicwong commented Aug 5, 2022

teenaxta commented Aug 6, 2022

logicwong commented Aug 6, 2022

Navigation Menu

"ignoring input and redirecting stderr to stdout" when fine tuning image captioning #190

"ignoring input and redirecting stderr to stdout" when fine tuning image captioning #190

Comments

teenaxta commented Aug 2, 2022

logicwong commented Aug 3, 2022

teenaxta commented Aug 5, 2022

logicwong commented Aug 5, 2022

teenaxta commented Aug 5, 2022 • edited

logicwong commented Aug 5, 2022

teenaxta commented Aug 5, 2022 • edited

logicwong commented Aug 5, 2022

teenaxta commented Aug 6, 2022

logicwong commented Aug 6, 2022

teenaxta commented Aug 5, 2022 •

edited

teenaxta commented Aug 5, 2022 •

edited