Skip to content

Stable release (stronger baseline) for VisDial challenge 2019.

Latest
Compare
Choose a tag to compare
@kdexd kdexd released this 18 Mar 23:31
· 15 commits to master since this release
13ec28e

Summarizing changes with PR #7:

A few bug fixes and tweaks for a stronger baseline.

This improves MRR from 0.5845 to 0.6155 and NDCG from 0.5070 to 0.5315 on val.

Changes:

  • Switched off dropout during evaluation on val in train.py.
  • Shuffling batches during training (shuffle=True to DataLoader).
  • Explicitly clearing GPU memory cache with torch.cuda.empty_cache(). Negligible time hit on single GPU, and fits batch sizes of up to 32 x no. of GPUs. There's some time gain when training with larger batch sizes.
  • Added a linear learning rate warm up (https://arxiv.org/abs/1706.02677), followed by multi-step decaying.
  • Using a multi-layer LSTM + dropout for the decoder.
  • Switched from dot-product attention to a richer element-wise multiplication + fc layer attention. (The network can learn dot-product attention if it needs to.)