VQA-Rad with 🤗 BERT

BERT-version Bilinear Attn Networks on VQA-Rad
⚠️ Very quick revision (done in 5 days 😅) so the overall code structure may look ugly. Thank you for your understanding and if you find any bugs, make a PR or open an issue.

Model Architecture

Bilinear Attn Networks: BERT-version
- Downstream tasks: VQA-Rad
- Revised based on sarahESL/PubMedCLIP (2021)
- Explanation: Original BAN model uses normal nn.Embeddings initialized with glove 300d and GRU as text encoder.
Use pretrained Bio-Clinical BERT.
Train with 2 optimizers, because BAN and BERT require very different learning rates.

Performance

Experiment stats show that pretrained CLIP visual encoder RN50x4 with our BERT-BAN and the preprocessed image outperforms the original PubMedCLIP ($71.62% \rightarrow 73.17%$). With original images, it achieves $72.28%$. Note that the $71.62%$ is our reproduced score of paper instead of paper's score ($71.8%$).
For more details, see 2023 MIS: Final Presentation Slides.
⚠️ It is likely that some settings could still be changed to make the performance better.

Running Experiments

Download Data

From Awenbocc/med-vqa/data you can find the images and img pickles.
If you'd like to pickle the data from images on your own:
- Open lib/utils/run.sh.
- Configure the IMAGEPATH.
- Run thecreate_resized_images.py lines to put the new image pickles underDATARADPATH.
- The VQA script reads the image pickles from yourDATARADPATH so be sure they are placed correctly.

Prepare an Answer-type Classifier (closed/open)

This classifier is used in validation period, where a question is classified into Open or Close, and then sent to different answer pools for the 2nd stage answer classififcation.
Please download and unzip type_classifier_rad_biobert_2023Jun03-155924.pth.zip for a pretrained type classifier. The BERT model for this type classifier checkpoint is emilyalsentzer/Bio_ClinicalBERT.
If the type classifier is corrupted (it seems that uploading it anywhere corrupts it, only scp resolves the issue), run type_classifier.py in the repo again to train a new one.
⚠️ The config passed should be the one you will be using in the VQA training. Specifically, make sure the config variable DATASET/EMBEDDER_MODEL is consistent with the following experiments' config so that their vocab sizes match.
⚠️ If you'd like to try out other BERT-based models, feel free to change config variable DATASET/EMBEDDER_MODEL to another huggingface model name, and then train and use your own type classifier.

Run Training

Create a virtual env and then pip install -r requirements.txt.
Install torch series packages following start locally|Pytorch.
Open a config that you'd like to use and check:
- For TRAIN.VISION.CLIP_PATH, download the pretrained clip visual encoders here. Read SarahESL/PubMedCLIP/PubMedCLIP/README.md formore details.
- Change DATASET.DATA_DIR to your dataset's path.
Copy the essentials to this folder from SarahESL/PubMedCLIP/QCR_PubMedCLIP if anything is missing.
Run python3 main.py --cfg={config_path}

Notes

Be sure to use modified configs, namely configs/qcr_pubmedclip{visual_encoder_name}_ae_rad_nondeterministic_typeatt_2lrs.yaml.
The changed files from BAN to BERT-BAN are:
- configs/
- lib/config/default.py
- lib/BAN/multi_level_model.py
- lib/lngauge/classify_question.py
- lib/lngauge/language_model.py
- lib/dataset/dataset_RAD_bert.py
- (May be more)
Beware of your disk space because 1 model checkpoint is roughly 3.6 GB; once your disk space is full the training stops.

Testing (Unsupported now)

We haven't written the test script (supposed to be used for creating the validation file). main/test.py is used for testing in original repo, so you could modify the eval loop by following main/train.py, which should be workable.

Extra

Make a PR or open an issue for your questions and we may (or may not) deal with it if we find time.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
configs		configs
data_rad		data_rad
embedder		embedder
lib		lib
main		main
output/qcr_biobert_refinedimgs_typeatt_2lrs		output/qcr_biobert_refinedimgs_typeatt_2lrs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
runRN50x4.sh		runRN50x4.sh
runViT.sh		runViT.sh
type_classifier.py		type_classifier.py

License

Nana2929/vqa-rad-with-bert

Folders and files

Latest commit

History

Repository files navigation

VQA-Rad with 🤗 BERT

Model Architecture

Performance

Running Experiments

Download Data

Prepare an Answer-type Classifier (closed/open)

Run Training

Notes

Testing (Unsupported now)

Extra

About

Topics

Resources

License

Stars

Watchers

Forks

Languages