Source: https://github.com/abachaa/VQA-Med-2019

# VQA-Med-2019

- Website: https://www.imageclef.org/2019/medical/vqa/
- Mailing list: https://groups.google.com/d/forum/imageclef-vqa-med 
- Paper: http://ceur-ws.org/Vol-2380/paper_272.pdf
- Results of the VQA-Med-2019 challenge on crowdAI: https://www.crowdai.org/challenges/imageclef-2019-vqa-med/leaderboards 

Task:
-------------------
VQA-Med 2019 focused on radiology images and four main categories of questions: Modality, Plane, Organ system and Abnormality. These categories are designed with different degrees of difficulty leveraging both classification and text generation approaches. In this second edition of the VQA challenge, we targeted medical questions asking about one element only (e.g. what is the organ principally shown in this mri? in what plane is this mammograph taken? is this a t1 weighted, t2 weighted, or flair image? what is most alarming about this ultrasound?), and that can be answered from the image content without requiring additional medical knowledge or domain-specific inference.  

VQA-Med-2019 Data:
-------------------
The VQA-Med-2019 dataset includes a training set of 3,200 medical images with 12,792 Question-Answer (QA) pairs, a validation set of 500 medical images with 2,000 QA pairs, and a test set of 500 medical images with 500 questions. 

1) The training set: https://github.com/abachaa/VQA-Med-2019/blob/master/ImageClef-2019-VQA-Med-Training.zip  

2) The validation set: https://github.com/abachaa/VQA-Med-2019/blob/master/ImageClef-2019-VQA-Med-Validation.zip

3) The VQA-Med-2019 test set and the reference answers are available here: https://github.com/abachaa/VQA-Med-2019/tree/master/VQAMed2019Test  

Please see the readme files for more detailed information about the dataset and the categories of questions and answers:
https://github.com/abachaa/VQA-Med-2019/blob/master/README-VQA-Med-2019-Data.txt
https://github.com/abachaa/VQA-Med-2019/blob/master/VQAMed2019Test/README-VQA-Med-2019-TestSet.txt

Reference: 
-------------------

If you use the VQA-Med 2019 dataset, please cite our paper:
"VQA-Med: Overview of the Medical Visual Question Answering Task at ImageCLEF 2019". Asma Ben Abacha, Sadid A. Hasan, Vivek V. Datla, Joey Liu, Dina Demner-Fushman, Henning Müller. CLEF 2019 Working Notes.  

@Inproceedings{ImageCLEFVQA-Med2019,

        author = {Asma {Ben Abacha} and Sadid A. Hasan and Vivek V. Datla and Joey Liu and Dina Demner-Fushman and Henning M\"uller},
        title = {VQA-Med: Overview of the Medical Visual Question Answering Task at ImageCLEF 2019},
        
        booktitle = {CLEF 2019 Working Notes},
        
        series = {{CEUR} Workshop Proceedings},
        
        year = {2019},
        
        publisher = {CEUR-WS.org $<$http://ceur-ws.org$>$},
        
        month = {September 9-12},
        
        address = {Lugano, Switzerland}
        }
        
 Contact Information
 -------------------
Asma Ben Abacha: asma.benabacha@nih.gov   https://sites.google.com/site/asmabenabacha/


In [None]:
import os
from IPython.display import Image as display_image 

display_image(filename=os.path.join('references', 'leaderboards.png')) 

In [None]:
# !pip install tensorflow_gpu sentence-transformers matplotlib nltk nlpaug scikit-learn pandas ipywidgets opencv-python

In [None]:
import tensorflow as tf
from tensorflow import keras
from sentence_transformers import SentenceTransformer
import nlpaug.augmenter.word as naw
import nltk

physical_devices = tf.config.list_physical_devices('GPU')
print(physical_devices)
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)


stopwords = set(nltk.corpus.stopwords.words('english'))
aug = naw.SynonymAug(aug_src='wordnet', stopwords=stopwords)
sbert_model = SentenceTransformer('distilbert-base-nli-mean-tokens')

In [None]:
import os
import json
import random
from typing import List

import cv2
import numpy as np
import pandas as pd
from sklearn.utils import shuffle
import matplotlib as mpl
import matplotlib.pyplot as plt

In [None]:
seed = 42

In [None]:
train_val_set = {
    'train': {},
    'val': {},
}

df = pd.read_csv('train_val.csv')

for key in ['id', 'category', 'question', 'answer', 'part']:
    df[key] = df[key].str.lower().str.strip()

categories = set(df['category'].unique())
cat_map = dict([(cat, idx) for (idx, cat) in enumerate(categories)])
cat_map_rev = dict([(idx, cat) for (idx, cat) in enumerate(categories)])

# prepare text data

def augment_df(
    df: pd.DataFrame,
    aug_loops: int = 4,
    col: str = 'question',
    key_skip=None
) -> pd.DataFrame:
    records = df.to_dict('records')
    augmented = []
    
    count = len(records)
    
    augment = aug.augment
    augmented_add = augmented.append
    for (row_idx, row) in enumerate(records):
        print(f'\rAugmenting {row_idx + 1} / {count}', end='')
        if key_skip(row):
            continue
        for _ in range(aug_loops):
            new_row = row.copy()
            new_row[col] = augment(str(row[col]))
            augmented_add(new_row)
    print('')
    records.extend(augmented)
    return pd.DataFrame(records)

def create_embedding(arr: List[str], key=None):
    if key is None:
        res = sbert_model.encode(arr)
    else:
        res = sbert_model.encode([key(x) for x in arr])
    res = res.reshape(len(res), 64, 12)
    res = np.array(res, dtype=np.float64)
    return res

def avg(arr, round_key=lambda x: x):
    return round_key(sum(arr) / len(arr))

name_labels_map = {}
for (row_idx, row) in df.iterrows():
    name = row['id']
    cat = row['category']
    assert(cat in categories), 'unknown category in df'    
    if name not in name_labels_map:
        name_labels_map[name] = set()
    name_labels_map[name].add(cat)


def get_part(_df, part: str) -> List[dict]:
    res = _df[_df['part'] == part]
    res = shuffle(res, random_state=seed)
    res = res.to_dict('records')
    return res

augdf = augment_df(df, key_skip=lambda x: x['part'] == 'val')
train = get_part(augdf, 'train')
val = get_part(augdf, 'val')

print('Encoding data')
for part in [train, val]:
    for (idx, elem) in enumerate(part):
        txt = [elem['question']]
        part[idx]['embedding'] = create_embedding(txt)[0]

# X_train, y_train, X_val, y_val = 

In [None]:
# load images into memory
image_map = {}
image_folder = os.path.join('.', 'images')

image_shapes = []
average_shape = None

# get all image shapes to find the average shape
for filename in os.listdir(image_folder):
    image_path = os.path.join(image_folder, filename)
    image_shapes.append(cv2.imread(image_path).shape)
    
x, y, z = zip(*image_shapes)
x, y, z = avg(x, round), avg(y, round), avg(z, round)
average_shape = (x, y, z)

for filename in os.listdir(image_folder):
    image_path = os.path.join(image_folder, filename)
    name, _ = os.path.splitext(os.path.basename(image_path))
    assert(os.path.isfile(image_path)), 'Missing image'
    image_map[name] = cv2.imread(image_path)
    image_map[name] = cv2.resize(image_map[name], average_shape[:-1])
    image_map[name] = np.array(image_map[name])


In [None]:
img_input_shape = average_shape[:-1]
txt_input_shape = train[0]['embedding'].shape
print(f'{img_input_shape=}; {txt_input_shape=}')
img_input_shape+txt_input_shape

In [None]:
imf = image_map[list(image_map.keys())[random.randint(0, len(image_map.keys()))]]
plt.imshow(imf, cmap='binary')
plt.axis('off')
plt.show()

In [None]:
image_model = keras.applications.resnet50.ResNet50(weights='imagenet')

In [None]:
num_hidden_units = 1024
num_hidden_layers = 3
batch_size = 128
dropout = 0.5
activation = 'tanh'
img_dim = 4096
word2vec_dim = 300
num_epochs = 100
nb_classes = 12

In [None]:
from tensorflow.keras import Sequential
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Dropout, Activation, LSTM, Reshape, Concatenate

model = Sequential()
model.add(Dense(num_hidden_units, input_dim=word2vec_dim+img_dim,
                kernel_initializer='uniform'))
model.add(Dropout(dropout))
for i in range(num_hidden_layers):
    model.add(Dense(num_hidden_units, kernel_initializer='uniform'))
    model.add(Activation(activation))
    model.add(Dropout(dropout))
model.add(Dense(nb_classes, kernel_initializer='uniform'))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop')
# tensorboard = TensorBoard(log_dir='./Graph',
#                           histogram_freq=0,
#                           write_graph=True,
#                           write_images=True)

In [None]:
model.output

In [None]:
number_of_hidden_units_LSTM = 512
max_length_questions = 30

### Image model
model_image = Sequential()
model_image.add(Reshape((img_dim,), input_shape=(img_dim,)))

### Language Model
model_language = Sequential()
model_language.add(LSTM(number_of_hidden_units_LSTM,
                        return_sequences=True,
                        input_shape=(max_length_questions,
                                    word2vec_dim)))
model_language.add(LSTM(number_of_hidden_units_LSTM,
                        return_sequences=True))
model_language.add(LSTM(number_of_hidden_units_LSTM,
                        return_sequences=False))

### Merging the two models
# full_model = Sequential()
# full_model.add(Concatenate([model_language, model_image],
#                   mode='concat',
#                   concat_axis=1))

# full_model = Model([model_language, model_image], model)
