# Custom Medical VQA Test

### Goal: 
Using pre-trained LLM and scraped caption : image dataset, train a VQA model to accurately answer questions based on medical textbook. Explore whether an encoder or decoder model is more appropriate.

### Textbook Source:
https://drive.google.com/drive/u/2/folders/12mL45XMDRSxhkgMH_PIeQAAsAtbv-X2W

### TODO:
- question forming: {question : answering : image pairings} from scraped dictionary using coordinate classifier (potentially ask SRI experts)
- vision + text modalities: use LLM for text and ??? (resnet CV) for image modality (potentially ask SRI experts; maybe ask for online resources if they have any)
- Once modalities in place, reference MedBLIP / MedPalm papers + architecture

### contact:
- sookim@parc.com, shazarika@parc.com

### Text & Image Pairing + Question Forming

##### Text & Image Pair Classification

In [13]:
import os
import json
import regex as re
import pandas as pd
import numpy as np

In [5]:
# IMPORTANT: Run scrape notebook first
PDF_URL = "General - Mandell - Core Radiology (1e).pdf"

assert os.path.exists(f"book-scrape/scrape_out/{PDF_URL.split('.pdf')[0]}")
TEXT_DATA_FOLDER_URL = f"book-scrape/scrape_out/{PDF_URL.split('.pdf')[0]}"

In [53]:
text_image_df = pd.DataFrame(columns=['ch', 'section', 'pg_start', 'pg_end', 'text', 'image_refs'])

for fjson in os.listdir(TEXT_DATA_FOLDER_URL):
    ch, ftype = fjson.split('.')[0], fjson.split('.')[-1]
    if ftype != 'json':
        continue
    with open(TEXT_DATA_FOLDER_URL + f'/{fjson}') as f:
        ch = int(re.search(r'\d+', ch)[0]) - 1
        
        data = json.load(f)
        for sec_num, sec in enumerate(data):            
            pg_start = sec['pg_range'][0]
            pg_end = sec['pg_range'][1]
            text = sec['body']
            image_refs = np.array(sec['images'])

            entry = pd.DataFrame({'ch': ch, 'section': sec_num, 'pg_start': pg_start, 
                                  'pg_end': pg_end, 'text': text, 'image_refs': image_refs})
            text_image_df = pd.concat([text_image_df, entry])

text_image_df = text_image_df.set_index(['ch', 'section'])
text_image_df = text_image_df.sort_index()  

In [54]:
# TODO: add x-y classification

In [55]:
text_image_df.head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,pg_start,pg_end,text,image_refs
ch,section,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,0,14,16,Interlobar fi ssures Mechanisms of atelectasis...,scrape_out/General - Mandell - Core Radiology ...
0,0,14,16,Interlobar fi ssures Mechanisms of atelectasis...,scrape_out/General - Mandell - Core Radiology ...
0,0,14,16,Interlobar fi ssures Mechanisms of atelectasis...,scrape_out/General - Mandell - Core Radiology ...
0,1,17,18,"Case courtesy Ritu R. Gill, MD, MPH, Brigham a...",scrape_out/General - Mandell - Core Radiology ...
0,1,17,18,"Case courtesy Ritu R. Gill, MD, MPH, Brigham a...",scrape_out/General - Mandell - Core Radiology ...
0,2,18,19,Left lower lobe collapse: Frontal and lateral ...,scrape_out/General - Mandell - Core Radiology ...
0,2,18,19,Left lower lobe collapse: Frontal and lateral ...,scrape_out/General - Mandell - Core Radiology ...
0,3,19,27,Right middle lobe atelectasis: Frontal chest r...,scrape_out/General - Mandell - Core Radiology ...
0,3,19,27,Right middle lobe atelectasis: Frontal chest r...,scrape_out/General - Mandell - Core Radiology ...
0,3,19,27,Right middle lobe atelectasis: Frontal chest r...,scrape_out/General - Mandell - Core Radiology ...


##### QF

In [57]:
# how? reference SRI

### Text & Vision Modalities