# Figure RAG Demo
In this notebook, we will demonstrate the main functionalities of the project in a notebook environment. To work with the GUI application, please run the `main.py` file.

In [1]:
import os
import sys

sys.path.append("..")
from radqg.utils.rag_utils import prepare_pipeline
import radqg.settings.configs as configs
from radqg.utils.html_utils import retrieve_figures, retrieve_fulltexts
from radqg.utils.langchain_utils import get_all_chunks

### Extracting the figures and text from HTML files 

In [2]:
# Listing all the HTML files
# Please download your desired articles from the RadioGraphics website as HTML files and
# put them in the toy_data_dir folder. Do not change the names of the files and folders
# when saving them from the website. Five sample articles are already provided.

toy_data_dir = configs.TOY_HTML_DATA_DIR

print("Name of the articles: \n")
for file in os.listdir(toy_data_dir):
    if file.endswith(".html"):
        print(file)

Name of the articles: 

Internal Hernias in the Era of Multidetector CT_ Correlation of Imaging and Surgical Findings _ RadioGraphics.html
Role of Multimodality Imaging in Gastroesophageal Reflux Disease and Its Complications, with Clinical and Pathologic Correlation _ RadioGraphics.html
Murphy’s Law_ What Can Go Wrong in the Gallbladder_ Resident and Fellow Education Feature _ RadioGraphics.html
Pearls and Pitfalls in Multimodality Imaging of Colonic Volvulus _ RadioGraphics.html
CT Findings of Acute Small-Bowel Entities _ RadioGraphics.html


In [3]:
# Retrieve a dictionary of all article figures and their corresponding captions

fig_dict = retrieve_figures(toy_data_dir)
list(fig_dict.items())[:5]

[('/research/projects/m221279_Pouria/RadQG/data/html_articles/Internal Hernias in the Era of Multidetector CT_ Correlation of Imaging and Surgical Findings _ RadioGraphics_files/images_medium_rg.2016150113.fig1.gif',
  ['Internal Hernias in the Era of Multidetector CT_ Correlation of Imaging and Surgical Findings _ RadioGraphics.html',
   'Figure 1',
   'Figure 1.Drawing shows the anatomic sites of internal hernias:1= paraduodenal hernia,2= small bowel mesentery–related hernia,3= greater omentum–related hernia,4= lesser sac hernia,5= transverse mesocolon–related hernia,6= pericecal hernia,7= sigmoid mesocolon–related hernia,8= falciform ligament hernia,9= pelvic internal hernia. (Roux-en-Y anastomosis–related hernia is not shown.)']),
 ('/research/projects/m221279_Pouria/RadQG/data/html_articles/Internal Hernias in the Era of Multidetector CT_ Correlation of Imaging and Surgical Findings _ RadioGraphics_files/images_medium_rg.2016150113.fig2a.gif',
  ['Internal Hernias in the Era of Mu

In [4]:
# Retrieving the full text of all articles.

text_dict = retrieve_fulltexts(toy_data_dir)
list(text_dict.items())[0][-1][:500]

'Internal Hernias in the Era of Multidetector CT: Correlation of Imaging and Surgical Findings Clinical diagnosis of internal hernias is challenging because of their nonspecific signs and symptoms. Many types of internal hernias have been defined: paraduodenal, small bowel mesentery–related, greater omentum–related, lesser sac, transverse mesocolon–related, pericecal, sigmoid mesocolon–related, falciform ligament, pelvic internal, and Roux-en-Y anastomosis–related. An internal hernia is a surgica'

### Finding the closest sections of the text to each figure caption using RAG

In [5]:
docs_dict = {}
for key in text_dict.keys():
    article_text = text_dict[key]
    docs = get_all_chunks([article_text])
    print(len(docs))
    docs_dict[key] = docs

27
45
5
3
30
