# Report Data Extraction (RECIST v1.1)

This demo will demonstrate how to perform simple structured data extraction from radiology report using `LangChain`

## Example Data

Let's say we have the following data from a radiology report. 

In [1]:
ex_report = """
**MDCT OF THE CHEST AND WHOLE ABDOMEN (2nd study, 1st follow up scan)**

**HISTORY:** A 79-year-old woman, known history of NSCLC S/P right upper lobectomy with lung metastasis, was sent for evaluation.
**Technique:** Axial scans of the brain, chest, and whole abdomen were performed with IV contrast material according to the standard divisional protocol.  
**COMPARISON:** Prior baseline CT chest and whole abdomen obtained on 12/11/2023

**FINDINGS:**  
**Tube and line:** None.  
**Lungs and airways:** Evidence of post right upper lobectomy with no significant change of the heterogeneous enhancing soft tissue at the RML, adjacent to the bronchial stump, now measuring about 1.7x2.5 cm in transaxial dimension. The soft tissue mass involves the RML bronchus, causing complete atelectasis of the RML.  
&nbsp;&nbsp;&nbsp; There are slightly decreased in size of several peribronchial part-solid nodules and masses with spiculate borders at the RLL. The target lesion#1 decreases size from 2.84 cm to 2.62 cm in greatest transaxial diameter (target#1; Im 67 Se 210). There is also decreased size of the largest nodule with pleural tagging and peripheral subsegmental atelectasis/fibrosis in the superior segment of the RLL from 3.8x4.6 cm to 3.5x4.0 cm (Im 58 Se 210).  
&nbsp;&nbsp;&nbsp; There is a slight increase in size of the part-solid pulmonary nodule in the superior segment of the RLL (Im 59 Se 210) from 0.68 cm to 1.19 cm. No change of a tiny ground-glass nodule at the apicoposterior segment of the LUL. Also, no change of fibrosis at bilateral basal lungs and the inferior lingular segment of the LUL.  
**Pleura:** No significant change of nodular pleural thickening along the right 5th-6th posterolateral costal pleura, now measuring up to 1.5x3.1 cm (Im 54, Se 601). Minimal loculated right pleural effusion is seen. No pneumothorax is observed.  
**Mediastinum:** No significant mediastinal or hilar lymphadenopathy is seen. Normal heart size without pericardial effusion. Unremarkable visualization of the thoracic aorta and esophagus.  
**Thyroid gland:** No change of several small hypodense nodules in both thyroid glands.  
**Liver and biliary:** Normal attenuation, size, and contour. No intrahepatic duct dilatation. Portal and hepatic veins and IVC are patent. Stable 0.8-cm hypodense lesion at segment V, likely a small cyst. No calcified gallstones, gallbladder wall thickening, or mass. No biliary ductal dilatation.  
**Spleen:** No splenomegaly.  
**Pancreas:** No focal mass or ductal dilatation.  
**Adrenals:** No nodules.  
**Kidneys/ureters:** Normal size, parenchymal enhancement, and excretory function of both kidneys. No change of fat-containing nodules at the upper poles of both kidneys (RK = 1.0 cm, LK = 0.7 cm), likely angiomyolipomas (AMLs). A few tiny left renal cysts are observed. No focal mass, stone, or hydronephrosis.  
**Bladder and pelvic organs:** Unremarkable.  
**GI tract:** No distension or wall thickening.  
**Peritoneum/retroperitoneum:** No free fluid or free air.  
**Abdominal vessels:** Mild aortic atherosclerosis.  
**Chest wall and bony structures:** No change of a small sclerotic lesion at the left posteriolateral 4th rib. Dense sclerotic lesions at the bilateral pelvic bones are likely bone islands.

**Target lesions**  
1. Slightly decreased size of the pulmonary nodules at the posterior basal segment of the RLL, size = 2.62 cm from baseline 2.84 cm (Im 67 Se 210; target lesion#1)  
2. Slightly decreased size of the pulmonary nodules at the anterior basal segment of the RLL, size = 2.88 cm from baseline 3.20 cm (Im 81 Se 210; target lesion#2)

Sum of longest diameter of target lesions = 5.50 cm = 55.0 mm from baseline 60.4 mm (9% decrease in SLD of the two target lesions)  
**Evaluation of target lesions = Stable disease**

**Non-target lesions**  
1. Slightly decreased size of the large non-targeted pulmonary mass (from 3.8x4.6 cm to 3.5x4.0 cm [Im 58 Se 210]), but with interval increase in size of the 1.19-cm part-solid lung nodule in the RLL.  
2. No change of heterogeneous enhancing soft tissue at the RML, adjacent to the bronchial stump (presumable recurrent/residual cancer), size about 1.7x2.5 cm  
3. No change of nodular pleural thickening along the right 5th-6th posterolateral costal pleura (suspected pleural metastasis), size = 3.1 cm.  
**Evaluation of non-target lesions = Equivocal PD**

**Overall response = Stable disease**

**IMPRESSION:**  
- Status post right upper lobectomy with stable size of the soft tissue at the RML near the surgical site, causing atelectasis of the RML. Such findings remain concerning for residual/recurrent lung cancer.  
- Slight improvement of multiple metastatic pulmonary masses in the right lung.  
- Increase in size of the 1.19-cm part-solid nodule in the RLL, uncertain nature, likely reflecting progressive pulmonary metastasis (mixed response) or new primary lung cancer. Interval follow-up is recommended.  
- No change of a tiny ground-glass nodule at the LUL, indeterminate nature. Interval follow-up is helpful.  
- No significant change of nodular pleural thickening between the right 5th and 6th ribs, uncertain nature, possibly pleural metastasis or post-operative change. Interval follow-up is recommended.  
- No intra-thoracic or upper abdominal lymph node enlargement.  
- No change of a small liver cyst, left renal cysts, and bilateral renal AMLs.  
- No change of several small hypodense nodules in both thyroid glands.  
"""

## Goal 

The goal of the data extraction is to extract RECIST v1.1 information from the following attributes:

1. **Extract Time Point Response** `overall_response`, `target_response`, and `non_target_response`
2. **Extract target lesion:** `id`, `name`, `location`, `size`


## Define Pydantic Models

In [2]:
from pydantic import BaseModel, Field
from typing import Literal

### Time Point Response

In [3]:
# Time Point Response

class TimePointResponse(BaseModel):
    """Timepoint response according to RECIST v1.1"""

    overall_response: str = Field(description="Overall response", default="")
    target_response: str = Field(
        description="""Evaluation of the target lesion (abbreviation can be used). 
        For example: Stable disease or SD, Progressive disease or PD, Partial response or PR, Complete response or CR
        """,
        default="",
    )
    non_target_response: str = Field(
        description="""Evaluation of the non-target lesion (abbreviation can be used)
        For example: Stable disease or SD, Progressive disease or PD, Partial response or PR, Complete response or CR
        """,
        default="",
    )

### Target Lesion

In [4]:
class TargetLesion(BaseModel):
    """Individual target lesion from a radiology report"""

    id: int | None = Field(
        description="Target lesion identifier number", examples=[1, 2, 3], default=None
    )
    name: str = Field(
        description="Name of the target lesion",
        default="",
        examples=["pulmonary nodule"],
    )
    location: str = Field(description="Location of the target lesion", default="")
    size: float | str | None = Field(
        description="Size of the target lesion", default=None
    )
    unit: Literal["mm", "cm", None] = Field(
        description="Measurement unit of the target lesion", default=None
    )

In [5]:

class TargetLesionCollection(BaseModel):
    """Collection of target lesions from a radiology report"""
    TargetLesions: list[TargetLesion] = Field(description="Array of target lesions")


## LLM

Let's use open source model 

In [12]:
from langchain.chat_models import init_chat_model

llm = init_chat_model(model = "llama-3.3-70b-versatile", model_provider="groq", temperature=0)

## Prompt Template

In [7]:
from langchain_core.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a radiology report data extraction assistant."
            "Only extract relevant information from the radiology report."
        ),
        ("human", "{text}"),
    ]
)

In [9]:
# Create Actual Prompt
prompt_report = prompt_template.invoke({"text": ex_report})
prompt_report

ChatPromptValue(messages=[SystemMessage(content='You are a radiology report data extraction assistant.Only extract relevant information from the radiology report.', additional_kwargs={}, response_metadata={}), HumanMessage(content='\n**MDCT OF THE CHEST AND WHOLE ABDOMEN (2nd study, 1st follow up scan)**\n\n**HISTORY:** A 79-year-old woman, known history of NSCLC S/P right upper lobectomy with lung metastasis, was sent for evaluation.\n**Technique:** Axial scans of the brain, chest, and whole abdomen were performed with IV contrast material according to the standard divisional protocol.  \n**COMPARISON:** Prior baseline CT chest and whole abdomen obtained on 12/11/2023\n\n**FINDINGS:**  \n**Tube and line:** None.  \n**Lungs and airways:** Evidence of post right upper lobectomy with no significant change of the heterogeneous enhancing soft tissue at the RML, adjacent to the bronchial stump, now measuring about 1.7x2.5 cm in transaxial dimension. The soft tissue mass involves the RML bro

## Perform Extraction

### Target Lesion

In [13]:
llm_target = llm.with_structured_output(schema=TargetLesionCollection)
target = llm_target.invoke(prompt_report)

In [15]:
target.model_dump()

{'TargetLesions': [{'id': 1,
   'name': 'pulmonary nodule',
   'location': 'posterior basal segment of the RLL',
   'size': 2.62,
   'unit': 'cm'},
  {'id': 2,
   'name': 'pulmonary nodule',
   'location': 'anterior basal segment of the RLL',
   'size': 2.88,
   'unit': 'cm'}]}

## RECIST Time Point

In [16]:
llm_timepoint = llm.with_structured_output(schema=TimePointResponse)
timepoint = llm_timepoint.invoke(prompt_report)

In [17]:
timepoint.model_dump()

{'overall_response': 'Stable disease',
 'target_response': 'Stable disease',
 'non_target_response': 'Equivocal PD'}