<a href="https://colab.research.google.com/github/christinium/Health/blob/main/EchoReports_to_CSV.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Echo Reports to CSV Format
This notebook extracts the "Findings" section from echocardiography reports stored in a CSV file, calculate the length statistics of the extracted text, and save the processed data to a new CSV file.

This file will be used in model training for:
Yet another copy of echo_note_training.ipynb

This data can be found at:<br>
https://physionet.org/content/echo-note-to-num/1.0.0/

In [1]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Set up working directory
import os
WORKING_DIR = '/content/drive/MyDrive/echo_training/'  # Change this to your preferred location
os.makedirs(WORKING_DIR, exist_ok=True)
os.chdir(WORKING_DIR)

Mounted at /content/drive


In [6]:
import pandas as pd

In [2]:
import re

def extract_findings(report_text):
    """
    Extract everything between 'Findings:' and 'Conclusions:'
    """
    # Pattern to match everything between Findings: and Conclusions:
    pattern = r'Findings:(.*?)Conclusions:'

    match = re.search(pattern, report_text, re.DOTALL)

    if match:
        findings = match.group(1).strip()
        return findings
    else:
        return "Could not find Findings and Conclusions sections"

# Example usage
report = """
PATIENT/TEST INFORMATION: Indication: Congestive heart failure. Left ventricular function. Shortness of breath. Height: (in) 66 Weight (lb): 177 BSA (m2): 1.90 m2 BP (mm Hg): 150/80 HR (bpm): 60 Status: Inpatient Date/Time: [**2105-11-6**] at 14:57 Test: TTE (Complete) Doppler: Full Doppler and color Doppler Contrast: None Technical Quality: Adequate INTERPRETATION: Findings: LEFT ATRIUM: Mild LA enlargement. RIGHT ATRIUM/INTERATRIAL SEPTUM: Mildly dilated RA. Increased IVC diameter (>2.1cm) with <35% decrease during respiration (estimated RA pressure (10-20mmHg). LEFT VENTRICLE: Estimated cardiac index is normal (>=2.5L/min/m2). TDI E/e' >15, suggesting PCWP>18mmHg. No resting LVOT gradient. RIGHT VENTRICLE: Normal RV chamber size and free wall motion. Paradoxic septal motion consistent with prior cardiac surgery. AORTA: Normal diameter of aorta at the sinus, ascending and arch levels. Focal calcifications in aortic root. No 2D or Doppler evidence of distal arch coarctation. AORTIC VALVE: Mildly thickened aortic valve leaflets (3). No AS. No AR. MITRAL VALVE: Normal mitral valve leaflets with trivial MR. No MVP. TRICUSPID VALVE: Normal tricuspid valve leaflets. Mild [1+] TR. Moderate PA systolic hypertension. PULMONIC VALVE/PULMONARY ARTERY: Normal pulmonic valve leaflet. No PS. Physiologic PR. PERICARDIUM: No pericardial effusion. Conclusions: The left atrium is mildly dilated.
"""

findings = extract_findings(report)
print(findings)

LEFT ATRIUM: Mild LA enlargement. RIGHT ATRIUM/INTERATRIAL SEPTUM: Mildly dilated RA. Increased IVC diameter (>2.1cm) with <35% decrease during respiration (estimated RA pressure (10-20mmHg). LEFT VENTRICLE: Estimated cardiac index is normal (>=2.5L/min/m2). TDI E/e' >15, suggesting PCWP>18mmHg. No resting LVOT gradient. RIGHT VENTRICLE: Normal RV chamber size and free wall motion. Paradoxic septal motion consistent with prior cardiac surgery. AORTA: Normal diameter of aorta at the sinus, ascending and arch levels. Focal calcifications in aortic root. No 2D or Doppler evidence of distal arch coarctation. AORTIC VALVE: Mildly thickened aortic valve leaflets (3). No AS. No AR. MITRAL VALVE: Normal mitral valve leaflets with trivial MR. No MVP. TRICUSPID VALVE: Normal tricuspid valve leaflets. Mild [1+] TR. Moderate PA systolic hypertension. PULMONIC VALVE/PULMONARY ARTERY: Normal pulmonic valve leaflet. No PS. Physiologic PR. PERICARDIUM: No pericardial effusion.


In [7]:
# Load your CSV file
df = pd.read_csv('/content/drive/MyDrive/echo_training/echo_dataset.csv')


In [9]:
# Apply the extract_findings function to create the new column
df['text_trunc'] = df['text'].apply(extract_findings)

# Optional: Check the results
print(f"Shape of dataframe: {df.shape}")
print(f"\nFirst few rows of the new column:")
print(df['text_trunc'].head())

# Optional: Check statistics on the length of extracted findings
df['findings_length'] = df['text_trunc'].str.len()
print(f"\nLength statistics of extracted findings:")
print(df['findings_length'].describe())

# Save the updated dataframe to a new CSV file
output_path = '/content/drive/MyDrive/echo_training/echo_dataset_with_findings.csv'
df.to_csv(output_path, index=False)
print(f"\nSaved updated dataframe to: {output_path}")

Shape of dataframe: (45794, 3)

First few rows of the new column:
0    LEFT ATRIUM: The left atrium is normal in size...
1    LEFT ATRIUM: Mild LA enlargement.\n\nLEFT VENT...
2    LEFT VENTRICLE: Normal regional LV systolic fu...
3    This study was compared to the report of the p...
4    LEFT ATRIUM: Normal LA and RA cavity sizes.\n\...
Name: text_trunc, dtype: object

Length statistics of extracted findings:
count    45794.000000
mean      1015.886535
std        420.304926
min          0.000000
25%        793.000000
50%       1007.000000
75%       1252.000000
max       3312.000000
Name: findings_length, dtype: float64

Saved updated dataframe to: /content/drive/MyDrive/echo_training/echo_dataset_with_findings.csv


In [10]:
df.head()

Unnamed: 0,text,labels,text_trunc,findings_length
0,PATIENT/TEST INFORMATION:\nIndication: Left ve...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, -3, -3, -3, 0, 0, ...",LEFT ATRIUM: The left atrium is normal in size...,680
1,PATIENT/TEST INFORMATION:\nIndication: Endocar...,"[1, 0, 0, 0, 1, 0, -3, 0, 0, 0, 0, -3, 0, 1, 0...",LEFT ATRIUM: Mild LA enlargement.\n\nLEFT VENT...,976
2,PATIENT/TEST INFORMATION:\nIndication: Left ve...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, ...",LEFT VENTRICLE: Normal regional LV systolic fu...,567
3,PATIENT/TEST INFORMATION:\nIndication: Endocar...,"[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 3, 1, 0, ...",This study was compared to the report of the p...,1017
4,PATIENT/TEST INFORMATION:\nIndication: Left ve...,"[0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",LEFT ATRIUM: Normal LA and RA cavity sizes.\n\...,1091
