# Rule-Based CDSS Summarization

This notebook generates medical summaries using a rule-based Clinical Decision Support System (CDSS) for comparison with RAG-based and ML-based approaches.

## Approach
- **Method**: Rule-based extraction using regex patterns and keyword matching
- **Components**: Symptom extraction, vital signs extraction, medication extraction
- **Assessment**: Clinical decision rules based on symptom combinations
- **Input**: Doctor-patient conversations
- **Output**: Medical summaries (text format, comparable with RAG and ML)


In [1]:
import pandas as pd
import numpy as np
import re
from tqdm import tqdm
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

print("Libraries imported successfully!")


Libraries imported successfully!


## 1. Load Conversation Data


In [2]:
# Load the conversation data
df = pd.read_csv("/home/root495/Inexture/CDSS-RAG/data/raw/conversation_summary.csv")

# Extract first 15 rows (same as other notebooks)
df = df.head(15)

print(f"Loaded {len(df)} conversations")
print(f"Columns: {df.columns.tolist()}")
print(f"\nFirst conversation preview:")
print(df['conversation'].iloc[0][:200] + "...")


Loaded 15 conversations
Columns: ['conversation', 'summary']

First conversation preview:
Doctor: Hello? Hi. Um, should we start? Yeah, okay. Hello how um. Good morning sir, how can I help you this morning? Patient: Hello, how are you? Patient: Oh hey, um, I've just had some diarrhea for t...


## 2. Implement Symptom Extraction Rules


In [3]:
# Define symptom dictionary with keywords
SYMPTOMS = {
    "fever": ["fever", "high temperature", "pyrexia", "temperature", "hot", "feverish"],
    "cough": ["cough", "sputum", "productive cough", "coughing"],
    "chest_pain": ["chest pain", "tightness in chest", "pressure in chest", "chest discomfort"],
    "breathlessness": ["shortness of breath", "breathlessness", "dyspnea", "short of breath", "sob", "winded"],
    "fatigue": ["fatigue", "tired", "weakness", "lethargic", "weak", "tiredness"],
    "headache": ["headache", "migraine", "head pain"],
    "dizziness": ["dizzy", "dizziness", "lightheaded"],
    "night_sweats": ["night sweats", "sweating at night"],
    "vomiting": ["vomit", "vomiting", "nausea", "nauseous", "throwing up"],
    "diarrhea": ["diarrhea", "diarrhoea", "loose stools", "loose stool", "watery stool"],
    "abdominal_pain": ["tummy pain", "abdominal pain", "stomach pain", "belly pain", "lower abdomen", "abdo pain"],
    "skin_rash": ["rash", "red skin", "itchy skin", "dry skin", "sore skin", "cracked skin"],
    "sore_throat": ["sore throat", "throat pain", "painful throat"],
    "runny_nose": ["runny nose", "nasal congestion", "stuffy nose", "congested"],
    "muscle_pain": ["muscle pain", "myalgia", "joint pain", "arthralgia", "stiffness"]
}

def extract_symptoms(text):
    """Extract symptoms from conversation text"""
    text = text.lower()
    found = []
    for key, words in SYMPTOMS.items():
        for w in words:
            if w in text:
                found.append(key)
                break
    return found

def extract_vitals(text):
    """Extract vital signs from conversation text"""
    bp = re.findall(r'(\d{2,3}/\d{2,3})', text)
    hr = re.findall(r'heart rate\s*:?\s*(\d+)', text.lower())
    temp = re.findall(r'(\d{2}\.\d)\s?(?:c|°c|degrees)', text.lower())
    # Also look for temperature mentions without specific values
    temp_mentioned = bool(re.search(r'temperature|fever|feverish|hot', text.lower()))
    
    return {
        "bp": bp[0] if bp else None,
        "heart_rate": hr[0] if hr else None,
        "temperature": temp[0][0] if temp else ("mentioned" if temp_mentioned else None)
    }

def extract_medications(text):
    """Extract medications from conversation text"""
    meds = []
    # Common medications with case-insensitive matching
    patterns = [
        r'\bmetformin\b', r'\binsulin\b', r'\baspirin\b', 
        r'\bparacetamol\b', r'\bmetoprolol\b', r'\bamlodipine\b',
        r'\bfurosemide\b', r'\bbisoprolol\b', r'\blisinopril\b',
        r'\binhaler\b', r'\binhalers\b', r'\bibuprofen\b',
        r'\bsteroid\b', r'\bantihistamine\b', r'\bantihistamines\b',
        r'\bco-codamol\b', r'\bnaproxen\b', r'\btrimethoprim\b',
        r'\bnitrofurantoin\b', r'\bamoxicillin\b', r'\bmicrogynon\b',
        r'\bimplanon\b', r'\bdiarolyte\b', r'\bdioralyte\b'
    ]
    for p in patterns:
        matches = re.findall(p, text.lower())
        if matches:
            meds.extend(matches)
    # Remove duplicates while preserving order
    return list(dict.fromkeys(meds))

# Test extraction functions
test_text = df['conversation'].iloc[0]
print("Testing extraction functions on first conversation:")
print(f"\nSymptoms found: {extract_symptoms(test_text)}")
print(f"Vitals found: {extract_vitals(test_text)}")
print(f"Medications found: {extract_medications(test_text)}")


Testing extraction functions on first conversation:

Symptoms found: ['fever', 'fatigue', 'night_sweats', 'vomiting', 'diarrhea', 'abdominal_pain']
Vitals found: {'bp': None, 'heart_rate': None, 'temperature': 'mentioned'}
Medications found: ['paracetamol', 'inhaler', 'inhalers', 'dioralyte']


## 3. Implement Rule-Based Summary Generator


In [4]:
def rule_based_summary(conversation):
    """Generate rule-based medical summary from conversation"""
    symptoms = extract_symptoms(conversation)
    vitals = extract_vitals(conversation)
    medications = extract_medications(conversation)
    
    # Extract duration information
    duration_patterns = [
        r'(\d+)\s*(?:days?|d)\s*(?:ago|hx|history)',
        r'for\s*(?:the\s*)?(?:last\s*)?(\d+)\s*(?:days?|d)',
        r'(\d+)\s*(?:days?|d)\s*(?:of|with)'
    ]
    duration = None
    for pattern in duration_patterns:
        match = re.search(pattern, conversation.lower())
        if match:
            duration = match.group(1)
            break
    
    # Clinical assessments based on symptom combinations
    assessments = []
    
    if "chest_pain" in symptoms and "breathlessness" in symptoms:
        assessments.append("Symptoms suggest possible acute coronary syndrome or cardiac event.")
    elif "fever" in symptoms and "cough" in symptoms:
        assessments.append("Findings are consistent with a respiratory infection.")
    elif "fever" in symptoms and "diarrhea" in symptoms and "vomiting" in symptoms:
        assessments.append("Symptoms suggest gastroenteritis or gastrointestinal infection.")
    elif "night_sweats" in symptoms and "fatigue" in symptoms:
        assessments.append("Symptoms raise concern for tuberculosis or other systemic infection.")
    elif "headache" in symptoms and "vomiting" in symptoms:
        assessments.append("Headache with vomiting may indicate migraine or increased intracranial pressure.")
    elif "skin_rash" in symptoms and "itchy" in conversation.lower():
        assessments.append("Skin findings consistent with dermatitis or eczema flare-up.")
    elif "sore_throat" in symptoms and "runny_nose" in symptoms and "cough" in symptoms:
        assessments.append("Symptoms consistent with viral upper respiratory tract infection.")
    elif "abdominal_pain" in symptoms and "vomiting" in symptoms:
        assessments.append("Abdominal symptoms may indicate gastrointestinal issue requiring further evaluation.")
    elif symptoms:
        assessments.append("Patient presents with non-specific symptoms requiring further evaluation.")
    else:
        assessments.append("No major symptoms detected in the conversation.")
    
    # Build summary in clinical format
    summary_parts = []
    
    # Duration and history
    if duration:
        summary_parts.append(f"{duration}/7 hx of")
    else:
        summary_parts.append("Recent hx of")
    
    # Main symptoms
    if symptoms:
        symptom_list = ', '.join([s.replace('_', ' ') for s in symptoms])
        summary_parts.append(symptom_list + ".")
    else:
        summary_parts.append("no significant symptoms.")
    
    # Vital signs
    vitals_str = "Vital signs: "
    vitals_list = []
    if vitals['bp']:
        vitals_list.append(f"BP {vitals['bp']}")
    if vitals['heart_rate']:
        vitals_list.append(f"HR {vitals['heart_rate']}")
    if vitals['temperature']:
        if vitals['temperature'] == "mentioned":
            vitals_list.append("temperature elevated")
        else:
            vitals_list.append(f"temp {vitals['temperature']}°C")
    if vitals_list:
        summary_parts.append(vitals_str + ", ".join(vitals_list) + ".")
    else:
        summary_parts.append("Vital signs not recorded.")
    
    # Medications
    if medications:
        meds_str = "Current medications: " + ", ".join(medications) + "."
        summary_parts.append(meds_str)
    else:
        summary_parts.append("No medications reported.")
    
    # Assessment
    if assessments:
        summary_parts.append("Imp: " + assessments[0])
    
    # Plan (generic)
    summary_parts.append("Plan: Further evaluation and management as clinically indicated.")
    
    return " ".join(summary_parts)

# Test the summary generator
test_summary = rule_based_summary(df['conversation'].iloc[0])
print("Sample rule-based summary:")
print(test_summary)
print(f"\nGold summary for comparison:")
print(df['summary'].iloc[0][:200] + "...")


Sample rule-based summary:
Recent hx of fever, fatigue, night sweats, vomiting, diarrhea, abdominal pain. Vital signs: temperature elevated. Current medications: paracetamol, inhaler, inhalers, dioralyte. Imp: Symptoms suggest gastroenteritis or gastrointestinal infection. Plan: Further evaluation and management as clinically indicated.

Gold summary for comparison:
3/7 hx of diarrhea, mainly watery. No blood in stool. Opening bowels x6/day. Associated LLQ pain - crampy, intermittent, nil radiation. Also vomiting - mainly bilous. No blood in vomit. Fever on first...


## 4. Generate Summaries for All Conversations


In [5]:
# Generate rule-based summaries for all conversations
print("Generating rule-based summaries for all conversations...")

rulebased_summaries = []
for idx, conversation in enumerate(tqdm(df['conversation'], desc="Processing")):
    try:
        summary = rule_based_summary(conversation)
        rulebased_summaries.append(summary)
    except Exception as e:
        print(f"  Error processing conversation {idx + 1}: {e}")
        rulebased_summaries.append("")  # Add empty string on error

# Add rulebased_summary column to dataframe
df['rulebased_summary'] = rulebased_summaries

print(f"\nGenerated {len(rulebased_summaries)} summaries successfully!")

# Display sample results
print("\n" + "="*80)
print("Sample Results:")
print("="*80)
for i in range(min(5, len(df))):
    print(f"\n--- Conversation {i+1} ---")
    print(f"\nGold Summary:")
    print(df['summary'].iloc[i][:300] + "...")
    print(f"\nRule-Based Summary:")
    print(df['rulebased_summary'].iloc[i][:300] + "...")


Generating rule-based summaries for all conversations...


Processing: 100%|██████████| 15/15 [00:00<00:00, 80.95it/s]


Generated 15 summaries successfully!

Sample Results:

--- Conversation 1 ---

Gold Summary:
3/7 hx of diarrhea, mainly watery. No blood in stool. Opening bowels x6/day. Associated LLQ pain - crampy, intermittent, nil radiation. Also vomiting - mainly bilous. No blood in vomit. Fever on first day, nil since. Has been feeling lethargic and weak since. Takeaway 4/7 ago - Chinese restaurant. W...

Rule-Based Summary:
Recent hx of fever, fatigue, night sweats, vomiting, diarrhea, abdominal pain. Vital signs: temperature elevated. Current medications: paracetamol, inhaler, inhalers, dioralyte. Imp: Symptoms suggest gastroenteritis or gastrointestinal infection. Plan: Further evaluation and management as clinically...

--- Conversation 2 ---

Gold Summary:
4/7 hx of dry itchy skin, mainly on chest and back. Skin is also sore and cracked. No bleeding or discharge. No spread elsewhere. Not aware of any triggers. Used steroid cream OTC last night with minimal effect. Pt has also tried antihist




## 5. Save Results


In [6]:
# Save to processed directory
output_path = Path("/home/root495/Inexture/CDSS-RAG/data/processed/cdss_rulebased_output.csv")

# Ensure directory exists
output_path.parent.mkdir(parents=True, exist_ok=True)

# Save dataframe
df.to_csv(output_path, index=False)

print(f"Results saved to: {output_path}")
print(f"Total conversations processed: {len(df)}")
print(f"Columns saved: {df.columns.tolist()}")
print("\nFile saved successfully!")


Results saved to: /home/root495/Inexture/CDSS-RAG/data/processed/cdss_rulebased_output.csv
Total conversations processed: 15
Columns saved: ['conversation', 'summary', 'rulebased_summary']

File saved successfully!
