# Healthcare Feedback Analysis ‚Äî NLP Pipeline

**Objective**: Convert noisy, free-text patient feedback into structured insights.

| Step | Task | Method |
|------|------|--------|
| 1 | Overall Sentiment | VADER (fast) + Transformer (accurate) |
| 2 | Entity Extraction | Regex patterns for healthcare entities |
| 3 | Entity-wise Sentiment | Context-window + VADER/Transformer |
| 4 | Output | CSV / Excel with all results |

In [None]:
import re
import warnings
import pandas as pd
import numpy as np
import nltk
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

warnings.filterwarnings("ignore")
nltk.download("punkt", quiet=True)
nltk.download("punkt_tab", quiet=True)

# VADER ‚Äî fast, rule-based sentiment (no GPU needed)
vader = SentimentIntensityAnalyzer()

print("Setup complete.")

Setup complete.


In [None]:
INPUT_PATH = "C:/Users/HP/Downloads/healthcare_feedback.csv"

df = pd.read_csv(INPUT_PATH)
print(f"Loaded {len(df)} feedback records.")
print(f"Columns: {df.columns.tolist()}")
df.head(3)

Loaded 999 feedback records.
Columns: ['feedback_id', 'feedback_text']


Unnamed: 0,feedback_id,feedback_text
0,500,"I met Dr. Alan Moore, for laparoscopic gallbla..."
1,145,"I met Dr. Riya Patel, at Riverside Health Clin..."
2,785,"at Greenfield Medical Center, for hernia corre..."


In [None]:

ENTITY_PATTERNS = {
    "Doctor": [
        r"(?:Dr\.?|Doctor)\s+[A-Z][a-z]+(?:\s+[A-Z][a-z]+)*",   # Dr. Alan Moore
    ],
    "Facility": [
        # "at <Name> Hospital/Clinic/Medical Center/Institute/Campus"
        r"(?:at\s+)?(?:[A-Z][a-z]+\s+){1,4}(?:Hospital|Clinic|Medical\s+Center|Institute|Campus|Health\s+Clinic|Surgery\s+Hospital)",
    ],
    "Surgery": [
        # "for <procedure>" ‚Äî captures multi-word procedure names
        r"(?:for\s+)([a-zA-Z]+(?:\s+[a-zA-Z]+){0,4}\s+(?:surgery|procedure|repair|removal|replacement|fusion|ablation|biopsy|correction|transplant))",
        r"\b(?:surgery|procedure|repair|removal|replacement|fusion|ablation|biopsy|correction|transplant)\b",
    ],
    "Appointment": [
        r"\b(?:appointment|check[\s-]?in|scheduling|reschedul\w+)\b",
    ],
    "Nurse": [
        r"\b(?:nurs\w+|nursing\s+staff|nursing\s+team|RN|registered\s+nurse)\b",
    ],
    "Parking": [
        r"\b(?:parking|parked|parking\s+lot|parking\s+garage|valet)\b",
    ],
}


def extract_entities(text: str) -> list[dict]:
    """Extract healthcare entities from text using regex patterns."""
    entities = []
    seen = set()  # avoid duplicates

    for entity_type, patterns in ENTITY_PATTERNS.items():
        for pattern in patterns:
            for match in re.finditer(pattern, text, re.IGNORECASE):
                # Use group(1) if there's a capture group, else group(0)
                entity_text = (match.group(1) if match.lastindex else match.group(0)).strip()

                # Clean up: remove leading "at "/"for "
                entity_text = re.sub(r"^(?:at|for)\s+", "", entity_text, flags=re.IGNORECASE).strip()

                # Skip very short or duplicate entities
                key = (entity_type, entity_text.lower())
                if len(entity_text) < 3 or key in seen:
                    continue
                seen.add(key)

                entities.append({
                    "entity": entity_text,
                    "entity_type": entity_type,
                    "start": match.start(),
                    "end": match.end(),
                })

    return entities


# --- Quick test ---
sample = df["feedback_text"].iloc[0]
print(f"Text: {sample}\n")
for e in extract_entities(sample):
    print(f"  {e['entity_type']:15s}  ‚Üí  {e['entity']}")

Text: I met Dr. Alan Moore, for laparoscopic gallbladder removal, check in for the appointment was confusing, nursing staff followed standard checks, I left feeling confused and frustrated maybe I expected more

  Doctor           ‚Üí  Dr. Alan Moore
  Surgery          ‚Üí  laparoscopic gallbladder removal
  Surgery          ‚Üí  removal
  Appointment      ‚Üí  check in
  Appointment      ‚Üí  appointment
  Nurse            ‚Üí  nursing


In [None]:

def vader_label(compound: float) -> str:
    """Convert VADER compound score ‚Üí positive / negative / neutral."""
    if compound >= 0.05:
        return "positive"
    elif compound <= -0.05:
        return "negative"
    return "neutral"


def get_overall_sentiment(text: str) -> dict:
    """Return overall sentiment label + confidence for the full text."""
    scores = vader.polarity_scores(text)
    label = vader_label(scores["compound"])
    return {"label": label, "confidence": round(abs(scores["compound"]), 4)}


def get_entity_context(text: str, entity: str, window: int = 120) -> str:
    """Extract the sentence / window around where the entity appears."""
    # Try sentence-level first
    sentences = nltk.sent_tokenize(text)
    for sent in sentences:
        if entity.lower() in sent.lower():
            return sent

    # Fallback: character window around first occurrence
    idx = text.lower().find(entity.lower())
    if idx == -1:
        return text  # entity not found literally ‚Üí use full text
    start = max(0, idx - window)
    end = min(len(text), idx + len(entity) + window)
    return text[start:end]


def get_entity_sentiment(text: str, entity: str) -> dict:
    """Return sentiment for the context surrounding a specific entity."""
    context = get_entity_context(text, entity)
    scores = vader.polarity_scores(context)
    label = vader_label(scores["compound"])
    return {
        "label": label,
        "confidence": round(abs(scores["compound"]), 4),
        "context": context,
    }


# --- Quick test ---
sample = df["feedback_text"].iloc[0]
print("Overall:", get_overall_sentiment(sample))
print("Entity:", get_entity_sentiment(sample, "appointment"))

Overall: {'label': 'negative', 'confidence': 0.7269}
Entity: {'label': 'negative', 'confidence': 0.7269, 'context': 'I met Dr. Alan Moore, for laparoscopic gallbladder removal, check in for the appointment was confusing, nursing staff followed standard checks, I left feeling confused and frustrated maybe I expected more'}


In [None]:

def analyze_feedback(feedback_id: int, text: str) -> list[dict]:
    """
    For a single feedback record, return:
      - Overall sentiment
      - Each extracted entity with its type and sentiment
    """
    overall = get_overall_sentiment(text)
    entities = extract_entities(text)

    rows = []

    if not entities:
        # No entities found ‚Äî still record overall sentiment
        rows.append({
            "feedback_id": feedback_id,
            "feedback_text": text,
            "overall_sentiment": overall["label"],
            "overall_confidence": overall["confidence"],
            "entity": None,
            "entity_type": None,
            "entity_sentiment": None,
            "entity_confidence": None,
            "entity_context": None,
        })
    else:
        for ent in entities:
            ent_sent = get_entity_sentiment(text, ent["entity"])
            rows.append({
                "feedback_id": feedback_id,
                "feedback_text": text,
                "overall_sentiment": overall["label"],
                "overall_confidence": overall["confidence"],
                "entity": ent["entity"],
                "entity_type": ent["entity_type"],
                "entity_sentiment": ent_sent["label"],
                "entity_confidence": ent_sent["confidence"],
                "entity_context": ent_sent["context"],
            })

    return rows


# --- Quick test on first 3 rows ---
for _, row in df.head(3).iterrows():
    results = analyze_feedback(row["feedback_id"], row["feedback_text"])
    print(f"\n--- Feedback {row['feedback_id']} ---")
    print(f"Overall: {results[0]['overall_sentiment']} ({results[0]['overall_confidence']})")
    for r in results:
        if r["entity"]:
            print(f"  {r['entity_type']:15s} | {r['entity']:35s} | {r['entity_sentiment']}")


--- Feedback 500 ---
Overall: negative (0.7269)
  Doctor          | Dr. Alan Moore                      | negative
  Surgery         | laparoscopic gallbladder removal    | negative
  Surgery         | removal                             | negative
  Appointment     | check in                            | negative
  Appointment     | appointment                         | negative
  Nurse           | nursing                             | negative

--- Feedback 145 ---
Overall: positive (0.2023)
  Doctor          | Dr. Riya Patel                      | positive
  Facility        | Riverside Health Clinic             | positive
  Surgery         | arthroscopic knee repair            | positive
  Surgery         | repair                              | positive
  Appointment     | appointment                         | positive
  Appointment     | rescheduled                         | positive
  Nurse           | nursing                             | positive

--- Feedback 785 ---
Overall: 

In [None]:

all_results = []

for idx, row in df.iterrows():
    results = analyze_feedback(row["feedback_id"], row["feedback_text"])
    all_results.extend(results)

    if (idx + 1) % 200 == 0:
        print(f"  Processed {idx + 1}/{len(df)} records...")

results_df = pd.DataFrame(all_results)
print(f"\nDone! {len(df)} feedback records ‚Üí {len(results_df)} output rows.")
results_df.head(10)

  Processed 200/999 records...
  Processed 400/999 records...
  Processed 600/999 records...
  Processed 800/999 records...

Done! 999 feedback records ‚Üí 4059 output rows.


Unnamed: 0,feedback_id,feedback_text,overall_sentiment,overall_confidence,entity,entity_type,entity_sentiment,entity_confidence,entity_context
0,500,"I met Dr. Alan Moore, for laparoscopic gallbla...",negative,0.7269,Dr. Alan Moore,Doctor,negative,0.7269,"I met Dr. Alan Moore, for laparoscopic gallbla..."
1,500,"I met Dr. Alan Moore, for laparoscopic gallbla...",negative,0.7269,laparoscopic gallbladder removal,Surgery,negative,0.7269,"I met Dr. Alan Moore, for laparoscopic gallbla..."
2,500,"I met Dr. Alan Moore, for laparoscopic gallbla...",negative,0.7269,removal,Surgery,negative,0.7269,"I met Dr. Alan Moore, for laparoscopic gallbla..."
3,500,"I met Dr. Alan Moore, for laparoscopic gallbla...",negative,0.7269,check in,Appointment,negative,0.7269,"I met Dr. Alan Moore, for laparoscopic gallbla..."
4,500,"I met Dr. Alan Moore, for laparoscopic gallbla...",negative,0.7269,appointment,Appointment,negative,0.7269,"I met Dr. Alan Moore, for laparoscopic gallbla..."
5,500,"I met Dr. Alan Moore, for laparoscopic gallbla...",negative,0.7269,nursing,Nurse,negative,0.7269,"I met Dr. Alan Moore, for laparoscopic gallbla..."
6,145,"I met Dr. Riya Patel, at Riverside Health Clin...",positive,0.2023,Dr. Riya Patel,Doctor,positive,0.2023,"I met Dr. Riya Patel, at Riverside Health Clin..."
7,145,"I met Dr. Riya Patel, at Riverside Health Clin...",positive,0.2023,Riverside Health Clinic,Facility,positive,0.2023,"I met Dr. Riya Patel, at Riverside Health Clin..."
8,145,"I met Dr. Riya Patel, at Riverside Health Clin...",positive,0.2023,arthroscopic knee repair,Surgery,positive,0.2023,"I met Dr. Riya Patel, at Riverside Health Clin..."
9,145,"I met Dr. Riya Patel, at Riverside Health Clin...",positive,0.2023,repair,Surgery,positive,0.2023,"I met Dr. Riya Patel, at Riverside Health Clin..."


In [None]:

CSV_OUTPUT  = "healthcare_feedback_results.csv"
XLSX_OUTPUT = "healthcare_feedback_results.xlsx"

# --- CSV ---
results_df.to_csv(CSV_OUTPUT, index=False)
print(f"Saved ‚Üí {CSV_OUTPUT}")

# --- Excel (with formatting) ---
with pd.ExcelWriter(XLSX_OUTPUT, engine="openpyxl") as writer:
    # Sheet 1: Full results (one row per entity)
    results_df.to_excel(writer, sheet_name="Detailed Results", index=False)

    # Sheet 2: Summary per feedback (overall sentiment + entity count)
    summary = (
        results_df
        .groupby(["feedback_id", "overall_sentiment"])
        .agg(
            entities_found=pd.NamedAgg(column="entity", aggfunc="count"),
            entity_types=pd.NamedAgg(column="entity_type", aggfunc=lambda x: ", ".join(sorted(set(x.dropna())))),
        )
        .reset_index()
    )
    summary.to_excel(writer, sheet_name="Summary", index=False)

    # Sheet 3: Entity-type level aggregation
    entity_agg = (
        results_df[results_df["entity"].notna()]
        .groupby(["entity_type", "entity_sentiment"])
        .size()
        .unstack(fill_value=0)
        .reset_index()
    )
    entity_agg.to_excel(writer, sheet_name="Entity Sentiment Agg", index=False)

print(f"Saved ‚Üí {XLSX_OUTPUT}  (3 sheets: Detailed Results, Summary, Entity Sentiment Agg)")

Saved ‚Üí healthcare_feedback_results.csv
Saved ‚Üí healthcare_feedback_results.xlsx  (3 sheets: Detailed Results, Summary, Entity Sentiment Agg)


In [None]:
print("=" * 60)
print("HEALTHCARE FEEDBACK ANALYSIS ‚Äî SUMMARY")
print("=" * 60)

# 1) Overall sentiment distribution
print("\nüìä Overall Sentiment Distribution:")
overall_counts = results_df.drop_duplicates("feedback_id")["overall_sentiment"].value_counts()
for label, count in overall_counts.items():
    pct = count / overall_counts.sum() * 100
    print(f"  {label:10s}  {count:4d}  ({pct:.1f}%)")

# 2) Entity type frequency
print("\nüè∑Ô∏è Entity Types Extracted:")
entity_counts = results_df[results_df["entity"].notna()]["entity_type"].value_counts()
for etype, count in entity_counts.items():
    print(f"  {etype:15s}  {count:4d}")

# 3) Entity-level sentiment breakdown
print("\nüîç Sentiment by Entity Type:")
entity_sent = (
    results_df[results_df["entity"].notna()]
    .groupby(["entity_type", "entity_sentiment"])
    .size()
    .unstack(fill_value=0)
)
print(entity_sent.to_string())

# 4) Most negative entities (actionable insights)
print("\n‚ö†Ô∏è Most Mentioned Negative Entities (Top 10):")
neg = results_df[(results_df["entity_sentiment"] == "negative") & (results_df["entity"].notna())]
top_neg = neg.groupby(["entity_type", "entity"]).size().sort_values(ascending=False).head(10)
for (etype, ename), count in top_neg.items():
    print(f"  {etype:15s} | {ename:35s} | mentioned {count}x")

print("\n" + "=" * 60)
print("Analysis complete. Results saved to CSV and Excel.")

HEALTHCARE FEEDBACK ANALYSIS ‚Äî SUMMARY

üìä Overall Sentiment Distribution:
  negative     498  (49.8%)
  positive     453  (45.3%)
  neutral       48  (4.8%)

üè∑Ô∏è Entity Types Extracted:
  Surgery          1273
  Appointment       913
  Doctor            720
  Facility          523
  Nurse             392
  Parking           223

üîç Sentiment by Entity Type:
entity_sentiment  negative  neutral  positive
entity_type                                  
Appointment            469       53       391
Doctor                 369       33       318
Facility               274       25       224
Nurse                  194       21       177
Parking                104       13       106
Surgery                633       50       590

‚ö†Ô∏è Most Mentioned Negative Entities (Top 10):
  Appointment     | appointment                         | mentioned 305x
  Nurse           | nurse                               | mentioned 107x
  Parking         | parking                             | mentio

In [None]:

# Show a nicely formatted sample for one feedback
sample_id = df["feedback_id"].iloc[0]
sample_rows = results_df[results_df["feedback_id"] == sample_id]

print(f"Feedback ID: {sample_id}")
print(f"Text: {sample_rows.iloc[0]['feedback_text']}")
print(f"Overall Sentiment: {sample_rows.iloc[0]['overall_sentiment']} "
      f"(confidence: {sample_rows.iloc[0]['overall_confidence']})")
print(f"\nEntities found: {len(sample_rows)}")
print("-" * 80)
print(f"{'Entity Type':15s} | {'Entity':35s} | {'Sentiment':10s} | {'Confidence'}")
print("-" * 80)
for _, r in sample_rows.iterrows():
    if r["entity"]:
        print(f"{r['entity_type']:15s} | {r['entity']:35s} | {r['entity_sentiment']:10s} | {r['entity_confidence']}")

Feedback ID: 500
Text: I met Dr. Alan Moore, for laparoscopic gallbladder removal, check in for the appointment was confusing, nursing staff followed standard checks, I left feeling confused and frustrated maybe I expected more
Overall Sentiment: negative (confidence: 0.7269)

Entities found: 6
--------------------------------------------------------------------------------
Entity Type     | Entity                              | Sentiment  | Confidence
--------------------------------------------------------------------------------
Doctor          | Dr. Alan Moore                      | negative   | 0.7269
Surgery         | laparoscopic gallbladder removal    | negative   | 0.7269
Surgery         | removal                             | negative   | 0.7269
Appointment     | check in                            | negative   | 0.7269
Appointment     | appointment                         | negative   | 0.7269
Nurse           | nursing                             | negative   | 0.7269
