# üè• CMC Vellore Health Package Summarizer

**My First LLM Project!**

This project scrapes health checkup package data from [DocOPD - CMC Vellore](https://www.docopd.com/en-in/lab/cmc-vellore) and uses an LLM to extract, organize, and rate all the packages with their prices.

### How it works:
1. **Scrape** the webpage using `requests` + `BeautifulSoup`
2. **Extract** the text content from the page
3. **Send** the content to an LLM with a system prompt
4. **Display** a clean summary with prices and ratings

### üì¶ Cell 2 ‚Äî Imports & Setup
This cell imports all the libraries we need: `requests` and `BeautifulSoup` for web scraping, `dotenv` to load the API key from a `.env` file, `OpenAI` to talk to the LLM via OpenRouter, and `IPython.display` to render markdown output. It then loads and validates the API key and initializes the OpenAI client.

In [4]:
# Step 1: Imports and Setup

import os
import requests
from bs4 import BeautifulSoup
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

# Load API key
load_dotenv(override=True)
api_key = os.getenv('OPENROUTER_API_KEY')

if not api_key:
    print("‚ùå No API key found - please add OPENROUTER_API_KEY to your .env file!")
elif not api_key.startswith("sk-or-"):
    print("‚ö†Ô∏è API key found but doesn't start with sk-or-; check your key")
else:
    print("‚úÖ API key loaded successfully!")

# Initialize OpenAI client pointing to OpenRouter
openai = OpenAI(api_key=api_key, base_url="https://openrouter.ai/api/v1")
print("‚úÖ OpenAI client ready!")

‚úÖ API key loaded successfully!
‚úÖ OpenAI client ready!


### üåê Cell 4 ‚Äî Scrape the Website
This cell defines a function `scrape_health_packages()` that:
1. Sends an HTTP GET request to the DocOPD URL (pretending to be a Chrome browser using a `User-Agent` header)
2. Parses the raw HTML using **BeautifulSoup**
3. Removes junk elements like scripts, styles, images, navbars, and footers
4. Extracts only the **visible text** from the page
5. Truncates to 10,000 characters (enough to capture all package info)

It then calls this function on the CMC Vellore URL and prints a preview of the scraped content.

In [5]:
# Step 2: Scrape the CMC Vellore health packages page
# We use a custom scraper here (not the default one) to get MORE content (up to 10,000 chars)

def scrape_health_packages(url, max_chars=10000):
    """Fetch and clean webpage content, optimized for health package data."""
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
    }
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")
    
    title = soup.title.string if soup.title else "No title found"
    
    # Remove irrelevant elements
    if soup.body:
        for irrelevant in soup.body(["script", "style", "img", "input", "nav", "footer"]):
            irrelevant.decompose()
        text = soup.body.get_text(separator="\n", strip=True)
    else:
        text = ""
    
    return (title + "\n\n" + text)[:max_chars]

url = "https://www.docopd.com/en-in/lab/cmc-vellore"
cmc_content = scrape_health_packages(url)

print(f"üìÑ Fetched {len(cmc_content)} characters from CMC Vellore page")
print("\n--- Preview (first 500 chars) ---\n")
print(cmc_content[:500])

üìÑ Fetched 10000 characters from CMC Vellore page

--- Preview (first 500 chars) ---

Full Body Health Checkup Packages in CMC (Vellore) - DOCOPD in CMC (Vellore)

100% Safe & Hygienic
Home Sample Pick Up
View Reports Online
Free Doctor Consultation
Best Prices Guaranteed
Best Health Packages in CMC (Vellore)
65% Off
Basic Panel in CMC (Vellore)
Test
‚Ä¢ Include 83 Parameters
CBC,FBS,KFT,LFT,LIPID,TSH,URINE R/M
799
2270
Read More
Add to Cart
60% Off
Winter Plus Health Checkup in CMC (Vellore)
Test
‚Ä¢ Include 93 Parameters
Cbc,Esr,Fasting Sugar,Tsh,Lipid,Lipid,Thyroid,Iron Profile,Hb


### ü§ñ Cell 6 ‚Äî Send to LLM for Analysis
This is the core of the project! It:
1. Creates a **system prompt** ‚Äî tells the LLM to act as a medical info assistant and specifies exactly what output format we want (markdown table, star ratings, recommendations)
2. Creates a **user prompt** ‚Äî pastes the scraped website text and asks the LLM to extract packages and rate them
3. Sends both prompts to the **GPT model** via OpenRouter API
4. Displays the LLM's response as beautifully formatted markdown with tables, ratings, and recommendations

In [6]:
# Step 3: Send to LLM for analysis and summarization

system_prompt = """You are a helpful medical information assistant specializing in health checkup packages.

You will be given scraped text content from a hospital health packages webpage.
Your task is to:

1. Extract ALL health checkup packages mentioned on the page
2. Present them in a clean markdown table with these columns:
   - Package Name
   - Parameters/Tests Included (count)
   - Key Tests
   - Discounted Price (‚Çπ)
   - Original Price (‚Çπ)
   - Discount %
3. After the table, add a section rating each package:
   - ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê Best Value
   - ‚≠ê‚≠ê‚≠ê‚≠ê Good Value  
   - ‚≠ê‚≠ê‚≠ê Average
   - ‚≠ê‚≠ê Below Average
   - ‚≠ê Not Recommended
4. End with a "üí° Recommendation" section suggesting which package is best for:
   - A young healthy adult (routine checkup)
   - A senior citizen
   - Someone on a budget

Be thorough - don't miss any package. Use ‚Çπ symbol for all prices.
Respond in well-formatted markdown."""

user_prompt = f"""Here is the scraped content from CMC Vellore's health packages page on DocOPD.
Please extract all health packages, list their prices, and rate them:

{cmc_content}"""

print("ü§ñ Sending to LLM for analysis... (this may take a few seconds)")

response = openai.chat.completions.create(
    model="openai/gpt-oss-120b:free",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]
)

print("‚úÖ Response received!\n")
display(Markdown(response.choices[0].message.content))

ü§ñ Sending to LLM for analysis... (this may take a few seconds)
‚úÖ Response received!



## üìã Health Check‚Äëup Packages from CMC (Vellore) (DocOPD)

| # | Package Name | Parameters / Tests Included | Key Tests (representative) | Discounted Price (‚Çπ) | Original Price (‚Çπ) | Discount % |
|---|--------------|----------------------------|----------------------------|----------------------|--------------------|------------|
| 1 | **Basic Panel** | 83 | CBC, FBS, KFT, LFT, Lipid, TSH, Urine R/M | 799 | 2‚ÄØ270 | 65‚ÄØ% |
| 2 | **Winter Plus Health Checkup** | 93 | CBC, ESR, Fasting Sugar, TSH, Lipid, Thyroid, Iron Profile, HbA1c, Vitamin‚ÄØD, Vitamin‚ÄØB12, KFT, LFT, Urine R/M | 1‚ÄØ799 | 4‚ÄØ550 | 60‚ÄØ% |
| 3 | **Viral Fever Plan** | 50 | Complete Hemogram, Widal, MP, Urine R/M | 899 | 999 | 10‚ÄØ% |
| 4 | **Hormones Plan** | 4 | LH, FSH, Prolactin, TSH | 1‚ÄØ499 | 1‚ÄØ750 | 14‚ÄØ% |
| 5 | **Antenatal Profile Basic** | 34 | CBC, Blood Group & Rh, Blood Sugar, VDRL, HIV‚ÄØ1‚ÄØ&‚ÄØ2, HBsAg, Anti‚ÄëHCV, TSH, Urine R/M | 2‚ÄØ250 | 3‚ÄØ170 | 29‚ÄØ% |
| 6 | **Coagulation Profile** | 3 | BT‚ÄëCT, APTT, PT‚ÄëINR | 1‚ÄØ249 | 1‚ÄØ249 | 0‚ÄØ% |
| 7 | **CMV Antibodies Panel (IgG‚ÄØ&‚ÄØIgM)** | 1 | CMV IgG, CMV IgM | 1‚ÄØ063 | 1‚ÄØ063 | 0‚ÄØ% |
| 8 | **Hepatitis‚ÄØC Virus RNA Quantitative & Genotype Reflex Panel** | 1 | HCV RNA (quantitative) & genotype | 7‚ÄØ499 | 9‚ÄØ000 | 17‚ÄØ% |
| 9 | **Smart Full Body Checkup** | 82 | Comprehensive routine panel (CBC, LFT, KFT, Lipid, Thyroid, etc.) | 999 | 2‚ÄØ120 | 53‚ÄØ% |
|10| **Diabetes Plan** | 12 | Fasting Sugar, HbA1c, Lipid, Kidney profile, etc. | 999 | 1‚ÄØ299 | 23‚ÄØ% |
|11| **Sexual Wellness Plan** | 5 | Hormone profile, STI screening, Lipid, etc. | 1‚ÄØ899 | 2‚ÄØ450 | 22‚ÄØ% |
|12| **Fertility Panel ‚Äì Male** | 5 | Testosterone, LH, FSH, Prolactin, PSA | 2‚ÄØ499 | 2‚ÄØ550 | 2‚ÄØ% |
|13| **Fertility Panel ‚Äì Female (2)** | 7 | LH, FSH, Estradiol, Progesterone, Prolactin, TSH, AMH | 3‚ÄØ499 | 3‚ÄØ650 | 4‚ÄØ% |
|14| **Bride & Groom Panel** | 6 | CBC, Blood Group, HIV, VDRL, HBsAg, HCV | 2‚ÄØ999 | 3‚ÄØ300 | 9‚ÄØ% |
|15| **Kids Plan** | 29 | CBC, Growth hormones, Immunisation titers, Lipid, Thyroid, etc. | 1‚ÄØ399 | 2‚ÄØ450 | 43‚ÄØ% |
|16| **Smart‚ÄØPlus Full Body Checkup** | 85 | All tests of Smart Full Body + extra (Bone health, Vitamin‚ÄØD, Tumor markers, etc.) | 1‚ÄØ349 | 2‚ÄØ820 | 52‚ÄØ% |
|17| **Arthritis Profile‚Äë1** | 28 | Rheumatoid factor, Anti‚ÄëCCP, ESR, CRP, Uric acid, Joint X‚Äëray markers | 1‚ÄØ249 | 1‚ÄØ700 | 27‚ÄØ% |
|18| **Iron Study ‚Äì 2** | 4 | Serum Iron, Ferritin, Total Iron‚ÄëBinding Capacity, Transferrin Saturation | 1‚ÄØ249 | 1‚ÄØ400 | 11‚ÄØ% |
|19| **Bone & Joints Strength Package** | 34 | Calcium, Vitamin‚ÄØD, Vitamin‚ÄØK, Bone turnover markers, Joint health panel | 4‚ÄØ399 | 4‚ÄØ680 | 6‚ÄØ% |

> **Note:** ‚ÄúKey Tests‚Äù lists the most representative investigations mentioned in the package description. For packages that simply state ‚ÄúIncludes X tests‚Äù the column shows a generic description of the type of tests covered.

---

## ‚≠êÔ∏è Package Ratings  

| Rating | Meaning |
|--------|---------|
| ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê | **Best Value** ‚Äì high number of tests + high discount + affordable price |
| ‚≠ê‚≠ê‚≠ê‚≠ê | **Good Value** ‚Äì solid test coverage and decent discount |
| ‚≠ê‚≠ê‚≠ê | **Average** ‚Äì moderate discount or limited test set |
| ‚≠ê‚≠ê | **Below Average** ‚Äì low discount or high price for the number of tests |
| ‚≠ê | **Not Recommended** ‚Äì poor price‚Äëto‚Äëtest ratio |

| # | Package Name | Rating |
|---|--------------|--------|
| 1 | Basic Panel | ‚≠ê‚≠ê‚≠ê‚≠ê |
| 2 | Winter Plus Health Checkup | ‚≠ê‚≠ê‚≠ê |
| 3 | Viral Fever Plan | ‚≠ê‚≠ê |
| 4 | Hormones Plan | ‚≠ê‚≠ê‚≠ê |
| 5 | Antenatal Profile Basic | ‚≠ê‚≠ê‚≠ê |
| 6 | Coagulation Profile | ‚≠ê |
| 7 | CMV Antibodies Panel | ‚≠ê |
| 8 | Hepatitis‚ÄØC RNA Panel | ‚≠ê‚≠ê |
| 9 | **Smart Full Body Checkup** | **‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê** |
|10| Diabetes Plan | ‚≠ê‚≠ê‚≠ê |
|11| Sexual Wellness Plan | ‚≠ê‚≠ê‚≠ê |
|12| Fertility Panel ‚Äì Male | ‚≠ê‚≠ê |
|13| Fertility Panel ‚Äì Female (2) | ‚≠ê‚≠ê |
|14| Bride & Groom Panel | ‚≠ê‚≠ê |
|15| **Kids Plan** | ‚≠ê‚≠ê‚≠ê‚≠ê |
|16| **Smart‚ÄØPlus Full Body Checkup** | **‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê** |
|17| Arthritis Profile‚Äë1 | ‚≠ê‚≠ê‚≠ê |
|18| Iron Study ‚Äì 2 | ‚≠ê‚≠ê |
|19| Bone & Joints Strength Package | ‚≠ê‚≠ê |

**Why the top‚Äërated packages got ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê**

- **Smart Full Body Checkup** ‚Äì 82 tests for just ‚Çπ999 (53‚ÄØ% off) ‚Äì excellent breadth at a pocket‚Äëfriendly price.  
- **Smart‚ÄØPlus Full Body Checkup** ‚Äì 85 tests for ‚Çπ1‚ÄØ349 (52‚ÄØ% off) ‚Äì even more comprehensive while still under ‚Çπ1.5‚ÄØk.

---

## üí° Recommendation  

| Target Group | Suggested Package | Why |
|--------------|-------------------|-----|
| **Young healthy adult (routine check‚Äëup)** | **Smart Full Body Checkup** (‚Çπ999) | Covers all essential routine labs (CBC, LFT, KFT, lipid, thyroid, vitamin‚ÄØD, etc.) at the best price‚Äëto‚Äëtest ratio. |
| **Senior citizen** | **Winter Plus Health Checkup** (‚Çπ1‚ÄØ799) **or** **Smart‚ÄØPlus Full Body Checkup** (‚Çπ1‚ÄØ349) | Winter Plus adds ESR, iron profile, HbA1c, vitamin‚ÄØD/B12 ‚Äì useful for age‚Äërelated monitoring. Smart‚ÄØPlus adds bone‚Äëhealth markers, which are important for seniors. |
| **Someone on a budget** | **Basic Panel** (‚Çπ799) or **Kids Plan** (‚Çπ1‚ÄØ399) | Basic Panel gives the core 83 parameters at the lowest absolute cost. Kids Plan is the most economical full‚Äëfamily option with 29 tests for ‚Çπ1‚ÄØ399. |

> **Tip:** If you need a specific focus (e.g., diabetes, fertility, joint health), pick the dedicated plan that matches the clinical need; otherwise, the ‚ÄúSmart‚Äù full‚Äëbody packages give the most comprehensive coverage for the money.

## üîç Bonus: Try a different URL!

You can use this summarizer for any hospital on DocOPD. Just change the URL below!

### üéØ Cell 9 ‚Äî Bonus: Try Any Hospital
This cell lets you enter a **custom URL** from DocOPD (e.g., a different hospital or city). It reuses the same `scrape_health_packages()` function and `system_prompt` from earlier cells, so you can instantly get a package summary for any hospital on the platform. Press Enter to skip if you don't want to try another URL.

In [None]:
# Bonus: Try any other hospital or lab - just change the URL!

custom_url = input("Enter a DocOPD health packages URL (or press Enter to skip): ").strip()

if custom_url:
    custom_content = scrape_health_packages(custom_url)
    print(f"üìÑ Fetched {len(custom_content)} characters\n")
    
    response = openai.chat.completions.create(
        model="openai/gpt-oss-120b:free",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": f"Extract and rate all health packages from this page:\n\n{custom_content}"}
        ]
    )
    display(Markdown(response.choices[0].message.content))
else:
    print("Skipped! You can run this cell again anytime.")