<a href="https://colab.research.google.com/github/2303a52449/GEN_AI_PROJECT/blob/main/GEN_AI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Install dependencies
!pip install pdfplumber transformers spacy -q
!python -m spacy download en_core_web_sm

import pdfplumber
import spacy
from transformers import pipeline
from google.colab import files
from IPython.display import Markdown, display
import re

# Load spaCy NLP model
nlp = spacy.load("en_core_web_sm")

# Upload PDF file
uploaded = files.upload()

# Extract text from PDF
pdf_text = ""
for filename in uploaded:
    with pdfplumber.open(filename) as pdf:
        for page in pdf.pages:
            page_text = page.extract_text()
            if page_text:
                pdf_text += page_text + "\n"

# Summarizer pipeline using BART
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Chunk text (because transformer models have token limits)
def split_text(text, max_len=1024):
    sentences = re.split(r'(?<=[.])\s+', text)
    chunks, chunk = [], ""
    for sentence in sentences:
        if len(chunk) + len(sentence) < max_len:
            chunk += " " + sentence
        else:
            chunks.append(chunk.strip())
            chunk = sentence
    if chunk:
        chunks.append(chunk.strip())
    return chunks

chunks = split_text(pdf_text)
summaries = [summarizer(chunk, max_length=130, min_length=30, do_sample=False)[0]['summary_text'] for chunk in chunks]
full_summary = " ".join(summaries)

# Extract sections
def extract_section(summary, keywords):
    pattern = '|'.join([re.escape(k) for k in keywords])
    matches = re.findall(rf"((?:[^.]\b(?:{pattern})\b[^.]\.)+)", summary, re.IGNORECASE)
    return " ".join(matches) if matches else "Not found."

# Section-specific keywords
attention_keywords = ['emergency', 'critical', 'immediate', 'urgent', 'unstable']
previous_keywords = ['treated', 'underwent', 'received', 'was diagnosed', 'previously']
suggested_keywords = ['recommend', 'advise', 'plan', 'consider', 'suggested', 'prescribed']

# Extract sections from the summary
immediate_attention = extract_section(full_summary, attention_keywords)
previous_treatment = extract_section(full_summary, previous_keywords)
suggested_treatment = extract_section(full_summary, suggested_keywords)

# Highlight key medical terms
def highlight_terms(text, terms):
    for term in terms:
        pattern = re.compile(re.escape(term), re.IGNORECASE)
        text = pattern.sub(f"{term.upper()}", text)
    return text

medical_terms = ['cancer', 'tumor', 'diabetes', 'hypertension', 'stroke', 'infection', 'surgery', 'chemotherapy']
highlighted_summary = highlight_terms(full_summary, medical_terms)

# Display results
display(Markdown("## 🧠 AI BASED MEDICAL REPORT SUMMARIZER"))
display(Markdown("### 🔴 ISSUE THAT NEEDS IMMEDIATE ATTENTION"))
display(Markdown(immediate_attention))

display(Markdown("### 🕓 PREVIOUSLY RECEIVED TREATMENT"))
display(Markdown(previous_treatment))

display(Markdown("### 💊 SUGGESTED TREATMENT"))
display(Markdown(suggested_treatment))

display(Markdown("### 🌟 HIGHLIGHTED KEY CONTENT"))
display(Markdown(highlighted_summary[:3000]))  # limit preview

Collecting en-core-web-sm==3.8.0
  Using cached https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.




Saving Sample 1.pdf to Sample 1.pdf


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu
Your max_length is set to 130, but your input_length is only 106. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=53)


## 🧠 AI BASED MEDICAL REPORT SUMMARIZER

### 🔴 ISSUE THAT NEEDS IMMEDIATE ATTENTION

Not found.

### 🕓 PREVIOUSLY RECEIVED TREATMENT

Not found.

### 💊 SUGGESTED TREATMENT

Not found.

### 🌟 HIGHLIGHTED KEY CONTENT

Robert Williams, a 58-year-old male, presents with a complex history of multiple chronic conditions. He has Type 2 DIABETES Mellitus, HYPERTENSION, Chronic Kidney Disease (Stage 3), Coronary Artery Disease, and Obesity with a BMI of 34. Two years ago, he was diagnosed with chronic autoimmunekidney disease. He was maintained on dual antiplatelet therapy with aspirin and clopidogrel, along withbeta-blockers and statins for secondary prevention. His blood pressure remains uncontrolled at 160/95 mmHg despite current therapy. Mr. Williams will be transitioned to a basal insulin pump to improve glycemic control. A GLP-1 receptors agonist like Semaglutide will be added to address both weight and blood sugar. Antihypertensive therapy will be intensified. The patient was counseled extensively on the importance of strict medication adherence, diet, and exercise. Family involvement was encouraged to support compliance with the management plan. Mr. Williams’ case is critical and demands urgent, coordinated intervention to prevent serious complications and improve overall prognosis.