## Handling with PDF

In [1]:
!pip install pymupdf




[notice] A new release of pip is available: 24.0 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [1]:
import fitz

def extract_text(pdf_path):
    doc = fitz.open(pdf_path)
    text = ""
    for page in doc:
        text += page.get_text()
    return text;

In [2]:
pdf_path = "sample_report.pdf"
text = extract_text(pdf_path)
print(text)

Report Status    
Male
25 Years
:
:
:
:
Age
Gender
Reported        
P
7/11/2023  11:08:00AM
SELF
WM17SPF
Mr.   DUMMY
:
:
:
:
:
Name        
Lab No.    
Ref By 
Collected       
A/c Status 
17/1/2024  10:55:48AM
Revised
Collected at    :
Processed at      :
LPL-ROHINI (NATIONAL REFERENCE LAB)
National Reference laboratory, Block E, Sector 
18, ROHINI
DELHI 110085
LPL-NATIONAL REFERENCE LAB
National Reference laboratory, Block E, 
Sector 18, Rohini, New Delhi -110085
Test Report 
Test Name
Results
Units
Bio. Ref. Interval
SWASTHFIT SUPER 4
LIVER & KIDNEY PANEL, SERUM
1.00
Creatinine
(Modified Jaffe,Kinetic)
 0.70 - 1.30 
mg/dL
107
GFR Estimated
(CKD EPI Equation 2021)
 >59 
mL/min/1.73m2
G1
GFR Category
(KDIGO Guideline 2012)
  
40.00
Urea
(Urease UV)
 13.00 - 43.00 
mg/dL
18.68
Urea Nitrogen  Blood
(Calculated)
 6.00 - 20.00 
mg/dL
19
BUN/Creatinine Ratio
(Calculated)
  
7.00
Uric Acid
(Uricase)
 3.50 - 7.20 
mg/dL
30.0
AST (SGOT)
(IFCC without P5P)
 15.00 - 40.00 
U/L
40.0
ALT (SGPT)
(

In [3]:
def clean_text(text):
    return "\n".join([line.strip() for line in text.splitlines() if line.strip()])
text = clean_text(text)
print(text)

Report Status
Male
25 Years
:
:
:
:
Age
Gender
Reported
P
7/11/2023  11:08:00AM
SELF
WM17SPF
Mr.   DUMMY
:
:
:
:
:
Name
Lab No.
Ref By
Collected
A/c Status
17/1/2024  10:55:48AM
Revised
Collected at    :
Processed at      :
LPL-ROHINI (NATIONAL REFERENCE LAB)
National Reference laboratory, Block E, Sector
18, ROHINI
DELHI 110085
LPL-NATIONAL REFERENCE LAB
National Reference laboratory, Block E,
Sector 18, Rohini, New Delhi -110085
Test Report
Test Name
Results
Units
Bio. Ref. Interval
SWASTHFIT SUPER 4
LIVER & KIDNEY PANEL, SERUM
1.00
Creatinine
(Modified Jaffe,Kinetic)
0.70 - 1.30
mg/dL
107
GFR Estimated
(CKD EPI Equation 2021)
>59
mL/min/1.73m2
G1
GFR Category
(KDIGO Guideline 2012)
40.00
Urea
(Urease UV)
13.00 - 43.00
mg/dL
18.68
Urea Nitrogen  Blood
(Calculated)
6.00 - 20.00
mg/dL
19
BUN/Creatinine Ratio
(Calculated)
7.00
Uric Acid
(Uricase)
3.50 - 7.20
mg/dL
30.0
AST (SGOT)
(IFCC without P5P)
15.00 - 40.00
U/L
40.0
ALT (SGPT)
(IFCC without P5P)
10.00 - 49.00
U/L
50.0
GGTP
(IFCC)
0

## Initializing LLM

In [4]:
with open("apikey.txt", "r") as file:
    api_key = file.read().strip()

print("API Key loaded successfully ✅")

API Key loaded successfully ✅


In [5]:
import os
from langchain_groq import ChatGroq

try:
    groq_api_key = os.getenv('GROQ_API_KEY', api_key)

    llm = ChatGroq(
        temperature=0,
        groq_api_key=groq_api_key,
        model_name="llama-3.3-70b-versatile" 
    )

    # response = llm.invoke("Tell me about Nike in 50 words")
    # print(response.content)

except ModuleNotFoundError:
    print("Error: 'langchain-groq' package not found. Install it using 'pip install langchain-groq'.")
except Exception as e:
    print(f"Error: {e}")

In [6]:
from langchain.prompts import PromptTemplate
prompt_extract = PromptTemplate.from_template(
    """###
    MEDICAL REPORT TEXT:
    {page_data}
    
    ### INSTRUCTION:
    The above text is extracted from a medical lab report. Your tasks are:
    1. Extract important test values such as: Hemoglobin, WBC, RBC, Platelet Count, Blood Sugar, etc.
    2. For each value, identify if it's 'Low', 'Normal', or 'High' based on standard medical ranges (for adults).
    3. If any value is low, suggest 2–3 **organic food items** that can help improve it naturally give name of fruits or veges or anything you tell.
    4. If a value is good, give a small **compliment or motivational message**.
    5. Finally, give an **overall health summary** based on the report.
    6. Also tell what does this test does eg what will happen if wbc is low in the json itself
    7. Also at top mention the details of patient

    ### FORMAT (VALID JSON ONLY):
    {{
        "tests": [
            {{
                "name": "WBC",
                "value": "3.2",
                "status": "Low",
                "suggestion": ["Citrus fruits", "Garlic", "Green tea"],
                "comment": "Consider boosting your immunity with the suggested foods."
            }},
            {{
                "name": "Hemoglobin",
                "value": "14.2",
                "status": "Normal",
                "suggestion": [],
                "comment": "Excellent hemoglobin level! Keep it up!"
            }}
        ],
        "summary": "Most parameters are in a healthy range. WBC is slightly low, so immune-boosting foods are recommended."
    }}
    ### OUTPUT:
    """
)
chain_extract = prompt_extract | llm
res = chain_extract.invoke(input={"page_data": text})
print(res.content)

```json
{
    "patientDetails": {
        "name": "Mr. DUMMY",
        "age": 25,
        "gender": "Male"
    },
    "tests": [
        {
            "name": "Hemoglobin",
            "value": 15.0,
            "status": "Normal",
            "suggestion": [],
            "comment": "Excellent hemoglobin level! Keep it up!",
            "description": "Hemoglobin is a protein in red blood cells that carries oxygen to different parts of the body. Low hemoglobin levels can lead to anemia, while high levels can indicate dehydration or other conditions."
        },
        {
            "name": "WBC (White Blood Cell Count)",
            "value": 5.0,
            "status": "Low",
            "suggestion": ["Citrus fruits", "Garlic", "Green tea"],
            "comment": "Consider boosting your immunity with the suggested foods. Low WBC count can increase the risk of infections.",
            "description": "White blood cells are part of the immune system, helping to fight infections. A low

In [8]:
from langchain_core.output_parsers import JsonOutputParser

json_parser = JsonOutputParser()
json_res = json_parser.parse(res.content)
med_report = json_res

In [9]:
med_report

{'patientDetails': {'name': 'Mr. DUMMY', 'age': 25, 'gender': 'Male'},
 'tests': [{'name': 'Hemoglobin',
   'value': 15.0,
   'status': 'Normal',
   'suggestion': [],
   'comment': 'Excellent hemoglobin level! Keep it up!',
   'description': 'Hemoglobin is a protein in red blood cells that carries oxygen to different parts of the body. Low hemoglobin levels can lead to anemia, while high levels can indicate dehydration or other conditions.'},
  {'name': 'WBC (White Blood Cell Count)',
   'value': 5.0,
   'status': 'Low',
   'suggestion': ['Citrus fruits', 'Garlic', 'Green tea'],
   'comment': 'Consider boosting your immunity with the suggested foods. Low WBC count can increase the risk of infections.',
   'description': 'White blood cells are part of the immune system, helping to fight infections. A low WBC count can make you more susceptible to illnesses.'},
  {'name': 'RBC (Red Blood Cell Count)',
   'value': 5.0,
   'status': 'Normal',
   'suggestion': [],
   'comment': 'Great RBC c

In [10]:
prompt_summary = PromptTemplate.from_template(
    """
    ### MEDICAL REPORT JSON:
    {med_report}
    
    ### INSTRUCTION:
    The above is structured data extracted from a medical lab report.
    Write a clear and structured health summary based on the report.
    Use a professional yet friendly tone.
  
    Your response should include:
    1. A short introduction acknowledging the report.
    2. Bullet-point summary of key test results (with value + status: Low, Normal, High).
    3. For low or high values:
       - Provide a short explanation of what it means.
       - Recommend 4-5 organic food items to improve it. Also tell what not to eat, in the bullet point itself
    4. For normal/good values:
       - Give a short positive comment or compliment.
    5. A final paragraph with a brief overall health analysis or advice.
    also from med_report mention descrption while telling
    also give a line break after each line
    ### OUTPUT FORMAT (DO NOT RETURN JSON OR MARKDOWN):
    - Just clean and structured text.
    - No preambles.
    - No code blocks.

    6.Also at the top mention the name of patient details like name age etc which is available and the name of summarizer i.e Luv Nichat and the time of summarization and the caution that this is for general purpose and not real
    """
)

chain_report = prompt_summary | llm

res = chain_report.invoke({"med_report": str(med_report)})

print(res.content)

Patient Name: Mr. DUMMY
Age: 25
Gender: Male
Summarized by: Luv Nichat
Time of Summarization: Current Time
Caution: This summary is for general purposes only and should not be considered as real medical advice.

We have received your medical lab report, and based on the results, we have prepared a summary of your key test results.

 
- Hemoglobin: 15.0, Status: Normal
  Great hemoglobin level, keep it up, Hemoglobin is a protein in red blood cells that carries oxygen to different parts of the body, Low hemoglobin levels can lead to anemia, while high levels can indicate dehydration or other conditions.

 
- WBC (White Blood Cell Count): 5.0, Status: Low
  A low WBC count can make you more susceptible to illnesses, White blood cells are part of the immune system, helping to fight infections, to improve this, consider eating organic foods like Citrus fruits, Garlic, Green tea, and avoid processed foods, sugary drinks.

 
- RBC (Red Blood Cell Count): 5.0, Status: Normal
  Great RBC count

In [11]:
from reportlab.lib.pagesizes import A4
from reportlab.pdfgen import canvas
from reportlab.lib.units import inch
from reportlab.lib import colors
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.platypus import Paragraph, SimpleDocTemplate, Spacer, Frame, PageTemplate

In [12]:
from reportlab.lib.pagesizes import A4
from reportlab.pdfgen import canvas
from reportlab.lib.units import inch
from reportlab.lib import colors
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.platypus import Paragraph, SimpleDocTemplate, Spacer, Frame, PageTemplate

def create_pdf(text, filename="output.pdf"):
    # Create a document template
    doc = SimpleDocTemplate(filename, pagesize=A4,
                            rightMargin=72, leftMargin=72,
                            topMargin=72, bottomMargin=72)
    
    styles = getSampleStyleSheet()
    
    # Customize style
    style = ParagraphStyle(
        name="CustomStyle",
        parent=styles["Normal"],
        fontName="Helvetica",
        fontSize=12,
        leading=18,
        textColor=colors.HexColor("#333333"),
        spaceAfter=12,
    )
    
    # Split into paragraphs
    content = []
    for para in text.split("\n\n"):
        content.append(Paragraph(para.strip().replace("\n", "<br/>"), style))
        content.append(Spacer(1, 12))
    
    # Build PDF
    doc.build(content)
    print(f"PDF created: {filename}")

In [13]:
create_pdf(res.content, "report_summary0.pdf")

PDF created: report_summary0.pdf
