# Lab 06: Working with DeepSeek

This notebook explores the Document Analysis capabilities of DeepSeek models

In [21]:
#!pip install PDFReader
import PyPDF2

# Function to read PDF file
from PyPDF2 import PdfReader

import re
import os
from openai import OpenAI
from dotenv import load_dotenv
from IPython.display import display, Markdown, HTML, Image

In [22]:
import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

endpoint = "https://models.inference.ai.azure.com"
model_name = "DeepSeek-R1"
token = os.environ["githubtoken"]

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(token),
)



In [28]:
structured_prompt = (
    "<instructions>"
    "You are an expert in medical data analysis and interpretation. "
    "Based on the provided biomarker report—which includes a structured dataset of 40 biomarkers and an unstructured note detailing the patient's background, symptoms, and lifestyle factors—perform the following tasks:\n"
    "1. Extract the key abnormal or noteworthy biomarkers, including their values and reference ranges.\n"
    "2. Identify any patterns or potential clinical correlations (e.g., high cholesterol, elevated inflammatory markers) based on the report.\n"
    "3. Propose a possible set of next steps or recommendations for further investigation or management, using only the information presented in the biomarker data.\n"
    "4. Present your findings in a detailed plain-text format, using simple language as if explaining to a non-medical person, also suggest a diet and exercize recommendations.\n"
    "Only use the information provided in the input data.\n"
    "Create a detailed report in nicely formatted HTML.\n"
    "</instructions>"
)
print(structured_prompt)


<instructions>You are an expert in medical data analysis and interpretation. Based on the provided biomarker report—which includes a structured dataset of 40 biomarkers and an unstructured note detailing the patient's background, symptoms, and lifestyle factors—perform the following tasks:
1. Extract the key abnormal or noteworthy biomarkers, including their values and reference ranges.
2. Identify any patterns or potential clinical correlations (e.g., high cholesterol, elevated inflammatory markers) based on the report.
3. Propose a possible set of next steps or recommendations for further investigation or management, using only the information presented in the biomarker data.
4. Present your findings in a detailed plain-text format, using simple language as if explaining to a non-medical person, also suggest a diet and exercize recommendations.
Only use the information provided in the input data.
Create a detailed report in nicely formatted HTML.
</instructions>


In [29]:
biomarker_report = PdfReader("Data/Biomarker Report.pdf")
# Function to read JSON file

pdf_text = ""
for page in biomarker_report.pages:
    pdf_text += page.extract_text() + "\n"

print(pdf_text)


Biomarker Report  
Patient Name:  Jane Doe (Fictitious)  
Date of Birth:  02/15/1985 (Fictitious)  
Date of Report:  02/24/2025 (Fictitious)  
Sample ID:  JD-CP-0040  
1. Comprehensive Biomarker Panel  
# Biomarker  Result  Reference Range  Note  
1 Red Blood Cell (RBC) Count  4.4 x10^6/µL  4.0–5.5 x10^6/µL  Normal  
2 Hemoglobin (HGB)  13.8 g/dL  12.0–15.5 g/dL  Normal  
3 Hematocrit (HCT)  41 %  36–46 %  Normal  
4 Mean Corpuscular Volume (MCV)  90 fL  80–100 fL  Normal  
5 Mean Corpuscular Hemoglobin (MCH)  30 pg  27–33 pg  Normal  
6 Mean Corpuscular Hemoglobin Conc. 
(MCHC)  34 g/dL  32–36 g/dL  Normal  
7 Red Cell Distribution Width (RDW)  13 %  11–15 %  Normal  
8 White Blood Cell (WBC) Count  7.0 x10^3/µL  4.0–11.0 x10^3/µL  Normal  
9 Neutrophils  60 %  40–70 %  Normal  
10 Lymphocytes  30 %  20–40 %  Normal  
11 Monocytes  7 %  2–10 %  Normal  
12 Eosinophils  2 %  1–5 %  Normal  
13 Basophils  1 %  0–2 %  Normal  
14 Platelet Count  220 x10^3/µL  150–400 x10^3/µL  Normal  
1

In [30]:
response = client.complete(
    messages=[
        UserMessage(content= structured_prompt + pdf_text),
    ],
    model=model_name
)

#print(response.choices[0].message.content)

In [31]:
display(HTML('<div style="background-color: #f0fff8; padding: 10px; border-radius: 5px; border: 1px solid #d3d3d3;"></hr><h2>🔽 &nbsp; Markdown Output – Beginning</h2></hr></div>'))
display(Markdown(response.choices[0].message.content))
display(HTML('<div style="background-color: #fff4f4; padding: 10px; border-radius: 5px; border: 1px solid #d3d3d3;"></hr><h2>🔼 &nbsp; Markdown Output – End</h2></hr></div>'))

<think>
Okay, let me start by reading through the user's query and the provided biomarker report. The user wants an expert analysis of Jane Doe's medical data, including extracting abnormal biomarkers, identifying patterns, suggesting next steps, and providing lifestyle advice. They want this in plain text but formatted as HTML.

First, I need to parse the biomarker report. Looking through the 40 biomarkers, most are within normal ranges. The key ones flagged are Total Cholesterol, LDL, HDL, Triglycerides, and CRP. Each has their reference ranges and notes indicating they're high or low.

Total Cholesterol is 250 (ref <200), LDL 160 (ref <100), HDL 45 (ref >50 for females), Triglycerides 180 (ref <150), and CRP at 3 (ref <3). These are the abnormal values. The others are all normal.

Next, identifying patterns. High LDL and Total Cholesterol point to hyperlipidemia, which increases cardiovascular risk. Low HDL adds to that. Triglycerides being mildly elevated is another lipid issue. CRP at the upper limit suggests possible inflammation, which could be related to cardiovascular risk or other inflammation sources.

For next steps, since the lipid panel is the main issue, a lipid profile retest makes sense. Also, checking inflammation causes with more tests like hs-CRP or ESR. Maybe liver function tests since ALT is at the upper normal range, but the note says it's normal. Wait, ALT is 30, which is within 7-35, so actually normal. So maybe not necessary. Focus on lipids and inflammation.

Diet recommendations should target lowering LDL and triglycerides. Increase fiber, omega-3s, reduce saturated fats, sugars. Exercise advice: aerobic and resistance training to improve HDL and lower LDL. Weight management if needed, but the report doesn't mention weight.

Potential clinical correlations: High cholesterol (hyperlipidemia), inflammation, increased cardiovascular risk.

Need to make sure all info only comes from the data provided. No other assumptions. Present in simple language, HTML formatted. Avoid medical jargon as much as possible, but still accurate.

Check if all the abnormal biomarkers are correctly extracted. CRP is exactly at 3, which is the upper limit, so it's noted. HDL is slightly below female reference. Triglycerides are mildly elevated. LDL is elevated.

Possible next steps include confirming lipid levels, assessing inflammation further, and lifestyle changes. No mention of diabetes or kidney issues since glucose, HbA1c, eGFR are normal.

Diet: Emphasize whole grains, vegetables, healthy fats like avocado, nuts. Avoid trans fats, high-sugar foods. Exercise: 150 mins moderate aerobic, plus strength training.

Make sure the HTML is properly formatted with sections for each task, using headings, lists, maybe bold text for important points. Keep paragraphs concise. No markdown, just HTML tags.
</think>

```html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Biomarker Analysis Report for Jane Doe</title>
    <style>
        body { font-family: Arial, sans-serif; line-height: 1.6; padding: 20px; }
        h2 { color: #2c3e50; }
        .section { margin-bottom: 25px; }
        ul { list-style-type: none; padding-left: 0; }
        li { margin-bottom: 8px; }
        .highlight { color: #e74c3c; font-weight: bold; }
    </style>
</head>
<body>
    <h1>Biomarker Analysis Report for Jane Doe</h1>

    <div class="section">
        <h2>🔍 Key Abnormal Biomarkers</h2>
        <ul>
            <li><span class="highlight">Total Cholesterol:</span> 250 mg/dL (Normal: &lt;200 mg/dL)</li>
            <li><span class="highlight">LDL ("Bad" Cholesterol):</span> 160 mg/dL (Normal: &lt;100 mg/dL)</li>
            <li><span class="highlight">HDL ("Good" Cholesterol):</span> 45 mg/dL (Normal for women: &gt;50 mg/dL)</li>
            <li><span class="highlight">Triglycerides:</span> 180 mg/dL (Normal: &lt;150 mg/dL)</li>
            <li><span class="highlight">C-Reactive Protein (CRP):</span> 3 mg/L (Normal: &lt;3 mg/L)</li>
        </ul>
    </div>

    <div class="section">
        <h2>🧩 Patterns & Potential Concerns</h2>
        <ul>
            <li>⚠️ <strong>High Cholesterol:</strong> Elevated LDL and total cholesterol with low HDL suggest increased risk for heart disease.</li>
            <li>💡 <strong>Inflammation Marker:</strong> CRP at the upper limit of normal may indicate low-grade inflammation.</li>
            <li>🔗 <strong>Combined Risk:</strong> High triglycerides paired with low HDL further increase cardiovascular risk.</li>
        </ul>
    </div>

    <div class="section">
        <h2>🩺 Recommended Next Steps</h2>
        <ul>
            <li><strong>Repeat Lipid Panel:</strong> Confirm cholesterol levels with a follow-up test in 3 months.</li>
            <li><strong>Advanced Inflammation Testing:</strong> Consider hs-CRP or ESR to assess inflammation.</li>
            <li><strong>Cardiac Risk Assessment:</strong> Discuss Framingham Risk Score or coronary calcium scoring with your doctor.</li>
        </ul>
    </div>

    <div class="section">
        <h2>🥗 Diet & Lifestyle Recommendations</h2>
        <h3>Dietary Changes:</h3>
        <ul>
            <li>✅ Increase soluble fiber (oats, beans, apples)</li>
            <li>✅ Choose healthy fats (avocados, olive oil, fatty fish)</li>
            <li>❌ Reduce saturated fats (red meat, butter, fried foods)</li>
            <li>❌ Limit added sugars and refined carbs</li>
        </ul>

        <h3>Exercise Plan:</h3>
        <ul>
            <li>💪 <strong>Aerobic Exercise:</strong> 150 mins/week moderate activity (brisk walking, cycling)</li>
            <li>🏋️ <strong>Strength Training:</strong> 2 days/week (bodyweight exercises, resistance bands)</li>
            <li>🧘 <strong>Stress Management:</strong> Include yoga or meditation for inflammation reduction</li>
        </ul>
    </div>

    <div class="section">
        <h2>🔬 Important Notes</h2>
        <p>All other biomarkers (blood cells, liver/kidney function, blood sugar) are within normal ranges. No immediate concerns were found outside the cholesterol and inflammation markers highlighted above. Always consult your healthcare provider before making significant lifestyle changes.</p>
    </div>
</body>
</html>
```

In [32]:

# Assuming response.choices[0].message.content contains the HTML content
html_content_deepseek = response.choices[0].message.content
# Extract the HTML content by removing the <think> tags and their content

html_content_deepseek = re.sub(r'<think>.*?</think>', '', html_content_deepseek, flags=re.DOTALL).strip()

# Define the file path where you want to save the HTML content
file_path = 'output_deepseek.html'

# Write the HTML content to the file
with open(file_path, 'w', encoding='utf-8') as file:
    file.write(html_content_deepseek)

print(f"HTML content has been written to {file_path}")

HTML content has been written to output_deepseek.html
