# 🩺 Introduction to Health-LLM  

## 📌 Overview  
**Health-LLM** is a specialized large language model designed to **predict and analyze health measures** using **wearable sensor data** and **user demographics**. Developed by **MIT Media Lab**, this model enhances traditional LLMs by integrating physiological signals such as **heart rate, sleep patterns, activity levels, and stress indicators** to provide **personalized health assessments and insights**.  

🔗 **GitHub Repository**: [Health-LLM on GitHub](https://github.com/mitmedialab/Health-LLM/tree/main)  
📄 **Research Paper**: [Health-LLM Paper (PDF)](https://github.com/mitmedialab/Health-LLM/blob/main/pdf/paper.pdf)  

---

## 🛠 How Was Health-LLM Trained?  

Health-LLM was trained using a **fine-tuning approach** on multiple **public health datasets** to specialize in **consumer health prediction tasks**. The training process involved adapting **general-purpose LLMs** to better interpret **physiological and behavioral data**, improving their ability to generate **health insights and risk assessments**.  

### **1️⃣ Base Models Used**
Health-LLM fine-tunes existing large language models, including:  
- **GPT-3.5 / GPT-4** (OpenAI)  
- **Gemini-Pro** (Google)  
- **MedAlpaca** (Stanford Medicine)  

These models were **not originally trained for health predictions**, so Health-LLM **fine-tunes them on specialized medical datasets** to improve their performance in **health-related tasks**.  

### **2️⃣ Training Data**  
Health-LLM was trained on **four major health datasets**, each containing **physiological and behavioral data** from real-world users:  

| **Dataset** | **Tasks Covered** |
|------------|------------------|
| **PMData** | Fatigue, Stress, Readiness, Sleep Quality |
| **LifeSnaps** | Stress Resilience, Sleep Disorder |
| **GLOBEM** | Anxiety, Depression |
| **AW FB** | Calories, Activity |

These datasets allow the model to **learn human health patterns** and make **more accurate, personalized health predictions**.  

---

## 🚀 How to Use Health-LLM  

Unlike models hosted on **Hugging Face** or accessible via an **API**, **Health-LLM must be run locally**. To use it:  

- **Clone the GitHub repository** and install the required dependencies.  
- **Run the inference script locally** to input health-related queries.  
- **Ensure you have sufficient GPU memory** for efficient execution, or modify the settings for CPU compatibility.  

Health-LLM is designed for **customizable, offline health predictions**, making it ideal for **privacy-focused and research applications**.  



In [3]:

!pip install -q transformers torch

# ✅ Verify installation
import torch  # PyTorch - Required for handling tensors and running models on GPU
import transformers  # Hugging Face Transformers library for loading pre-trained models

print("✅ Transformers and Torch installed successfully!")


✅ Transformers and Torch installed successfully!


In [1]:
!git clone https://github.com/mitmedialab/Health-LLM.git

%cd Health-LLM

!pip install -r requirements.txt


fatal: destination path 'Health-LLM' already exists and is not an empty directory.
/content/Health-LLM
Collecting git+https://github.com/huggingface/peft.git (from -r requirements.txt (line 8))
  Cloning https://github.com/huggingface/peft.git to /tmp/pip-req-build-kb77m4_v
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft.git /tmp/pip-req-build-kb77m4_v
  Resolved https://github.com/huggingface/peft.git to commit d3ff1334a7fec4f4f17ed28a5a6cbcb048645cf1
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting git+https://github.com/huggingface/transformers.git (from -r requirements.txt (line 9))
  Cloning https://github.com/huggingface/transformers.git to /tmp/pip-req-build-mdjt3y4i
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers.git /tmp/pip-req-build-mdjt3y4i
  Resolved

In [4]:
# ⚡ Load Health-LLM (Based on MedAlpaca)

# ✅ Import Hugging Face utilities for model & tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer

# ✅ Define the model name (ensures we use the correct fine-tuned version)
model_name = "medalpaca/medalpaca-7b"

# ✅ Load the tokenizer (converts text into token IDs for model processing)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# ✅ Load the causal language model (used for generating text responses)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,  # Uses FP16 precision for efficient execution on GPU
    device_map="auto"  # Automatically assigns model to GPU if available
)

# 🔹 Since we are using GPU, the model will run on GPU if detected, improving speed and efficiency.
# 🔹 If no GPU is available, it will fall back to CPU, but execution may be slower.


config.json:   0%|          | 0.00/542 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/28.1k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/9.89G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/7.18G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/9.88G [00:00<?, ?B/s]



Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [5]:
# 🏥 Define the Inference Function for Health-LLM

def generate_response(prompt, max_length=200):
    """
    Generates a response from the Health-LLM model based on a given prompt.

    Parameters:
    - prompt (str): The input question or statement for the model.
    - max_length (int): The maximum number of tokens the model should generate.

    Returns:
    - response (str): The generated text response.
    """

    # ✅ Convert the input text into tokenized format and move it to GPU if available
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda" if torch.cuda.is_available() else "cpu")

    # ✅ Generate the model's response with sampling enabled
    outputs = model.generate(
        inputs.input_ids,  # Pass tokenized input IDs to the model
        max_length=max_length,  # Limit response length
        do_sample=True  # Enable randomness in text generation
    )

    # ✅ Decode the generated tokens back into a readable text response
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return response  # Return the final response


In [8]:
# Example Usage of Health-LLM Inference

# Define a personalized test prompt
prompt = "I am a 25-year-old female experiencing frequent dizziness, fatigue, and low blood pressure (90/60 mmHg). What could be the possible causes, and how can I manage this condition?"

# Generate response using the model
response = generate_response(prompt)

# Print formatted output
print("Response:", response)


Response: I am a 25-year-old female experiencing frequent dizziness, fatigue, and low blood pressure (90/60 mmHg). What could be the possible causes, and how can I manage this condition?
Tiredness and dizziness are two of the most common symptoms of anemia. There are other causes of low blood pressure also, some of which are dangerous. You cannot tell your condition from symptoms alone. You need to get yourself thoroughly examined by a qualified medical practitioner to know the cause of your symptoms.
I would suggest you stop looking for causes of your problem over the internet and consult your doctor!


In [9]:
import pandas as pd  # Import pandas for table formatting
from IPython.display import display  # Import display for showing tables in Colab

# 🏥 Store refined prompts and responses in a list
data = []

# 🏃‍♂️ Fitness & Health Optimization
prompt = "I am a 35-year-old male with a resting heart rate of 78 bpm. I walk 5,000 steps daily and sleep for 6 hours. My diet consists mainly of processed foods. What are the best strategies to improve my fitness and reduce stress?"
response = generate_response(prompt)
data.append(["🏃‍♂️ Fitness & Stress", prompt, response])

# 🥗 Nutrition & Weight Management
prompt = "I am a 45-year-old female. My weight is 85 kg, and my height is 165 cm. I have a sedentary lifestyle and consume a high-carb diet. How can I improve my nutrition and achieve a healthier BMI?"
response = generate_response(prompt)
data.append(["🥗 Nutrition & Weight", prompt, response])

# 😴 Sleep & Recovery
prompt = "I sleep only 5 hours per night and wake up feeling exhausted. I often drink coffee late at night due to work stress. How can I improve my sleep quality and energy levels throughout the day?"
response = generate_response(prompt)
data.append(["😴 Sleep & Energy", prompt, response])

# 📊 Convert to DataFrame and Display
df = pd.DataFrame(data, columns=["Category", "Prompt", "Response"])

# ✅ Print the table in Colab
display(df)


Unnamed: 0,Category,Prompt,Response
0,🏃‍♂️ Fitness & Stress,I am a 35-year-old male with a resting heart r...,I am a 35-year-old male with a resting heart r...
1,🥗 Nutrition & Weight,"I am a 45-year-old female. My weight is 85 kg,...","I am a 45-year-old female. My weight is 85 kg,..."
2,😴 Sleep & Energy,I sleep only 5 hours per night and wake up fee...,I sleep only 5 hours per night and wake up fee...


In [30]:
questions = [
    "I am a 35-year-old male with a resting heart rate of 78 bpm. I walk 5,000 steps daily and sleep for 6 hours. My diet consists mainly of processed foods. What are the best strategies to improve my fitness and reduce stress?",
    "I am a 45-year-old female. My weight is 85 kg, and my height is 165 cm. I have a sedentary lifestyle and consume a high-carb diet. How can I improve my nutrition and achieve a healthier BMI?",
    "I sleep only 5 hours per night and wake up feeling exhausted. I often drink coffee late at night due to work stress. How can I improve my sleep quality and energy levels throughout the day?"
]


In [31]:
from langchain.chat_models import ChatOpenAI
from langchain.llms import HuggingFaceEndpoint

# GPT-4 setup
gpt4_llm = ChatOpenAI(model="gpt-4", temperature=0.7)

# MedAlpaca (assuming HuggingFace Inference API or similar)
medalpaca_llm = HuggingFaceEndpoint(repo_id="medalpaca/medalpaca-7b", temperature=0.7)

# You can add more models if needed, e.g., Vicuna, Cohere, etc.


In [32]:
results = []

for q in questions:
    gpt4_response = gpt4_llm([HumanMessage(content=q)])

    results.append({
        "Question": q,
        "GPT-4": gpt4_response.content,
    })


In [33]:
import pandas as pd

df_results = pd.DataFrame(results)

display(df_results)


Unnamed: 0,Question,GPT-4
0,I am a 35-year-old male with a resting heart r...,1. Improved Diet: Start by gradually reducing ...
1,"I am a 45-year-old female. My weight is 85 kg,...",Improving your nutrition and achieving a healt...
2,I sleep only 5 hours per night and wake up fee...,1. Create a Sleep Schedule: A regular sleep sc...
