# Local GPT Analysis for Medical Students

## 🤖 Analyzing Data with Natural Language Questions

This notebook demonstrates how to use **local large language models** to analyze medical data using plain English questions. Unlike cloud-based AI services, everything runs on your computer, ensuring complete data privacy.

### Why This Approach Matters for Medical Students:
- 🔒 **Complete Privacy**: Your data never leaves your computer
- 💬 **Natural Language**: Ask questions in plain English, no complex coding required
- 🏥 **Medical Context**: Perfect for exploring healthcare datasets securely
- 📊 **Instant Analysis**: Get immediate insights and visualizations

### Important Notes:
- **Always consult your PI** about methods, devices, and approach before analyzing real data
- This method complements other data science techniques covered in separate notebooks
- The local model (qwen3) works on older devices but must be downloaded first via Ollama

---

## 🛠️ Prerequisites

### Required Setup:

1. **Install Ollama** from https://ollama.ai
2. **Download the qwen3:8b model**:
   ```bash
   ollama pull qwen3:8b
   ```
3. **Install requirements** per the README.md file

### ⏱️ First-Time Usage Note:
**The first time you use the model, it takes longer to load** - don't be discouraged! Subsequent queries will be much faster as the model stays loaded in memory.

### Why qwen3:8b?
- Runs efficiently on older hardware
- Good balance of performance and resource usage
- Excellent for data analysis tasks
- Completely offline operation

---

## 🚀 Setting Up Your Local Analysis Environment

In [None]:
# Install all requirements first per the README.md file
# You can use the same virtual environment as the one used for the local Exploring Data notebook.

from pandasai_litellm.litellm import LiteLLM
import pandasai as pai

print("✅ Libraries loaded successfully!")

In [None]:
# Load your dataset
df = pai.read_csv("data/predictdm.csv")

print(f"📊 Dataset loaded: {df.shape[0]} rows, {df.shape[1]} columns")
print("Ready for natural language analysis!")

In [None]:
# Initialize the local language model
# Note: First run may take 30-60 seconds to load the model
model = LiteLLM(model="ollama/qwen3:8b")

pai.config.set({
    "llm": model,
    "save_charts": False
})

print("🤖 Local language model ready!")
print("💡 Tip: First query may be slow while model loads")

## 💬 Natural Language Data Analysis

Now you can ask questions about your data in plain English! The local AI will interpret your questions and generate appropriate analysis code.

### Example Questions You Can Ask:
- "What is the average age of patients?"
- "Show me the distribution of glucose levels"
- "Compare BMI between diabetic and non-diabetic patients"
- "Create a scatter plot of age vs glucose"
- "What percentage of patients have diabetes?"

### Let's Start with Basic Questions:

In [None]:
# Ask about mean glucose for men
# This demonstrates how natural language gets converted to data analysis
# The first question takes longer as the model loads
response = df.chat('What is the mean glucose for men?')
print(response)

In [None]:
# Create visualizations with natural language
chart_response = df.chat("Plot age distribution")
print(chart_response)

In [None]:
# More complex analysis with regression
plot_response = df.chat("Plot glucose vs age with a regression line")
print(plot_response)

## 🔬 Try Your Own Questions

Use the cells below to ask your own questions about the diabetes dataset. Remember, you can ask in natural language!

In [None]:
# Your question here - try asking about different variables or relationships
# Example: df.chat("What is the correlation between BMI and diabetes?")

your_response = df.chat("Your question here")
print(your_response)

In [None]:
# Another question - try asking for a different type of visualization
# Example: df.chat("Create a box plot of glucose levels by gender")

another_response = df.chat("Your visualization request here")
print(another_response)

## 🎯 Advanced Natural Language Queries

The local LLM can handle more sophisticated analysis requests:

In [None]:
# Complex statistical analysis
complex_analysis = df.chat("Calculate the diabetes prevalence by age groups: under 30, 30-50, 50-70, and over 70")
print(complex_analysis)

In [None]:
# Multi-variable analysis
multi_var = df.chat("Show me a correlation heatmap of all numeric variables")
print(multi_var)

## 🔒 Security and Privacy Benefits

### Why Local LLMs Matter for Medical Data:

✅ **Complete Data Privacy**: No data transmission to external servers  
✅ **HIPAA Compliance**: Maintains patient confidentiality  
✅ **Institutional Control**: Meets hospital/university data policies  
✅ **Audit Trail**: All analysis happens on your controlled environment  
✅ **No Internet Required**: Works offline once model is downloaded  

### Best Practices:
- Always discuss your analysis approach with your Principal Investigator
- Ensure your device meets your institution's security requirements
- Keep your local environment updated and secure
- Document your analysis methods for reproducibility

---

## 💡 Tips for Effective Natural Language Queries

### Good Question Patterns:
- **Be specific**: "Show glucose distribution for diabetic patients" vs "Show glucose"
- **Request context**: "Compare average BMI between groups" vs "Calculate BMI"
- **Ask for visualizations**: "Plot X vs Y" or "Create a histogram of Z"
- **Specify groupings**: "Analyze by gender", "Group by age ranges"

### What the Local LLM Can Do:
- Generate statistical summaries
- Create various plot types (scatter, histogram, box plot, etc.)
- Perform correlation analysis
- Group and aggregate data
- Calculate percentages and ratios
- Filter and subset data

### Performance Notes:
- **First query**: May take 30-60 seconds (model loading)
- **Subsequent queries**: Much faster (2-10 seconds)
- **Complex requests**: May take longer but provide detailed analysis
- **Model stays loaded**: Until you restart the notebook

---

## 🎓 Next Steps

This notebook focused on **natural language data analysis** using local LLMs. To expand your data science skills:

1. **Explore other notebooks** in this collection for traditional data science methods
2. **Practice with different datasets** to see how the LLM adapts
3. **Combine approaches**: Use this for exploration, then traditional methods for detailed analysis
4. **Learn prompt engineering**: Better questions lead to better analysis

### Remember:
- This is one tool in your data science toolkit
- Always validate important findings with traditional statistical methods
- Consult with your PI about appropriate use in research contexts
- The local approach ensures your data remains secure and private

**Happy analyzing with natural language! 🤖📊🔒**