LLM4Med: AI-Based Medical Educational Resource Recommendation System

Project Overview

LLM4Med is an AI-powered web application that analyzes patient data and recommends customized educational resources. This system utilizes a medical-specialized Large Language Model (LLM) to perform in-depth analysis of patient information and recommends relevant educational materials based on the analysis results.
*Project was intergrated into iKooB Healthcare platform with upgraded versions

Key Features

Multi-stage Patient Data Analysis:
- Stage 1: Analysis of patient health status and key management areas
- Stage 2: Development of disease management plans, complication prevention strategies, and lifestyle improvement recommendations
- Stage 3: Provision of personalized educational topics and actionable advice
Educational Resource Recommendation: TF-IDF-based keyword matching for relevant educational material recommendations
Multilingual Support: Automatic translation of English analysis results to Korean

System Requirements

Hardware Requirements

GPU: 12GB+ VRAM (recommended for running Llama-3 models)
RAM: 16GB+
Storage: 15GB+ (for model and data storage)

Software Requirements

Python: 3.8+
CUDA: 11.7+ (for GPU usage)

Installation

1. Environment Setup

# Create a virtual environment
conda create -n llm4med python=3.8

# Activate the virtual environment
conda activate llm4med

# Install required packages
pip install -r requirements.txt

2. Directory and File Setup

# Create directories for images and educational resources
mkdir -p content/images

# Create ngrok configuration file (optional: for external access)
echo '{"auth_token": "your_ngrok_auth_token", "domain": "your_ngrok_domain"}' > ngrok_config.json

3. Papago API Key Setup (for Korean translation)

Set as environment variables:

export PAPAGO_CLIENT_ID="your_client_id"
export PAPAGO_CLIENT_SECRET="your_client_secret"

Alternatively, you can directly set them in utils.py.

File Structure and Required Modifications

Key Files

Filename	Description
`main.py`	Main Flask application and API endpoint definitions
`model.py`	LLM model and processor class definitions
`Recsys.py`	TF-IDF-based recommendation algorithm for educational resources
`template.py`	LLM prompt templates and HTML template definitions
`utils.py`	Utility functions (translation, ngrok setup, etc.)

Required Modifications

1. Path Settings in `main.py`

The following variables need to be directly set:

# Directory containing content images
IMAGE_DIRECTORY = "/path/to/your/image/directory"

# Path to the educational resource metadata CSV file
RECOMMENDATION_PATH = "/path/to/your/recommendation.csv"

2. Papago API Key Setup in `utils.py`

def translate_to_korean(text, client_id="YOUR_PAPAGO_CLIENT_ID", client_secret="YOUR_PAPAGO_CLIENT_SECRET"):

3. Patient Data CSV Format

The CSV file uploaded to the system should include the following fields:

Required fields:
- Disease Classification - Primary, Disease Classification - Secondary, Disease Classification - Tertiary (disease classifications)
- Department - Main, Department - Sub (related departments)
- System, System.1, System.2, System.3, System.4 (affected systems)
- Disease Name, Disease Name.1, Disease Name.2, Disease Name.3, Disease Name.4 (disease names/symptoms)
- Gender, Age, BMI (patient characteristics)
- top-1(ENG) (recommended educational topic)

4. Educational Resource Metadata CSV Format

The CSV file specified in RECOMMENDATION_PATH should follow this format:

Required fields:
- No (unique identifier)
- Title (educational resource title)
- File_name (image filename)
- Keywords (keyword list - string or list format)

Execution

Basic Execution

python main.py

By default, the web application runs at http://localhost:5000.

External Access Setup via ngrok (Optional)

To create a URL accessible from outside, configure the ngrok_config.json file and run:

{
  "auth_token": "your_ngrok_auth_token",
  "domain": "your_custom_domain" (optional)
}

Usage Guide

Access the web interface (http://localhost:5000)
Upload a CSV file containing patient data
Select a patient to analyze from the patient list
The multi-stage analysis will automatically proceed:
- Stage 1: Patient information analysis
- Stage 2: Management plan development
- Stage 3: Detailed recommendation generation
View the analysis results along with recommended educational resources

Dependencies

The requirements.txt file should include the following packages:

flask==2.0.2
vllm==0.2.0
pandas==1.5.3
numpy==1.24.3
pyngrok==5.2.1
langchain==0.0.230
scikit-learn==1.2.2
torch==2.1.0

Troubleshooting

Common Errors

Model Loading Error
- Issue: LLM object initialization failure
- Solution: Check GPU VRAM, verify CUDA compatibility
File Upload Error
- Issue: CSV file format error
- Solution: Ensure CSV field names match the expected format
Image Loading Failure
- Issue: Recommended images not displaying
- Solution: Check IMAGE_DIRECTORY path, verify file extensions (.png)
Translation API Error
- Issue: Korean translation failure
- Solution: Verify Papago API keys, check network connectivity

Model Information

This project uses the TsinghuaC3I/Llama-3-8B-UltraMedical model, which is an 8B parameter Llama-3 model fine-tuned on medical data.

Model Source: Hugging Face - TsinghuaC3I/Llama-3-8B-UltraMedical
License: Check the license before using the model

Code Structure and Workflow

Data Flow

Patient CSV data upload (/upload endpoint)
Patient list provision and pagination (/patients_list endpoint)
Three-stage analysis of selected patient (/process_stage endpoint)
Educational resource recommendation and display based on analysis results

Class and Function Relationships

LangchainCOTProcessor: LLM interaction management
prepare_input_data(): Patient data preprocessing
extract_keywords_from_output(): Keyword extraction
find_best_matches_tfidf(): Educational resource matching
translate_to_korean(): Translation processing

Extension and Improvement Directions

LLM Model Replacement:
- Change the model_name parameter of the LangchainCOTProcessor class in model.py
- Adjust tokenizer and generation parameters as needed
Prompt Template Modification:
- Modify STAGE_1_TEMPLATE, STAGE_2_TEMPLATE, STAGE_3_TEMPLATE in template.py
UI Improvements:
- Modify the HTML_TEMPLATE in template.py
- Add CSS and JavaScript functionality
Recommendation Algorithm Enhancement:
- Consider implementing embedding-based recommendation algorithms instead of TF-IDF in Recsys.py

License

Please check Llama license before use.

Contact

For additional information, please email jaheon555@g.skku.edu

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
Recsys.py		Recsys.py
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
template.py		template.py
utils.py		utils.py

Parkprogrammer/llm4med

Folders and files

Latest commit

History

Repository files navigation