LLM4Med is an AI-powered web application that analyzes patient data and recommends customized educational resources. This system utilizes a medical-specialized Large Language Model (LLM) to perform in-depth analysis of patient information and recommends relevant educational materials based on the analysis results.
*Project was intergrated into iKooB Healthcare platform with upgraded versions
-
Multi-stage Patient Data Analysis:
- Stage 1: Analysis of patient health status and key management areas
- Stage 2: Development of disease management plans, complication prevention strategies, and lifestyle improvement recommendations
- Stage 3: Provision of personalized educational topics and actionable advice
-
Educational Resource Recommendation: TF-IDF-based keyword matching for relevant educational material recommendations
-
Multilingual Support: Automatic translation of English analysis results to Korean
- GPU: 12GB+ VRAM (recommended for running Llama-3 models)
- RAM: 16GB+
- Storage: 15GB+ (for model and data storage)
- Python: 3.8+
- CUDA: 11.7+ (for GPU usage)
# Create a virtual environment
conda create -n llm4med python=3.8
# Activate the virtual environment
conda activate llm4med
# Install required packages
pip install -r requirements.txt# Create directories for images and educational resources
mkdir -p content/images
# Create ngrok configuration file (optional: for external access)
echo '{"auth_token": "your_ngrok_auth_token", "domain": "your_ngrok_domain"}' > ngrok_config.jsonSet as environment variables:
export PAPAGO_CLIENT_ID="your_client_id"
export PAPAGO_CLIENT_SECRET="your_client_secret"Alternatively, you can directly set them in utils.py.
| Filename | Description |
|---|---|
main.py |
Main Flask application and API endpoint definitions |
model.py |
LLM model and processor class definitions |
Recsys.py |
TF-IDF-based recommendation algorithm for educational resources |
template.py |
LLM prompt templates and HTML template definitions |
utils.py |
Utility functions (translation, ngrok setup, etc.) |
The following variables need to be directly set:
# Directory containing content images
IMAGE_DIRECTORY = "/path/to/your/image/directory"
# Path to the educational resource metadata CSV file
RECOMMENDATION_PATH = "/path/to/your/recommendation.csv"def translate_to_korean(text, client_id="YOUR_PAPAGO_CLIENT_ID", client_secret="YOUR_PAPAGO_CLIENT_SECRET"):The CSV file uploaded to the system should include the following fields:
- Required fields:
Disease Classification - Primary,Disease Classification - Secondary,Disease Classification - Tertiary(disease classifications)Department - Main,Department - Sub(related departments)System,System.1,System.2,System.3,System.4(affected systems)Disease Name,Disease Name.1,Disease Name.2,Disease Name.3,Disease Name.4(disease names/symptoms)Gender,Age,BMI(patient characteristics)top-1(ENG)(recommended educational topic)
The CSV file specified in RECOMMENDATION_PATH should follow this format:
- Required fields:
No(unique identifier)Title(educational resource title)File_name(image filename)Keywords(keyword list - string or list format)
python main.pyBy default, the web application runs at http://localhost:5000.
To create a URL accessible from outside, configure the ngrok_config.json file and run:
{
"auth_token": "your_ngrok_auth_token",
"domain": "your_custom_domain" (optional)
}- Access the web interface (
http://localhost:5000) - Upload a CSV file containing patient data
- Select a patient to analyze from the patient list
- The multi-stage analysis will automatically proceed:
- Stage 1: Patient information analysis
- Stage 2: Management plan development
- Stage 3: Detailed recommendation generation
- View the analysis results along with recommended educational resources
The requirements.txt file should include the following packages:
flask==2.0.2
vllm==0.2.0
pandas==1.5.3
numpy==1.24.3
pyngrok==5.2.1
langchain==0.0.230
scikit-learn==1.2.2
torch==2.1.0
-
Model Loading Error
- Issue:
LLMobject initialization failure - Solution: Check GPU VRAM, verify CUDA compatibility
- Issue:
-
File Upload Error
- Issue: CSV file format error
- Solution: Ensure CSV field names match the expected format
-
Image Loading Failure
- Issue: Recommended images not displaying
- Solution: Check
IMAGE_DIRECTORYpath, verify file extensions (.png)
-
Translation API Error
- Issue: Korean translation failure
- Solution: Verify Papago API keys, check network connectivity
This project uses the TsinghuaC3I/Llama-3-8B-UltraMedical model, which is an 8B parameter Llama-3 model fine-tuned on medical data.
- Model Source: Hugging Face - TsinghuaC3I/Llama-3-8B-UltraMedical
- License: Check the license before using the model
- Patient CSV data upload (
/uploadendpoint) - Patient list provision and pagination (
/patients_listendpoint) - Three-stage analysis of selected patient (
/process_stageendpoint) - Educational resource recommendation and display based on analysis results
LangchainCOTProcessor: LLM interaction managementprepare_input_data(): Patient data preprocessingextract_keywords_from_output(): Keyword extractionfind_best_matches_tfidf(): Educational resource matchingtranslate_to_korean(): Translation processing
-
LLM Model Replacement:
- Change the
model_nameparameter of theLangchainCOTProcessorclass inmodel.py - Adjust tokenizer and generation parameters as needed
- Change the
-
Prompt Template Modification:
- Modify
STAGE_1_TEMPLATE,STAGE_2_TEMPLATE,STAGE_3_TEMPLATEintemplate.py
- Modify
-
UI Improvements:
- Modify the
HTML_TEMPLATEintemplate.py - Add CSS and JavaScript functionality
- Modify the
-
Recommendation Algorithm Enhancement:
- Consider implementing embedding-based recommendation algorithms instead of TF-IDF in
Recsys.py
- Consider implementing embedding-based recommendation algorithms instead of TF-IDF in
Please check Llama license before use.
For additional information, please email jaheon555@g.skku.edu
