LLaVa3-Med

We apply 3-stages to train our model.

Pretraining: We utilize a dataset comprising 600k image-text pairs from PMC and 60k medical references based on Mayo Clinic guidelines for the pretraining phase.
Instruction Fine-tuning: We employ a dataset consisting of 60k LLaVA_Med instruction fine-tuning examples and PMC-VQA datasets to perform instruction learning.
Fine-tuning: Our model undergoes fine-tuning on various VQA datasets.

Inference

CUDA_VISIBLE_DEVICES=0 python -m evaluation \
        --model-path model_path \
        --question-file data_path \
        --image-folder image_path \
        --answers-file result.jsonl \
        --temperature 0.7 \
        --conv-mode llama3

Results

Because GPT-4 has not been fine-tuned on these VQA tasks, the answers it generates for open questions differ significantly in style from the reference answers. Therefore, we employed a few-shot approach and modified GPT-4's answers to match the style of the reference answers.

Dataset	Metric	Med-Gemini	Med-PaLM-540B	GPT-4V	LLaVa3-Med
Slake-VQA	Token F1	87.5	89.3	76.8	89.8†
Path-VQA	Token F1	64.7	62.7	57.7	64.9†

Table 1 | Multimodal evaluation. Performance comparison of LLaVa3-Med versus state-of-the-art (SoTA) methods.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
README.md		README.md
eval.sh		eval.sh
evaluation.py		evaluation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLaVa3-Med

Inference

Results

About

Releases

Packages

Languages

believewhat/LLaVa3-Med

Folders and files

Latest commit

History

Repository files navigation

LLaVa3-Med

Inference

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages