GitHub

BiomedRAG: A Retrieval augmented Large Language Model for Biomedicine

1) Overview

The architecture of our proposed PeTailor is depicted in the diagram below. It consists of three major steps: (1) constructing the diverse chunk database; (2) training the tailored chunk scorer to select the relevant document for the input sentence, the relevant document is from the diverse chunk database; (3) incorporating the retrieved document into the LLM to generate the triple or other output for the given sentence.

2)baselines

for the triple extraction baseline, please check Triple Extraction Baselines

3) GIT

data format:

{"PREDICATE": "INTERACTS_WITH",
"SUBJECT_TEXT": "nalorphine",
"OBJECT_TEXT": "morphine",
"SENTENCE": "[on the effect of respiration of a combination of  morphine -like acting pethidine with the  morphine  antagonist  nalorphine ]."}

4) Code Structure

0_make_relation_chuck_and_scorer_data (data pre-progress)
1_train_scorer_model (chuck score training) -- 1) Please download the transformer file to 1_train_scorer_model from the huggingface. 2) Please replace the trainer.py file in the source transformer file, and add the SplitInputsChunks.py and ChuckWeights.py to the transformer file.
2_relation_data_to_triple_train_data
3_trainning_triple_model (LLM training for triple extraction)
4_generation_triple_model (generation progress)

5) Configuration

Python 3.8.8
Transformer: pip install transformers==4.31.0
GPU A100

3) Easy way to train and evaluate the model

Chuck scorer training

Please enter: 1_train_chuck_scorer_model, and run:

CUDA_VISIBLE_DEVICES=0  python 0_train_retrievel_5.py

LLM training for triple extraction

step1: enter the "https://github.com/ToneLi/PETAILOR-for-bio-triple-extraction/tree/main/3_trainning_triple_model" and run:

CUDA_VISIBLE_DEVICES=0  nohup  python trainer.py >myout.trainer 2>&1 &

step 2: triple generation: enter 4_generation_triple_model and run:

CUDA_VISIBLE_DEVICES=0  python chuck5_generation_8000.py

step 3: The generated file is in petailor_output_for_GM-CIHT, please run:

 python   0_F1_triplet_evalution.py
 results: {'all-prec': 0.8177874186550976, 'all-recall': 0.810752688172043, 'all-f1': 0.8142548596112312}

4) How to run (full step)

Step 1: Please access the "0_make_relation_chuck_and_scorer_data" directory and execute the code, proceeding through the files sequentially according to their assigned numbers. Ensure to update the file names and locations as necessary.
Step2: Please access the "1_train_scorer_model" directory and execute the code, ensure to update the file names and locations as necessary, based on the step1

CUDA_VISIBLE_DEVICES=1 python 0_train_retrievel_5.py,  please use the default parameters.

Step3: Please access the "2_relation_data_to_triple_train_data" directory and execute the code

python chuck_triplet_progress_train.py

Step4: Please access the "3_trainning_triple_model" directory and execute the code:

CUDA_VISIBLE_DEVICES=1 python 1_ourmethod_chuck5_sim_llama2_13b_right_2.py

Step5: Please access the "4_generation_triple_model" directory and execute the code:

 CUDA_VISIBLE_DEVICES=1 python  chuck5_generation.py

Step6: Please access the "4_generation_triple_model" directory and execute the code for the evaluation:

python 0_F1_triplet_evalution.py

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
0_make_relation_chuck_and_scorer_data		0_make_relation_chuck_and_scorer_data
1_train_chuck_scorer_model		1_train_chuck_scorer_model
2_relation_data_to_triple_train_data		2_relation_data_to_triple_train_data
3_trainning_triple_model		3_trainning_triple_model
4_generation_triple_model		4_generation_triple_model
dataset		dataset
README.md		README.md
framework.png		framework.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0_make_relation_chuck_and_scorer_data

0_make_relation_chuck_and_scorer_data

1_train_chuck_scorer_model

1_train_chuck_scorer_model

2_relation_data_to_triple_train_data

2_relation_data_to_triple_train_data

3_trainning_triple_model

3_trainning_triple_model

4_generation_triple_model

4_generation_triple_model

dataset

dataset

README.md

README.md

framework.png

framework.png

Repository files navigation

BiomedRAG: A Retrieval augmented Large Language Model for Biomedicine

1) Overview

2)baselines

3) GIT

4) Code Structure

5) Configuration

3) Easy way to train and evaluate the model

Chuck scorer training

LLM training for triple extraction

4) How to run (full step)

About

Releases

Packages

Languages

ToneLi/BIoMedRAG

Folders and files

Latest commit

History

Repository files navigation

BiomedRAG: A Retrieval augmented Large Language Model for Biomedicine

1) Overview

2)baselines

3) GIT

4) Code Structure

5) Configuration

3) Easy way to train and evaluate the model

Chuck scorer training

LLM training for triple extraction

4) How to run (full step)

About

Resources

Stars

Watchers

Forks

Languages