Skip to content

ToneLi/BIoMedRAG

Repository files navigation

BiomedRAG: A Retrieval augmented Large Language Model for Biomedicine

1) Overview

The architecture of our proposed PeTailor is depicted in the diagram below. It consists of three major steps: (1) constructing the diverse chunk database; (2) training the tailored chunk scorer to select the relevant document for the input sentence, the relevant document is from the diverse chunk database; (3) incorporating the retrieved document into the LLM to generate the triple or other output for the given sentence.

2)baselines

for the triple extraction baseline, please check Triple Extraction Baselines

3) GIT

data format:

{"PREDICATE": "INTERACTS_WITH",
"SUBJECT_TEXT": "nalorphine",
"OBJECT_TEXT": "morphine",
"SENTENCE": "[on the effect of respiration of a combination of  morphine -like acting pethidine with the  morphine  antagonist  nalorphine ]."}

4) Code Structure

  • 0_make_relation_chuck_and_scorer_data (data pre-progress)

  • 1_train_scorer_model (chuck score training) -- 1) Please download the transformer file to 1_train_scorer_model from the huggingface. 2) Please replace the trainer.py file in the source transformer file, and add the SplitInputsChunks.py and ChuckWeights.py to the transformer file.

  • 2_relation_data_to_triple_train_data

  • 3_trainning_triple_model (LLM training for triple extraction)

  • 4_generation_triple_model (generation progress)

5) Configuration

  • Python 3.8.8

  • Transformer: pip install transformers==4.31.0

  • GPU A100

3) Easy way to train and evaluate the model

Chuck scorer training

Please enter: 1_train_chuck_scorer_model, and run:

CUDA_VISIBLE_DEVICES=0  python 0_train_retrievel_5.py

LLM training for triple extraction

step1: enter the "https://github.com/ToneLi/PETAILOR-for-bio-triple-extraction/tree/main/3_trainning_triple_model" and run:

CUDA_VISIBLE_DEVICES=0  nohup  python trainer.py >myout.trainer 2>&1 &   

step 2: triple generation: enter 4_generation_triple_model and run:

CUDA_VISIBLE_DEVICES=0  python chuck5_generation_8000.py

step 3: The generated file is in petailor_output_for_GM-CIHT, please run:

 python   0_F1_triplet_evalution.py
 results: {'all-prec': 0.8177874186550976, 'all-recall': 0.810752688172043, 'all-f1': 0.8142548596112312}

4) How to run (full step)

  • Step 1: Please access the "0_make_relation_chuck_and_scorer_data" directory and execute the code, proceeding through the files sequentially according to their assigned numbers. Ensure to update the file names and locations as necessary.

  • Step2: Please access the "1_train_scorer_model" directory and execute the code, ensure to update the file names and locations as necessary, based on the step1

CUDA_VISIBLE_DEVICES=1 python 0_train_retrievel_5.py,  please use the default parameters.
  • Step3: Please access the "2_relation_data_to_triple_train_data" directory and execute the code
python chuck_triplet_progress_train.py
  • Step4: Please access the "3_trainning_triple_model" directory and execute the code:
CUDA_VISIBLE_DEVICES=1 python 1_ourmethod_chuck5_sim_llama2_13b_right_2.py
  • Step5: Please access the "4_generation_triple_model" directory and execute the code:
 CUDA_VISIBLE_DEVICES=1 python  chuck5_generation.py
  • Step6: Please access the "4_generation_triple_model" directory and execute the code for the evaluation:
python 0_F1_triplet_evalution.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages