Skip to content

BioTextSumm/BERT-based-Summ

Repository files navigation

BERT-based-Summ

BERT-based Biomedical Text Summarizer

  1. Download version 1 or version 2 of the BERT-based biomedical text summarizer.

    • Version 1 uses Euclidean distance in the clustering step.
    • Version 2 uses Cosine similarity in the clustering step.
  2. Extract the zip file.

  3. Download the BERT repository from https://github.com/google-research/bert, and copy the files to the BERT directory already available with the summarizer.

  4. Download a BERT pretrained model from https://github.com/google-research/bert or a BioBERT pretrained model from https://github.com/naver/biobert-pretrained, and copy the files to the BERT directory already available with the summarizer.

  5. Copy your input document (preferably a txt file) to the INPUT directory already available with the summarizer.

  6. Run the following script:

    • python Summarizer.py -i INPUT_FILE_NAME -o OUTPUT_FILE_NAME -c COMPRESSION_RATE -k NUMBER_OF_CLUSTERS
  7. Four parameters must be specified when running the script:

    • INPUT_FILE_NAME is the name of input file already copied to the INPUT directory.
    • OUTPUT_FILE_NAME is the name of output file containing the summary that will be created in the OUTPUT directory.
    • COMPRESSION_RATE specifies the size of summary and takes a value in the range (0, 1).
    • NUMBER_OF_CLUSTERS specifies the number of final clusters in the clustering step.
  8. After finishing the summarization process, the summary can be found in the OUTPUT directory already available with the summarizer.


Example
The following script uses the file Input.txt as the input, runs the summarizer with a compression rate of 30 percent and a final cluster number of 4, and finally stores the summary in the file Output.txt:
 python Summarizer.py -i Input.txt -o Output.txt -c 0.3 -k 4

Final evaluation results
ROUGE-1 ROUGE-2
BERT-based summarizer (BERT-large) 0.7504 0.3312
BERT-based summarizer (BioBERT-pubmed+pmc) 0.7411 0.3228
BERT-based summarizer (BioBERT-pubmed) 0.7376 0.3203
CIBS biomedical summarizer 0.7345 0.3187
BERT-based summarizer (BioBERT-pmc) 0.7309 0.3164
Bayesian biomedical summarizer 0.7288 0.3143
BERT-based summarizer (BERT-base) 0.7257 0.3110
SUMMA 0.7098 0.3022
TexLexAn 0.6982 0.2979
Lead baseline 0.6116 0.2311
Random baseline 0.5667 0.1999

Parameterization results (Euclidean distance)
BERT-base BERT-large BioBERT-pmc BioBERT-pubmed BioBERT-pubmed+pmc
K R-1 R-2 R-1 R-2 R-1 R-2 R-1 R-2 R-1 R-2
2 0.7221 0.3087 0.7434 0.3264 0.7243 0.3094 0.7269 0.3122 0.7369 0.3195
3 0.7291 0.3133 0.7457 0.3285 0.7308 0.3172 0.7361 0.3186 0.7429 0.3265
4 0.7224 0.3107 0.7507 0.3329 0.7299 0.3189 0.7354 0.3187 0.7399 0.3234
5 0.7205 0.3114 0.7467 0.3302 0.7272 0.3138 0.7293 0.3183 0.7398 0.3229
6 0.7199 0.3099 0.7415 0.3249 0.7239 0.3134 0.7276 0.3146 0.7352 0.3199
7 0.7157 0.3075 0.7366 0.3208 0.7187 0.3097 0.7226 0.3111 0.7313 0.3170
8 0.7179 0.3079 0.7334 0.3183 0.7194 0.3089 0.7198 0.3074 0.7272 0.3122
9 0.7146 0.3084 0.7291 0.3173 0.7183 0.3099 0.7174 0.3062 0.7273 0.3087
10 0.7127 0.3054 0.7284 0.3137 0.7186 0.3102 0.7162 0.3036 0.7196 0.3080
11 0.7063 0.2990 0.7257 0.3089 0.7148 0.3161 0.7113 0.2992 0.7164 0.3027
12 0.7034 0.2968 0.7203 0.3101 0.7094 0.3088 0.7087 0.2995 0.7117 0.3006

Parameterization results (Cosine similarity)
BERT-base BERT-large BioBERT-pmc BioBERT-pubmed BioBERT-pubmed+pmc
K R-1 R-2 R-1 R-2 R-1 R-2 R-1 R-2 R-1 R-2
2 0.7196 0.3092 0.7328 0.3224 0.7242 0.3117 0.7177 0.3095 0.7285 0.3163
3 0.7169 0.3102 0.7377 0.3275 0.7249 0.3131 0.7224 0.3089 0.7328 0.3204
4 0.7212 0.3107 0.7362 0.3249 0.7272 0.3107 0.7268 0.3184 0.7278 0.3202
5 0.7152 0.3068 0.7361 0.3259 0.7212 0.3082 0.7298 0.3165 0.7295 0.3199
6 0.7136 0.3026 0.7299 0.3205 0.7171 0.3071 0.7261 0.3157 0.7272 0.3160
7 0.7107 0.2984 0.7259 0.3162 0.7173 0.3008 0.7221 0.3126 0.7224 0.3136
8 0.7071 0.2988 0.7231 0.3127 0.7176 0.3049 0.7207 0.3102 0.7199 0.3135
9 0.7037 0.2968 0.7194 0.3094 0.7119 0.3001 0.7170 0.3072 0.7182 0.3099
10 0.6989 0.2917 0.7173 0.3068 0.7073 0.2965 0.7143 0.3056 0.7158 0.3074
11 0.6953 0.2905 0.7146 0.3046 0.7035 0.2954 0.7080 0.2986 0.7126 0.3069
12 0.6908 0.2879 0.7142 0.3018 0.6995 0.2882 0.7033 0.2967 0.7106 0.3034

About

BERT-based Biomedical Text Summarizer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published