Skip to content

JobayerAhmmed/SueNes

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Refactorize SueNes Using HuggingFace Transformer Library​

SueNes paper

Team Members

  • Jobayer Ahmmed
  • Jahid Hasan

Run The Experiment Automatically

  • Open a Linux Terminal
  • Clone the repo: git clone https://github.com/SigmaWe/SueNes_RE.git
  • Go to SueNes_RE directory: cd SueNes_RE
  • Give execution permission to run.sh file: chmod +x run.sh
  • Finally, run the script: source run.sh

We trained two different models from the same checkpoint. One is using Tensorflow and other one is using PyTorch. The run.sh scipt runs all the python files for training the two models and testing them with sample data. For testing, we call our trained model with three pairs of document and summary. The original scores and the predicted scores are shown in the terminal.

The rest of the part is step-by-step instructions.

Repeat Transformer-based Experiments

The transformer directory contains code for training transformer-based models with different datasets. The datasets were generated using sentence delete or word delete techniques mentioned in the SueNes paper.

Environmet Setup

You can create virtual environment using Python or Conda.

Python venv (CPU Only)

  • git clone https://github.com/JobayerAhmmed/SueNes.git
  • cd SueNes
  • python3 -m venv .venv
  • source .venv/bin/activate
  • pip install -r requirements.txt
  • python -m spacy download en_core_web_sm
  • pip install transformers datasets scikit-learn evaluate pyyaml h5py
  • Issue: replace from keras.saving.hdf5_format by from tensorflow.python.keras.saving.hdf5_format at line 39 of .venv/lib/python3.10/site-packages/transformers/modeling_tf_utils.py

Conda venv (GPU)

  • Create venv following this documentation
  • pip install tensorflow tensorflow-datasets tensorflow_hub
  • Install PyTorch following this documentation
  • pip install joblib numpy nltk matplotlib bs4 spacy stanza
  • python -m spacy download en_core_web_sm
  • pip install transformers datasets scikit-learn evaluate pyyaml h5py

Generate Datasets

  • mkdir exp exp/data exp/result
  • cd pre
  • python3 sentence_scramble.py

Bert Tiny CNN Daily Mail TensorFlow

Code for the model is in bert_tiny_cnndm_tf.py file. This model is trained from checkpoint found in prajjwal1/bert-tiny. Data is generated from CNN Daily Mail dataset using SueNes. Only sentence delete technique, defined in SueNes paper, is used for data generation. Only 10% data is considered from CNN Daily Mail dataset's train split for generating train split for our experiment.

Train Model

  • cd transformer
  • python3 bert_tiny_cnndm_tf.py

Test Model

  • python3 bert_tiny_cnndm_tf_wrap.py

Bert Tiny CNN Daily Mail PyTorch

Code for the model is in bert_tiny_cnndm_pt.py file. This model is trained from checkpoint found in prajjwal1/bert-tiny. Data is generated from CNN Daily Mail dataset using SueNes. Only sentence delete technique, defined in SueNes paper, is used for data generation. Only 10% data is considered from CNN Daily Mail dataset's train split for generating train split for our experiment.

Train Model

  • cd transformer
  • python3 bert_tiny_cnndm_pt.py

Test Model

  • python3 bert_tiny_cnndm_pt_wrap.py

About

Document summary evaluation model using Hugging Face transformer library.

Topics

Resources

Stars

Watchers

Forks

Languages

  • Python 53.3%
  • Jupyter Notebook 24.3%
  • Perl 22.1%
  • Shell 0.3%