Refactorize SueNes Using HuggingFace Transformer Library

Team Members

Jobayer Ahmmed
Jahid Hasan

Run The Experiment Automatically

Open a Linux Terminal
Clone the repo: git clone https://github.com/SigmaWe/SueNes_RE.git
Go to SueNes_RE directory: cd SueNes_RE
Give execution permission to run.sh file: chmod +x run.sh
Finally, run the script: source run.sh

We trained two different models from the same checkpoint. One is using Tensorflow and other one is using PyTorch. The run.sh scipt runs all the python files for training the two models and testing them with sample data. For testing, we call our trained model with three pairs of document and summary. The original scores and the predicted scores are shown in the terminal.

The rest of the part is step-by-step instructions.

Repeat Transformer-based Experiments

The transformer directory contains code for training transformer-based models with different datasets. The datasets were generated using sentence delete or word delete techniques mentioned in the SueNes paper.

Environmet Setup

You can create virtual environment using Python or Conda.

Python venv (CPU Only)

git clone https://github.com/JobayerAhmmed/SueNes.git
cd SueNes
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m spacy download en_core_web_sm
pip install transformers datasets scikit-learn evaluate pyyaml h5py
Issue: replace from keras.saving.hdf5_format by from tensorflow.python.keras.saving.hdf5_format at line 39 of .venv/lib/python3.10/site-packages/transformers/modeling_tf_utils.py

Conda venv (GPU)

Create venv following this documentation
pip install tensorflow tensorflow-datasets tensorflow_hub
Install PyTorch following this documentation
pip install joblib numpy nltk matplotlib bs4 spacy stanza
python -m spacy download en_core_web_sm
pip install transformers datasets scikit-learn evaluate pyyaml h5py

Generate Datasets

mkdir exp exp/data exp/result
cd pre
python3 sentence_scramble.py

Bert Tiny CNN Daily Mail TensorFlow

Code for the model is in bert_tiny_cnndm_tf.py file. This model is trained from checkpoint found in prajjwal1/bert-tiny. Data is generated from CNN Daily Mail dataset using SueNes. Only sentence delete technique, defined in SueNes paper, is used for data generation. Only 10% data is considered from CNN Daily Mail dataset's train split for generating train split for our experiment.

Train Model

cd transformer
python3 bert_tiny_cnndm_tf.py

Test Model

python3 bert_tiny_cnndm_tf_wrap.py

Bert Tiny CNN Daily Mail PyTorch

Code for the model is in bert_tiny_cnndm_pt.py file. This model is trained from checkpoint found in prajjwal1/bert-tiny. Data is generated from CNN Daily Mail dataset using SueNes. Only sentence delete technique, defined in SueNes paper, is used for data generation. Only 10% data is considered from CNN Daily Mail dataset's train split for generating train split for our experiment.

Train Model

cd transformer
python3 bert_tiny_cnndm_pt.py

Test Model

python3 bert_tiny_cnndm_pt_wrap.py

Name		Name	Last commit message	Last commit date
Latest commit History 311 Commits
bert		bert
embed		embed
human		human
old		old
pre		pre
transformer		transformer
.gitignore		.gitignore
README.md		README.md
meeting.txt		meeting.txt
requirements.txt		requirements.txt
run.sh		run.sh
vast_setup.sh		vast_setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Refactorize SueNes Using HuggingFace Transformer Library

Team Members

Run The Experiment Automatically

Repeat Transformer-based Experiments

Environmet Setup

Python venv (CPU Only)

Conda venv (GPU)

Generate Datasets

Bert Tiny CNN Daily Mail TensorFlow

Train Model

Test Model

Bert Tiny CNN Daily Mail PyTorch

Train Model

Test Model

About

Languages

JobayerAhmmed/SueNes

Folders and files

Latest commit

History

Repository files navigation

Refactorize SueNes Using HuggingFace Transformer Library​

Team Members

Run The Experiment Automatically

Repeat Transformer-based Experiments

Environmet Setup

Python venv (CPU Only)

Conda venv (GPU)

Generate Datasets

Bert Tiny CNN Daily Mail TensorFlow

Train Model

Test Model

Bert Tiny CNN Daily Mail PyTorch

Train Model

Test Model

About

Topics

Resources

Stars

Watchers

Forks

Languages

Refactorize SueNes Using HuggingFace Transformer Library