Skip to content

dropreg/OpenLTG-MLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

OpenLTG-MLM

OpenLTG-MLM is a code repository designed for Open-ended Text Generation tasks using bidirectional pre-trained langugage models such as BERT Family models (e.g., BERT and RoBERTa) through a Non-autoregressive Generation paradigm.

The main advantages of our work (has been accepted for ACL 2023!) include "enhancing the diversity of generated text" and "improving the generation speed of long text." We aim to bring renewed attention to bidirectional attention models, as they still hold potential in the field of text generation!

🧩 Installation

The codebase relies on Fairseq and PyTorch. As of March 12, 2024 (2024/3/12), it has been verified that Fairseq version 0.12.2 is compatible.

pip install fairseq

🧱 Reproducibility

We conducted experiments on "Writing Prompts" tasks in open domains, presenting results from different-sized datasets.

Dataset Test set Size
WritingPrompts (Slim) 26k
WritingPrompts download (.tar.bz2) 272k
WritingPromptsX 587k

🎮 Inference:

We have incorporated two methods of generation: Direct Generation and Recursive Span Generation:

  • DirectGen generate the target text directly in its entirety.

  • RecSpanGen generate the target text by specifying span numbers.

| Recursive span generation helps our model remain competitive in scenarios involving longer text generation.

Sampling Parameters:

  • -DSWAttn (Dynamic Sliding Window Attention) can assist attention mechanisms in focusing on crucial information within a broader local context, thus preventing interference from distant noise.
  • -NSamping (Nucleus Sampling) helps mitigate prevent degradation issues in language models during open-domain tasks.
  • -LTD (Linear Temperature Decay) is a crucial technique that ensures the model maintains high-quality outputs in the iterative process.
bash openltg_mlm/scripts/tasks/xsum/run_inf.sh

🚀 Training

We can extend the maximum encoding length of the RoBERTa model using --hierarchical-pos to support usage scenarios larger than 1k.

# Prepare Data
bash openltg_mlm/scripts/process/xsum/binarize.sh
# DirectGen
bash openltg_mlm/scripts/tasks/xsum/run_train.sh
# or RecSpanGen
# bash openltg_mlm/scripts/tasks/xsum/run_rec_train.sh

🧷 Citing

@inproceedings{liang-etal-2023-open,
    title = "Open-ended Long Text Generation via Masked Language Modeling",
    author = "Liang, Xiaobo  and
      Tang, Zecheng  and
      Li, Juntao  and
      Zhang, Min",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.13",
    doi = "10.18653/v1/2023.acl-long.13",
    pages = "223--241",
}

About

Open-ended Long Text Generation via Masked Language Modeling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published