This project produces the non-anonymized version of the Gigaword summarization dataset, as used in the ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Networks. It processes the dataset into the binary format expected by the code for the Tensorflow model.
python mkdir.py
Download the data via the URL and unzip it. Move the downloaded data to the empty directory ./data/datafiles.
python ./data/data.py
python ./makedatafile/make_datafiles.py