Skip to content

Ethanscuter/gigaword

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This project produces the non-anonymized version of the Gigaword summarization dataset, as used in the ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Networks. It processes the dataset into the binary format expected by the code for the Tensorflow model.

Run the script before running the main project

python mkdir.py

Download Gigaword and process data

Download the data via the URL and unzip it. Move the downloaded data to the empty directory ./data/datafiles.

python ./data/data.py

Convet the data into bin files format

python ./makedatafile/make_datafiles.py

About

Code to obtain the Gigaword dataset (non-anonymized) for summarization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages