This repo is to parse and create training data for nyt dataset. Use unzip.py to extract all files in nyt corpus. Use XMLparser.py to parse and extract abstract and full text pairs. Use make_datafiles.py to tokenize and split data into train(90%), val(5%), and test(5%). (Credit for https://github.com/abisee/pointer-generator)
-
Notifications
You must be signed in to change notification settings - Fork 0
boya-song/nyt_extract
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
This is to create the NYT dataset for summarization.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published