nyt_extract

This repo is to parse and create training data for nyt dataset. Use unzip.py to extract all files in nyt corpus. Use XMLparser.py to parse and extract abstract and full text pairs. Use make_datafiles.py to tokenize and split data into train(90%), val(5%), and test(5%). (Credit for https://github.com/abisee/pointer-generator)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
XMLparser.py		XMLparser.py
find_all_tar.py		find_all_tar.py
make_datafiles.py		make_datafiles.py
make_datafiles_pytorch.py		make_datafiles_pytorch.py
unzip.py		unzip.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nyt_extract

About

Releases

Packages

Languages

boya-song/nyt_extract

Folders and files

Latest commit

History

Repository files navigation

nyt_extract

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages