GitHub - dfm/data.arxiv.io: Code and website for my arxiv abstract dataset

A little script that scrapes the metadata from the arXiv and saves it in a form that is useful for statistical analysis.

Usage

You'll need to install NLTK first and then run

python scrape.py

This will take many hours to run and it will save files of the form data/astro-ph/2007-05-10.txt.gz with one abstract per line. Each row has the tab-separated columns: arxiv id, space-separated categories, tokenized title, and tokenized abstract.

Credits

Licensed under the terms of the MIT License (see LICENSE).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
scrape.py		scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

scrape.py

scrape.py

Repository files navigation

Usage

Credits

About

Releases

Packages

Languages

License

dfm/data.arxiv.io

Folders and files

Latest commit

History

Repository files navigation

Usage

Credits

About

Resources

License

Stars

Watchers

Forks

Languages