ilovescience

Set of scripts for arxiv.org articles text mining

articles_crawl.py loads articles. It may take several hours

annotations_crawl.py loads annotations

lda.py extracts topics with Latent Dirichlet Allocation

terms_cn.py counts keywords in articles base

cites.py counts references and show most citied articles

word_vec.py builds word2vec model

Scripts stored in src path. Articles stored in .txt format in arxiv/<section>/<year>/<month>/ Results stored in stat path.

discover.py <section>.<year> run all analysing scripts

notes.py <section>.<year> generate and open Jupyter notebook with calculated statistics

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
arxiv		arxiv
notebooks		notebooks
src		src
stat		stat
topics		topics
README.md		README.md
discover.py		discover.py
notes.py		notes.py
requirements.txt		requirements.txt