Skip to content

dimared/article-parser

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ilovescience

Set of scripts for arxiv.org articles text mining

Scripts

articles_crawl.py loads articles. It may take several hours

annotations_crawl.py loads annotations

lda.py extracts topics with Latent Dirichlet Allocation

terms_cn.py counts keywords in articles base

cites.py counts references and show most citied articles

word_vec.py builds word2vec model

Scripts stored in src path. Articles stored in .txt format in arxiv/<section>/<year>/<month>/ Results stored in stat path.

Usage

discover.py <section>.<year> run all analysing scripts

notes.py <section>.<year> generate and open Jupyter notebook with calculated statistics

About

Text mining of arxiv.org articles

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 94.4%
  • Python 5.6%