MLG - Visual Machine Learning arxiv Graph and Textual explorer

MLG (Machine Learning Graph) is a visual representation of ML researchers and papers, and the connections between them. Each node in the graph (/citations_network) is an author or a paper, and an edge can represent a citation, a reference or a authorship.

Note: There is an old version of the graph (/network) in which edges represent co-authorship of papers, based solely on arXiv.org.

Live demo is available at Lyrn.ai.

MLG allows you to:

Search for papers or authors.
Navigate between related papers and authors.
Click on a node to view its list of papers and double click to expand its connections.
Re-organize the network after expanding.

The backend is based on arxiv-sanity but with a lot of modifications:

The papers data is collected from arxiv.org and semanticscholar.org. Everything is stored on MongoDB.
Rebuilt the Twitter daemon - it now collects tweets from a list of prominent ML accounts, in addition for searching arxiv.org links on Twitter.

The project includes three parts:

/ - arXiv text explorer.
/citations_network - The new visual network graph explorer.
/network - The old arXiv visual graph explorer.

Example of the old version:

Dependencies

$ virtualenv env                # optional: use virtualenv
$ source env/bin/activate       # optional: use virtualenv
$ pip install -r requirements.txt

There is still some legacy code from arxiv-sanity that require some of the packages in the requirement.

Processing pipeline

Install and start MongoDB
Optional - Run fetch_papers.py to collect all paper from arXiv. Run fetch_citations_and_references.py to collect data from semanticScholar.org.
Create twitter.txt with your Twitter API credentials (values of consumer key and secret, in separate lines). You can also add accounts to the twitter_users.json file.
Run run_background_tasks.py to start background tasks scheduler.
Run the flask server with serve.py.

Old version - Generating the network graph

After fetching papers from arXiv you can build the network graph by running the notebook graph_generator.ipynb. It will overwrite the static/network_data.json.

Note: Calculating the physics of the network (nodes' position) is very slow. The current hack is to run it once (by changing the physics settings in network.js) and store the calculated positions. I tried using networkX to calculate the positions, however, the results weren't pleasing...

Running online

If you'd like to run the flask server online (e.g. AWS) run it as python serve.py --prod.

You also want to create a secret_key.txt file and fill it with random text (see top of serve.py).

Name		Name	Last commit message	Last commit date
Latest commit History 311 Commits
static		static
templates		templates
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
all_arxiv_categories.json		all_arxiv_categories.json
analyze.py		analyze.py
arxiv_graph.jpg		arxiv_graph.jpg
buildsvm.py		buildsvm.py
create_authors_db.py		create_authors_db.py
create_index.py		create_index.py
download_pdfs.py		download_pdfs.py
fetch_citations_and_references.py		fetch_citations_and_references.py
fetch_papers.py		fetch_papers.py
graph_generator.ipynb		graph_generator.ipynb
logger.py		logger.py
make_cache.py		make_cache.py
parse_pdf_to_text.py		parse_pdf_to_text.py
reddit_keys_template.json		reddit_keys_template.json
reddit_post.txt		reddit_post.txt
reddit_utils.py		reddit_utils.py
relevant_arxiv_categories.json		relevant_arxiv_categories.json
requirements.txt		requirements.txt
restart_server.sh		restart_server.sh
run_background_tasks.py		run_background_tasks.py
schema.sql		schema.sql
serve.py		serve.py
thumb_pdf.py		thumb_pdf.py
twitter_daemon.py		twitter_daemon.py
twitter_users.json		twitter_users.json
ui.jpeg		ui.jpeg
utils.py		utils.py
voting.py		voting.py

License

ranihorev/arxiv-network-graph

Folders and files

Latest commit

History

Repository files navigation

MLG - Visual Machine Learning arxiv Graph and Textual explorer

Dependencies

Processing pipeline

Old version - Generating the network graph

Running online

About

Resources

License

Stars

Watchers

Forks

Languages