Arxiv-connections is a python pkg, and provides a CLI (arxiv-connector
) to find academics related to other academics based on closeness through arxiv co-authorings.
You can interact with the notebook and code here :
- Given a person, find the people who are most related to the things they work on
- Create a visualization of the authors that surround a particular individual
- Not yet accomplished : produce a framework where one can provide a name, and it would generate a report of where to look.
As with all development, I recommend using a virtual environment to prevent dependency issues.
Python3 comes with venv
, and virtualenv
also works. This step is optional, but recommended:
- Create a fresh python virtual environment. Which looks something like
python3.7 -m venv my_env37
orvirtualenv --python=/usr/bin/python3.7 my_env37
- followed by
source my_evn37/bin/activate
to enter the environment (to leave this python environment, rundeactivate
at anytime)
pip install arxiv-connections
git clone git@github.com:RoyRin/arxiv_connections.git
cd arxiv_connections && poetry install arxiv_connections
Once you have done the steps from 'How To Install', Now you can either run your own python package or use the CLI from terminal.
There is 1 library (arxiv_connections) and 2 ways to use it: Command Line (call arxiv-connector
) and calling the code directly
from python (using import arxiv_connection
). There is a jupyter notebook named arxiv_connections/arxiv_explorer.ipynb that
is to serve as an example for this.
Generally, the current code will pull articles from Arxiv. It will generate graph-nodes for each author of each article. It will create edges between authors if they have co-authored something, with the edge weights proportional to the number of papers they have co-authored.
For new author-investigations, you can scrape directly from arxiv. To make future investigations easier, you can
save the articles discovered off arxiv to a CSV, and then read from it in future iterations of the investigation.
(i.e. arxiv-connector crawl-and-plot 'fname lastname' -s name.csv -m 7 -d 3
and
then arxiv-connector crawl-and-plot 'fname lastname' -r name.csv
)
Note you can easily make a lot of requests and keep the program from running. If you are pulling data from arxiv, you should estimate the number of you make as [max-results-per-search ** max-depth].
If you have properly installed arxiv-connector
it should tab-complete. Just calling it should produce a list of
sub-commands. As of 8/10/20, there is only 1 sub-command crawl-and-plot
to understand how it works, run
arxiv-connector crawl-and-plot --help
Quick-start from terminal:
- Quick-start example #1 (from a file):
arxiv-connector crawl-and-plot 'dmitry rinberg' -r example_data/dmitry_rinberg.csv
- Quick-start example #2 (pull from arxiv):
arxiv-connector crawl-and-plot 'dmitry rinberg' -s dmitry_rinberg_read.csv -d 3 -m 8 -D
(this example code is in the git repo) - run
arxiv-connector --help
for an explanation of the CLI's commands
Quick-start From Jupyter:
- You can explore the data directly in a jupyter notebook using
arxiv_connector/arxiv_explorer.ipynb
- To run the jupyter notebook, call
jupyter notebook
(you may need topip install jupyter
) and navigate to the jupyter file
arxiv-connector crawl-and-plot 'stuart russell' -m 15 -d 3
arxiv-connector crawl-and-plot 'stuart russell' -m 7 -d 3
How to Zoom
Hover
concentric-view
Note, this has been developed on Ubuntu 18.04.
To test run pytest tests
Commit hooks : following guidance from https://codeinthehole.com/tips/tips-for-using-a-git-pre-commit-hook/ and https://githooks.com/
- To set-up easy formatting practices (i.e yapf + stripping jupyter notebooks):
- run
ln -s ../../hooks/pre-commit.sh .git/hooks/pre-commit
from arxiv_connector home
- run
- Add tests !
- read through all the TODO comments, and actually fix them!
-
be able to create a metric between 2 different authors
-
Investigate ways to define a few metrics on a graph, then rank individuals
-
Generate reports on other academics, as a way to investigate who to look into next
-
Be able to process more information than just Co-authoring, when considering new academics (i.e. topics)
-
Add some images and put them in the git repo, for others to understand the tool better
-
To-Do define some kind of bottle-neck distance or distance about number of paths to get to another individual `(In order to avoid bottlenecks)
- Make the distance between nodes be porportional to the weight between them