Cross-domain joint visualization of documents and keywords
A Jupyter notebook to visualize learning paths on research papers collections using Latent Semantic Analysis performed on user-authored keywords.
These are the materials for the paper:
A. Benito-Santos and R. T. Sánchez, “Cross-Domain Visual Exploration of Academic Corpora via the Latent Meaning of User-Authored Keywords,” IEEE Access, pp. 1–1, 2019. https://ieeexplore.ieee.org/document/8766090/
This code implements a visualization scheme to explore a collection of research papers with a pre-defined aim. It offers a perspective of the target dataset that reveals the kind of knowledge you are interested in extracting.
The method employs keyword associations obtained from an auxiliary bag-of-words that represents the aim of your research. These associations can be freely composed from your personal collection or from any other collection of your liking that is linked to the theme you want to explore.
In the paper we showcase how a large collection of visualization research papers can be explored using a Digital Humanities query corpus.
You have three options to run the code: Google's Colaboratory, CodeOcean and locally on your own machine. Instructions adapted from Jeffrey Perkel's example notebook made for Perkel, J. M. (2018). Why Jupyter is data scientists' computational notebook of choice. Nature, 563(7729), 145.
To use Colab:
- Click the
Open in Colabbutton above. It will launch the notebook directly.
- Make the notebook live by clicking 'Connect' in the Colab toolbar.
- Uncomment the two cells under 'Google Colab only' by removing the leading
#on each line.
Runtime > Run Allin the menu to execute the notebook. (You may get a warning that the page was not authored by Google.)
- Go the bottom and check the visualization that was just created.
To use CodeOcean:
- Click the
Open in CodeOceanbutton above. It will launch the notebook directly. Then either:
- Click the 'Run' button at the top right (click the
keywords_vis.htmlfile in the Results pane at right to see the output); or
- Launch an interactive Jupyter session within Code Ocean.
To run the notebook locally (python 3.6 and pip required)
git clone https://github.com/ale0xb/keywords-vis
mkvirtualenv keywords-vis(if you have virtualenvwrapper installed). Otherwise create a virtual environment as usual.
- When the virtual environment has been activated, use
pip install -r requirements.txt.
python -m ipykernel install --user --name=keywords-vis.
- Launch jupyter with
jupyter notebookand keywords_vis.ipynb
- Select the kernel that was created in step 5: Kernel > Change Kernel > keywords-vis
- Run all cells: Kernel > Restart & Run All
Visualizing other queries
You can try other queries by modifying the variable
path_keys, which is set to show the first use-case of the paper:
path_keys = ['shakespear']
Any other combination of query keywords is accepted. You can find all available query keywords in
Regenerating the model / Using your own data
If you want to use a different set of keywords (or use two completely new collections) you must regenerate the model that is created below the notebook section Generating the model. In order to do so, remove ./model/all-paths.pkl from the environment and run the code again. (It may take several hours depending on the size of your data and processing power).