Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Open in Colab Open in Code Ocean

Cross-domain joint visualization of documents and keywords

A Jupyter notebook to visualize learning paths on research papers collections using Latent Semantic Analysis performed on user-authored keywords.

These are the materials for the paper:

A. Benito-Santos and R. T. Sánchez, “Cross-Domain Visual Exploration of Academic Corpora via the Latent Meaning of User-Authored Keywords,” IEEE Access, pp. 1–1, 2019.

This code implements a visualization scheme to explore a collection of research papers with a pre-defined aim. It offers a perspective of the target dataset that reveals the kind of knowledge you are interested in extracting.

The method employs keyword associations obtained from an auxiliary bag-of-words that represents the aim of your research. These associations can be freely composed from your personal collection or from any other collection of your liking that is linked to the theme you want to explore.

In the paper we showcase how a large collection of visualization research papers can be explored using a Digital Humanities query corpus.

alt text


You have three options to run the code: Google's Colaboratory, CodeOcean and locally on your own machine. Instructions adapted from Jeffrey Perkel's example notebook made for Perkel, J. M. (2018). Why Jupyter is data scientists' computational notebook of choice. Nature, 563(7729), 145.

To use Colab:

  1. Click the Open in Colab button above. It will launch the notebook directly.
  2. Make the notebook live by clicking 'Connect' in the Colab toolbar.
  3. Uncomment the two cells under 'Google Colab only' by removing the leading # on each line.
  4. Select Runtime > Run All in the menu to execute the notebook. (You may get a warning that the page was not authored by Google.)
  5. Go the bottom and check the visualization that was just created.

To use CodeOcean:

  1. Click the Open in CodeOcean button above. It will launch the notebook directly. Then either:
  2. Click the 'Run' button at the top right (click the keywords_vis.html file in the Results pane at right to see the output); or
  3. Launch an interactive Jupyter session within Code Ocean.

To run the notebook locally (python 3.6 and pip required)

  1. git clone
  2. cd keywords-vis
  3. mkvirtualenv keywords-vis (if you have virtualenvwrapper installed). Otherwise create a virtual environment as usual.
  4. When the virtual environment has been activated, use pip install -r requirements.txt.
  5. python -m ipykernel install --user --name=keywords-vis.
  6. Launch jupyter with jupyter notebook and keywords_vis.ipynb
  7. Select the kernel that was created in step 5: Kernel > Change Kernel > keywords-vis
  8. Run all cells: Kernel > Restart & Run All

Visualizing other queries

You can try other queries by modifying the variable path_keys, which is set to show the first use-case of the paper: path_keys = ['shakespear'] Any other combination of query keywords is accepted. You can find all available query keywords in merged_paths_dict.

Regenerating the model / Using your own data

If you want to use a different set of keywords (or use two completely new collections) you must regenerate the model that is created below the notebook section Generating the model. In order to do so, remove ./model/all-paths.pkl from the environment and run the code again. (It may take several hours depending on the size of your data and processing power).


A Jupyter Notebook to explore document collections via the latent meaning of user-authored keyword







No releases published


No packages published