A tool to visually browse co-occurrence of MeSH terms in PubMeb
JavaScript CSS Python HTML
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
images
javascripts
readme-images
stylesheets
.gitignore
Dockerfile
LICENSE
README.md
config.txt
index.html
mesh_stopwords.txt
pm2mdb.py
server.py
terms.txt
test_server.py
url_gen.py
wordcloud.py

README.md

MeSHgram

A tool to visually browse co-occurrence of MeSH terms in PubMeb.

Publications indexed in PubMed have human curated MeSH terms associated with them. We leverage these MeSH terms and create a visual search tool to find articles in PubMed. The idea is that a visual inspection of co-occurrences is helpful for exploratory queries to PubMed.

We recently launched our website!

To check out MeSHgram in action go to meshgram.org. The site is still under development. Please leave your comments / issues here.

Citation details will be posted soon. For now please cite the repository or the website directly.

Software Artifacts

Server code was tested in Python 3.5 and Web client was tested in all major browsers except FireFox.

url_gen.py - generates Pubmed XML archive urls to be fed to wget to download.

pm2mdb.py - parses the downloaded Pubmed XML archives and loads them into Mongodb.

server.py - CherryPy based server that provides json end points for the Web Front End.

config.txt - CherryPy config file.

terms.txt - list of all MeSH terms, alphabetically sorted, extracted from the database.

mesh_stopwords.txt - "Stop words" among MeSH terms. We calculated the 100 most frequent MeSH terms across the entire corpus and manually curated some terms out.

External Libraries / Packages

lxml - C library for fast native XML parsing.

MongoDB - Scalable NoSQL database.

PyMongo - Python driver for MongoDB.

CherryPy - A lightweight HTTP server. Used for REST/JSON in our project.

nvd3 - D3 based javascript visualization library.

jqcloud - Javascript plug-in for wordcloud

System Components

System Components

Data Source

FTP download from NLM bulk distribution for MEDLINE/PubMed