Important
Branches have been changed as shown below on 2023-11-05 (Sunday).
original_only
->main
master
->both_versions
The two branches differ as described below.
- In
main
branch, data is fetched directly from the Pubmed API. - In
both_versions
branch, data can be fetched either directly from the Pubmed API or from a local database which contains copies of those data.
This is a data visualisation website. When the user types in a keyword (e.g. psychology) or an author's name, the website identifies top authors (i.e. authors who published most) on Pubmed (1) on a keyword or (2) together with a specified author. Then, each author's publication counts are visualised in interactive plots (1) per year and (2) per journal.
1. Make an account at Cloud9, and create a workspace choosing Python as a template.
Now, copy the command in each of the following steps and paste it into the terminal.
git clone https://github.com/gknam/pubmed-top-authors.git
cd pubmed-top-authors
sudo pip3 install -r requirements.txt
export FLASK_APP=application.py
export FLASK_DEBUG=1
flask run --host=0.0.0.0 --port=8080
Go to Preview
--> Preview Running Application
To kill the server, do the following with the the terminal in which flask
command is running.
- Click anywhere in the terminal and press
Ctrl
+C
. - Close the terminal
If you want to restart the server, follow the instructions in Start server section above.
You might want to kill the server and restart it in case you want to stop a search and do a new one. Search will be slower with bigger numbers typed in "Max number of days from today and/or "Max number of articles to check" on the website.
- Backend
- Python
- SQLite3 (via Python's SQLAlchemy)
- xml.etree.ElementTree
- Frontend
- HTML
- CSS
- JavaScript
The user types in a keyword (e.g. psychology) or an author's name in the search bar. The user can also specify the (1) data fetching method - which will be explained below - , (2) number of top authors to identify, (3) date range going backwards from today, and (4) maximum number of articles (i.e. publications) to check. The set of search criteria are sent to the back-end in JSON format.
jQuery is used for simplified syntax.
In the back-end, publications that match the search criteria are identified.
Data are fetched for each identified publication. The fetched data include various details of each publication (e.g. author name, publication year, journal title, etc.).
Data are downloaded from Pubmed's database via Pubmed API as XML files. From the XML files, relevant elements (i.e. information) are Extracted using xml.etree.ElementTree, then Transformed into a Python dictionary.
Pro: Fetched data are reliable because this accesses the original database
Cons: Retrieving data via Pubmed API can be slow especially when the query range (date range, maximum number of articles) is big. Also, Pubmed API sets a rather tight limit on query range.
Within the fetched data, top authors are identified who have most publications (1) on the specified keyword or (2) together with the specified author. Then, the top authors' data are sent to the front-end in JSON format.
Data are reorganised. Then, using D3.js, each author's publication counts are visualised in interactive plots (1) per year and (2) per journal.