Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
CrossRef
Gateway
Literature
Metadata
PaperBot
PubMed
Search
.gitignore
LICENSE
README.md
launch.sh
pom.xml

README.md

PaperBot

PaperBot is a we crawler configurable, modular, open-source web-based solution to automatically find and efficiently annotate peer-reviewed publications based on periodic full-text searches across publisher portals. Without user interactions, PaperBot retrieves and stores article information (full reference, corresponding email contact, and full-text keyword hits) based on pre-set search logic from disparate sources including Wiley, ScienceDirect, Springer/Nature/Frontiers, HighWire, PubMed/PubMedCentral, and GoogleScholar. Although different portals require different search configurations, the common interface of PaperBot unifies the process from the user perspective. Once saved, all information becomes web accessible, allowing efficient triage of articles based on their actual relevance to the project goals and seamless annotation of suitable metadata dimensions.

The user should read and understand the terms of use of the portals that are using a scraper prior to activte this portion of the tool, we are not responsible of any misuse of it:

https://www.google.com/policies/terms/
http://olabout.wiley.com/WileyCDA/Section/id-826542.html

1. DataBase

1.1. Install & launch MongoDB

Follow the instructions: https://docs.mongodb.com/manual/administration/install-community/

1.2. Get an API key for ScienceDiect, SpringerLink and CrossRef (Wiley)

The portals ScienceDiect and SpringerLink require the user to register and obtain an API to use their APIs. You can register and find the key at https://dev.elsevier.com/user/registration and https://dev.springer.com/signup
CrossRef provides an option to retrieve the pdf urls, some of the portals are completely open, but Wiley for example requies the CrossRef key to download their articles. The key is obtained following the instructions provided in http://olabout.wiley.com/WileyCDA/Section/id-829772.html

1.3. Upload the portals configuration to the Portal Database

This is needed if you want to use the automated search (Elsevier/ScienceDirect, Springer, Nature, Wiley, PubMed/PubMed Central, and GoogleScholar). The manual PubMed search does not use the Portal Database.

  • token is the api key obtained in 1.2, once inserted you should replace the ... "token": "replace with your token"
    with your api key.
  • searchPeriod is defined in months.
  • active can be set to true if you want to launch the specific portal or false otherwise. For example, you may want to launch only one of the portal for a given time range and set the others to false.

a) Insert the data from the terminal copying and pastying the following:
mongo
use portal
db.portal.insertMany([
... {
... "name": "PubMed",
... "apiUrl": "https://eutils.ncbi.nlm.nih.gov/entrez/eutils",
... "active": true,
... "db": "pubmed"
... },
... {
... "name": "PubMedCentral",
... "apiUrl": "https://eutils.ncbi.nlm.nih.gov/entrez/eutils",
... "active": true,
... "db": "pmc"
... },
... {
... "name": "ScienceDirect",
... "apiUrl": "https://api.elsevier.com/content/search/scidir?",
... "active": true,
... },
... {
... "name": "Nature",
... "apiUrl": "http://api.nature.com/content/opensearch/request?",
... "searchPeriod": 3,
... "active": true
... },
... {
... "name": "Wiley",
... "url": "https://onlinelibrary.wiley.com/action/doSearch",
... "active": false,
... "base": "http://onlinelibrary.wiley.com"
... },
... {
... "name": "SpringerLink",
... "apiUrl": "http://api.springer.com/metadata/json?",
... "active": true,
... },
... {
... "name": "GoogleScholar",
... "url": "https://scholar.google.com/scholar?l=es&",
... "base": "https://scholar.google.com",
... "active": false
... }
... ]
... );

If everything works well you should see the following response. Of course the ids will be different:

{
"acknowledged" : true,
"insertedIds" : [
ObjectId("57c709dcf139a309cc559a81"),
ObjectId("57c709dcf139a309cc559a82"),
ObjectId("57c709dcf139a309cc559a83"),
ObjectId("57c709dcf139a309cc559a84"),
ObjectId("57c709dcf139a309cc559a85"),
ObjectId("57ceca1e14896407206e3d82"),
ObjectId("59272282f139a31a3a033501")
]
}

Close mongo console:
exit

2. Boot MicroServices

Microservices run an embedded tomcat using Spring Boot (.jar). All of them are independent and can be launched in any order

Pre-requisites: Maven to compile and build the code. Download: https://maven.apache.org/download.cgi and install: https://maven.apache.org/install.html and Java 8

2.1. Download the code

Download the code from git from the download button or you can use the terminal if git is installed in your system typing the following:
git clone https://github.com/NeuroMorphoOrg/PaperBot.git

2.2. Compile

From the terminal navigate inside the principal folder PaperBot-master and compile:
cd PaperBot
mvn clean install

This will compile all the services and you should see the SUCCESS for all the services at the end:

Reactor Summary:
[INFO]
[INFO] CrossRef ...................... SUCCESS [ 21.782 s]
[INFO] Metadata ...................... SUCCESS [ 0.996 s]
[INFO] PubMed ........................ SUCCESS [ 1.003 s]
[INFO] Literature .................... SUCCESS [ 1.607 s]
[INFO] Search ........................ SUCCESS [ 1.029 s]

2.3. Launch

If using Linux or Mac you can launch it typing: ./launch.sh

This will launch the required services with nohup and java -jar. Any error will be traced in the correspondnt log.

NOTE: Although the services can be used on your local machine, they are designed to run in a server. If you run them locally and restart your computer this step needs to be executed again. Same happens in a server. Servers are not rebooted that often, but I highly encourage you to create Unix/Linux services following Spring instructions resumed in https://springjavatricks.blogspot.com/2017/11/installing-spring-boot-services-in.html

3. Fronted

Pre-requisites: Apache web server installed & running: https://httpd.apache.org

3.1. Copy the frontend to apache folder & launch

Apache default directory is:
- MacOS: /Library/WebServer/Documents/
- Linux: /var/www/html
- Windows v2.2 and up (replace 2.2 with the version you had installed): C:\Program Files\Apache Software Foundation\Apache2.2\htdocs
- Windows v2: C:\Program Files\Apache Group\Apache2\htdocs

Replace from the following commands /Library/WebServer/Documents/ with your apache folder in the following commands:

sudo mkdir /Library/WebServer/DocumentsPaperBot
sudo cp -r NMOLiteratureWeb/app/ /Library/WebServer/Documents/PperBot

In your browser type: http://[ipAddress]/PaperBot

3.2. If runing on a server and not your localhost remember to update the ip in the browser

Update NMOLiteratureWeb/communications/articlesCommunicationService.js

var url_literature = 'http://<serverIP>:8443/literature';
var url_metadata = 'http://<serverIP>:8443/metadata';
var url_pubmed = 'http://<serverIP>:8443/pubmed';
...

3.3. Update metadata html to your desired metadata properties

Edit PaperBot/article/metadata.html. Any kind of object is supported since the metadataService receives type Object in java, so you can add Strings, Booleans, and Lists. If you want to use Lists you have to update the frontend controller accordingly.

Lets update a name for a given tag. For example:

<tr>
   <td><strong>Category 1:</strong></td>
   <td><span e-style="width:600px;" editable-text="metadata.category1">{{metadata.category1}}</span></td>
</tr>

Update Category 1 for your desired name, also category1 if you want the name of the DataBase to match (not needed). You can add as many <tr> groups as you want.

The metadataFinished is a nice feature that allows you to remember if you had finished reviewing a paper. If it is set to false, when you navigate to the Positive group of articles a red flag will remind you that there is pending work.

3.4. Go to the Wiki to learn how to update the keywords, the portals configuration, launch your first search, and add an article manually

3.5. Go to the browser & refresh

You will see how the web populates. It is ready to use.

You can’t perform that action at this time.