Scalable Interoperable Annotation Server (SIA)

Project description

SIA is an annotation service according to the BioCreative V.5. BeCalm task TIPS. Annotations for mutation mentions are generated using SETH, mirNer, and diseases using a dictionary lookup. Results are returned in JSON according to these definitions.

Citation

To cite SIA, please use the following reference:

@Article{Kirschnick2018,
  title     = {{SIA:} a scalable interoperable annotation server for biomedical named entities},
  author    = {Johannes Kirschnick and Philippe Thomas and Roland Roller and Leonhard Hennig},
  journal   = {Journal of Cheminformatics},
  volume    = {10},
  number    = {1},
  pages     = {63:1--63:7},
  year      = {2018},
  month     = {Dec},
  url       = {https://doi.org/10.1186/s13321-018-0319-2},
  doi       = {10.1186/s13321-018-0319-2}
}

A PDF version of the paper is freely available here

Getting Started

Note

The system uses RabbitMQ to load balance, so make sure it is running locally before starting the application, refer to how to install RabbitMQ for help.

If you want to skip the RabbitMQ installation, for convenience, you can just start it via maven (this might not work on your machine)

./mvnw rabbitmq:start

Check http://localhost:15672/ for the management interface, default login: guest/guest

And issue the following to tear down RabbitMQ afterwards

./mvnw rabbitmq:stop

To start the system in development mode issue

./mvnw spring-boot:run

This starts the backend without submitting results to the tips server, instead results are printed to the console. The server is listening on port 8080 by default.

getAnnotation

Issue the following curl request to trigger a new annotation request with a sample payload

curl -vX POST http://localhost:8080/call -d @src/test/resources/samplepayloadGetannotations.json --header "Content-Type: application/json"

and watch the console for results.

getStatus

To trigger a get status report, use the following curl request

curl -vX POST http://localhost:8080/call -d @src/test/resources/sampleplayloadGetStatus.json --header "Content-Type: application/json"

Adding custom annotators

To extend SIA for additional Named Entity Recognition tools you have to:

Implement the Annotator interface

Consult the examples in the corresponding package for implementation details. Afterwards, for correct message routing, it is necessary to define the input channel. Input channels can be freely named, but we recommend to use the name of the annotator. For example:

@Transformer(inputChannel = "yourAnnotator")

This annotation placed on the annotator defines that inputs are coming from the yourAnnotator channel. Internally channels are mapped to queues automatically.

Add your annotator as recipient in FlowHandler and define the set of PredictionType your annotator responds to accordingly.

For example:

.recipientMessageSelector("yourAnnotator", message -> headerContains(message, CHEMICAL) && enabledAnnotators.yourAnnotator)

Here the yourAnnotator has to match the transformer inputChannel definition. And defines that all requests that need to be tagged with CHEMICAL will be send to the yourAnnotator channel. headerContains(message, CHEMICAL) is a helper method to check if in the header a field called types contains the enum CHEMICAL. The header is automatically populated from the request message containing the annotator types requested.

Furthermore enabledAnnotators is an injected configuration bean which allows to specify which annotators to enable.

Simply add a new boolean property with yourAnnotator to the class allows to control which annotators to enable. Check application.properties.

Available Annotators

BannerNER
DiseasesNER
Linnaeus
MirNER
SETH
ChemSpot
DNorm (external)

BannerNER

BANNER is a named entity recognition system, primarily intended for biomedical text.

http://banner.sourceforge.net/

DiseasesNER

DiseasesNER is using a large dictionary of desease mentiones.

Linnaeus

Species name recognition and normalization software.

http://linnaeus.sourceforge.net/

MirNER

mirNer is a simple regex based tool to detect MicroRna mentions in text, following the mi-RNA definition of Victor Ambroset al., (2003). A uniform system for microRNA annotation. RNA 2003 9(3):277-279.

https://github.com/Erechtheus/mirNer

SETH

SNP Extraction Tool for Human Variations.

SETH is a software that performs named entity recognition (NER) of genetic variants (with an emphasis on single nucleotide polymorphisms (SNPs) and other short sequence variations) from natural language texts.

https://rockt.github.io/SETH/

ChemSpot (external)

ChemSpot is a named entity recognition tool for identifying mentions of chemicals in natural language texts, including trivial names, drugs, abbreviations, molecular formulas and IUPAC entities.

https://www.informatik.hu-berlin.de/de/forschung/gebiete/wbi/resources/chemspot/chemspot

DNorm (external)

DNorm is an automated method for determining which diseases are mentioned in biomedical text, the task of disease normalization. Diseases have a central role in many lines of biomedical research, making this task important for many lines of inquiry, including etiology (e.g. gene-disease relationships) and clinical aspects (e.g. diagnosis, prevention, and treatment).

https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/DNorm.html

External annotators

DNorm and ChemSpot are integrated out of process. This means that you need to start the annotators before you can use them. Communication is handled via a dedicated queue for each handler respectively.

Start DNorm

./mvnw -f tools/dnorm/pom.xml -DskipTests package
java -Xmx8g -jar tools/dnorm/target/dnorm-0.0.1-SNAPSHOT.jar

Start ChemSpot

./mvnw -f tools/chemspot/pom.xml package
java -Xmx16g -jar tools/chemspot/target/chemspot-0.0.1-SNAPSHOT.jar

Tagging PubMed Dumps

You can simply tag pubmed articles from ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/ by putting them into the directory tools/pubmedcache.

Configure the annotators to use by creating an application.properties file in the current directory and add the annotators you want to use. Then start any external annotators that you want to use.

If you don't customize the annotators, the following default configuration is applied:

sia.annotators.banner=false
sia.annotators.diseaseNer=false
sia.annotators.mirNer=false
sia.annotators.linnaeus=false
sia.annotators.seth=true

# external
sia.annotators.dnorm=false
sia.annotators.chemspot=false

Finally start the SiaPubmedAnnotator class with the driver and backend profile enabled. The driver profile ensures that output is collected into the directory annotated, while the backend profile ensures that the internal annotators are started as well.

./mvnw -DskipTests package
java -cp target/sia-0.0.1-SNAPSHOT.jar \
     -Dloader.main=de.dfki.nlp.SiaPubmedAnnotator \
     org.springframework.boot.loader.PropertiesLauncher \
     --spring.profiles.active=backend,driver

Example output

$ ls -lh annotated
1.0K Jun 28 23:15 annotation-results_2018-06-28_11-15-07.json 
$ head annotated/a*
{"predictionResults":[{"document_id":"10022392","section":"A","init":1085,"end":1090,"score":1.0,"annotated_text":"T337A","type":"MUTATION"} ....

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
.mvn/wrapper		.mvn/wrapper
src		src
tools		tools
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.md		LICENSE.md
README.md		README.md
manifest.yml		manifest.yml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

License

Erechtheus/sia

Folders and files

Latest commit

History

Repository files navigation