Skip to content

sifrproject/docker-compose-bioportal

 
 

Repository files navigation

A docker compose for BioPortal API and the Annotator Proxy

Services provided

This docker compose launches all the services necessary to run ontologies_api and ncbo_cron. The docker files of each of the services above configures them properly to all run together, and bind all the persistent data in the data/ directory:

  • 4store
  • redis
    • Data in data/redis
    • Port 6379 (except for tests, where we use also 6380 and 6381)
    • 3 servers:
      • redis-goo
      • redis-annotator
      • redis-http
  • solr
  • mgrep
  • bioportal-api
    • ontologies_api accessible on http://localhost:8080
    • data/bioportal/repositories contains the ontologies processed by bioportal
    • data/bioportal/reports contains the reports generated by the processing of ontologies
    • data/ncbo_logs contains the logs from ncbo_cron and other processes in ontologies-api  * data/submit location for ontologies to be submitted by the admin/populate script
  • bioportal-annotator-proxy

Test environment

If you wish to run the ncbo_cron integration tests to check that the containers are set-up properly, run the following script: 00_run_test_containers.sh, it will run only the containers needed for the test and expose the right ports. Nothing more is needed.

Quick setup

A quick guide with commands to easily setup a BioPortal appliance on your machine (with security disabled)

./0_purge_data_and_reset.sh
./2_build_containers.sh
./3_initialize.sh

Before running the ./3_initialize.sh script, please put the ontologies you want to submit in data/submit as ACRONYM.[ttl/owl/skos], where acronym is the acronym for the ontology that will be sumitted and later accessible through the API.

A "admin" user is created with the following apikey: 61297daf-147c-40f3-b9b1-a3a2d6b744fa

You can stop the containers by doing: docker-compose down and run them again with docker-compose up

Enable Security

By default the docker build is set with security disabled, making it easier to use for tests and development (no apikey is asked when interacting with the API). Security can be enabled by changing the config.enable_security = false to true in bioportal-api/config-ontologies-api.rb file.

  • Now that the security has been enabled the admin user is not automatically created. We need to create the admin user through the NCBO Cron console, with the apikey 61297daf-147c-40f3-b9b1-a3a2d6b744fa:
# Access to the NCBO Cron console
./admin/ncbo-console
# Create the user
LinkedData::Models::User.new({:username => "admin", :email => "admin@god.org", :role => [LinkedData::Models::Users::Role.find("ADMINISTRATOR").include(:role).first], :password => "password", :apikey => "61297daf-147c-40f3-b9b1-a3a2d6b744fa"}).save
  • To add a new ontology and submission manually, you may use the REST API as follows
# using pullLocation (here Movie Ontology)
curl -X PUT -H "Content-Type: application/json" -H "Authorization: apikey token=61297daf-147c-40f3-b9b1-a3a2d6b744fa" -d '{ "acronym": "TEST", "name": "Test Ontology", "administeredBy": ["admin"]}' http://localhost:8080/ontologies/TEST

curl -X POST -H "Content-Type: application/json" -H "Authorization: apikey token=0eab1f37-0f43-46ed-a245-5060b2e2eaa5" -d '{"contact": [{"name": "Admin","email": "admin@god.org"}], "ontology": "http://localhost:8080/ontologies/TEST", "hasOntologyLanguage": "OWL", "released": "2016-01-01", "pullLocation": "http://www.movieontology.org/2010/01/movieontology.owl"}' http://localhost:8080/ontologies/TEST/submissions

# The STY ttl file has been previously put in data/submit. So it is in /srv/submit in the container (for uploadFilePath param). But not working (the ontology file is not properly copied to the submission repertory)
curl -X PUT -H "Content-Type: application/json" -H "Authorization: apikey token=61297daf-147c-40f3-b9b1-a3a2d6b744fa" -d '{ "acronym": "STY", "name": "UMLS Semantic Network", "administeredBy": ["admin"]}' http://localhost:8080/ontologies/STY

# Copy the ontology file in the repository
cp data/bioportal/umls_semantictypes_2015AA.ttl data/submit/STY.ttl

curl -X POST -H "Content-Type: application/json" -H "Authorization: apikey token=0eab1f37-0f43-46ed-a245-5060b2e2eaa5" -d '{"contact": [{"name": "Admin","email": "admin@god.org"}], "ontology": "http://localhost:8080/ontologies/STY", "hasOntologyLanguage": "UMLS", "released": "2016-01-01", "uploadFilePath": "/srv/submit/STY.ttl"}' http://localhost:8080/ontologies/STY/submissions
  • You may also use the admin/populate script to achieve the same. Please put your ontology files to submit in data/submit, as ACRONYM.[ttl/owl/skos], where acronym is the acronym for the ontology that will be sumitted and later accessible through the API. Subsequently, you may just run admin/populate, wait for the submission logs to show the end of the indexing for all the ontologies, interrupt with Ctrl + C and finally run admin/regenerate-dictionary.

Scripts details and preparing data

The first step in deploying this docker compose is to clone this repository:

git clone https://github.com/sifrproject/docker-compose-bioportal.git

Subsequently, administration scripts are provided to set-up the environment. You may should run them in the following order:

0_purge_data_and_reset.sh (Optional)

Erases all the persistant data from the data directory. This is useful if you want to reset an already set-up bioportal instance.

1_prepare_data.sh

This script allows you to retrieve all the ontologies you need from NCBO bioportal (English) or LIRMM (French) for the ontologies with no licence restrictions.

The script takes three arguments:

  • The first argument is the portal from which to fetch the ontologies (lirmm or ncbo).
    • The second argument is your api-key from the selected portal (you must create an account for free on the portal to obtain the api-key).
    • The last argument is the list of ontology acronyms to retrieve from the selected portal (you may find the list of ontologies on the portal).

If the process is interrupted, you can run the script again, it will not redownload ontologies that were already downloaded before. The ontologies downloaded are saved in the data/bioportal/repository/ directory, if you mistakingly included an ontology you do not need, you may delete it directly from this directory.

Alternatively, instead of running this script you may manually put the ontologies in the data/bioportal/repository/ directory if you already have them with the name ACRONYM.ttl, where ACRONYM is the acronym of each ontology.

2_build_containers.sh

This script will generate all the data directory, reinitialize the 4store and run docker-compose build to build all the containers prior to running them. WARNING: Only run initially, if you run this after indexing ontologies it will purge the contents of the 4store triple store. Please manuslly rebuild individual containers with docker-compose build instead of running the build script again.

3_initialize.sh

This script will run docker-compose up -d --force-recreate to start all the containers and services and will then proceed to submit all the ontologies located in the data/bioportal/repository/directory.

The script will show you the submission logs, you must monitor them during the submission process until no more ontologies are being processed for more than a few minutes. At that point you will see the Finished ontology process queue check repeatedly without any ontologies starting to be processed.

You can then interrupt the script.

Day to day operations

Once the appliance is populated and the initial set-up is finished, you may stop your containers as per usual with docker-compose down . To start the containers again, you can use docker-compose up

If you wish to start from scratch, you may use the 0_purge_data_optional.sh script to purge all the data and then repeat the initial setup-process by running scripts from 1 to 5 from the start.

If you wish to update the containers in the docker-compose to use the latest version of ontologies-api, ncbo-cron and annotator-proxy, you can manually rebuild the containers with: docker-compose build --no-cache bioportal-api and docker-compose build --no-cache bioportal-annotator-proxy .

You then need to restart and recreate the containers: docker-compose down and then docker-compose up --force-recreate

Requirements and dependencies on the host machine

Use the latest version of Docker on a linux host with an up to date kernel (prefarably the latest stable release of the upstream branch).

Warning: The native version of docker for MacOS contains active bugs that cause the docker deamon to hang-up during the indexation process. If you wish to use this docker-compose on a MacOS host, you may want to use docker-toolkit and docker-machine to create a virtualized docker environemnt. Alternatively you may install docker in a virtual machine and deploy docker compose inside the virtual machine. The same may be true on a Windows machine with the native windows version of docker.

Utilities required for the deployment process

The depolyment and set-up process requires a number of basic utilities to run:

  • curl
  • wget

curl is required by 1_prepare_data.sh and 3_initialize.sh.

wget is only required for 1_prepare_data.

About

A Docker compose to have a fully working BioPortal development environment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • XSLT 29.1%
  • HTML 25.9%
  • JavaScript 25.1%
  • CSS 7.6%
  • Ruby 7.1%
  • Shell 2.8%
  • Other 2.4%