Skip to content

Onto-Med/top-backend

Repository files navigation

TOP Backend

Spring Boot based backend of the TOP framework. Please see top-deployment for additional documentation.

Lint and Test

Running the Spring Server

  1. Set up environment variables:

    • APP_PORT: the port where the spring application will run on, e.g. 8080
    • APP_PATH: the context path, e.g. "/" for root
    • DB_TYPE: type of the DB to be used, defaults to postgresql
    • DB_HOST: host running the database server, defaults to localhost
    • DB_NAME: name of the database, defaults to postgres
    • DB_PORT: port of the database host, defaults to 5432
    • DB_USER: username for connecting to the database, defaults to postgres
    • DB_PASS: password for connecting to the database, required
    • DATA_SOURCE_CONFIG_DIR: location of data source configuration files, defaults to config/data_sources
    • DOCUMENT_DATA_SOURCE_CONFIG_DIR: location of document data source configuration files, defaults to config/data_sources/nlp
    • QUERY_RESULT_DIR: location where query results are stored to, defaults to config/query_results
    • QUERY_RESULT_DOWNLOAD_ENABLED: whether users with write permission for a repository can download query results or not, defaults to true
    • TERMINOLOGY_SERVICE_ENDPOINT: endpoint of the Ontology Lookup Service to be used for code search, defaults to https://www.ebi.ac.uk/ols4/api (OLS4 is currently supported)
    • IMPORT_DEMO_DATA: whether a demo organisation with a small BMI phenotype model should be imported on startup, defaults to false
    • MAX_BATCH_SIZE: max number of entities that can be uploaded to the server in one batch

    Document related:
    (The following variables will be overwritten by their respective adapter values if specified)

    • DB_NEO4J_USER: username for neo4j database, defaults to neo4j
    • DB_NEO4J_PASS: password for neo4j database (should be declared here and not written into an adapter)
    • DB_ELASTIC_USER: username for elasticsearch database, defaults to elastic
    • DB_ELASTIC_PASS: password for elasticsearch database (should be declared here and not written into an adapter)

    (These are general configuration variables for the database that won't be declared in an adapter)

    • DB_ELASTIC_CONNECTION_TIMEOUT: timeout in seconds, defaults to 1s
    • DB_ELASTIC_SOCKET_TIMEOUT: timeout in seconds, defaults to 30s
    • DB_NEO4J_CONNECTION_TIMEOUT: timeout in seconds, defaults to 30s

    OAuth2 related:

    • OAUTH2_ENABLED: enable or disable oauth2, defaults to false
    • OAUTH2_URL: base URL of the OAuth2 server, defaults to http://127.0.0.1:8081
    • OAUTH2_REALM: name of the OAuth2 realm to be used for authentication
  2. Start the PostgreSQL database (see dockerhub). Please review the documentation for production use.

    docker run --rm -p 5432:5432 -e POSTGRES_PASSWORD=password postgres
  3. Start the Neo4J database (see dockerhub). Please review the documentation for production use.

    docker run --rm -p 7687:7687 -e NEO4J_AUTH=neo4j/password neo4j
  4. Start a default document index service (Elasticsearch) on the address specified in the adapter (if no adapter file is found defaults to localhost:9008).

  5. Start the concept graphs service on the address specified in the adapter (if no adapter file is found defaults to localhost:9007).

  6. Start the OAuth2 server (see dockerhub).

If you run the TOP Framework with an OAuth2 server, the first user that is created will have the admin role.

NLP/Document related configuration

To utilize the document search of the framework, one needs three different services running:

  1. Elasticsearch or something similar (default: http://localhost:9008)
  2. A Neo4j cluster (default: bolt://localhost:7687)
  3. And the concept graphs service (default: http://localhost:9007)

The document search is adapter centric and one needs a working configuration file (yml) that specifies the addresses of said services under the folder declared with the environment variable DOCUMENT_DATA_SOURCE_CONFIG_DIR. If no DOCUMENT_DEFAULT_ADAPTER is specified, the first adapter found in the folder is used for setup.
If there is no such adapter present, or it's faulty in some other way, the framework uses default values for the connections (specified above). You can find more information about the adapter specification under top-document-query.
The concept graphs service is responsible for generating graphs of related phrases from a document source (either via upload or an external data/document server like the one used for the framework). These graphs in turn are then represented as concept nodes, phrase nodes and document nodes on a Neo4j cluster where they serve as a way to search/explore documents.

Plugins

Any plugin you want to provide must be a member of the package care.smith.top. It is sufficient to build JAR files and place them in the classpath of this application.

Please make sure to set all TOP Framework dependencies in your plugins to provided! e.g.:

<dependencies>
    <dependency>
        <groupId>care.smith.top</groupId>
        <artifactId>top-api</artifactId>
        <version>${version}</version>
        <scope>provided</scope>
    </dependency>
</dependencies>

Currently supported plugin types:

  • phenotype importer: implement care.smith.top.top_phenotypic_query.converter.PhenotypeImporter
  • phenotype exporter: implement care.smith.top.top_phenotypic_query.converter.PhenotypeExporter

Development

Coding Standard

The code in this repository, and in contributions provided via pull requests, should conform to Google Java Style.

We use the flag --skip-reflowing-long-strings for google-java-format, as it is currently not supported by all IDEs.

If your IDE does not support file formatting, you can get a JAR release of google-java-format and run the following command from the root of this repository:

java -jar google-java-format-CURRENT_VERSION-all-deps.jar --skip-reflowing-long-strings --replace $(git ls-files *.java)

Database Migrations

The application uses Liquibase in combination with liquibase-maven-plugin to manage migrations (changelog files).

This section describes how to generate new changelogs based on modifications applied to JPA entities. To generate new changelogs, a local HSQL database is used to reflect the state prior to changes.

  1. Run mvn liquibase:update to apply all changelogs to the local HSQL database.
  2. Make desired modifications to the JPA entities in care.smith.top.backend.model.
  3. Recompile all JPA entities with mvn compile.
  4. Run the following command to generate new changelogs:
    mvn liquibase:diff \
      -Dliquibase.diffChangeLogFile=src/main/resources/db/changelog/changesets/<timestamp>-<changelog name>.yaml
    You can call set user.name=<change author> before above command to modify the changelog author name.
  5. Review the generated changelog file!

There is a bug in liquibase-maven-plugin that results in recreation of some constraints and of the hibernate sequence. You should manually remove these changes from the generated changelog file.

NLP related Tests

On newer JDK versions, you might need the following arguments to run Neo4j tests:
--add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED

Docker

Before building the Docker image, copy .env.dist to .env and fill in your GitHub username and Maven package registry authentication token.

License

The code in this repository and the package care.smith.top:top-backend are licensed under MIT.