Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for deploying via Helm chart #23

Closed
chgl opened this issue Dec 20, 2020 · 8 comments
Closed

Support for deploying via Helm chart #23

chgl opened this issue Dec 20, 2020 · 8 comments

Comments

@chgl
Copy link

chgl commented Dec 20, 2020

I've created a Helm chart for deploying WebAPI & Atlas to Kubernetes: https://github.com/chgl/charts/tree/master/charts/ohdsi. By default, it also creates a Postgres instance (see https://github.com/bitnami/charts/tree/master/bitnami/postgresql). It uses the recently added containers build from the Dockerfiles in the atlas/webapi repo.

A few things could be added to the Atlas and WebAPI to make it a bit more cloud-native (such as dedicated liveness and readiness probes, exposed metrics, tracing support). A great future addition to this chart would be an automated way to setup the source an source_daimon tables in a declarative way.

https://github.com/chgl/charts uses GitHub pages to host the charts, deployed via https://github.com/helm/chart-releaser-action. I'd be happy to migrate this to an official OHDSI repo if there's interest.

@chgl
Copy link
Author

chgl commented Dec 28, 2020

ping @blootsvoets since this may be interesting to you and your containerization efforts.

@t-abdul-basser
Copy link

Thanks @chgl! We deploy and manage Atlas via k8s at the OHDSI Coordinating Center so adding support for release of Atlas a Helm chart is a useful contribution. I suggest that you

  • submit as PR to this rep OHDSI/Broadsea or alternatively at OHDSI/Atlas. Thoughts on which @leeevans @OHDSI/odysseus?
  • Adding OMOP data source on Postgres has been discussed elsewhere and would also be useful. @fdefalco

@chgl
Copy link
Author

chgl commented Dec 30, 2020

submit as PR to this rep OHDSI/Broadsea or alternatively at OHDSI/Atlas. Thoughts on which @leeevans @OHDSI/odysseus?

I think OHDSI/Broadsea is probably more appropriate: the current chart includes Atlas, WebAPI, and an optional CronJob for Achilles, so it's slightly broader in scope than just Atlas. Another alternative would be a new charts repo dedicated to OHDSI charts.

Adding OMOP data source on Postgres has been discussed elsewhere and would also be useful. @fdefalco

I've been using a custom container containing the schemas and vocabs (as CSVs) which would initialize the Postgres DB remotely using psql and also setup the daimon tables for the WebAPI. It's run as a Kubernetes Job. It's not very configurable at the moment, but at least the basic idea works.

@ssaranathan
Copy link

Hi @chgl ,
Thanks for creating the helm chart. Does the cdm-init-job work ? I am not sure if the image referenced in the chart is correct one. Please let me know. Thank you.

@chgl
Copy link
Author

chgl commented Feb 15, 2021

@ssaranathan the docker/whalesay container is indeed just an example :) You will have to provide your own container image to use for initialization: https://github.com/chgl/charts/tree/master/charts/ohdsi#initialize-the-cdm-using-a-custom-container

I don't think I'll be able to fully share my solution (or at least the downloaded vocabularies) given the SNOMED license terms. But basically it's a container image which contains the Postgres schema SQLs from https://github.com/OHDSI/CommonDataModel/tree/v5.3.1/PostgreSQL and Vocabularies downloaded from Athena as CSVs. The entrypoint script of this container uses the psql binary to initialize the schema (basically running psql -f OMOP-CDM-postgresql-ddl.txt) and import the vocabulary. Here's the basic gist of the container's entrypoint.sh:

#!/bin/bash
set -e

OMOP_INIT_BASE_DIR="/opt/omop-init"

SOURCES_DIR="$OMOP_INIT_BASE_DIR/sources"
VOCABS_DIR="$OMOP_INIT_BASE_DIR/vocabs"
SCHEMAS_DIR="$OMOP_INIT_BASE_DIR/schemas"
POSTINIT_DIR="$OMOP_INIT_BASE_DIR/postinit"

WEBAPI_URL=${WEBAPI_URL:?"WEBAPI_URL required but not set"}
PGPASSWORD=${PGPASSWORD:?"PGPASSWORD required but not set"}
PGHOST=${PGHOST:?"PGHOST required but not set"}

export PGDATABASE=${PGDATABASE:-"ohdsi"}
export PGUSER=${PGUSER:-"postgres"}
export PGPORT=${PGPORT:-"5432"}

CDM_DIR="$VOCABS_DIR/cdm_20201110"

mkdir -p "$VOCABS_DIR"

echo "$(date): Extracting CDM vocabs"
tar -xzvf "$SOURCES_DIR/cdm_20201110.tar.gz" -C "$VOCABS_DIR/"

echo "$(date): Checking if Postgres @ $PGHOST:$PGPORT is up"
until psql -c "select 1"; do
    echo "$(date): Waiting for Postgres to be up"
    sleep 5
done
echo "$(date): Postgres is up"

echo "$(date): Creating Schema"
for SQL_FILE in init_omop/*; do
    echo "$(date): Applying $SQL_FILE"
    psql -f "$SQL_FILE"
done

echo "$(date): Copying vocabulary"

psql <<-EOSQL
    \COPY cdm.drug_strength FROM '$CDM_DIR/DRUG_STRENGTH.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b';
    \COPY cdm.concept FROM '$CDM_DIR/CONCEPT.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b';
    \COPY cdm.concept_relationship FROM '$CDM_DIR/CONCEPT_RELATIONSHIP.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b';
    \COPY cdm.concept_ancestor FROM '$CDM_DIR/CONCEPT_ANCESTOR.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b';
    \COPY cdm.concept_synonym FROM '$CDM_DIR/CONCEPT_SYNONYM.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b';
    \COPY cdm.vocabulary FROM '$CDM_DIR/VOCABULARY.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b';
    \COPY cdm.relationship FROM '$CDM_DIR/RELATIONSHIP.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b';
    \COPY cdm.concept_class FROM '$CDM_DIR/CONCEPT_CLASS.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b';
    \COPY cdm.domain FROM '$CDM_DIR/DOMAIN.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b';
EOSQL

echo "$(date): Creating DB indices"
PGOPTIONS=--search_path=cdm psql -f "$POSTINIT_DIR/indexes.sql"

echo "$(date): Creating DB constraints"
PGOPTIONS=--search_path=cdm psql -f "$POSTINIT_DIR/constraints.sql"

echo "$(date): Creating CDM results tables"
PGOPTIONS=--search_path=cdm psql -f "$POSTINIT_DIR/restabs_cdm.sql"

echo "$(date): Waiting for WebAPI @ $WEBAPI_URL to be up"
until [ "$(curl -s -o /dev/null -L -w '%{http_code}' "$WEBAPI_URL/info")" == "200" ]; do
    echo "$(date): Waiting for WebAPI to be up"
    sleep 5
done


# This is better solved by invoking the WebAPI dynamically instead
echo "$(date): Updating OHDSI WebAPI CDM sources"
psql <<-EOSQL
    INSERT INTO ohdsi.source(source_id, source_name, source_key, source_connection, username, password, source_dialect)
    VALUES (1, 'CDM V5.3.1 Database', 'CDMV5', 'jdbc:postgresql://${PGHOST}:${PGPORT}/${PGDATABASE}', '$PGUSER', '$PGPASSWORD', 'postgresql');

    INSERT INTO ohdsi.source_daimon(source_daimon_id, source_id, daimon_type, table_qualifier, priority)
    VALUES (5, 1, 0, 'cdm', 2);

    INSERT INTO ohdsi.source_daimon(source_daimon_id, source_id, daimon_type, table_qualifier, priority)
    VALUES (6, 1, 1, 'cdm', 2);

    INSERT INTO ohdsi.source_daimon(source_daimon_id, source_id, daimon_type, table_qualifier, priority)
    VALUES (7, 1, 2, 'cdm_results', 2);

    INSERT INTO ohdsi.source_daimon(source_daimon_id, source_id, daimon_type, table_qualifier, priority)
    VALUES (8, 1, 3, 'cdm_results', 2);
EOSQL

echo "$(date): Refreshing sources"
curl -s -L -o /dev/null "$WEBAPI_URL/source/refresh"

echo "$(date): Completed initialization."
exit 0

There's room for improvement: less hard-coding of schemas, checking if the db is already initialized to avoid duplicate entries, etc. It might make sense to provide an end-to-end example which fully initializes the DB using something like the SynPUF sample data.

@ssaranathan
Copy link

Thanks a lot @chgl . Will look into this.

@ssaranathan
Copy link

ssaranathan commented Feb 17, 2021

Hi @chgl ,
I just tried to install the ohdsi chart. I did set the following in my values:

postgresql:
enabled: true

But I don't see a Postgres deployment created in the K8s namespace. So without the Postgres, WebAPI is still in waiting status.

Anything else that I need to add in the values to create a Postgres Deployment ?

@chgl
Copy link
Author

chgl commented Feb 17, 2021

@ssaranathan what k8s distribution do you use? My hunch is that it might be related to the storageClass/persistence settings. the Bitnami Postgres is deployed as a StatefulSet by default, can you check if you can find any of its Pods? They might be in a "waiting" state in case there's an issue with creating a volume.

(Maybe an issue over at the chart repo is a better place to discuss this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants