Skip to content

biocommons/anyvar

Repository files navigation

AnyVar

AnyVar provides Python and REST interfaces to validate, normalize, generate identifiers, and register biological sequence variation according to the GA4GH Variation Representation standards.

Quickstart

(temporary)

Clone the repo and navigate to it:

git clone https://github.com/biocommons/anyvar
cd anyvar

Point ANYVAR_STORAGE_URI to an available PostgreSQL database:

export ANYVAR_STORAGE_URI=postgresql://anyvar:anyvar-pw@localhost:5432/anyvar

Set SEQREPO_DATAPROXY_URI to local SeqRepo files or to a REST service instance:

export SEQREPO_DATAPROXY_URI=seqrepo+file:///usr/local/share/seqrepo/latest
# or
export SEQREPO_DATAPROXY_URI=seqrepo+http://localhost:5000/seqrepo

Start the AnyVar server:

uvicorn anyvar.restapi.main:app --reload

Developer installation

git clone https://github.com/biocommons/anyvar.git
cd anyvar
python3 -mvenv venv
source venv/bin/activate
pip install -U setuptools pip
pip install -e '.[dev]'

Or, more simply:

make devready
source venv/3.11/bin/activate

Then, start the REST server with:

uvicorn anyvar.restapi.main:app

In another terminal:

curl http://localhost:8000/info

Setting up Postgres

A Postgres-backed AnyVar installation may use any Postgres instance, local or remote. The following instructions are for using a docker-based Postgres instance.

First, run the commands in README-pg.md. This will create and start a local Postgres docker instance.

Next, run the commands in postgres_init.sql. This will create the anyvar user with the appropriate permissions and create the anyvar database.

Setting up Snowflake

A Snowflake-backed AnyVar installation may use any Snowflake database schema. The Snowflake database and schema must exist prior to starting AnyVar. To point AnyVar at Snowflake, specify a Snowflake URI in the ANYVAR_STORAGE_URI environment variable. For example:

snowflake://my-sf-acct/?database=sf_db_name&schema=sd_schema_name&user=sf_username&password=sf_password

Snowflake connection parameter reference

When running interactively and connecting to a Snowflake account that utilizes federated authentication or SSO, add the parameter authenticator=externalbrowser. Non-interactive execution in a federated authentication or SSO environment requires a service account to connect. Connections using an encrypted or unencrypted private key are also supported by specifying the parameter private_key=path/to/file.p8. The key material may be URL-encoded and inlined in the connection URI, for example: private_key=-----BEGIN+PRIVATE+KEY-----%0AMIIEvAIBA...

Environment variables that can be used to modify Snowflake database integration:

  • ANYVAR_SNOWFLAKE_STORE_BATCH_LIMIT - in batch mode, limit VRS object upsert batches to this number; defaults to 100,000
  • ANYVAR_SNOWFLAKE_STORE_TABLE_NAME - the name of the table that stores VRS objects; defaults to vrs_objects
  • ANYVAR_SNOWFLAKE_STORE_MAX_PENDING_BATCHES - the maximum number of pending batches to allow before blocking; defaults to 50
  • ANYVAR_SNOWFLAKE_STORE_PRIVATE_KEY_PASSPHRASE - the passphrase for an encrypted private key

NOTE: If you choose to create the VRS objects table in advance, the minimal table specification is as follows:

CREATE TABLE ... (
    vrs_id VARCHAR(500) COLLATE 'utf8',
    vrs_object VARIANT
)

NOTE: The Snowflake database connector utilizes a background thread to write VRS objects to the database when operating in batch mode (e.g. annotating a VCF file). Queries and statistics query only against the already committed database state. Therefore, queries issued immediately after a batch operation may not reflect all pending changes.

Deployment

NOTE: The authoritative and sole source for version tags is the repository. When a commit is tagged, that tag is automatically used as the Python __version__, the docker image tag, and the version reported at the /info endpoint.

Testing

Run with pytest:

% pytest

or the Makefile target:

% make test

Use the environment variable ANYVAR_TEST_STORAGE_URI to specify the database to use for tests, eg:

% export ANYVAR_TEST_STORAGE_URI=postgresql://postgres:postgres@localhost/anyvar_test

Currently, there is some interdependency between test modules -- namely, tests that rely on reading data from storage assume that the data from test_variation has been uploaded. A pytest hook ensures correct test order, but some test modules may not be able to pass when run in isolation. By default, the tests will use a Postgres database installation. To run the tests against a Snowflake database, change the ANYVAR_TEST_STORAGE_URI to a Snowflake URI and run the tests.