Added section for developers in manual (#267). (#271)

varfish-org · Dec 13, 2021 · 0c83fa8 · 0c83fa8
1 parent 084c929
commit 0c83fa8
Show file tree

Hide file tree

Showing 7 changed files with 363 additions and 0 deletions.
diff --git a/HISTORY.rst b/HISTORY.rst
@@ -22,6 +22,7 @@ End-User Summary
 - Fixing SODAR Core template inconsistency (#150).
 - Imports via API now are only allowed for projects of type ``PROJECT`` (#237).
 - Fixing ensembl gene link-out to wrong genome build (#156).
+- Added section for developers in manual (#267).
 
 Full Change List
 ================
@@ -43,6 +44,7 @@ Full Change List
 - Fixing SODAR Core template inconsistency (#150).
 - Imports via API now are only allowed for projects of type ``PROJECT`` (#237).
 - Fixing ensembl gene link-out to wrong genome build (#156).
+- Added section for developers in manual (#267).
 
 -------
 v0.23.9

diff --git a/docs_manual/developer_database.rst b/docs_manual/developer_database.rst
@@ -0,0 +1,90 @@
+.. _developer_database:
+
+===============
+Database Import
+===============
+
+To prepare the VarFish database, follow `the instructions for the VarFish DB Downloader <https://github.com/bihealth/varfish-db-downloader>`_.
+Downloading and processing the data can take multiple days.
+
+The VarFish DB Downloader working folder consumes 1.7Tb for GRCh37 and 5.4Tb for GRCh38.
+The pre-computed tables for VarFish consume 208Gb and the final
+postgres database consumes 500Gb. Please make sure that there is enough free
+space available. However, we recommend to exclude the large databases:
+Frequency tables, extra annotations and dbSNP. Also, keep in mind that
+importing the whole database takes >24h, depending on the speed of your HDD.
+
+In the future, we plan to provide a pre-build package for import.
+
+This is a list of the possible imports, sorted by its size:
+
+===================  ====  ==================  ===================================
+Component            Size  Exclude             Function
+===================  ====  ==================  ===================================
+gnomAD_genomes       80G   highly recommended  frequency annotation
+extra_annos          57G   highly recommended  diverse
+dbSNP                56G   highly recommended  SNP annotation
+gnomAD_exomes        6.0G  highly recommended  frequency annotation
+knowngeneaa          4.5G  highly recommended  multiz alignment of 100 vertebrates
+clinvar              2.4G  highly recommended  pathogenicity classification
+ExAC                 1.9G  highly recommended  frequency annotation
+dbVar                623M  recommended         SNP annotation
+thousand_genomes     312M  recommended         frequency annotation
+gnomAD_SV            218M  recommended         SV frequency annotation
+DGV                  88M   yes, import broken  SV annotation
+ensembl_regulatory   68M   yes, import broken  frequency annotation
+gnomAD_constraints   13M   yes, import broken  frequency annotation
+ensembltorefseq      8.6M                      identifier mapping
+hgmd_public          6.3M  yes, import broken  gene annotation
+ExAC_constraints     4.8M  yes, import broken  frequency annotation
+hgnc                 3.3M  yes, import broken  gene annotation
+ensembltogenesymbol  1.8M  yes, import broken  identifier mapping
+ensembl_genes        1.3M                      gene annotation
+HelixMTdb            1.1M  yes, import broken  MT frequency annotation
+MITOMAP              1.1M  yes, import broken  MT frequency annotation
+refseq_genes         1.1M                      gene annotation
+mtDB                 514K  yes, import broken  MT frequency annotation
+tads_hesc            258K                      domain annotation
+tads_imr90           258K                      domain annotation
+===================  ====  ==================  ===================================
+
+You can find the ``import_versions.tsv`` file in the root folder of the
+package. This file determines which component (called ``table_group`` and
+represented as folder in the package) gets imported when the import command is
+issued. To exclude a component, simply comment out (``#``) or delete the line.
+
+A space-consumption-friendly version of the file would look like this::
+
+    build   table_group version
+    #GRCh37 clinvar 20210728
+    #GRCh37 dbSNP   b155
+    #GRCh37 dbVar   20210728
+    #GRCh37  DGV 2016
+    #GRCh37  DGV 2020
+    GRCh37  ensembl_genes   r104
+    #GRCh37  ensembl_regulatory  20210728
+    GRCh37  ensembltogenesymbol 20210728
+    #GRCh37  ensembltorefseq 20210728
+    #GRCh37 ExAC    r1
+    #GRCh37  ExAC_constraints    r0.3.1
+    #GRCh37 extra_annos 20210728
+    #GRCh37  gnomAD_constraints  v2.1.1
+    #GRCh37 gnomAD_exomes   r2.1.1
+    #GRCh37 gnomAD_genomes  r2.1.1
+    #GRCh37 gnomAD_SV   v2.1
+    #GRCh37  HelixMTdb   20200327
+    #GRCh37  hgmd_public ensembl_r104
+    #GRCh37  hgnc    20210728
+    #GRCh37 knowngeneaa 20210728
+    #GRCh37  MITOMAP 20210728
+    #GRCh37  mtDB    20210728
+    GRCh37  refseq_genes    r105
+    GRCh37  tads_hesc   dixon2012
+    GRCh37  tads_imr90  dixon2012
+    #GRCh37 thousand_genomes    phase3
+    #GRCh37  vista   20210728
+
+To perform the import, issue::
+
+    $ python manage.py import_tables --tables-path varfish-db-downloader
+
diff --git a/docs_manual/developer_development.rst b/docs_manual/developer_development.rst
@@ -0,0 +1,15 @@
+.. _developer_development:
+
+===========
+Development
+===========
+
+VarFish is based on the SODAR core framework which has a `developer manual <https://sodar-core.readthedocs.io/en/latest/development.html>`_
+itself. It is worth having a look there. The following lists parts that are useful in particular:
+
+- `Models <https://sodar-core.readthedocs.io/en/latest/dev_project_app.html#models>`_
+- `Rules <https://sodar-core.readthedocs.io/en/latest/dev_project_app.html#rules-file>`_
+- `Views <https://sodar-core.readthedocs.io/en/latest/dev_project_app.html#views>`_
+- `Templates <https://sodar-core.readthedocs.io/en/latest/dev_project_app.html#templates>`_
+    - `Icons <https://sodar-core.readthedocs.io/en/latest/dev_general.html#using-icons>`_
+- `Forms <https://sodar-core.readthedocs.io/en/latest/dev_project_app.html#forms>`_
diff --git a/docs_manual/developer_installation.rst b/docs_manual/developer_installation.rst
@@ -0,0 +1,113 @@
+.. _developer_installation:
+
+============
+Installation
+============
+
+The VarFish installation for developers should be set up differently from the
+installation for production use.
+
+The reason being is that the installation for production use runs completely in
+a Docker environment. All containers are assigned to a Docker network that the
+host by default has no access to, except for the reverse proxy that gives
+access to the VarFish webinterface.
+
+The developers installation is intended not to carry the full VarFish database
+such that it is light-weight and fits on a laptop. We advise to install the
+services not running in a Docker container.
+
+----------------
+Install Postgres
+----------------
+
+Follow the instructions for your operating system to install `Postgres <https://www.postgresql.org>`_.
+For Ubuntu, this would be::
+
+    sudo apt install postgresql
+
+-------------
+Install Redis
+-------------
+
+`Redis <https://redis.io>`_ is the broker that celery uses to manage the queues.
+Follow the instructions for your operating system to install Redis.
+For Ubuntu, this would be::
+
+    sudo apt install redis-server
+
+-----------------
+Install miniconda
+-----------------
+
+miniconda helps to set up encapsulated Python environments.
+This step is optional. You can also use pipenv, but to our experience,
+resolving the dependencies in pipenv is terribly slow::
+
+    $ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
+    $ bash Miniconda3-latest-Linux-x86_64.sh -b -p ~/miniconda3
+    $ source ~/miniconda3/bin/activate
+    $ conda init
+    $ conda create -n varfish python=3.8 pip
+    $ conda activate varfish
+
+--------------------
+Clone git repository
+--------------------
+
+Clone the VarFish Server repository and switch into the checkout::
+
+    $ git clone https://github.com/bihealth/varfish-server
+    $ cd varfish-server
+
+
+---------------------------
+Install Python Requirements
+---------------------------
+
+With the conda/Python environment activated, install all the requirements::
+
+    $ for i in requirements/*; do install -r $i; done
+
+--------------
+Setup Database
+--------------
+
+Use the tool provided in ``utility/`` to set up the database. The name for the
+database should be ``varfish``::
+
+    $ bash utility/setup_database.sh
+
+-------------
+Setup VarFish
+-------------
+
+First, create a ``.env`` file with the following content::
+
+    export DATABASE_URL="postgres://varfish:varfish@127.0.0.1/varfish"
+    export CELERY_BROKER_URL=redis://localhost:6379/0
+    export PROJECTROLES_ADMIN_OWNER=root
+    export DJANGO_SETTINGS_MODULE=config.settings.local
+
+If you wish to enable structural variants, add the following line::
+
+    export VARFISH_ENABLE_SVS=1
+
+To create the tables in the VarFish database, run the ``migrate`` command.
+This step can take a few minutes::
+
+    $ python manage.py migrate
+
+Once done, create a superuser for your VarFish instance. By default, the VarFish root user is named ``root`` (the
+setting can be changed in the ``.env`` file with the ``PROJECTROLES_ADMIN_OWNER`` variable)::
+
+    $ python manage.py createsuperuser
+
+Last, download the icon sets for VarFish and make scripts, stylesheets and icons available::
+
+    $ python manage.py geticons -c bi cil fa-regular fa-solid gridicons octicon
+    $ python manage.py collectstatic
+
+When done, open two terminals and start the VarFish server and the celery server::
+
+    terminal1$ make server
+    terminal2$ make celery
diff --git a/docs_manual/developer_kiosk.rst b/docs_manual/developer_kiosk.rst
@@ -0,0 +1,39 @@
+.. _developer_kiosk:
+
+=====
+Kiosk
+=====
+
+The Kiosk mode in VarFish enables users to upload VCF files.
+This is not intended for production use as every upload will create it's own project, so there is no way of
+organizing your cases properly. The mode serves only as a way to try out VarFish for external users.
+
+-------------
+Configuration
+-------------
+
+First, you need to download the VarFish annotator data (11Gb) and unpack it::
+
+    $ wget https://file-public.bihealth.org/transient/varfish/varfish-annotator-20191129.tar.gz
+    $ wget https://file-public.bihealth.org/transient/varfish/varfish-annotator-transcripts-20191129.tar.gz
+    $ tar xzvf varfish-annotator-20191129.tar.gz
+    $ tar xzvf varfish-transcripts-20191129.tar.gz
+
+If you want to enable Kiosk mode, add the following lines to the ``.env`` file::
+
+    export VARFISH_KIOSK_MODE=1
+    export VARFISH_KIOSK_VARFISH_ANNOTATOR_REFSEQ_SER_PATH=/path/to/varfish-annotator-transcripts-20191129/hg19_refseq_curated.ser
+    export VARFISH_KIOSK_VARFISH_ANNOTATOR_ENSEMBL_SER_PATH=/path/to/varfish-annotator-transcripts-20191129/hg19_ensembl.ser
+    export VARFISH_KIOSK_VARFISH_ANNOTATOR_REFERENCE_PATH=/path/to/unpacked/varfish-annotator-20191129/hs37d5.fa
+    export VARFISH_KIOSK_VARFISH_ANNOTATOR_DB_PATH=/path/to/unpacked/varfish-annotator-20191129/varfish-annotator-db-20191129.h2.db
+    export VARFISH_KIOSK_CONDA_PATH=/path/to/miniconda/bin/activate
+
+---
+Run
+---
+
+To run the kiosk mode, simply (re)start the webserver server and the celery server::
+
+    terminal1$ make serve
+    terminal2$ make celery
+
diff --git a/docs_manual/index.rst b/docs_manual/index.rst
@@ -117,6 +117,22 @@ Currently, the main focus is on small/sequence variants called from high-througp
     sop_supporting
     sop_filtration
 
+.. raw:: latex
+
+    \part{Developer's Manual}
+
+.. toctree::
+    :maxdepth: 1
+    :caption: Developer's Manual
+    :name: developers_manual
+    :hidden:
+    :titlesonly:
+
+    developer_installation
+    developer_database
+    developer_development
+    developer_kiosk
+
 .. raw:: latex
 
     \part{Notes}

diff --git a/utility/setup_database.sh b/utility/setup_database.sh
@@ -0,0 +1,88 @@
+#!/usr/bin/env bash
+
+echo "***********************************************"
+echo "Setting up Database and User for PostgreSQL"
+echo "***********************************************"
+
+# while loops to ensure input not empty
+
+# -------------
+# Database Name
+# -------------
+while [[ -z "$db_name" ]]
+do
+    echo -n "Database Name: "
+    read db_name
+done
+
+# ----------------
+# User Name Choice
+# ----------------
+
+# Choice to create a new user or use and existing one
+while [ "$use_existing_user" != "y" ] && [ "$use_existing_user" != "n" ]
+do
+    echo -n "Do you want to use an existing db-user? [y|n] "
+    read use_existing_user
+done
+
+# Existing user
+if [ "$use_existing_user" == "y" ]
+then
+    # Get and parse existing users
+
+    # All users in the psql database
+    users="$( sudo su - postgres -c "psql -c \"SELECT u.usename FROM pg_catalog.pg_user u;\"")"
+    # Regex to remove table header and row count
+    regex="usename\ -+ \K(.*)(?=\ \(\d*\ rows?\))"
+    # Use grep to execute regex
+    users=$(echo $users | grep -Po "$regex")
+    # Create user array by
+    # Splitting users string at spaces
+    IFS=' '
+    read -ra user_array <<< "$users"
+
+    # Choose existing user
+    echo ""
+    choice=-1
+    # Choice has to be in range and an integer
+    while (( $choice < 0 )) || (( $choice > ${#user_array[@]} - 1 ))
+    do
+        echo "Choose an existing user"
+        idx=0
+        for user in "${user_array[@]}"; do # access each element of array
+            echo "[$idx] $user"
+            idx=$(( idx + 1 ))
+        done
+        echo -n "> "
+        read choice
+    done
+    username=${user_array[$choice]}
+
+# New user
+else
+    while [[ -z "$username" ]]
+    do
+        echo -n "New username: "
+        read username
+    done
+
+    # ----------------
+    # Password Choice
+    # ----------------
+    while [[ -z "$password" ]]
+    do
+        echo -n "Password: "
+        read -s password
+    done
+    echo ""
+fi
+
+
+sudo su - postgres -c "psql -c \"CREATE DATABASE $db_name;\""
+if [ "$use_existing_user" == "n" ]
+then
+    sudo su - postgres -c "psql -c \"CREATE USER $username WITH PASSWORD '$password';\""
+fi
+sudo su - postgres -c "psql -c \"GRANT ALL PRIVILEGES ON DATABASE $db_name to $username;\""
+sudo su - postgres -c "psql -c \"ALTER USER $username CREATEDB;\""