Skip to content

Commit

Permalink
Added section for developers in manual (#267). (#271)
Browse files Browse the repository at this point in the history
  • Loading branch information
stolpeo committed Dec 13, 2021
1 parent 084c929 commit 0c83fa8
Show file tree
Hide file tree
Showing 7 changed files with 363 additions and 0 deletions.
2 changes: 2 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ End-User Summary
- Fixing SODAR Core template inconsistency (#150).
- Imports via API now are only allowed for projects of type ``PROJECT`` (#237).
- Fixing ensembl gene link-out to wrong genome build (#156).
- Added section for developers in manual (#267).

Full Change List
================
Expand All @@ -43,6 +44,7 @@ Full Change List
- Fixing SODAR Core template inconsistency (#150).
- Imports via API now are only allowed for projects of type ``PROJECT`` (#237).
- Fixing ensembl gene link-out to wrong genome build (#156).
- Added section for developers in manual (#267).

-------
v0.23.9
Expand Down
90 changes: 90 additions & 0 deletions docs_manual/developer_database.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
.. _developer_database:

===============
Database Import
===============

To prepare the VarFish database, follow `the instructions for the VarFish DB Downloader <https://github.com/bihealth/varfish-db-downloader>`_.
Downloading and processing the data can take multiple days.

The VarFish DB Downloader working folder consumes 1.7Tb for GRCh37 and 5.4Tb for GRCh38.
The pre-computed tables for VarFish consume 208Gb and the final
postgres database consumes 500Gb. Please make sure that there is enough free
space available. However, we recommend to exclude the large databases:
Frequency tables, extra annotations and dbSNP. Also, keep in mind that
importing the whole database takes >24h, depending on the speed of your HDD.

In the future, we plan to provide a pre-build package for import.

This is a list of the possible imports, sorted by its size:

=================== ==== ================== ===================================
Component Size Exclude Function
=================== ==== ================== ===================================
gnomAD_genomes 80G highly recommended frequency annotation
extra_annos 57G highly recommended diverse
dbSNP 56G highly recommended SNP annotation
gnomAD_exomes 6.0G highly recommended frequency annotation
knowngeneaa 4.5G highly recommended multiz alignment of 100 vertebrates
clinvar 2.4G highly recommended pathogenicity classification
ExAC 1.9G highly recommended frequency annotation
dbVar 623M recommended SNP annotation
thousand_genomes 312M recommended frequency annotation
gnomAD_SV 218M recommended SV frequency annotation
DGV 88M yes, import broken SV annotation
ensembl_regulatory 68M yes, import broken frequency annotation
gnomAD_constraints 13M yes, import broken frequency annotation
ensembltorefseq 8.6M identifier mapping
hgmd_public 6.3M yes, import broken gene annotation
ExAC_constraints 4.8M yes, import broken frequency annotation
hgnc 3.3M yes, import broken gene annotation
ensembltogenesymbol 1.8M yes, import broken identifier mapping
ensembl_genes 1.3M gene annotation
HelixMTdb 1.1M yes, import broken MT frequency annotation
MITOMAP 1.1M yes, import broken MT frequency annotation
refseq_genes 1.1M gene annotation
mtDB 514K yes, import broken MT frequency annotation
tads_hesc 258K domain annotation
tads_imr90 258K domain annotation
=================== ==== ================== ===================================

You can find the ``import_versions.tsv`` file in the root folder of the
package. This file determines which component (called ``table_group`` and
represented as folder in the package) gets imported when the import command is
issued. To exclude a component, simply comment out (``#``) or delete the line.

A space-consumption-friendly version of the file would look like this::

build table_group version
#GRCh37 clinvar 20210728
#GRCh37 dbSNP b155
#GRCh37 dbVar 20210728
#GRCh37 DGV 2016
#GRCh37 DGV 2020
GRCh37 ensembl_genes r104
#GRCh37 ensembl_regulatory 20210728
GRCh37 ensembltogenesymbol 20210728
#GRCh37 ensembltorefseq 20210728
#GRCh37 ExAC r1
#GRCh37 ExAC_constraints r0.3.1
#GRCh37 extra_annos 20210728
#GRCh37 gnomAD_constraints v2.1.1
#GRCh37 gnomAD_exomes r2.1.1
#GRCh37 gnomAD_genomes r2.1.1
#GRCh37 gnomAD_SV v2.1
#GRCh37 HelixMTdb 20200327
#GRCh37 hgmd_public ensembl_r104
#GRCh37 hgnc 20210728
#GRCh37 knowngeneaa 20210728
#GRCh37 MITOMAP 20210728
#GRCh37 mtDB 20210728
GRCh37 refseq_genes r105
GRCh37 tads_hesc dixon2012
GRCh37 tads_imr90 dixon2012
#GRCh37 thousand_genomes phase3
#GRCh37 vista 20210728

To perform the import, issue::

$ python manage.py import_tables --tables-path varfish-db-downloader

15 changes: 15 additions & 0 deletions docs_manual/developer_development.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
.. _developer_development:

===========
Development
===========

VarFish is based on the SODAR core framework which has a `developer manual <https://sodar-core.readthedocs.io/en/latest/development.html>`_
itself. It is worth having a look there. The following lists parts that are useful in particular:

- `Models <https://sodar-core.readthedocs.io/en/latest/dev_project_app.html#models>`_
- `Rules <https://sodar-core.readthedocs.io/en/latest/dev_project_app.html#rules-file>`_
- `Views <https://sodar-core.readthedocs.io/en/latest/dev_project_app.html#views>`_
- `Templates <https://sodar-core.readthedocs.io/en/latest/dev_project_app.html#templates>`_
- `Icons <https://sodar-core.readthedocs.io/en/latest/dev_general.html#using-icons>`_
- `Forms <https://sodar-core.readthedocs.io/en/latest/dev_project_app.html#forms>`_
113 changes: 113 additions & 0 deletions docs_manual/developer_installation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
.. _developer_installation:

============
Installation
============

The VarFish installation for developers should be set up differently from the
installation for production use.

The reason being is that the installation for production use runs completely in
a Docker environment. All containers are assigned to a Docker network that the
host by default has no access to, except for the reverse proxy that gives
access to the VarFish webinterface.

The developers installation is intended not to carry the full VarFish database
such that it is light-weight and fits on a laptop. We advise to install the
services not running in a Docker container.

----------------
Install Postgres
----------------

Follow the instructions for your operating system to install `Postgres <https://www.postgresql.org>`_.
For Ubuntu, this would be::

sudo apt install postgresql

-------------
Install Redis
-------------

`Redis <https://redis.io>`_ is the broker that celery uses to manage the queues.
Follow the instructions for your operating system to install Redis.
For Ubuntu, this would be::

sudo apt install redis-server

-----------------
Install miniconda
-----------------

miniconda helps to set up encapsulated Python environments.
This step is optional. You can also use pipenv, but to our experience,
resolving the dependencies in pipenv is terribly slow::

$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh -b -p ~/miniconda3
$ source ~/miniconda3/bin/activate
$ conda init
$ conda create -n varfish python=3.8 pip
$ conda activate varfish

--------------------
Clone git repository
--------------------

Clone the VarFish Server repository and switch into the checkout::

$ git clone https://github.com/bihealth/varfish-server
$ cd varfish-server


---------------------------
Install Python Requirements
---------------------------

With the conda/Python environment activated, install all the requirements::

$ for i in requirements/*; do install -r $i; done

--------------
Setup Database
--------------

Use the tool provided in ``utility/`` to set up the database. The name for the
database should be ``varfish``::

$ bash utility/setup_database.sh

-------------
Setup VarFish
-------------

First, create a ``.env`` file with the following content::

export DATABASE_URL="postgres://varfish:varfish@127.0.0.1/varfish"
export CELERY_BROKER_URL=redis://localhost:6379/0
export PROJECTROLES_ADMIN_OWNER=root
export DJANGO_SETTINGS_MODULE=config.settings.local

If you wish to enable structural variants, add the following line::

export VARFISH_ENABLE_SVS=1

To create the tables in the VarFish database, run the ``migrate`` command.
This step can take a few minutes::

$ python manage.py migrate

Once done, create a superuser for your VarFish instance. By default, the VarFish root user is named ``root`` (the
setting can be changed in the ``.env`` file with the ``PROJECTROLES_ADMIN_OWNER`` variable)::

$ python manage.py createsuperuser

Last, download the icon sets for VarFish and make scripts, stylesheets and icons available::

$ python manage.py geticons -c bi cil fa-regular fa-solid gridicons octicon
$ python manage.py collectstatic

When done, open two terminals and start the VarFish server and the celery server::

terminal1$ make server
terminal2$ make celery
39 changes: 39 additions & 0 deletions docs_manual/developer_kiosk.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
.. _developer_kiosk:

=====
Kiosk
=====

The Kiosk mode in VarFish enables users to upload VCF files.
This is not intended for production use as every upload will create it's own project, so there is no way of
organizing your cases properly. The mode serves only as a way to try out VarFish for external users.

-------------
Configuration
-------------

First, you need to download the VarFish annotator data (11Gb) and unpack it::

$ wget https://file-public.bihealth.org/transient/varfish/varfish-annotator-20191129.tar.gz
$ wget https://file-public.bihealth.org/transient/varfish/varfish-annotator-transcripts-20191129.tar.gz
$ tar xzvf varfish-annotator-20191129.tar.gz
$ tar xzvf varfish-transcripts-20191129.tar.gz

If you want to enable Kiosk mode, add the following lines to the ``.env`` file::

export VARFISH_KIOSK_MODE=1
export VARFISH_KIOSK_VARFISH_ANNOTATOR_REFSEQ_SER_PATH=/path/to/varfish-annotator-transcripts-20191129/hg19_refseq_curated.ser
export VARFISH_KIOSK_VARFISH_ANNOTATOR_ENSEMBL_SER_PATH=/path/to/varfish-annotator-transcripts-20191129/hg19_ensembl.ser
export VARFISH_KIOSK_VARFISH_ANNOTATOR_REFERENCE_PATH=/path/to/unpacked/varfish-annotator-20191129/hs37d5.fa
export VARFISH_KIOSK_VARFISH_ANNOTATOR_DB_PATH=/path/to/unpacked/varfish-annotator-20191129/varfish-annotator-db-20191129.h2.db
export VARFISH_KIOSK_CONDA_PATH=/path/to/miniconda/bin/activate

---
Run
---

To run the kiosk mode, simply (re)start the webserver server and the celery server::

terminal1$ make serve
terminal2$ make celery

16 changes: 16 additions & 0 deletions docs_manual/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,22 @@ Currently, the main focus is on small/sequence variants called from high-througp
sop_supporting
sop_filtration

.. raw:: latex

\part{Developer's Manual}

.. toctree::
:maxdepth: 1
:caption: Developer's Manual
:name: developers_manual
:hidden:
:titlesonly:

developer_installation
developer_database
developer_development
developer_kiosk

.. raw:: latex

\part{Notes}
Expand Down
88 changes: 88 additions & 0 deletions utility/setup_database.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
#!/usr/bin/env bash

echo "***********************************************"
echo "Setting up Database and User for PostgreSQL"
echo "***********************************************"

# while loops to ensure input not empty

# -------------
# Database Name
# -------------
while [[ -z "$db_name" ]]
do
echo -n "Database Name: "
read db_name
done

# ----------------
# User Name Choice
# ----------------

# Choice to create a new user or use and existing one
while [ "$use_existing_user" != "y" ] && [ "$use_existing_user" != "n" ]
do
echo -n "Do you want to use an existing db-user? [y|n] "
read use_existing_user
done

# Existing user
if [ "$use_existing_user" == "y" ]
then
# Get and parse existing users

# All users in the psql database
users="$( sudo su - postgres -c "psql -c \"SELECT u.usename FROM pg_catalog.pg_user u;\"")"
# Regex to remove table header and row count
regex="usename\ -+ \K(.*)(?=\ \(\d*\ rows?\))"
# Use grep to execute regex
users=$(echo $users | grep -Po "$regex")
# Create user array by
# Splitting users string at spaces
IFS=' '
read -ra user_array <<< "$users"

# Choose existing user
echo ""
choice=-1
# Choice has to be in range and an integer
while (( $choice < 0 )) || (( $choice > ${#user_array[@]} - 1 ))
do
echo "Choose an existing user"
idx=0
for user in "${user_array[@]}"; do # access each element of array
echo "[$idx] $user"
idx=$(( idx + 1 ))
done
echo -n "> "
read choice
done
username=${user_array[$choice]}

# New user
else
while [[ -z "$username" ]]
do
echo -n "New username: "
read username
done

# ----------------
# Password Choice
# ----------------
while [[ -z "$password" ]]
do
echo -n "Password: "
read -s password
done
echo ""
fi


sudo su - postgres -c "psql -c \"CREATE DATABASE $db_name;\""
if [ "$use_existing_user" == "n" ]
then
sudo su - postgres -c "psql -c \"CREATE USER $username WITH PASSWORD '$password';\""
fi
sudo su - postgres -c "psql -c \"GRANT ALL PRIVILEGES ON DATABASE $db_name to $username;\""
sudo su - postgres -c "psql -c \"ALTER USER $username CREATEDB;\""

0 comments on commit 0c83fa8

Please sign in to comment.