Skip to content

Commit

Permalink
Instructions for SV variant annotation.
Browse files Browse the repository at this point in the history
  • Loading branch information
holtgrewe committed Jan 12, 2022
1 parent 157654f commit 1d2a291
Showing 1 changed file with 73 additions and 2 deletions.
75 changes: 73 additions & 2 deletions docs_manual/admin_ingest.rst
Expand Up @@ -68,6 +68,9 @@ First, obtain some tests data for annotation and later import into VarFish Serve
$ sha256sum --check varfish-test-data-v0.22.2-20210212.tar.gz.sha256
$ tar -xf varfish-test-data-v0.22.2-20210212.tar.gz.sha256
Annotating Small Variant VCFs
-----------------------------

Next, you can use the ``varfish-annotator`` command:

.. code-block:: bash
Expand All @@ -83,7 +86,7 @@ Next, you can use the ``varfish-annotator`` command:
--ref-path varfish-annotator-20201006/hs37d5.fa \
--input-vcf "INPUT.vcf.gz" \
--release "GRCh37" \
--output-db-info "FAM_name.db-info.tsv" \
--output-db-info "FAM_name.db-infos.tsv" \
--output-gts "FAM_name.gts.tsv" \
--case-id "FAM_name"
Expand Down Expand Up @@ -181,6 +184,72 @@ For example, if you have genotypes for two siblings but none for the parents:
FAM_index father 0 0 1 1
FAM_index mother 0 0 2 1
Annotating Structural Variant VCFs
----------------------------------

Structural variants can be annotated as follows.


.. code-block:: bash
:linenos:
$ varfish-annotator \
annotate-svs \
-XX:MaxHeapSize=10g \
-XX:+UseConcMarkSweepGC \
\
--default-sv-method=YOURCALLERvVERSION"
--release GRCh37 \
\
--db-path varfish-annotator-20201006/varfish-annotator-db-20191129.h2.db \
--ensembl-ser-path varfish-annotator-20201006/hg19_ensembl.ser \
--refseq-ser-path varfish-annotator-20201006/hg19_refseq_curated.ser \
\
--input-vcf FAM_sv_calls.vcf.gz \
--output-db-info FAM_sv_calls.db-info.tsv \
--output-gts FAM_sv_calls.gts.tsv
--output-feature-effects CASE_SV_CALLS.feature-effects.tsv
.. note::
``varfish-annotator annotate-svs`` will write out the ``INFO/SVMETHOD`` column to the output file.
If this value is empty then the value from ``--default-sv-method`` will be used.
You **must** either provide ``INFO/SVMETHOD`` or ``--default-sv-method``.
Otherwise, you will get errors in the import step (visible in the case import background task view).
You can use the following shell snippet for adding ``INFO/SVMETHOD`` to your VCF file properly.
Replace ``YOURCALLERvVERSION`` with the value that you want to provide to Varfish.
.. code-block:: shell
cat >$TMPDIR/header.txt <<"EOF"
##INFO=<ID=SVMETHOD,Number=1,Type=String,Description="Type of approach used to detect SV">
EOF
bcftools annotate \
--header-lines $TMPDIR/header.txt \
INPUT.vcf.gz \
| awk -F $'\t' '
BEGIN { OFS = FS; }
/^#/ { print $0; }
/^[^#]/ { $8 = $8 ";SVMETHOD=YOURCALLERvVERSION"; print $0; }
' \
| bgzip -c \
> OUTPUT.vcf.gz
tabix -f OUTPUT.vcf.gz
Again, you have have to compress the output TSV files with ``gzip`` and compute MD5 sums.
.. code-block:: bash
$ gzip -c FAM_sv_calls.db-info.tsv >FAM_sv_calls.db-info.tsv.gz
$ md5sum FAM_sv_calls.db-info.tsv.gz >FAM_sv_calls.db-info.tsv.gz.md5
$ gzip -c FAM_sv_calls.gts.tsv >FAM_sv_calls.gts.tsv.gz
$ md5sum FAM_sv_calls.gts.tsv.gz >FAM_sv_calls.gts.tsv.gz.md5
$ gzip -c FAM_sv_calls.feature-effects.tsv >FAM_sv_calls.feature-effects.tsv.gz
$ md5sum FAM_sv_calls.feature-effects.tsv.gz >FAM_sv_calls.feature-effectstsv.gz.md5
--------------
Variant Import
--------------
Expand Down Expand Up @@ -214,11 +283,13 @@ When executing the import as shown above, you have to specify:
- a pedigree file with suffix ``.ped``,
- a genotype annotation file as generated by ``varfish-annotator`` ending in ``.gts.tsv.gz``,
- a database info file as generated by ``varfish-annotator`` ending in ``.db-infos.tsv.gz``.
- a database info file as generated by ``varfish-annotator`` ending in ``.db-info.tsv.gz``.
Optionally, you can also specify a TSV file with BAM quality control metris ending in ``.bam-qc.tsv.gz``.
Currently, the format is not properly documented yet but documentation and supporting tools are forthcoming.
If you want to import structural variants for your case, then you simply submit the output files from the SV annotation step together with the the ``.feature-effects.tsv.gz`` and ``.gts.tsv.gz`` files from the small variant annotation step.
Running the import command through VarFish CLI will create a background import job as shown below.
Once the job is done, the created or updated case will appear in the case list.
Expand Down

0 comments on commit 1d2a291

Please sign in to comment.