Skip to content

Commit

Permalink
Documenting new Clinvar aggregation (#640).
Browse files Browse the repository at this point in the history
Related-Issue: #640
Closes: #640
Projected-Results-Impact: require-revalidation
  • Loading branch information
holtgrewe committed Sep 9, 2022
1 parent db01542 commit db8f83f
Show file tree
Hide file tree
Showing 4 changed files with 91 additions and 141 deletions.
10 changes: 8 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,9 @@ The following combinations have been validated / are supported to work.
============== =========== =================
VarFish Server VarFish CLI VarFish Annotator
============== =========== =================
v1.*.* v0.3.0 v0.21
v1.2.2 v0.3.0 v0.21
v1.2.1 v0.3.0 v0.21
v1.2.0 v0.3.0 v0.21
============== =========== =================

----------------------------------------
Expand All @@ -80,5 +82,9 @@ The following combinations have been validated / are supported to work.
============== ============ =====================
VarFish Server Data Release VarFish DB Downloader
============== ============ =====================
v1.*.* 20210728 v0.3.*
v1.2.2 20210728c v0.3.*
v1.2.1 20210728 v0.3.*
v1.2.1 20210728b v0.3.*
v1.2.0 20210728 v0.3.*
v1.2.0 20210728b v0.3.*
============== ============ =====================
13 changes: 13 additions & 0 deletions docs_manual/admin_install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,19 @@ You will observe that the database came with some demo data sets of public IGSR
:align: center
:width: 80%

--------------------------
Applying Necessary Updates
--------------------------

Depending on the VarFish version that you installed, you might need to install some data patches as described in :ref:`admin_upgrade`.
Notably:

- If you are using the original data version ``20210728`` then you need to apply the patch to ``20210728b`` as described in :ref:`admin_upgrade_data_release_20210728`.
- When installing ``v1.2.1`` then you need to apply the patch to data version ``20210728c``.
This is done by first upgrading to ``20210728b`` (if you are not there already) and then upgrading to ``20210728c`` as described in :ref:`admin_update_1_2_1_to_1_2_2`.

In case of any issues, contact us in the `Github Discussion <https://github.com/bihealth/varfish-server/discussions>`__ or directly by email.

---------------------
Updating the Database
---------------------
Expand Down
58 changes: 58 additions & 0 deletions docs_manual/admin_upgrade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ Upgrade Varfish Installation

This section contains upgrade instructions for upgrading your VarFish Server installation using `VarFish Docker Compose <https://github.com/bihealth/varfish-docker-compose>`__.

.. _admin_upgrade_data_release_20210728:

-------------------------------------------------
Problem with Data Release ``20210728`` and GRCh37
-------------------------------------------------
Expand All @@ -30,6 +32,9 @@ You can find out more details, give feedback, and ask for help `in this Github d
v1.2.* to v2.*.*
----------------

**ClinVar Changes**
Please follow the instructions described in :ref:`admin_update_1_2_1_to_1_2_2` if you start at v1.2.1.

**In-House Background Database.**
A number of changes were made to the implementation of the background database.
The upgrade will re-create the in-house database as empty.
Expand Down Expand Up @@ -70,6 +75,59 @@ You can fill the fields with reasonable values (that will work well for all case

This is not strictly necessary and it is recommended to re-annotate and re-import.

.. _admin_update_1_2_1_to_1_2_2:

----------------
v1.2.1 to v1.2.2
----------------

**ClinVar Updates**
First, make sure that you have upgraded the data to ``20210728b`` following :ref:`admin_upgrade_data_release_20210728`.
Then, upgrade by just updating your ``varfish-docker-compose`` repository clone and calling ``docker-compose down && docker-compose up -d``.

Next, patch to data version ``20210728c`` using the following instructions.

We have made a larger change to the ClinVar database.
You will have to re-import the ClinVar database after upgrade follows.

Download the appropriate data patch from our file server:

GRCh37
`varfish-server-background-db-20210728c-grch37.tar.gz <https://file-public.cubi.bihealth.org/transient/varfish/anthenea/varfish-server-background-db-20210728c-grch37.tar.gz>`__

GRCh38
`varfish-server-background-db-20210728c-grch38.tar.gz <https://file-public.cubi.bihealth.org/transient/varfish/anthenea/varfish-server-background-db-20210728c-grch38.tar.gz>`__

Extract the output to a folder on your VarFish server, e.g., ``/data/varfish-data/varfish-server-background-db-20210728c-grch37``, such that this folder contains a file ``import_versions.tsv``.
Next, edit the ``docker-compose.yml`` file of your ``varfish-docker-compose`` such that the ``varfish-web`` entry's ``volumes`` field reads as follows.

::

volumes:
- "/data:/data:ro"

Then, restart VarFish by calling ``docker-compose down && docker-compose up -d``.
After startup, you can now do the following if you use GRCh37:

::

docker exec -it varfish-docker-compose_varfish-web_1 python manage.py \
import_tables --force --truncate --tables-path /data/varfish-server-background-db-20210728c-grch37 \
--threads=0

If you use GRCh38, use

::

docker exec -it varfish-docker-compose_varfish-web_1 python manage.py \
import_tables --force --truncate --tables-path /data/varfish-server-background-db-20210728c-grch38 \
--threads=0

This will import the ClinVar version from the 20210728 release in the fixed format compatible with ``v1.2.2``.
Note that this will also import a patch to the TAD data in

In case of any issues, contact us in the `Github Discussion <https://github.com/bihealth/varfish-server/discussions>`__ or directly by email.

------------------
v0.23.0 to v1.2.1
------------------
Expand Down
151 changes: 12 additions & 139 deletions docs_manual/notes_clinvar.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,146 +5,19 @@ ClinVar Notes
=============

This section contains notes regarding ClinVar and its integration into VarFish.
It outlines issues with the interpretation of variants as well as their resolution in VarFish and the rationale for the taken decisions.
Starting with VarFish ``v1.2.2`` and data version ``20210728c``, VarFish provides two different annotation modes.
These differ in how they require assessment criteria in the aggregation.

ClinVar entries have two major labels:
Normal ClinVar Mode (default)
Records are merged as in ClinVar.
That is, "practice guideline" is preferred over all others.
Then, "reviewed by expert panel" takes precedence.
In all other cases, if there are submissions with assessment criteria then only these are interpreted.

variant assertion
The assertion about the pathogenicity of a variant, e.g., *likely benign* or *pathogenic*.
review status
A grading of how well a variant is reviewed.
This is shown as a star rating on the ClinVar website.
"Paranoid" Mode
In this mode, submissions with and without assessment criteria are considered to be on one level.

Some reference ClinVar records (RVC identifiers) refer to one submission (SCV identifiers).
Multiple reference ClinVar records are summarised in variant ClinVar records (VCV identifiers).
When using "Normal ClinVar Mode", you should get the same aggregated summary as in ClinVar VCV records.
The only source of differences should be that the local VarFish version will be outdated when compared to ClinVar.

----------------------------
Review Status Interpretation
----------------------------

The interpretation of the status of a ClinVar record can be challenging.
This is caused by two points.

Overall, there are the following occurences in ClinVar of clinvar assertion (June 4, 2020).
Note that some only make sense together with the others (e.g., "no conflicts" only makes sense if there is more than one submission).

======== ================================
Count ClinVar Status
======== ================================
12,342 conflicting interpretations
839,966 criteria provided
55,467 multiple submitters
71,858 no assertion criteria provided
17,068 no assertion provided
55,467 no conflicts
5,751 practice guideline
11,172 reviewed by expert panel
772,157 single submitter
======== ================================

**In ClinVar** the star ratings are assigned as follows

===== ===========================================================
Stars Description
===== ===========================================================
none no assertion criteria provided **OR**
single submitter, no assertion provided
one single submitter, criteria provided **OR**
criteria provided &
multiple submitters, conflicting interpretations
two criteria provided, multiple submitters, no conflicts
three reviewed by expert panel
four practice guideline
===== ===========================================================

In particular, the missing distinction between "no assertion criteria provided" and "no assertion provided" is misleading.
Also, it can be misleading that records with an assertion criteria override those without.
In several records, good literature has been curated without an assertion criteria while many records from clinical testing companies have an assertion criteria but no phenotype and less diligence has been made as with good research.

==========================
Merging of ClinVar Records
==========================

The algorithm for merging multiple records in ClinVar to display the VCV records is not public.
Also, given the issues with ClinVar's star rating from above, VarFish uses a modified display from ClinVar's.
Instead of ClinVar's gold stars, VarFish assigns points.

======== ==================================================
Points Condition
======== ==================================================
none origin is somatic **OR** no assertion provided
one single submitter **OR**
multiple submitters, conflicting interpretations
two multiple submitters, no conflicting interpretation
three reviewed by expert panel **OR**
practice guideline
======== ==================================================

Importantly, Varfish will still display all ClinVar records in the variant display and link out to ClinVar so the user can make their own assessment.
The role of ClinVar in VarFish is to assist the user in quickly find variants present in ClinVar and not to override the user in any way.

The rationale:

- ClinVar entries for somatic variants and those without a variant assessment are of little interest.
- Multiple submitters are better than one submitter, regardless of the assertion criteria.
Requiring assertion criteria or expert panel status is good for ClinVar to foster submission of assertion criteria or applications for expert panels but less important for VarFish users.
- Variants for practice guideline are less important for VarFish's use case.
Thus, collapsing them with "reviewed by expert panel" should not make a problem.

VarFish merges ClinVar records based on the following algorithm.

0. Generally, *benign* and *likely benign* is merged to *likely benign/benign*, same for *pathogenic* and *likely pathogenic*.
Records with *uncertain significance* are ignored in merging if there is at least one *(likely) benign/pathogenic* assessment.
1. Records flagged with *practice guideline* or *expert panel* will be assigned three points and override any other assessment.
Within three point variants, practice guideline beat expert panel.
2. In the case that there is only one record, that record's assessment is used.
Note that this will include RCV records in ClinVar that are already merged.
Assign one point.
3. In the case of two or more records:

- Ignore *uncertain significance* records as outlined in (0).
- If there are conflicting interpretations, mark the record as such.
- Otherwise, merge *likely* and non-*likely* assertions and add *no conflicting interpretation* if more than one non-*uncertain significance* record.
- Assign one point in case of conflicts and two points in case of consistency.

Further, each variant is annotated with an ACMG-style rating.
In the case of having an "likely X/X" assertion, ACMG:1.5 or ACMG:4.5 is assigned.
In the case of conflicting assertions, an ACMG score of 3 is assigned but the variant is flagged with a "C" to indicate conflicting interpretations.
Note that uncertain vs. benign does not create a conflict as well as uncertain vs. pathogenic.

========
Examples
========

1. INPUT
- practice guideline, likely pathogenic
- reviewed by expert panel, likely pathogenic
- single submitter, pathogenic
OUTPUT
- reviewed by expert panel, likely pathogenic
- three points; ACMG:4-LP
2. INPUT
- single submitter, pathogenic
- multiple submitters, no conflict, likely pathogenic
OUTPUT
- multiple submitters, no conflict, likely pathogenic/pathogenic
- two points; ACMG:4.5-LP-P
3. INPUT
- single submitter, pathogenic
- single submitter, uncertain significance
- single submitter, likely pathogenic
OUTPUT
- multiple submitters, no conflict, likely pathogenic/pathogenic
- two points; ACMG:4.5-LP-P
4. INPUT
- single submitter, pathogenic
- multiple submitters, uncertain significance
OUTPUT
- single submitter, likely pathogenic
- one point; ACMG:4-LP
5. INPUT
- single submitter, pathogenic
- multiple single submitters, likely benign
OUTPUT
- multiple submitters, conflicting interpretations, uncertain significance
- one point; ACMG:3
In "Paranoid Mode" you will get many more conflicts and pathogenic variants because the submissions without assessment criteria are sometimes of lower quality and generate noise.

0 comments on commit db8f83f

Please sign in to comment.