Fix #4469 - parse TRGT STR VCF #4566

dnil · 2024-04-16T12:18:24Z

This PR adds a functionality or fixes a bug.
OR
This PR marks a new Scout release. We apply semantic versioning. This is a major/minor/patch release for reasons.

Testing on cg-vm1 server (Clinical Genomics Stockholm)

Prepare for testing

Make sure the PR is pushed and available on Docker Hub
Fist book your testing time using the Pax software available at https://pax.scilifelab.se/. The resource you are going to call dibs on is scout-stage and the server is cg-vm1.
ssh <USER.NAME>@cg-vm1.scilifelab.se
sudo -iu hiseq.clinical
ssh localhost
(optional) Find out which scout branch is currently deployed on cg-vm1: podman ps
Stop the service with current deployed branch: systemctl --user stop scout.target
Start the scout service with the branch to test: systemctl --user start scout@<this_branch>
Make sure the branch is deployed: systemctl --user status scout.target
After testing is done, repeat procedure at https://pax.scilifelab.se/, which will release the allocated resource (scout-stage) to be used for testing by other users.

Testing on hasta server (Clinical Genomics Stockholm)

Prepare for testing

ssh <USER.NAME>@hasta.scilifelab.se
Book your testing time using the Pax software. us; paxa -u <user> -s hasta -r scout-stage. You can also use the WSGI Pax app available at https://pax.scilifelab.se/.
(optional) Find out which scout branch is currently deployed on cg-vm1: conda activate S_scout; pip freeze | grep scout-browser
Deploy the branch to test: bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_scout -t scout -b <this_branch>
Make sure the branch is deployed: us; scout --version
After testing is done, repeat the paxa procedure, which will release the allocated resource (scout-stage) to be used for testing by other users.

How to test:

how to test it, possibly with real cases/data

Expected outcome:
The functionality should be working
Take a screenshot and attach or copy/paste the output.

Review:

code approved by
tests executed by

codecov · 2024-04-16T12:27:19Z

Codecov Report

Attention: Patch coverage is 63.88889% with 26 lines in your changes are missing coverage. Please review.

Project coverage is 84.53%. Comparing base (33dd4b5) to head (db9e50f).

Files	Patch %	Lines
scout/parse/variant/genotype.py	53.33%	21 Missing ⚠️
scout/server/blueprints/variants/controllers.py	72.22%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4566      +/-   ##
==========================================
- Coverage   84.61%   84.53%   -0.09%     
==========================================
  Files         310      310              
  Lines       18679    18744      +65     
==========================================
+ Hits        15805    15845      +40     
- Misses       2874     2899      +25

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dnil · 2024-05-02T14:34:51Z

Ok, actually working! There are a couple of small things missing in Stranger now (Clinical-Genomics/stranger#58), but once they are in place we can release it, and make sure we are parsing the release version ok with this PR.

northwestwitch · 2024-05-06T05:56:02Z

Would be nice to add this one to the new release. Who's with the missing things in STranger? Otherwise in the next release?

dnil · 2024-05-06T06:01:21Z

Would be nice to add this one to the new release. Who's with the missing things in STranger? Otherwise in the next release?

Well, feel free to review: it would be nice with some input. In my mind right now the further additions would be in STRanger and possibly the reference files, but I conservatively kept this on hold since having things like the REF count visible has been useful in the past. Not that it’s strictly needed.

northwestwitch

Looks good to me. Fine to merge since it works I think. I have a few minor suggestions

northwestwitch · 2024-05-06T06:14:52Z

CHANGELOG.md

@@ -11,9 +11,11 @@ About changelog [here](https://keepachangelog.com/en/1.0.0/)
 - STR variant information card with database links, replacing empty frequency panel
 - Display paging and number of HPO terms available in the database on Phenotypes page
 - On case page, typeahead hints when searching for a disease using substrings containing source ("OMIM:", "ORPHA:")
+t


Suggested change

t

northwestwitch · 2024-05-06T06:17:10Z

CHANGELOG.md

 - Button to monitor the status of submissions on ClinVar Submissions page
 - Option to filter cancer variants by number of observations in somatic and germline archived database
 - Documentation for integrating chanjo2
+- Parse TRGT STR VCF


Suggested change

- Parse TRGT STR VCF

- Parse Tandem repeat genotyping (TRGT) tags from STR VCFs

northwestwitch · 2024-05-06T06:21:46Z

scout/build/variant/variant.py

@@ -199,7 +199,12 @@ def build_variant(
    variant_obj["str_pathologic_min"] = variant.get("str_pathologic_min")
    variant_obj["str_ref"] = variant.get("str_ref")
    variant_obj["str_repid"] = variant.get("str_repid")
+    variant_obj["str_trid"] = variant.get("str_trid")


I know that we have long lists of key/values in this build_variant function, but without making huge changes to it, what about having all these strs keys/values into a constant and then call a specific function (outside this one) to assign these values in a loop? It would be less code and more readable

I think this kind of transformation should be done using a class, why not a Pydantic one since we have started using them. I would prefer to do that as a separate PR, knowing that we tend to introduce some issues with empty and missing values when we convert to Pydantic if it's ok with you.

northwestwitch · 2024-05-06T06:43:56Z