New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #4469 - parse TRGT STR VCF #4566
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4566 +/- ##
==========================================
- Coverage 84.61% 84.53% -0.09%
==========================================
Files 310 310
Lines 18679 18744 +65
==========================================
+ Hits 15805 15845 +40
- Misses 2874 2899 +25 ☔ View full report in Codecov by Sentry. |
Ok, actually working! There are a couple of small things missing in Stranger now (Clinical-Genomics/stranger#58), but once they are in place we can release it, and make sure we are parsing the release version ok with this PR. |
Would be nice to add this one to the new release. Who's with the missing things in STranger? Otherwise in the next release? |
Well, feel free to review: it would be nice with some input. In my mind right now the further additions would be in STRanger and possibly the reference files, but I conservatively kept this on hold since having things like the REF count visible has been useful in the past. Not that it’s strictly needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Fine to merge since it works I think. I have a few minor suggestions
@@ -11,9 +11,11 @@ About changelog [here](https://keepachangelog.com/en/1.0.0/) | |||
- STR variant information card with database links, replacing empty frequency panel | |||
- Display paging and number of HPO terms available in the database on Phenotypes page | |||
- On case page, typeahead hints when searching for a disease using substrings containing source ("OMIM:", "ORPHA:") | |||
t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t |
- Button to monitor the status of submissions on ClinVar Submissions page | ||
- Option to filter cancer variants by number of observations in somatic and germline archived database | ||
- Documentation for integrating chanjo2 | ||
- Parse TRGT STR VCF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Parse TRGT STR VCF | |
- Parse Tandem repeat genotyping (TRGT) tags from STR VCFs |
@@ -199,7 +199,12 @@ def build_variant( | |||
variant_obj["str_pathologic_min"] = variant.get("str_pathologic_min") | |||
variant_obj["str_ref"] = variant.get("str_ref") | |||
variant_obj["str_repid"] = variant.get("str_repid") | |||
variant_obj["str_trid"] = variant.get("str_trid") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that we have long lists of key/values in this build_variant function, but without making huge changes to it, what about having all these strs keys/values into a constant and then call a specific function (outside this one) to assign these values in a loop? It would be less code and more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this kind of transformation should be done using a class, why not a Pydantic one since we have started using them. I would prefer to do that as a separate PR, knowing that we tend to introduce some issues with empty and missing values when we convert to Pydantic if it's ok with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok!
@@ -103,6 +103,14 @@ def parse_genotype(variant, ind, pos): | |||
(flanking_ref, flanking_alt) = _parse_format_entry(variant, pos, "ADFL") | |||
(inrepeat_ref, inrepeat_alt) = _parse_format_entry(variant, pos, "ADIR") | |||
|
|||
# TRGT long read STR specific | |||
(mc_ref, mc_alt) = _parse_format_entry_trgt_mc(variant, pos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(mc_ref, mc_alt) = _parse_format_entry_trgt_mc(variant, pos) | |
(_, mc_alt) = _parse_format_entry_trgt_mc(variant, pos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it's not used downstream?
@@ -395,14 +403,15 @@ def get_str_so(variant, pos): | |||
return str_so | |||
|
|||
|
|||
def _parse_format_entry(variant, pos, format_entry_name): | |||
def _parse_format_entry(variant, pos, format_entry_name, number_format=int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _parse_format_entry(variant, pos, format_entry_name, number_format=int): | |
def _parse_format_entry(variant: cyvcf2.Variant, pos: int, format_entry_name: str, number_format:Optional[Union[float. int]]=int) -> Tuple(Union[float, int]): |
values = list(value.split("/")) | ||
values = variant.format(format_entry_name)[pos] | ||
|
||
new_values = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise this could also work
values = re.split("/|,", values)
|
||
ref_value = None | ||
alt_value = None | ||
|
||
if len(values) > 1: | ||
ref_value = int(values[0]) | ||
alt_value = int(values[1]) | ||
ref_value = (number_format)(values[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
if ref_value >= 0: | ||
ref = ref_value | ||
if alt_value >= 0: | ||
alt = alt_value | ||
except (ValueError, TypeError) as _ignore_error: | ||
pass | ||
return (ref, alt) | ||
|
||
|
||
def _parse_format_entry_trgt_mc(variant, pos): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type hints and return type instead of having the long docstring. Instead the docstring couls contain a better explanation of what MC is
pathologic_struc = variant.INFO.get("PathologicStruc", None) | ||
|
||
pathologic_counts = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pathologic_struc = variant.INFO.get("PathologicStruc", None) | |
pathologic_counts = 0 | |
pathologic_struc = variant.INFO.get("PathologicStruc", None) | |
pathologic_counts = 0 |
These 2 could be moved down, after line 468 perhaps?
Quality Gate passedIssues Measures |
This PR adds a functionality or fixes a bug.
OR
This PR marks a new Scout release. We apply semantic versioning. This is a major/minor/patch release for reasons.
Testing on cg-vm1 server (Clinical Genomics Stockholm)
Prepare for testing
scout-stage
and the server iscg-vm1
.ssh <USER.NAME>@cg-vm1.scilifelab.se
sudo -iu hiseq.clinical
ssh localhost
podman ps
systemctl --user stop scout.target
systemctl --user start scout@<this_branch>
systemctl --user status scout.target
scout-stage
) to be used for testing by other users.Testing on hasta server (Clinical Genomics Stockholm)
Prepare for testing
ssh <USER.NAME>@hasta.scilifelab.se
us; paxa -u <user> -s hasta -r scout-stage
. You can also use the WSGI Pax app available at https://pax.scilifelab.se/.conda activate S_scout; pip freeze | grep scout-browser
bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_scout -t scout -b <this_branch>
us; scout --version
paxa
procedure, which will release the allocated resource (scout-stage
) to be used for testing by other users.How to test:
Expected outcome:
The functionality should be working
Take a screenshot and attach or copy/paste the output.
Review: