Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small fixes for TRGT #59

Merged
merged 8 commits into from
May 17, 2024
Merged

Conversation

fellen31
Copy link
Contributor

This PR adds | fixes:

How to prepare for test:

  • ssh to ...
  • Install on stage:
    bash servers/resources/SERVER.scilifelab.se/update-[THIS_TOOL]-stage.sh [THIS-BRANCH-NAME]

How to test:

Expected outcome:

  • [ ]

Review:

  • Code approved by
  • Tests executed by
  • "Merge and deploy" approved by

This version is a:

  • MAJOR - when you make incompatible API changes
  • MINOR - when you add functionality in a backwards compatible manner
  • PATCH - when you make backwards compatible bug fixes or documentation/instructions

@fellen31 fellen31 marked this pull request as ready for review May 16, 2024 08:56
@fellen31 fellen31 requested a review from dnil May 16, 2024 08:56
Copy link
Collaborator

@dnil dnil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you kindly! Sorry for the state of the onion; we haven't had a contribution here in a while. Cool, we can try to clean up a bit in time to make it easier to piece in new stuff!

if not repeat_id:
repeat_id = variant_info['info_dict'].get('TRID').split('_')[1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like (its only two repetitions so far) a function like

def get_repeat_id(variant_info):

would be welcome, now that its a little bit more code. Besides it lowers apparent complexity with the less C-like and PEPpy.

if this:
    do this thing
    return this_thing
if that:
   do that thing
    return that_thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely!

@@ -236,6 +249,10 @@ def get_trgt_repeat_res(variant_info, repeat_info):
for allele in mc.split(","):
mcs = allele.split('_')
# GT would have the index of the MC in the ALT field list if we wanted to be specific...

# What should we do if MC is . ?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I hadn't seen that one? Is that from TRGT or sth from STRdust? Do they mean completely missing from motifs, or is it a shorthand for as-reference? ExHu was kind enough to calculate and pass the reference size given the current locus definitions, but I guess for TRGT we would perhaps pass it from the repeat definition file.
We should check what the callers intended here I think!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for TRGT. Probably when the repeat could not be called. Some examples:

chr16   17470908        .       GCCGCCGCCGCCGCCC        .       0       .       TRID=XYLT1;END=17470923;MOTIFS=GCC;STRUC=(GCC)n        GT:AL:ALLR:SD:MC:MS:AP:AM       .:.:.:.:.:.:.:.
chr16   24613440        .       TTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTAT    .       0       .       TRID=TNRC6A;END=24613530;MOTIFS=TTTTA,TTTCA;STRUC=(TTTTA)n(TTTCA)n(TTTTA)n      GT:AL:ALLR:SD:MC:MS:AP:AM      .:.:.:.:.:.:.:.
chr16   66490399        .       TAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATAATAAAATAAAAA   .       0              .       TRID=BEAN1;END=66490467;MOTIFS=TGGAA,TAAAA;STRUC=(TGGAA)n(TAAAA)n       GT:AL:ALLR:SD:MC:MS:AP:AM              .:.:.:.:.:.:.:.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahoy! Yes, could be missing or reality saying hi with some unexpected alleles. I guess we also have to treat it as no-call rather than reference. It's unfortunate that TRGT does seem to set QUAL==0 and no FILTER=='.' for everything. For ExHu we did set the size to 0 for . so I guess we might as well here!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I run a test-dataset with reads that only covers a few of the pathogenic repeats.

@fellen31
Copy link
Contributor Author

I have made some updates the code @dnil, let me know what you think.

Copy link
Collaborator

@dnil dnil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

stranger/utils.py Outdated Show resolved Hide resolved
repeat_id = variant_info['info_dict'].get('REPID')
if not repeat_id:
repeat_id = variant_info['info_dict'].get('TRID').split('_')[1]
repeat_id = get_repeat_id(variant_info)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯


# What should we do if MC is . ?
if allele == ".":
repeat_res.extend([0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

fellen31 and others added 2 commits May 17, 2024 14:22
Co-authored-by: Daniel Nilsson <daniel.k.nilsson@gmail.com>
@fellen31
Copy link
Contributor Author

Let me know if it's ok to merge.

Would be great if we could get a bioconda update as well, then I could add Stranger to Nallo (with TRGT for now).

@dnil
Copy link
Collaborator

dnil commented May 17, 2024

I think this is good to go, but I noticed we have no Changelog-missing warning automation! 😁
Perhaps add a sentence to the unreleased section [x.x.x] here, then merge when you are done, and I can make a new release and update on PyPi. I will have to investigate how that works on bioconda: not sure if they will immediately update as well, or if we have to trigger it.

@fellen31 fellen31 merged commit d0afca3 into Clinical-Genomics:master May 17, 2024
@fellen31 fellen31 deleted the trgt-bugfixes branch May 17, 2024 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants