Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 3'-rule errors for certain sequence patterns #75

Merged
merged 4 commits into from
Apr 27, 2023
Merged

Conversation

federkasten
Copy link
Member

@federkasten federkasten commented Apr 17, 2023

Resolve #72 and #74.

This PR addresses the problem of 3' shifting reported in issue #74 and the out-of-bounds exception reported in issue #72.

I have made some refinements to the varity.vcf-to-hgvs. Specifically, I have fixed varity.vcf-to-hgvs.common/backward-shift function to work correctly for certain sequence patterns and added error handling to varity.vcf-to-hgvs.common/apply-3'-rule for rare cases.

Additionally, I have fixed the exon boundary handling in varity.vcf-to-hgvs.protein/read-sequence-info, which was implemented in #68.

Added test cases is originates from #71 and #73 by @nokara26 Thanks! ❤️

@codecov
Copy link

codecov bot commented Apr 17, 2023

Codecov Report

Merging #75 (1854562) into master (e51d56c) will decrease coverage by 0.03%.
The diff coverage is 26.66%.

@@            Coverage Diff             @@
##           master      #75      +/-   ##
==========================================
- Coverage   45.63%   45.60%   -0.03%     
==========================================
  Files          16       16              
  Lines        2003     2004       +1     
  Branches       64       64              
==========================================
  Hits          914      914              
- Misses       1025     1026       +1     
  Partials       64       64              
Impacted Files Coverage Δ
src/varity/vcf_to_hgvs/protein.clj 28.05% <0.00%> (-0.16%) ⬇️
src/varity/vcf_to_hgvs/common.clj 66.25% <66.66%> (+0.41%) ⬆️

@federkasten federkasten changed the title Fix 3'-rule errors in rare cases Fix 3'-rule errors for certain sequence patterns Apr 22, 2023
Comment on lines +311 to +312
(if (= alt \*)
(protein-substitution (+ ppos offset) (str ref) (str alt)) ; eventually fs-ter-substitution
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 In certain cases, frameshift would be fixed as fs-ter-substitution by the 3'-rule application to protein sequence. So I have added the fs-ter-substition check here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, such cases potentially occur as you said. It is extremely rare.

Do you have any case of that? I would like you to add the test case if you already have.

No problem even if you do not have the case. I will approve this PR even then because I think it is a bit difficult to prepare the test case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! I think my REPL history still contains the details. I will try to retrieve it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the variant query I attempted appears no longer comes in this path by the backward-shift fix. 🤔

Instead, I have added test cases for varity.vcf-to-hgvs.protein/mutation in 1854562, which includes inputs before and after the 3'-rule of coding DNA. The 3'-rule for protein sequence and fs-ter-substitution check work correctly in both cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thank you very much. The test case using varity.vcf-to-hgvs.protein/mutation looks good to me.

@federkasten federkasten self-assigned this Apr 22, 2023
@federkasten federkasten marked this pull request as ready for review April 22, 2023 08:06
Copy link
Contributor

@nokara26 nokara26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix! LGTM👍

Copy link
Member

@totakke totakke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing the 3'-rule. The implementation looks good to me. I have left only a comment about a test case of fs-ter-substitution after 3'-rule.

Comment on lines +311 to +312
(if (= alt \*)
(protein-substitution (+ ppos offset) (str ref) (str alt)) ; eventually fs-ter-substitution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, such cases potentially occur as you said. It is extremely rare.

Do you have any case of that? I would like you to add the test case if you already have.

No problem even if you do not have the case. I will approve this PR even then because I think it is a bit difficult to prepare the test case.

Comment on lines +311 to +312
(if (= alt \*)
(protein-substitution (+ ppos offset) (str ref) (str alt)) ; eventually fs-ter-substitution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thank you very much. The test case using varity.vcf-to-hgvs.protein/mutation looks good to me.

@totakke totakke merged commit a155c38 into master Apr 27, 2023
31 checks passed
@totakke totakke deleted the fix/3-prime-rule branch April 27, 2023 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StringIndexOutOfBoundsException in apply-3'-rule
3 participants