Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix upstream and downstream sequence of sequence-info and delins process #81

Merged
merged 7 commits into from
Dec 25, 2023

Conversation

nokara26
Copy link
Contributor

I fixed the sequence-info and delins because error occurred when gene is transcribed from reverse strand and variant includes stop codon and upstream sequence from cds.

I found upstream and downstream sequence is not corrected when deletion is extended to outside of cds.
So I fix upstream sequence from cds-start and downstream sequence from cds-end, protein alteration type determinning process and delins process.

I executed lein run and lein run :slow and I confirmed all tests are passed.

@codecov
Copy link

codecov bot commented Oct 31, 2023

Codecov Report

Attention: 71 lines in your changes are missing coverage. Please review.

Comparison is base (eade95f) 45.64% compared to head (bbd85b6) 46.45%.

Files Patch % Lines
src/varity/vcf_to_hgvs/protein.clj 41.80% 66 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #81      +/-   ##
==========================================
+ Coverage   45.64%   46.45%   +0.81%     
==========================================
  Files          16       16              
  Lines        2007     2090      +83     
  Branches       66       70       +4     
==========================================
+ Hits          916      971      +55     
- Misses       1025     1049      +24     
- Partials       66       70       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@federkasten federkasten linked an issue Dec 21, 2023 that may be closed by this pull request
@nokara26 nokara26 marked this pull request as draft December 21, 2023 03:28
@nokara26 nokara26 marked this pull request as ready for review December 22, 2023 08:55
Copy link
Member

@federkasten federkasten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks PR. The strategy looks good to me! 🙆‍♂️

I pointed out some trivial points from a refactoring perspective.
Please check the comments.

Comment on lines 230 to 240
(defn- get-first-diff-aa-info
[pos ref-seq alt-seq]
(->> (map vector ref-seq alt-seq)
(drop (dec pos))
(map-indexed vector)
(drop-while (fn [[_ [r a]]] (= r a)))
first
((fn [[i [r a]]]
{:ppos (+ pos i)
:pref (str r)
:palt (str a)}))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prettier to use varity.vcf-to-hgvs.common/diff-bases:

Suggested change
(defn- get-first-diff-aa-info
[pos ref-seq alt-seq]
(->> (map vector ref-seq alt-seq)
(drop (dec pos))
(map-indexed vector)
(drop-while (fn [[_ [r a]]] (= r a)))
first
((fn [[i [r a]]]
{:ppos (+ pos i)
:pref (str r)
:palt (str a)}))))
(defn- get-first-diff-aa-info
[pos ref-seq alt-seq]
(let [[ref-only alt-only offset _] (diff-bases ref-seq alt-seq (dec pos))]
(when (and (seq ref-only)
(seq alt-only))
{:ppos (+ pos offset)
:pref (str (first ref-only))
:palt (str (first alt-only))})))

with extending as follows:

(defn diff-bases
  ([s1 s2]
   ;; diff-bases original implementation
   )
  ([s1 s2 offset]
   (diff-bases (subs s1 offset) (subs s2 offset))))

[pref palt ppos])
[del ins offset _] (diff-bases pref palt)
ndel (count del)
include-ter-site? (if ref-include-ter-site
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
include-ter-site? (if ref-include-ter-site
alt-retain-ter-site? (if ref-include-ter-site

Do you think renaming it would make more sense?

Comment on lines 368 to 369
(-> (pstring/split-at seq (dec pos))
last
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use subs:

Suggested change
(-> (pstring/split-at seq (dec pos))
last
(-> (subs seq (dec pos))

Comment on lines 292 to 294
(if ref-include-ter-site
:indel
:frame-shift))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try using cond to expand nesting from L287.

(some (fn [[_ e]] (when (<= e %) e)) (reverse alt-exon-ranges*)))]
{:ref-exon-seq ref-exon-seq1
:ref-prot-seq (codon/amino-acid-sequence (cond-> ref-exon-seq1
(if ref-include-ter-site
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The apply-offset function has gotten a bit too complicated. Please split it into a new function and simplify the implementation 🙏

[ref alt]
(or (and (= 1 (count alt))
(= (first ref) (first alt)))
(and (not (= 1 (count ref) (count alt)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(and (not (= 1 (count ref) (count alt)))
(and (not= 1 (count ref) (count alt))

Equivalent?

Comment on lines 84 to 99
(defn- is-deletion-variant?
[ref alt]
(or (and (= 1 (count alt))
(= (first ref) (first alt)))
(and (not (= 1 (count ref) (count alt)))
(not= (first ref) (first alt)))))

(defn- cds-start-upstream-to-cds-variant?
[cds-start pos ref]
(and (< pos cds-start)
(<= cds-start (dec (+ pos (count ref))))))

(defn- cds-to-cds-end-downstream-variant?
[cds-end pos ref]
(and (<= pos cds-end)
(< cds-end (dec (+ pos (count ref))))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add simple tests for these functions 🙏

ter-site-adjusted-alt-seq (make-ter-site-adjusted-alt-seq alt-exon-seq alt-up-exon-seq alt-down-exon-seq
strand cds-start cds-end pos ref)
ref-include-ter-site (include-ter-site? rg pos ref)
apply-offset* (apply-offset pos ref alt alt-exon-ranges* ref-include-ter-site)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply-offset and apply-offset* become more readable when you use the partial function.

(defn- is-deletion-variant?
[ref alt]
(or (and (= 1 (count alt))
(= (first ref) (first alt)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a tricky condition. How about checking the length of ref?

Copy link
Member

@federkasten federkasten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. LGTM 🙆‍♂️

@federkasten federkasten merged commit a9d0829 into master Dec 25, 2023
31 checks passed
@federkasten federkasten deleted the fix/seq-info-and-delins branch December 25, 2023 09:28
@nokara26 nokara26 mentioned this pull request Dec 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect translation for unaffected stop codon
2 participants