Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotate fusion genes #52

Merged
merged 9 commits into from
Jun 29, 2022
Merged

Annotate fusion genes #52

merged 9 commits into from
Jun 29, 2022

Conversation

alumi
Copy link
Member

@alumi alumi commented Jun 27, 2022

This PR adds a new namespace varity.fusion to annotate fusion genes for breakpoints.

varity.fusion assumes a variant in the format of VCF as an input.
varity.fusion/variant->breakpoints converts a variant into an breakpoint entry.

(defn variant->breakpoints
"Converts each allele in a VCF-style variant into a map used in
`varity.fusion`. Returns a lazy sequence of the same length with alleles."

Then, varity.fusion/fusion-transcripts annotates fusion transcripts for the breakpoint.
This is a function intended for ease of use and is just a simple combination of the following 3 functions.

(defn fusion-transcripts
"Like `fusion-genes` but returns only in-frame and actually fused genes. An
amino acid sequence is added as `:transcript` for each candidates."

  1. varity.fusion/fusion-genes searches for all possible gene pairs from refGene and determines if each pair can form a valid fusion transcript.

    (defn fusion-genes
    "Returns a lazy sequence of annotations of all possible combinations of fusion
    genes. An annotation is a map containing the following key-values:
    - `:breakpoints` Given breakpoints in the order of 5'-part, 3'-part.
    - `:genes` Candidate genes for each of the `:breakpoints`.
    - `:fusion?` `true` iff the genes form a fused transcript.
    - `:breakpoint-regions` Genic regions for each of the `:breakpoints`.
    - `:transcript-regions` A sequence of regions corresponding to each exon of a
    resulting transcript. May not be fused or may be `nil` if no transcription is
    expected."

  2. varity.fusion/in-frame? checks if a fused sequence is aligned in or out of reading frame.

    (defn in-frame?

  3. varity.fusion/transcript gives an amino acid sequence for a fusion gene.

    (defn transcript


The annotation algorithm is originally developed at Cancer Precision Medicine Center, Japanese Foundation for Cancer Research and I re-implemented it with some additional features such as:

  • support for inserted sequences
  • specialized handling of UTRs
  • wider support of VCF notations

Thankfully, @athos, @r6eve and @k-kom have already reviewed this PR internally.

@codecov
Copy link

codecov bot commented Jun 27, 2022

Codecov Report

Merging #52 (9a0dfdb) into master (f84c720) will increase coverage by 1.75%.
The diff coverage is 65.77%.

@@            Coverage Diff             @@
##           master      #52      +/-   ##
==========================================
+ Coverage   43.49%   45.24%   +1.75%     
==========================================
  Files          15       16       +1     
  Lines        1805     1954     +149     
  Branches       39       61      +22     
==========================================
+ Hits          785      884      +99     
- Misses        981     1009      +28     
- Partials       39       61      +22     
Impacted Files Coverage Δ
src/varity/fusion.clj 65.77% <65.77%> (ø)
src/varity/ref_gene.clj 80.75% <0.00%> (+0.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f84c720...9a0dfdb. Read the comment docs.

Copy link
Member

@totakke totakke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great feature. The implementation looks good to me. I have left a few trivial comments.

@@ -24,7 +24,8 @@
:1.8 {:dependencies [[org.clojure/clojure "1.8.0"]
[clojure-future-spec "1.9.0"]]}
:1.9 {:dependencies [[org.clojure/clojure "1.9.0"]]}
:1.10 {:dependencies [[org.clojure/clojure "1.10.3"]]}}
:1.10 {:dependencies [[org.clojure/clojure "1.10.3"]]}
:1.11 {:dependencies [[org.clojure/clojure "1.11.1"]]}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add 1.11 to also .github/workflows/build.yml.

clojure: ['1.8', '1.9', '1.10']

README.md Outdated
@@ -130,3 +130,7 @@ To convert a genomic coordinate between assemblies,
Copyright 2017-2021 [Xcoo, Inc.](https://xcoo.jp/)

Licensed under the [Apache License, Version 2.0](LICENSE).

## Acknowledgement
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prural Acknowledgements generally.

@alumi
Copy link
Member Author

alumi commented Jun 29, 2022

@totakke Thank you for your review! 🙏 I fixed the problems you pointed out by bf7445b and 9a0dfdb.

Copy link
Member

@totakke totakke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speedy. Thank you.

@totakke totakke merged commit af3c5a3 into master Jun 29, 2022
@totakke totakke deleted the feature/annotate-fusion branch June 29, 2022 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants