Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF3 I/O #143

Merged
merged 7 commits into from Aug 28, 2018
Merged

GFF3 I/O #143

merged 7 commits into from Aug 28, 2018

Conversation

alumi
Copy link
Member

@alumi alumi commented Aug 27, 2018

Summary

This PR adds basic reader/writer support for GFF3: general feature format version 3.
I'm not sure which repo this PR should go to, cljam or varity, since both refGene and gff3 are widely used for analyzing transcripts but most of I/O modules are implemented in cljam. I'll reopen this PR in varity if it is more suitable. Thank you.

Tests

  • lein check 馃啑
  • lein test :all 馃啑

@codecov
Copy link

codecov bot commented Aug 27, 2018

Codecov Report

Merging #143 into master will increase coverage by 0.23%.
The diff coverage is 90.2%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #143      +/-   ##
==========================================
+ Coverage    85.5%   85.74%   +0.23%     
==========================================
  Files          68       69       +1     
  Lines        4580     4825     +245     
  Branches      444      468      +24     
==========================================
+ Hits         3916     4137     +221     
  Misses        220      220              
- Partials      444      468      +24
Impacted Files Coverage 螖
src/cljam/io/gff.clj 90.2% <90.2%> (酶)

Continue to review full report at Codecov.

Legend - Click here to learn more
螖 = absolute <relative> (impact), 酶 = not affected, ? = missing data
Powered by Codecov. Last update 0a5b86c...dc33a76. Read the comment docs.

Copy link
Member

@totakke totakke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for enhancement. I think gff i/o may be in cljam for now. We probably should separate i/o and algo.

I added a few comments, please check them.

:encoder encode-db, :decoder decode-db},
"Ontology_term" {:index 9, :key :ontology-term,
:encoder encode-db, :decoder decode-db},
"Is_circular" {:index 10, :key :is-circular?,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel :circular? is more Clojure-ish.

:end (p/as-long end)
:score (p/as-double score)
;; +: forward, -: reverse, ?: unknown, nil: not-stranded
:strand (some-> strand dot->nil first (case \+ \+ \- \- \? \?))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strand expression differs from varity. Such a slight difference might cause confusion, so you should choose a way from the following for consistency:

  1. string (e.g. "+", "-") - varity style
  2. character (e.g. \+, \-) - varity change required
  3. keyword (e.g. :forward, :reverse) - varity change required

2 is incomplete in terms of both compatibility and predefined value. I think 3 is better, though it is breaking change for varity.

Copy link
Member

@athos athos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @alumi. Thank you for another nice contribution!

I added some minor comments.

(when-not version-directive
(throw
(ex-info
"GFF3 must starts with a `##gff-version 3.#.#` directive"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"must starts" -> "must start"

Also, "with the #gff-version ..." feels more appropriate (at least to me) rather than "with a #gff-version ..." since the GFF spec says "the ##gff-version pragma only appears once in a file."


(defn ^GFFWriter writer
"Returns an open `cljam.io.gff.GFFWriter` instance of `f`. Should be used
inside `with-open` to ensure the writer is properly closed. Can take a optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a optional" -> "an optional"

inside `with-open` to ensure the writer is properly closed. Can take a optional
argument `options`, a map containing `:version`, `:major-revision`,
`:minor-revision` and `:encoding`. Currently supporting only `:version` 3.
To compress outputs, set `:gzip` or `:bzip2` to `:encoding`."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"set ... to :encoding" -> "set :encoding to ..."


(defn- ^String encode-target [{:keys [chr start end reverse?]}]
(cstr/join \space (cond-> [(encode escape-in-target? chr) start end]
(some? reverse?) (conj (if reverse? \- \+)))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(some? reverse?)

I think it's too more specific than necessary. Just reverse? will do, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry! I misunderstood this code.

reverse? is three-valued, so it's necessary to distinguish nil from false, isn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right. I added a comment there to clarify the meaning. Thanks!

@alumi
Copy link
Member Author

alumi commented Aug 28, 2018

@totakke @athos Thanks for your comments! I added some commits 馃槂

@athos
Copy link
Member

athos commented Aug 28, 2018

LGTM 馃憤

@alumi
Copy link
Member Author

alumi commented Aug 28, 2018

Sorry for pushing a commit after review 馃檱
I changed the strandedness representation of Target to make it consistent with the strand column.

GFF3 tag GFF3 value cljam key cljam value semantics
strand column + :strand :forward positive strand
strand column - :strand :reverse minus strand
strand column ? :strand :unknown strandedness is relevant but unknown
strand column . :strand nil not-stranded
target attr + :strand :forward positive strand
target attr - :strand :reverse minus strand
target attr unspecified

@totakke totakke merged commit 1d898ee into master Aug 28, 2018
@totakke totakke deleted the feature/gff3-io branch August 28, 2018 09:38
@totakke
Copy link
Member

totakke commented Aug 28, 2018

Thanks! Good job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants