New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
GFF3 I/O #143
GFF3 I/O #143
Conversation
Codecov Report
@@ Coverage Diff @@
## master #143 +/- ##
==========================================
+ Coverage 85.5% 85.74% +0.23%
==========================================
Files 68 69 +1
Lines 4580 4825 +245
Branches 444 468 +24
==========================================
+ Hits 3916 4137 +221
Misses 220 220
- Partials 444 468 +24
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for enhancement. I think gff i/o may be in cljam for now. We probably should separate i/o and algo.
I added a few comments, please check them.
src/cljam/io/gff.clj
Outdated
:encoder encode-db, :decoder decode-db}, | ||
"Ontology_term" {:index 9, :key :ontology-term, | ||
:encoder encode-db, :decoder decode-db}, | ||
"Is_circular" {:index 10, :key :is-circular?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel :circular?
is more Clojure-ish.
src/cljam/io/gff.clj
Outdated
:end (p/as-long end) | ||
:score (p/as-double score) | ||
;; +: forward, -: reverse, ?: unknown, nil: not-stranded | ||
:strand (some-> strand dot->nil first (case \+ \+ \- \- \? \?)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This strand expression differs from varity. Such a slight difference might cause confusion, so you should choose a way from the following for consistency:
- string (e.g.
"+"
,"-"
) - varity style - character (e.g.
\+
,\-
) - varity change required - keyword (e.g.
:forward
,:reverse
) - varity change required
2 is incomplete in terms of both compatibility and predefined value. I think 3 is better, though it is breaking change for varity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @alumi. Thank you for another nice contribution!
I added some minor comments.
src/cljam/io/gff.clj
Outdated
(when-not version-directive | ||
(throw | ||
(ex-info | ||
"GFF3 must starts with a `##gff-version 3.#.#` directive" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"must starts" -> "must start"
Also, "with the #gff-version ..." feels more appropriate (at least to me) rather than "with a #gff-version ..." since the GFF spec says "the ##gff-version
pragma only appears once in a file."
src/cljam/io/gff.clj
Outdated
|
||
(defn ^GFFWriter writer | ||
"Returns an open `cljam.io.gff.GFFWriter` instance of `f`. Should be used | ||
inside `with-open` to ensure the writer is properly closed. Can take a optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a optional" -> "an optional"
src/cljam/io/gff.clj
Outdated
inside `with-open` to ensure the writer is properly closed. Can take a optional | ||
argument `options`, a map containing `:version`, `:major-revision`, | ||
`:minor-revision` and `:encoding`. Currently supporting only `:version` 3. | ||
To compress outputs, set `:gzip` or `:bzip2` to `:encoding`." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"set ... to :encoding" -> "set :encoding to ..."
src/cljam/io/gff.clj
Outdated
|
||
(defn- ^String encode-target [{:keys [chr start end reverse?]}] | ||
(cstr/join \space (cond-> [(encode escape-in-target? chr) start end] | ||
(some? reverse?) (conj (if reverse? \- \+))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(some? reverse?)
I think it's too more specific than necessary. Just reverse?
will do, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry! I misunderstood this code.
reverse?
is three-valued, so it's necessary to distinguish nil
from false
, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right. I added a comment there to clarify the meaning. Thanks!
LGTM 馃憤 |
Sorry for pushing a commit after review 馃檱
|
Thanks! Good job! |
Summary
This PR adds basic reader/writer support for GFF3: general feature format version 3.
I'm not sure which repo this PR should go to, cljam or varity, since both refGene and gff3 are widely used for analyzing transcripts but most of I/O modules are implemented in cljam. I'll reopen this PR in varity if it is more suitable. Thank you.
Tests
lein check
馃啑lein test :all
馃啑