New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for CSI reader. #182
Conversation
Codecov Report
@@ Coverage Diff @@
## master #182 +/- ##
==========================================
+ Coverage 86.48% 86.64% +0.15%
==========================================
Files 75 76 +1
Lines 5926 6011 +85
Branches 497 499 +2
==========================================
+ Hits 5125 5208 +83
Misses 304 304
- Partials 497 499 +2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this PR! I've added some comments. I think we need to check the implementation of get-spans
carefully.
src/cljam/io/util/bin.clj
Outdated
(range (+ | ||
t | ||
(bit-shift-right beg s)) | ||
(+ t 1 (bit-shift-right (- end 1) s))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(+ t 1 (bit-shift-right (- end 1) s))))) | |
(+ t 1 (bit-shift-right (dec end) s))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for new CSI feature!
Sorry, I'm halfway through, I left some minor comments.
2d54548
to
9f3df7a
Compare
Thank you for pointing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fix.
I left comments about unused imports.
src/cljam/io/util/bin.clj
Outdated
@@ -1,44 +1,45 @@ | |||
(ns cljam.io.util.bin | |||
(:require [cljam.io.util.chunk :as util-chunk])) | |||
(:require [cljam.io.util.chunk :as util-chunk]) | |||
(:import [cljam.io.util.chunk Chunk])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused imports here. Using clj-kondo, we can catch that.
test/cljam/io/csi_test.clj
Outdated
[cljam.io.csi :as csi]) | ||
(:import | ||
[cljam.io.csi CSI] | ||
[cljam.io.util.chunk Chunk])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added more comments.
src/cljam/io/csi.clj
Outdated
(->> (filter #(< % (last bins)) oct-cumulative-sums) | ||
last | ||
inc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oct-cumulative-sums
is sorted by ascending order, so take-while
is better.
(->> (take-while #(< % (last bins)) oct-cumulative-sums)
last
inc)
src/cljam/io/csi.clj
Outdated
target-bin | ||
(first | ||
(filter #(< min-index %) bins)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bins
is also sorted.
target-bin (first (take-while #(< min-index %) bins))
src/cljam/io/csi.clj
Outdated
res (get (loffset ref-idx) target-bin)] | ||
(if res res 0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be (get (loffset ref-idx) target-bin 0)
test/cljam/io/util/bin_test.clj
Outdated
(util-bin/get-spans csi* 0 1 100000))))) | ||
|
||
(deftest reg->bins-test | ||
(are [x] (= [1 9 73 585 4681] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(are [x] (= [1 9 73 585 4681] | |
(are [x] (= [0 1 9 73 585 4681] |
src/cljam/io/csi.clj
Outdated
refs (range n-ref) | ||
bins (vec (repeatedly n-ref #(read-bin-index rdr))) | ||
bidx (zipmap refs (map #(into {} (map (juxt :bin :chunks)) %) bins)) | ||
max-bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant line break
max-bin | |
max-bin (apply + (map #(bit-shift-left 1 (* % 3)) (range (inc depth)))) |
src/cljam/io/csi.clj
Outdated
bidx (zipmap refs (map #(into {} (map (juxt :bin :chunks)) %) bins)) | ||
max-bin | ||
(apply + (map #(bit-shift-left 1 (* % 3)) (range (inc depth)))) | ||
loffset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
src/cljam/io/csi.clj
Outdated
(get-chunks [_ ref-idx bins] | ||
(into [] (mapcat (get bidx ref-idx) bins))) | ||
(get-min-offset [_ ref-idx beg bins] | ||
(let [oct-cumulative-sums |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant line break
(let [oct-cumulative-sums | |
(let [oct-cumulative-sums (->> (range (inc depth)) |
src/cljam/io/csi.clj
Outdated
util-bin/IBinaryIndex | ||
(get-chunks [_ ref-idx bins] | ||
(into [] (mapcat (get bidx ref-idx) bins))) | ||
(get-min-offset [_ ref-idx beg bins] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry I forgot that CSI doesn't always store l-offsets for all bins like BAI.
So in this comment #182 (comment), when bin 12
doesn't exist in CSI, we have to check the left bin 11
and then the parent bin 5
... recursively until we find an existing bin.
Note that the bin 11
is not contained in the result of reg->bins
in this case.
The current implementation will return the l-offset of the bin 13
if 12
doesn't exist, which results in unintentionally filtering necessary chunks out.
Thank you for pointing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some more trivial comments on styling issues.
src/cljam/io/tabix.clj
Outdated
(get (get (.lidx this) ref-idx) | ||
util-bin/IBinningIndex | ||
(get-chunks [_ ref-idx bins] | ||
(into [] (mapcat (get bidx ref-idx) bins))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use mapcat
transducer
(into [] (mapcat (get bidx ref-idx) bins))) | |
(into [] (mapcat (get bidx ref-idx)) bins)) |
or vec
(into [] (mapcat (get bidx ref-idx) bins))) | |
(vec (mapcat (get bidx ref-idx) bins))) |
src/cljam/io/bam_index/core.clj
Outdated
(util-bin/pos->lidx-offset beg common/linear-index-shift) 0))) | ||
util-bin/IBinningIndex | ||
(get-chunks [_ ref-idx bins] | ||
(into [] (mapcat (get bidx ref-idx) bins))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use mapcat
transducer
(into [] (mapcat (get bidx ref-idx) bins))) | |
(into [] (mapcat (get bidx ref-idx)) bins)) |
or vec
(into [] (mapcat (get bidx ref-idx) bins))) | |
(vec (mapcat (get bidx ref-idx) bins))) |
src/cljam/io/util/bin.clj
Outdated
(fn [^long d] | ||
(let [t | ||
(apply + (map (fn [^long x] (bit-shift-left 1 (* x 3))) | ||
(range (+ d 1)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(range (+ d 1)))) | |
(range (inc d)))) |
src/cljam/io/util/bin.clj
Outdated
(range (+ d 1)))) | ||
s (+ min-shift (* 3 (- depth d 1)))] | ||
(range (+ t (bit-shift-right beg s)) | ||
(+ t 1 (bit-shift-right (- end 1) s)))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(+ t 1 (bit-shift-right (- end 1) s)))))) | |
(+ t 1 (bit-shift-right (dec end) s)))))) |
src/cljam/io/csi.clj
Outdated
(deftype CSI [n-ref min-shift depth bidx loffset] | ||
util-bin/IBinningIndex | ||
(get-chunks [_ ref-idx bins] | ||
(into [] (mapcat (get bidx ref-idx) bins))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use mapcat
transducer
(into [] (mapcat (get bidx ref-idx) bins))) | |
(into [] (mapcat (get bidx ref-idx)) bins)) |
or vec
(into [] (mapcat (get bidx ref-idx) bins))) | |
(vec (mapcat (get bidx ref-idx) bins))) |
src/cljam/io/csi.clj
Outdated
last)] | ||
(or (->> bins | ||
(drop-while #(> min-index %)) | ||
(keep #(get (loffset ref-idx) %)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(keep #(get (loffset ref-idx) %)) | |
(keep (loffset ref-idx)) |
src/cljam/io/csi.clj
Outdated
(let [oct-cumulative-sums (->> (range (inc depth)) | ||
(map #(bit-shift-left 1 (* % 3))) | ||
(reductions +)) | ||
min-index (->> (take-while #(<= % (last bins)) oct-cumulative-sums) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer (foo bar)
over a threading macro with only 2 args (->> bar foo)
min-index (->> (take-while #(<= % (last bins)) oct-cumulative-sums) | |
min-index (last (take-while #(<= % (last bins)) oct-cumulative-sums)) |
or more concisely
(let [min-index (->> (range (inc depth))
(map #(bit-shift-left 1 (* % 3)))
(reductions +)
(take-while #(<= % (last bins)))
last)]
src/cljam/io/csi.clj
Outdated
min-index (->> (take-while #(<= % (last bins)) oct-cumulative-sums) | ||
last)] | ||
(or (->> bins | ||
(drop-while #(> min-index %)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra spaces between >
and min-index
. And partial
is preferable. https://guide.clojure.style/#partial
(drop-while #(> min-index %)) | |
(drop-while (partial > min-index)) |
e6b11b6
to
0706f5c
Compare
Sorry for my late. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing. I left more minor comments.
src/cljam/io/util/bin.clj
Outdated
(defn bin-beg ^long [^long bin ^long min-shift ^long depth] | ||
(let [level (bin-level bin)] | ||
(inc (* (- bin (first-bin-of-level level)) | ||
(bin-width-of-level level min-shift depth))))) | ||
|
||
(def ^:private reg->bins (memoize reg->bins*)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The position of defining memoized version is a little too far from original version.
src/cljam/io/csi.clj
Outdated
(comp (map (juxt :bin :loffset)) | ||
(filter #(<= (first %) max-bin)) | ||
(map | ||
#(vector | ||
(util-bin/bin-beg | ||
(first %) | ||
min-shift | ||
depth) | ||
(second %)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer literal collection syntax https://guide.clojure.style/#literal-col-syntax
(comp (map (juxt :bin :loffset))
(filter #(<= (first %) max-bin))
(map (fn [[bin loffset]]
[(util-bin/bin-beg bin
min-shift
depth)
loffset])))
test/cljam/io/util/bin_test.clj
Outdated
(is (= 1 (util-bin/bin-beg 4681 index-shift 5))) | ||
(is (= (+ 1 min-size) (util-bin/bin-beg 4682 index-shift 5))) | ||
(is (= (+ 1 (* min-size 2)) (util-bin/bin-beg 4683 index-shift 5))) | ||
(is (= 1 (util-bin/bin-beg 585 14 5))) | ||
(is (= (+ 1 (* min-size 8)) (util-bin/bin-beg 586 index-shift 5))) | ||
(is (= (+ 1 (* min-size 16)) (util-bin/bin-beg 587 index-shift 5))) | ||
(is (= (+ 1 (* min-size 24)) (util-bin/bin-beg 588 index-shift 5)))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra spaces here.
src/cljam/io/csi.clj
Outdated
|
||
(defn- read-chunks! | ||
[rdr] | ||
(->> #(Chunk. (lsb/read-long rdr) (lsb/read-long rdr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer constructor functions. https://guide.clojure.style/#record-constructors
src/cljam/io/csi.clj
Outdated
(->> #(hash-map | ||
:bin (lsb/read-int rdr) | ||
:loffset (lsb/read-long rdr) | ||
:chunks (read-chunks! rdr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In current implementation of Clojure, function arguments are certainly evaluated from left to right, but it's not very well known style.
I think using let form before hash-map
might be more reasonable for side-effects. Please correct me if I'm wrong.
https://clojure.org/reference/evaluation
Thank you for pointing out and suggestions. |
src/cljam/io/csi.clj
Outdated
|
||
(defn- read-chunks! | ||
[rdr] | ||
(->> #(chunk/->Chunk (lsb/read-long rdr) (lsb/read-long rdr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for nagging, would you use let form for side effects?
c0387f0
to
4d08f46
Compare
I'm sorry to overlook. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update 👍 The seeking logic looks good now. Added some trivial comments.
474d681
to
2c5dd95
Compare
Thank you for pointing. |
2c5dd95
to
be67998
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Would you squash the commits into a single one? Then I'll merge it.
be67998
to
a29c922
Compare
Thank you so much for your cooperation! |
This PR adds support for CSI reader to read BCF file randomly.
I've modify functions for other index file so that CSI can be handled by io/util/bin.
The referenced specification is CSIV1.