-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor BAI I/O #96
Refactor BAI I/O #96
Conversation
Codecov Report
@@ Coverage Diff @@
## master #96 +/- ##
==========================================
+ Coverage 83.28% 83.86% +0.57%
==========================================
Files 60 60
Lines 4129 4122 -7
Branches 444 430 -14
==========================================
+ Hits 3439 3457 +18
+ Misses 246 235 -11
+ Partials 444 430 -14
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good refinement. I've pointed out a few bugs and improvements. Please check them.
src/cljam/io/bam_index/writer.clj
Outdated
[cljam.io.bam-index.chunk :as chunk]) | ||
(:import [java.nio ByteBuffer ByteOrder])) | ||
[cljam.io.bam-index.chunk :as chunk] | ||
[cljam.io.bam.decoder]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cljam.io.bam.decoder
does not seem to be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the decoder was used to perform decoding in parallel for better performance.
I restored the decoding part.
Thank you.
src/cljam/io/bam/core.clj
Outdated
(throw (IOException. "Could not find BAM Index file")))))) | ||
(defn- bam-index | ||
"Load an index file (BAI) for the given BAM file." | ||
[bam-path & [{:keys [ignore]}]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
& []
allows meaningless arguments. [bam-path {:keys [ignore]}]
is simple and strict.
src/cljam/io/bam/core.clj
Outdated
(let [bai-path (->> ["$1.bai" ".bai" "$1.BAI" ".BAI"] | ||
(eduction | ||
(comp | ||
(map #(cstr/replace bam-path #"(?i)(\.bam)" %)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$
should be added to this regexp: #"(?i)(\.bam)$"
. If not so, the parent path is affected. For example, path/to/foo.bam_dir/foo.bam
is replaced by path/to/foo.bam.bai_dir/foo.bam.bai
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out! I didn't notice that. 🤕
src/cljam/io/bam/core.clj
Outdated
(map #(cstr/replace bam-path #"(?i)(\.bam)" %)) | ||
(filter #(.isFile (cio/file %))))) | ||
first)] | ||
(if (and bai-path (not ignore)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(bam-index "test-resources/bam/test.bam" {:ignore true})
;;=> FileNotFoundException
:ignore true
throws a exception regardless whether the index exists or not. Please ignore the index when :ignore true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment.
As you said, I've changed the behavior from the master version in this PR.
The :ignore
option was originally introduced to skip loading an index file before constructing BAMReader.
But because cljam.io.bam.core/bam-index
is now always called in delay
,
the actual loading of the index file (or throwing an exception) will happen when the first time a random access is queried.
So it's just a matter of the position of throwing exceptions:
- master: nil-handling part in BAM reader
- this PR: inside of the bam-index function
And I think the latter is simpler and more consistent.
What do you think of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your intention and agree to the change. But if so, :ignore
option seems not to be needed. Is it left for compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's been only a kind of assertion for a while...
I think it's not so hard to handle the compatibility issues on my other products.
I'll remove the options if it's okay with you 🤓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no problem. Please go ahead. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a commit to remove the :ignore
option and I confirmed that it passes test :all
.
src/cljam/io/bam_index/writer.clj
Outdated
@@ -76,11 +77,11 @@ | |||
(defn- update-bin-index | |||
[bin-index aln] | |||
(let [bin (sam-util/compute-bin aln) | |||
achunk (:chunk (:meta aln))] | |||
achunk ^Chunk (chunk/map->Chunk (:chunk (:meta aln)))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Record's map->Foo
function is slow. I recommend using ->Foo
function or Java constructor Foo.
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your advice. I replaced it with Chunk.
.
d61c9d7
to
36364d8
Compare
Thank you for reviewing! I have a question about ignoring bai: #96 (comment) |
e029b45
to
67bbf35
Compare
LGTM. Thank you! |
I appreciate your helpful comments 😄 |
summary
Overall refactoring of BAI module, including performance improvements.
changes
"foo.bam"
,["foo.bam.bai", "foo.bai", "foo.bam.BAI", "foo.BAI"]
are now looked up.Thanks.