-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of deep decoding of BAM files. #77
Conversation
Codecov Report
@@ Coverage Diff @@
## master #77 +/- ##
==========================================
- Coverage 83.04% 82.76% -0.29%
==========================================
Files 59 59
Lines 3833 3840 +7
Branches 409 422 +13
==========================================
- Hits 3183 3178 -5
+ Misses 241 240 -1
- Partials 409 422 +13
Continue to review full report at Codecov.
|
90f12a1
to
7ec9d15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good performance, but some revisions are required. Please check comments.
PS. I tested performance of direct linking before. As a result, there were no significant difference in speed and size (size slightly reduced). So I didn't enable it. Of course, enabling it does not matter.
src/cljam/bam/decoder.clj
Outdated
[(.toString sb) ref-length])))) | ||
|
||
(defrecord SAMAlignment [^String qname ^int flag ^String rname ^int pos ^int end ^int mapq ^String cigar | ||
^String rnext ^int pnext ^int tlen ^String seq ^String qual options]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cljam.bam.reader/read-alignments
returns SAMAlignment
s, but cljam.sam.reader/read-alignments
still returns maps. Both read-alignments
should return SAMAlignment
consistently. Other defrecord
location should be considered.
Additionally, type hints of record fields seem to have no effect in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed cljam.sam.reader/read-alignments
to return SAMAlignment
s.
SAMAlignment
is now defined in cljam.io
without ^String
type hints.
[refs id] | ||
(if (<= 0 id (dec (count refs))) | ||
[refs ^long id] | ||
(when (<= 0 id (dec (count refs))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using when
? when
should be used for multi forms or forms including side effects. I think if
is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer when
because one can tell at a glance that the function may return nil
.
In clojure.core.clj, if
s without third arg are less than 1% of entire use of if
, while when
with single body is commonly used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think existence of side effects is more important than whether the function return nil
. The third argument of if
is not a serious matter. Well, OK in this case because style guide about if/when
hasn't been established yet.
src/cljam/bam/decoder.clj
Outdated
l-read-name | ||
n-cigar-op | ||
l-seq)) | ||
pos (inc (.getInt buffer)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes about buffer reading reduces readability and component merit. I think enhancing each lsb/read-***
function with protocol is better. If so, performance of cljam.bcf.reader
will improve as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced lsb/read-***
functions with a protocol and extensions.
Thank you for the reviewing! I confirmed that effects on the performance by these changes are limited. |
Thank you for fixes :) |
Thank you so much! |
This PR adds some performance optimization for reading BAM files.
defrecord
for alignment instead of maplazy-seq
case
:end
while decoding cigar code and cache themByteBuffer
.Sequential reading will get 50~60% faster when
:options
are not referred and 30~40% faster with:options
.