reverse_complement: Do line wrapping and reverse complement in one step #55

mbrubeck · 2017-03-04T01:19:42Z

This combines newline handling and the main "reverse complement" step into a single loop. The downside of this is that it can no longer treat the input as u16 in order to transform two bytes with one table lookup. The advantages are faster speed from making a single pass over the data, and no more unsafe code!

This runs about 18% faster than the master branch on my computer.

mbrubeck · 2017-03-04T03:16:14Z

Pushed some minor stylistic fixes.

mbrubeck · 2017-03-04T04:28:59Z

As an extra bonus, the sequence of operations in this version more closely resembles the C program, for anyone who wants to do side-by-side comparisons of the two.

mbrubeck · 2017-03-04T07:52:30Z

src/reverse_complement.rs

-/// Compute the reverse complement with the sequence split into two equal-sized slices.
-fn reverse_complement_left_right(left: &mut [u16], right: &mut [u16], tables: &Tables) {
+/// Compute the reverse complement for two contiguous chunks without line breaks.
+fn reverse_complement_chunk(left: &mut [u8], right: &mut [u8], tables: &Tables) {


We could reintroduce the u16-based lookup inside this function... I'll try this and see what impact it has.

I tried this and it did not change performance at all. The code is at mbrubeck@e133026.

TeXitoi

I like it.

I think I can write a comment with Ascii art to explain the loop, that would help the understanding. I'll try to write it today.

cristicbz · 2017-03-04T15:11:07Z

This is brilliant, I'd love to see it merged!

TeXitoi · 2017-03-04T22:19:52Z

I can't find a better explaination that what is already done. As Isaac closed the previous submission, do you prefer I submit or you submit?

And maybe we should wait some days on a bench without update before submitting it. That's too much update of the same bench for Isaac ;-)

cristicbz · 2017-03-04T22:27:22Z

src/reverse_complement.rs

+    for (x, y) in left.iter_mut().zip(right.iter_mut().rev()) {
+        let tmp = tables.cpl8(*x);
+        *x = tables.cpl8(*y);
+        *y = tmp;


FWIW, I think you could write this as *y = tables.cpl8(mem::replace(x, tables.cpl8(*y)))

cristicbz · 2017-03-04T22:28:43Z

src/reverse_complement.rs

-const SEQUENTIAL_SIZE: usize = 1024;
+/// Length of a normal line including the terminating \n.
+const LINE_LEN: usize = 61;
+const SEQUENTIAL_SIZE: usize = 2048;


Do you think it'd be worth making this some multiple of 61?

Do you think it'd be worth making this some multiple of 61?

I don't think it would make any noticeable difference.

cristicbz · 2017-03-04T22:39:39Z

src/reverse_complement.rs

@@ -10,12 +10,11 @@ extern crate rayon;
 extern crate memchr;

 use std::io::{Read, Write};
-use std::{io, ptr, slice};
+use std::{cmp, io, mem};
 use std::fs::File;

 struct Tables {


What do you think about removing this struct and just having a table: &[u8; 256] passed around and a

fn build_table() -> [u8; 256] { let mut table = [0u8; 256]; for (i, v) in table8.iter_mut().enumerate() { *v = match i as u8 { b'A' | b'a' => b'T', b'C' | b'c' => b'G', b'G' | b'g' => b'C', b'T' | b't' => b'A', b'U' | b'u' => b'A', b'M' | b'm' => b'K', b'R' | b'r' => b'Y', b'W' | b'w' => b'W', b'S' | b's' => b'S', b'Y' | b'y' => b'R', b'K' | b'k' => b'M', b'V' | b'v' => b'B', b'H' | b'h' => b'D', b'D' | b'd' => b'H', b'B' | b'b' => b'V', b'N' | b'n' => b'N', i => i, }; } }

called once in main. Or we could even go one better and do what C does:

const TABLE: &'static [u8] = b" \ TVGH CD M KN YSAABW R TVGH CD M KN YSAABW R";

The last one must generate bound check, so better having a [u8; 256] to elide bound checks.

That's true: you could fill it up all the way to 256 and do const TABLE: &[u8; 256] it's only 4 lines 64 chars. But yeah it's a bit silly.

I used the fn build_table() -> [u8; 256] approach, because I found it the most readable.

mbrubeck · 2017-03-04T23:18:17Z

As Isaac closed the previous submission, do you prefer I submit or you submit?

I'll create a new submission for this after it's stabilized for at least a couple of days.

mbrubeck · 2017-03-06T16:07:40Z

I haven't come up with any new improvements and am no longer actively working on this benchmark, so I'll submit it later today if there are no other comments or changes posted.

TeXitoi · 2017-03-06T16:20:56Z

OK for me, I'll merge when you provide the link to the submission.

mbrubeck · 2017-03-06T17:21:04Z

Pushed my final version, which contains comment edits and a few minor style changes since the previous push.

mbrubeck · 2017-03-06T20:13:54Z

https://alioth.debian.org/tracker/index.php?func=detail&aid=315641&group_id=100815&atid=413122

mbrubeck force-pushed the reverse_complement_bytes branch 3 times, most recently from 77e473d to d0625cc Compare March 4, 2017 03:15

mbrubeck force-pushed the reverse_complement_bytes branch 2 times, most recently from 84e26be to 4ce4f91 Compare March 4, 2017 05:05

mbrubeck commented Mar 4, 2017

View reviewed changes

TeXitoi approved these changes Mar 4, 2017

View reviewed changes

cristicbz reviewed Mar 4, 2017

View reviewed changes

mbrubeck force-pushed the reverse_complement_bytes branch from 4ce4f91 to 0ce7d17 Compare March 4, 2017 23:14

reverse_complement: Do line wrapping and reverse complement in one step

1135a92

mbrubeck force-pushed the reverse_complement_bytes branch from 0ce7d17 to 1135a92 Compare March 6, 2017 16:59

TeXitoi approved these changes Mar 6, 2017

View reviewed changes

TeXitoi merged commit 8077f4f into TeXitoi:master Mar 6, 2017

mbrubeck mentioned this pull request Mar 9, 2017

reverse_complement: Process input in chunks #59

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reverse_complement: Do line wrapping and reverse complement in one step #55

reverse_complement: Do line wrapping and reverse complement in one step #55

mbrubeck commented Mar 4, 2017

mbrubeck commented Mar 4, 2017

mbrubeck commented Mar 4, 2017 •

edited

mbrubeck Mar 4, 2017

mbrubeck Mar 4, 2017

TeXitoi left a comment

cristicbz commented Mar 4, 2017

TeXitoi commented Mar 4, 2017

cristicbz Mar 4, 2017

mbrubeck Mar 4, 2017

cristicbz Mar 4, 2017

mbrubeck Mar 4, 2017

cristicbz Mar 4, 2017

TeXitoi Mar 4, 2017

cristicbz Mar 4, 2017

mbrubeck Mar 4, 2017

mbrubeck commented Mar 4, 2017

mbrubeck commented Mar 6, 2017

TeXitoi commented Mar 6, 2017

mbrubeck commented Mar 6, 2017

mbrubeck commented Mar 6, 2017

reverse_complement: Do line wrapping and reverse complement in one step #55

reverse_complement: Do line wrapping and reverse complement in one step #55

Conversation

mbrubeck commented Mar 4, 2017

mbrubeck commented Mar 4, 2017

mbrubeck commented Mar 4, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TeXitoi left a comment

Choose a reason for hiding this comment

cristicbz commented Mar 4, 2017

TeXitoi commented Mar 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbrubeck commented Mar 4, 2017

mbrubeck commented Mar 6, 2017

TeXitoi commented Mar 6, 2017

mbrubeck commented Mar 6, 2017

mbrubeck commented Mar 6, 2017

mbrubeck commented Mar 4, 2017 •

edited