New mandelbrot.rs #2

Schroedingers-Hat · 2015-05-19T00:28:18Z

Still completely green to rust, wound up creating a new version rather than fixing the old because it was easier to understand this way. LLVM seems to vectorise it reasonably well and seems to run in about 2.5x the c time on my system.

TeXitoi · 2015-05-19T07:08:27Z

Thanks for contributing!

Llvm does not autovectorise if you are 2.5 times slower. Simd can only double the performances. We can be only 2 times slower and I hope your implementation will be that fast :)

TeXitoi · 2015-05-19T07:11:08Z

src/mandelbrot.rs

+//
+// contributed by the Rust Project Developers
+// contributed by TeXitoi
+// contributed by Matt Watson


If you do not use the original version, you can remove the rust project line.

My contribution must be the last one as, according to the rules of the benchmark game, the last contributor must be the person that submit the benchmark. Or you can keep this order, but then you must propose yourself the benchmark.

Schroedingers-Hat · 2015-05-19T08:01:12Z

Thanks for the tips, as I said my understanding of rust is nearly nil (and I wouldn't consider myself a very experienced programmer in general).

When I said this (and n_body) were being vectorized it was because i was looking at the emitted ASM. LLVM is often making decent use of SSE3 instructions to save itself time (although it appears that it's quite fragile and broke it at some point as it's now only storing one variable in each xmm).
Here's an example from godbot. It saves some instructions on mbrotpt (may not be optimal), and I added that shift function just to play around.

I'm pretty sure most of the 2.5x slowdown is coming from other overhead such as not caching the xy values, not exploiting symmetry and generally writing code that is a bit noobish.

TeXitoi · 2015-05-19T08:03:11Z

src/mandelbrot.rs

+            std::io::stdout().write(&rows[i]).ok().expect("Could not write to stdout");
+        }
+    }
+    io::stdout().flush().ok().expect("Could not flush stdout");


Stdout will be executed and locked several times. Even if it must not be the bottleneck, we can avoid it.

Calling unwrap on a result is most of the time as good as calling ok().unwrap().

And this loop can be written to avoid array indexing.

let stdout_unlocked = std::io::stdout(); let mut stdout = stdout_unlocked.lock(); for row in handles.into_iter().flat_map(|h| h.join().unwrap().into_iter()) { stdout.write_all(&row).unwrap(); } stdout.flush().unwrap();

TeXitoi · 2015-05-19T08:14:43Z

Shure, llvm does a good job here.

The benchmark rules disallow you to use the symmetry of the image on this benchmark.

Unrolling the loop to compute the mandelbrot suite 2 elements by 2 by using simd has proved in the past a speedup of about 2.

Schroedingers-Hat · 2015-05-19T08:39:44Z

Unrolling the loop to compute the mandelbrot suite 2 elements by 2 by using simd has proved in the past a speedup of about 2.

Ah, that would make a lot more sense. I was trying to convince it to use the vector instructions on the individual complex numbers (it was working more like this when I last looked at the asm), but as you can see it doesn't really save much time and LLVM happily ignores the possibility when it sees ways to optimize elsewhere..

Schroedingers-Hat · 2015-05-19T20:55:05Z

src/mandelbrot.rs

+    let yloc = &yvals;
+
+    let handles: Vec<_> = (0..THREADS).map(|e| {
+        let xloc = xloc.to_vec();


I still don't really understand the whole borrowing/ownership thing, but I this is still only day 3 of actually having written any rust.

What happen is that thread::spawn want to take ownership of xloc and yloc, because it can't take references as it does not know if the reference will live enough. Your solution is to make a copy of the vector to each thread. This extra allocation must not be a bottleneck.

The other solution is to pu xloc and yloc in a Arc (Atomic Reference Counter), and then you can clone() the arc (it increment the ref count, but do not copy the inner object). Like that, you can avoid the allocation and copy of xloc and yloc, and just increment a ref count.

Good! That's what I intended to do. Although I didn't really understand why I wasn't allowed to use the copies I had.

xloc and yloc are small, given that I'm copying around size_size data, 2_size*THREADS didn't seem worth worrying about.

Then, you can remove the xloc and yloc definitions outside of the loop and do let xlock = xvals.clone();

Another pattern that can be used is to declare outside of the loop let xvals = xvals;, it makes xvals immutable. clone() still must be used.

Schroedingers-Hat · 2015-05-19T21:08:11Z

Wrote it again with vectorising across multiple points in mind.
This version edges out the fastest C++ version on my system by about 10% if I set THREADS to 20. I don't know how that will hold up on other CPUs.

Strangely enabling AVX doesn't seem to improve things notably over SSE3 despite the hot loops being noticeably shorter.

TeXitoi · 2015-05-19T21:28:36Z

It looks great!

TeXitoi · 2015-05-19T21:40:57Z

src/mandelbrot.rs

+
+fn mul2 (x: Vecf64, y: Vecf64) -> Vecf64 {
+    let mut res = ZEROS;
+    for i in 0..VLEN {res[i] = x[i] * y[i];}


idiomatic indentation is:

for i in 0..VLEN { res[i] = x[i] * y[i]; }

(with spaces before and after brackets).

Schroedingers-Hat · 2015-05-19T21:49:18Z

There is a downside here in that it uses a lot of memory.
If you could explain why I appear to have allocated double what is necessary (I'm copying something I don't need to?) that'd be great.

TeXitoi · 2015-05-19T21:57:38Z

src/mandelbrot.rs

+        thread::spawn(move || {
+        let mut rows = vec![vec![0 as u8; size / 8]; size / THREADS];
+        for y in 0..size / THREADS {
+            for x in 0..size / 8 {


It's not idiomatic to loop on index and then using [], better to do:

for (y, row) in rows.iter_mut().enumerate() { for (x, byte) in row.iter_mut().enumerate() { ... *byte = mbrot8(cr, ci); } }

I think I'll go away for a bit and do a bit of reading -- there's still something I'm not getting about & and * properly. You're most welcome to make the changes you're suggesting (and any others) and submit.

TeXitoi · 2015-05-19T22:20:17Z

I really like your version :) I couldn't find any useless allocation (except xlock and ylock, but it seems negligeable).

Schroedingers-Hat · 2015-05-19T22:27:16Z

time -v indicates the maximum stack frame is almost exactly double what is needed (to simply store the pbm). Does our use of stdout create a copy before it gets flushed?

TeXitoi · 2015-05-19T22:29:41Z

According to http://doc.rust-lang.org/stable/std/io/struct.Stdout.html stdout is buffered.

TeXitoi · 2015-05-19T23:14:20Z

I'll submit the bench a little bit later, so if you have any idea, don't hesitate to submit a new PR. Thanks again!

Schroedingers-Hat · 2015-05-19T23:33:32Z

I was thinking I'd have a look at n_body and then maybe the spectrum one.

TeXitoi · 2015-05-19T23:35:23Z

I'm looking at spectralnorm right now

New mandelbrot.rs

4eecc27

TeXitoi reviewed May 19, 2015
View reviewed changes

Another near-complete rewrite. Now with easy-to-vectorise code.

10bf179

Schroedingers-Hat reviewed May 19, 2015
View reviewed changes

Thread count was too low.

01bdd34

TeXitoi reviewed May 19, 2015
View reviewed changes

TeXitoi merged commit 01bdd34 into TeXitoi:master May 19, 2015

mbrubeck added a commit to mbrubeck/benchmarksgame-rs that referenced this pull request Mar 14, 2017

fasta TeXitoi#2 by Veedrac as accepted by benchmarksgame

729f887

TeXitoi pushed a commit that referenced this pull request Mar 15, 2017

fasta #2 by Veedrac as accepted by benchmarksgame

73b8202

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New mandelbrot.rs #2

New mandelbrot.rs #2

Schroedingers-Hat commented May 19, 2015

TeXitoi commented May 19, 2015

TeXitoi May 19, 2015

Schroedingers-Hat commented May 19, 2015

TeXitoi May 19, 2015

TeXitoi May 19, 2015

TeXitoi commented May 19, 2015

Schroedingers-Hat commented May 19, 2015

Schroedingers-Hat May 19, 2015

TeXitoi May 19, 2015

Schroedingers-Hat May 19, 2015

TeXitoi May 19, 2015

Schroedingers-Hat commented May 19, 2015

TeXitoi commented May 19, 2015

TeXitoi May 19, 2015

Schroedingers-Hat commented May 19, 2015

TeXitoi May 19, 2015

Schroedingers-Hat May 19, 2015

TeXitoi commented May 19, 2015

Schroedingers-Hat commented May 19, 2015

TeXitoi commented May 19, 2015

TeXitoi commented May 19, 2015

Schroedingers-Hat commented May 19, 2015

TeXitoi commented May 19, 2015

New mandelbrot.rs #2

New mandelbrot.rs #2

Conversation

Schroedingers-Hat commented May 19, 2015

TeXitoi commented May 19, 2015

Choose a reason for hiding this comment

Schroedingers-Hat commented May 19, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TeXitoi commented May 19, 2015

Schroedingers-Hat commented May 19, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Schroedingers-Hat commented May 19, 2015

TeXitoi commented May 19, 2015

Choose a reason for hiding this comment

Schroedingers-Hat commented May 19, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TeXitoi commented May 19, 2015

Schroedingers-Hat commented May 19, 2015

TeXitoi commented May 19, 2015

TeXitoi commented May 19, 2015

Schroedingers-Hat commented May 19, 2015

TeXitoi commented May 19, 2015