(In short, more data sharing and rewriting code to flow with the data structures.) Let's recap. We began the evening hacking sesion with the union.scad example renering, without parallelism, in: real 0m2.256s user 0m2.116s sys 0m0.140s Our first improvement turned this into: real 0m1.691s user 0m1.612s sys 0m0.080s Then we got: real 0m1.372s user 0m1.300s sys 0m0.072s ... now we're getting ... *drumroll* real 0m0.442s user 0m0.388s sys 0m0.052s Similarily, the car_rim example began at: real 0m9.298s user 0m8.957s sys 0m0.340s It now renders at: real 0m2.196s user 0m2.088s sys 0m0.108s
We generate all obj values and use them for calculuating mids. Recall from last commit, the union.scad example. Before last commit, the following is an example of normal, non-parallel, performance. real 0m2.256s user 0m2.116s sys 0m0.140s The previous commit turned this into: real 0m1.691s user 0m1.612s sys 0m0.080s Now, we acheive: real 0m1.372s user 0m1.300s sys 0m0.072s
After a lot of work and fun with ParallelArrays, we make our functions flow along the lists we're working with... This leads to performance improvements. However, if we do this for calculuating segs*, it hurts performance! Your guess is as good as mine, here... Magic? Some benchmarks: No Parallelism: (union.scad, new) real 0m1.691s user 0m1.612s sys 0m0.080s (union.scad, old) real 0m2.256s user 0m2.116s sys 0m0.140s (car_rim.scad, new) real 0m7.308s user 0m7.044s sys 0m0.260s (car_rim.scad, old) real 0m9.298s user 0m8.957s sys 0m0.340s With Parallism (ec2, +RTS -N8 -RTS): (car_rim.scad, new) real 0m2.572s user 0m12.509s sys 0m2.676s (car_rim.scad, old) real 0m3.256s user 0m17.113s sys 0m3.784s Progess, yay!!
…gCubes Conflicts: Graphics/Implicit/Export/MarchingCubes.hs