hcat is really terrible performance-wise #28

fryguybob · 2012-08-04T18:32:44Z

(Imported from http://code.google.com/p/diagrams/issues/detail?id=57. Original issue from felipe.l...@gmail.com on November 2, 2011, 06:27:13 PM UTC)

Hello!

I'm sorry that I didn't do my homework to investigate the "why", but I've included a very simple test case. You may compile and run the test as follows:

$ ghc -O --make -fforce-recomp -rtsopts bug.hs
$ ./bug -o t.png --selection=??? +RTS -s

where ??? may be "hcat", "foldl1", "foldr1" or "foldTree". The program just concatenates 1,000 (rect 1 1)s. On my computer, I get the following:

hcat: 23s, 130 MiB
foldr1: 30s, 69 MiB
foldl1: 27s, 62 MiB
foldTree: 1s, 7 MiB

Concatenating one thousand unit squares shouldn't take more than 20 seconds =). This is a showstopper for me, so I've reimplemented hcat (see hcatB at the end of [1])

Cheers!

[1] https://patch-tag.com/r/felipe/hierarchical-clustering-diagrams/snapshot/hash/20111102171621-0b5ec-1f98cd6a796d46b46f54dc2e168e826f459572ea/content/pretty/src/Diagrams/Dendrogram.hs

fryguybob · 2012-08-04T18:32:45Z

(Imported. Original comment by felipe.l...@gmail.com on November 2, 2011, 06:33:47 PM UTC)

Perhaps the problem is with (|||), which behaves has these worst cases. But fixing hcat may be easier, and maybe even profitable on a better world where (|||) too has better performance.

fryguybob · 2012-08-04T18:32:45Z

(Imported. Original comment by byor...@gmail.com on November 3, 2011, 02:35:31 AM UTC)

Thanks for the report! And nice dendrogram package. =)

I've changed the implementation of cat' to use a balanced fold so now cat', cat, hcat, and vcat are all much faster (running bug.hs with 'hcat' is down to 0.5s / 7MB on my machine, the same time+memory as 'foldTree'). I also improved the implementation of 'beside' (and hence of (===) and (|||)), which brought the running time of 'foldr1' on my machine down from about 30s to 15s.

Wed Nov 2 22:12:46 EDT 2011 Brent Yorgey byorgey@cis.upenn.edu

D.Combinators: implement cat' using a balanced folding scheme (fixes Add NullBackend for Ellipsoid #57)

Thanks to Felipe Lessa for the idea. The usual linear fold was O(n^2)
because appending each new diagram to the accumulated list has to
query its bounding function, which is built up as the iterated maximum
of the O(n) bounding functions of the component diagrams. Doing a
balanced fold is thus only O(n log n) (and seems to give a big speedup
in practice). There is also probably more room for optimizing
computation of bounding functions (using memoization and/or retaining
a bounding box for the common special cases of querying in a direction
parallel to an axis) but that can be left to future work.

The new implementation is also considerably prettier and shorter than
the old, so it's a win all around. =)

Wed Nov 2 17:43:21 EDT 2011 Brent Yorgey byorgey@cis.upenn.edu

improve performance of 'beside'
Instead of doing an alignment on both arguments, superimposing, then translating
the origin back, we just compute the proper offset, translate the second diagram,
and superimpose. Leads to much better performance when the bounding functions
are consulted later.

fryguybob · 2012-08-04T18:32:45Z

(Imported. Original comment by felipe.l...@gmail.com on November 3, 2011, 02:46:27 AM UTC)

Thanks a lot for being so fast on fixing this issue! =)

fryguybob closed this as completed Aug 4, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hcat is really terrible performance-wise #28

hcat is really terrible performance-wise #28

fryguybob commented Aug 4, 2012

fryguybob commented Aug 4, 2012

fryguybob commented Aug 4, 2012

fryguybob commented Aug 4, 2012

hcat is really *terrible* performance-wise #28

hcat is really *terrible* performance-wise #28

Comments

fryguybob commented Aug 4, 2012

fryguybob commented Aug 4, 2012

fryguybob commented Aug 4, 2012

fryguybob commented Aug 4, 2012

hcat is really terrible performance-wise #28

hcat is really terrible performance-wise #28