The terminating fold changes (PR #488) cause regression for several benchmarks. One of the benchmarks (copy/read/rawToNull) was found be improved using -fspec-constr-keen option. The following code:
import qualified Streamly.Prelude as S
import qualified Streamly.FileSystem.Handle as FH
import System.IO (openFile, IOMode(..))
main :: IO ()
main = do
inh <- openFile "benchmark-tmpin-100MB.txt" ReadMode
outh <- openFile "/dev/null" WriteMode
S.fold (FH.write outh) (S.unfold FH.read inh)
generates a core where a W8 is getting boxed in a function, in which it is not inspected, but it is inspected in a function called by it. This caused the Word8 to not get unboxed in a tight loop processing bytes in an array.
jump $wstep3_s9yD
sc10_s9Rn
sc9_s9Ro
sc8_s9Rp
sc7_s9Rq
sc6_s9Rr
(W8# ipv8_a8uK)
sc_s9Rx
unboxed in
exit1_X19 ww_s9yr
ww1_s9yw
ww2_s9yx
ww3_s9yy
ww4_s9yB
w_s9yh
w1_s9yi
= case w_s9yh of { W8# x_a8tC ->
-fspec-constr-keen removes this boxing. However, some benchmarks get worse due to this option. For example (allocations):
Benchmark default(0)(MiB) default(1) - default(0)(%)
------------------------------------------------------------------------------------------------------------- --------------- --------------------------
FileSystem.Handle/o-1-space/copy/read/group-ungroup/S.interposeSuffix . S.splitOnSuffix(Array Word8) (1/10) 76.23 +740.47
FileSystem.Handle/o-1-space/copy/read/group-ungroup/UA.unlines . UA.lines (Array Char) (1/10) 225.39 +250.78
FileSystem.Handle/o-1-space/reduce/read/S.splitOnSeq "\n" FL.drain 101.44 +37.00
FileSystem.Handle/o-1-space/reduce/read/S.splitOnSeq "a" FL.drain 101.44 +36.89
FileSystem.Handle/o-1-space/copy/read/group-ungroup/UA.unwords . UA.words (Array Char) (1/10) 1574.39 +22.09
Data.Fold/o-n-heap/serially/elimination/writeN 0.00 +Infinity
Data.Fold/o-n-heap/serially/elimination/lastN.Max 1602136.00 +195.57
Memory.Array/o-1-space/elimination/toStreamRev 0.00 +Infinity
Memory.Array/o-1-space/elimination/length . IsList.toList 0.00 +Infinity
Memory.Array/o-1-space/elimination/min 0.00 +Infinity
Memory.Array/o-1-space/elimination/< 0.00 +Infinity
Memory.Array/o-1-space/elimination/id 0.00 +Infinity
Memory.Array/o-1-space/generation/writeN . unfoldr 0.00 +Infinity
Memory.Array/o-1-space/generation/writeN . intFromTo 0.00 +Infinity
Memory.Array/o-1-space/generation/writeN . fromList 6989224.00 +29.78
Prelude.Serial/o-1-space/Applicative/(<*) (sqrt n x sqrt n) 9401184.00 +32.68
Prelude.Serial/o-n-stack/iterated/filterEven (n/10 x 10) 4766320.00 +22.07
Prelude.Serial/o-1-space/Monad/(>>) (sqrt n x sqrt n) 10392416.00 +10.23
Prelude.Serial/o-1-space/Applicative/(*>) (sqrt n x sqrt n) 10392416.00 +10.23
Prelude.Serial/o-n-heap/buffered/reverse 3129384.00 +66.91
Prelude.Serial/o-n-space/Applicative/(<*) (n times) 37705048.00 +14.59
To enable this option by default we need to make sure that there are no significant regressions. Possibly tweak GHC or add some support in fusion-plugin. We could do the following:
- Allow keen if the argument is getting unboxed in a child call
- Is it possible to detect if spec constr actually helped in reducing allocations, and only then allow it?
The terminating fold changes (PR #488) cause regression for several benchmarks. One of the benchmarks (copy/read/rawToNull) was found be improved using
-fspec-constr-keenoption. The following code:generates a core where a W8 is getting boxed in a function, in which it is not inspected, but it is inspected in a function called by it. This caused the Word8 to not get unboxed in a tight loop processing bytes in an array.
unboxed in
-fspec-constr-keenremoves this boxing. However, some benchmarks get worse due to this option. For example (allocations):To enable this option by default we need to make sure that there are no significant regressions. Possibly tweak GHC or add some support in fusion-plugin. We could do the following: