Skip to content

Use -fspec-constr-keen GHC option #703

@harendra-kumar

Description

@harendra-kumar

The terminating fold changes (PR #488) cause regression for several benchmarks. One of the benchmarks (copy/read/rawToNull) was found be improved using -fspec-constr-keen option. The following code:

import qualified Streamly.Prelude as S
import qualified Streamly.FileSystem.Handle as FH
import System.IO (openFile, IOMode(..))

main :: IO ()
main = do
    inh <- openFile "benchmark-tmpin-100MB.txt" ReadMode
    outh <- openFile "/dev/null" WriteMode
    S.fold (FH.write outh) (S.unfold FH.read inh)

generates a core where a W8 is getting boxed in a function, in which it is not inspected, but it is inspected in a function called by it. This caused the Word8 to not get unboxed in a tight loop processing bytes in an array.

                jump $wstep3_s9yD
                  sc10_s9Rn
                  sc9_s9Ro
                  sc8_s9Rp
                  sc7_s9Rq
                  sc6_s9Rr
                  (W8# ipv8_a8uK)
                  sc_s9Rx

unboxed in

                  exit1_X19 ww_s9yr
                            ww1_s9yw
                            ww2_s9yx
                            ww3_s9yy
                            ww4_s9yB
                            w_s9yh
                            w1_s9yi
                    = case w_s9yh of { W8# x_a8tC ->

-fspec-constr-keen removes this boxing. However, some benchmarks get worse due to this option. For example (allocations):

Benchmark                                                                                                     default(0)(MiB) default(1) - default(0)(%)
------------------------------------------------------------------------------------------------------------- --------------- --------------------------
FileSystem.Handle/o-1-space/copy/read/group-ungroup/S.interposeSuffix . S.splitOnSuffix(Array Word8) (1/10)             76.23                    +740.47
FileSystem.Handle/o-1-space/copy/read/group-ungroup/UA.unlines . UA.lines (Array Char) (1/10)                          225.39                    +250.78
FileSystem.Handle/o-1-space/reduce/read/S.splitOnSeq "\n" FL.drain                                                     101.44                     +37.00
FileSystem.Handle/o-1-space/reduce/read/S.splitOnSeq "a" FL.drain                                                      101.44                     +36.89
FileSystem.Handle/o-1-space/copy/read/group-ungroup/UA.unwords . UA.words (Array Char) (1/10)                         1574.39                     +22.09
Data.Fold/o-n-heap/serially/elimination/writeN                        0.00                  +Infinity
Data.Fold/o-n-heap/serially/elimination/lastN.Max               1602136.00                    +195.57
Memory.Array/o-1-space/elimination/toStreamRev                              0.00                  +Infinity
Memory.Array/o-1-space/elimination/length . IsList.toList                   0.00                  +Infinity
Memory.Array/o-1-space/elimination/min                                      0.00                  +Infinity
Memory.Array/o-1-space/elimination/<                                        0.00                  +Infinity
Memory.Array/o-1-space/elimination/id                                       0.00                  +Infinity
Memory.Array/o-1-space/generation/writeN . unfoldr                          0.00                  +Infinity
Memory.Array/o-1-space/generation/writeN . intFromTo                        0.00                  +Infinity
Memory.Array/o-1-space/generation/writeN . fromList                   6989224.00                     +29.78
Prelude.Serial/o-1-space/Applicative/(<*) (sqrt n x sqrt n)                     9401184.00                     +32.68
Prelude.Serial/o-n-stack/iterated/filterEven (n/10 x 10)                        4766320.00                     +22.07
Prelude.Serial/o-1-space/Monad/(>>) (sqrt n x sqrt n)                          10392416.00                     +10.23
Prelude.Serial/o-1-space/Applicative/(*>) (sqrt n x sqrt n)                    10392416.00                     +10.23
Prelude.Serial/o-n-heap/buffered/reverse                                        3129384.00                     +66.91
Prelude.Serial/o-n-space/Applicative/(<*) (n times)                            37705048.00                     +14.59

To enable this option by default we need to make sure that there are no significant regressions. Possibly tweak GHC or add some support in fusion-plugin. We could do the following:

  1. Allow keen if the argument is getting unboxed in a child call
  2. Is it possible to detect if spec constr actually helped in reducing allocations, and only then allow it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions