Use -fspec-constr-keen GHC option

The terminating fold changes (PR #488) cause regression for several benchmarks. One of the benchmarks (copy/read/rawToNull) was found be improved using `-fspec-constr-keen` option. The following code:

```
import qualified Streamly.Prelude as S
import qualified Streamly.FileSystem.Handle as FH
import System.IO (openFile, IOMode(..))

main :: IO ()
main = do
    inh <- openFile "benchmark-tmpin-100MB.txt" ReadMode
    outh <- openFile "/dev/null" WriteMode
    S.fold (FH.write outh) (S.unfold FH.read inh)
```

generates a core where a W8 is getting boxed in a function, in which it is not inspected, but it is inspected in a function called by it. This caused the Word8 to not get unboxed  in a tight loop processing bytes in an array. 

```
                jump $wstep3_s9yD
                  sc10_s9Rn
                  sc9_s9Ro
                  sc8_s9Rp
                  sc7_s9Rq
                  sc6_s9Rr
                  (W8# ipv8_a8uK)
                  sc_s9Rx
```

unboxed in

```
                  exit1_X19 ww_s9yr
                            ww1_s9yw
                            ww2_s9yx
                            ww3_s9yy
                            ww4_s9yB
                            w_s9yh
                            w1_s9yi
                    = case w_s9yh of { W8# x_a8tC ->
```

`-fspec-constr-keen` removes this boxing. However, some benchmarks get worse due to this option. For example (allocations):

```
Benchmark                                                                                                     default(0)(MiB) default(1) - default(0)(%)
------------------------------------------------------------------------------------------------------------- --------------- --------------------------
FileSystem.Handle/o-1-space/copy/read/group-ungroup/S.interposeSuffix . S.splitOnSuffix(Array Word8) (1/10)             76.23                    +740.47
FileSystem.Handle/o-1-space/copy/read/group-ungroup/UA.unlines . UA.lines (Array Char) (1/10)                          225.39                    +250.78
FileSystem.Handle/o-1-space/reduce/read/S.splitOnSeq "\n" FL.drain                                                     101.44                     +37.00
FileSystem.Handle/o-1-space/reduce/read/S.splitOnSeq "a" FL.drain                                                      101.44                     +36.89
FileSystem.Handle/o-1-space/copy/read/group-ungroup/UA.unwords . UA.words (Array Char) (1/10)                         1574.39                     +22.09
```

```
Data.Fold/o-n-heap/serially/elimination/writeN                        0.00                  +Infinity
Data.Fold/o-n-heap/serially/elimination/lastN.Max               1602136.00                    +195.57
```

```
Memory.Array/o-1-space/elimination/toStreamRev                              0.00                  +Infinity
Memory.Array/o-1-space/elimination/length . IsList.toList                   0.00                  +Infinity
Memory.Array/o-1-space/elimination/min                                      0.00                  +Infinity
Memory.Array/o-1-space/elimination/<                                        0.00                  +Infinity
Memory.Array/o-1-space/elimination/id                                       0.00                  +Infinity
Memory.Array/o-1-space/generation/writeN . unfoldr                          0.00                  +Infinity
Memory.Array/o-1-space/generation/writeN . intFromTo                        0.00                  +Infinity
Memory.Array/o-1-space/generation/writeN . fromList                   6989224.00                     +29.78
```

```
Prelude.Serial/o-1-space/Applicative/(<*) (sqrt n x sqrt n)                     9401184.00                     +32.68
Prelude.Serial/o-n-stack/iterated/filterEven (n/10 x 10)                        4766320.00                     +22.07
Prelude.Serial/o-1-space/Monad/(>>) (sqrt n x sqrt n)                          10392416.00                     +10.23
Prelude.Serial/o-1-space/Applicative/(*>) (sqrt n x sqrt n)                    10392416.00                     +10.23
Prelude.Serial/o-n-heap/buffered/reverse                                        3129384.00                     +66.91
Prelude.Serial/o-n-space/Applicative/(<*) (n times)                            37705048.00                     +14.59
```

To enable this option by default we need to make sure that there are no significant regressions. Possibly tweak GHC or add some support in fusion-plugin. We could do the following:

1) Allow keen if the argument is getting unboxed in a child call
2) Is it possible to detect if spec constr actually helped in reducing allocations, and only then allow it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use -fspec-constr-keen GHC option #703

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use -fspec-constr-keen GHC option #703

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions