Skip to content

optin optimizations

b3b00 edited this page Apr 23, 2024 · 4 revisions

Opt-In optimizations (from v3.1.0)

CSLY comes with some optimizations that can be enabled for parsers. Theses optimizations are not always relevant according to your parser structure. An optimization may produce interesting results for an expression grammar and causes performance losses for a JSON parser (JSON is relatively easy to parse and does not need any special optimization to be fast enough)

Optimizations are enabled with the help of C# attributes on the parser class. Many optimizations can be used simultaneously.

For now 2 optimizations are available.

Use Memoization

The idea of this optimization is to cache intermediary parsing result to limit backtracking, at the cost of memory use. When a grammar does not need too much bactracking this optimization might be a bad fit.

The C# attribute is UseMemoization

[UseMemoization]
public class MyParser {
    // parser definition
}

Broaden token window

Natively CSLY only use one token to choose production rules. With this optimization it can look ahead 2 tokens in some cases which might limit backtracking. Once again this optimization must be tested as it may not be a good choice

This optimization may be dwarfed by the Use Memoization optimization.

The C# attribute is BroadenTokenWindow

[BroadenTokenWindow]
public class MyParser {
    // parser definition
}

sample benchmarking

here are the results of some benchmarking with both optimizations. It clearly shows that optimizations produce different results depending on grammar.

json parser

Method UseMemoization BroadenTokenWindow Mean Error StdDev Gen0 Gen1 Gen2 Allocated
TestJson False False 88.18 ms 1.689 ms 1.579 ms 9833.3333 4000.0000 1500.0000 56.26 MB
TestJson False True 87.36 ms 1.686 ms 2.524 ms 10000.0000 3500.0000 1333.3333 58.68 MB
TestJson True False 152.28 ms 5.975 ms 17.143 ms 12000.0000 4750.0000 2250.0000 68.88 MB
TestJson True True 164.18 ms 4.283 ms 12.495 ms 12333.3333 4666.6667 2000.0000 71.29 MB

JSON is a highly structured language. Some tokens like {or [ uniquely denotes structures which leads to limited backtracking. Hence both Use Memoization and [toekn window broadening] do not add any performance. In fact they cause performance loss due to memory consumption (memoization) or processing time (token window)

indented while

Method Memoize Broaden Mean Error StdDev Median Gen0 Gen1 Gen2 Allocated
TestWhile False False 2.864 ms 0.1012 ms 0.2788 ms 2.780 ms 964.8438 304.6875 - 4.2 MB
TestWhile False True 2.963 ms 0.1412 ms 0.3816 ms 2.883 ms 949.2188 304.6875 - 4.24 MB
TestWhile True False 7.963 ms 0.2302 ms 0.6677 ms 7.777 ms 921.8750 453.1250 218.7500 5.62 MB
TestWhile True True 9.692 ms 0.7434 ms 2.1330 ms 9.460 ms 937.5000 421.8750 203.1250 5.65 MB

As for JSON While language is a structured language. So the same as JSON is observed.

parser from #414

Method UseMemoization BroadenTokenWindow Mean Error StdDev Median Gen0 Gen1 Gen2 Allocated
TestBackTrack False False 927.96 ms 33.831 ms 94.307 ms 898.35 ms 183000.0000 39000.0000 7000.0000 978.48 MB
TestBackTrack False True 890.39 ms 27.614 ms 79.673 ms 878.01 ms 180000.0000 39000.0000 8000.0000 971.88 MB
TestBackTrack True False 32.14 ms 1.680 ms 4.849 ms 31.75 ms 2000.0000 2000.0000 1666.6667 23.81 MB
TestBackTrack True True 32.33 ms 1.964 ms 5.698 ms 31.63 ms 1785.7143 1714.2857 1428.5714 23.8 MB

This parser is mainly an expression parser that does not use the expression parsing feature. It causes heavy backtracking because of the many precednece levels. Use Memoization limits the baktracking and toekn window broadening does not add evident benefice.