subrule: Fixed a huge use after scope bug #284

Kojoley · 2017-11-20T21:26:58Z

It took me 3 days of pulling my hair to debug and fix the bug. The subrule's author put a big bomb under it and I caught the exposition only in a single test, with modern versions of Clang/GCC and enabled optimizations (most of boost regression test runners does not use variant=release). Valgrind did not catch any problems. Enabling address sanitizer causes the bug to disappear (now I understand why, it places big gaps between stack pointers). -fsanitize-address-use-after-scope catches the bug, but I have discovered that it is turned off in Clang by default after already having a repro.

The bug could be easily reproduced if you use any parser that invalidates it's state in the destructor.

I used literal_char with added ~literal_char() { ch = char_type(0); }.

Reproduction code:

    rule<char const*> r;
    subrule<0> entry;
    r = (entry = 'a');
    BOOST_TEST(test("a", r));

It will fail because after assignment to r a temporary subrule_group is destroyed and rule is binded to the already destroyed object.

The cause is in usage of reference parser. I am 100% sure the code was copy-pasted from rule/grammar. They store an expression within their body, while subrule actually used only as a placeholder.

I have split subrule_group into an actual parser/generator that stores the expression and a Proto terminal that contains the parser/generator.

Tested on:

VS 2008, 2010, 2015, 2017
Clang 3.8, 3.9, 4.0, 5.0
GCC 4.8, 7

It took me 3 days of pulling my hair to debug and fix the bug. The subrule's author put a big bomb under it and I caught the exposition only in a single test, with modern versions of Clang/GCC and enabled optimizations (most of boost regression test runners does not use `variant=release`). Valgrind did not catch any problems. Enabling address sanitizer causes the bug to disappear (now I understand why, it places big gaps between stack pointers). `-fsanitize-address-use-after-scope` catches the bug, but I have discovered that it is turned off in Clang by default after already having a repro. The bug could be easily reproduced if you use any parser that invalidates it's state in the destructor. I used `literal_char` with added `~literal_char() { ch = char_type(0); }`. Reproduction code: ```cpp rule<char const*> r; subrule<0> entry; r = (entry = 'a'); BOOST_TEST(test("a", r)); ``` It will fail because after assignment to `r` a temporary `subrule_group` is destroyed and rule is binded to the already destroyed object. The cause is in usage of `reference` parser. I am 100% sure the code was copy-pasted from `rule`/`grammar`. They store an expression within their body, while `subrule` actually used only as a placeholder. I have split `subrule_group` into an actual parser/generator that stores the expression and a Proto terminal that contains the parser/generator. Tested on: - VS 2008, 2010, 2015, 2017 - Clang 3.8, 3.9, 4.0, 5.0 - GCC 4.8, 7

Kojoley · 2017-11-21T00:02:03Z

I am going to ping the original subrules author @frabar666. Maybe he can help in reviewing the PR.

djowel · 2017-11-21T07:10:51Z

Oh my... Hmmm... do you think the subrule is still worth keeping? I wonder if anyone is using it. I don't.

frabar666 · 2017-11-21T07:31:45Z

Ouch... Indeed, much of the code was copied from grammar, and also from classic's subrule. I haven't worked with Spirit for years, and I'm afraid I don't remember enough about Spirit and Proto to give any useful feedback, sorry.

I remember @djowel already mentioning years ago that the subrule mechanism was probably no longer useful in Qi/Karma (it was there in Classic for performance reasons, which no longer apply in Qi/Karma)... no objection here if you want to retire it.

Kojoley · 2017-11-21T16:14:11Z

I have made some benchmarks on VS 2017.

`mini_xml2.cpp` vs `mini_xml2_sr.cpp`

290Kb

--------------------------------------------------
Benchmark           Time           CPU Iterations
--------------------------------------------------
BM_rule      12134563 ns   12535795 ns         56   22.0134M items/s
BM_subrule   12367610 ns   11700075 ns         56   23.5858M items/s

114Mb xml

--------------------------------------------------
Benchmark           Time           CPU Iterations
--------------------------------------------------
BM_rule    9128121686 ns 9001257700 ns          1   12.6617M items/s
BM_subrule 9123333133 ns 9032457900 ns          1   12.6179M items/s

It looks like subrules do not improve perfromance, but mini_xml2 heavily uses dynamic allocations and memory copy.

`start = (entry = *(a | b | c), a = 'a', b = 'b', c = 'c');`

--------------------------------------------------------
Benchmark                 Time           CPU Iterations
--------------------------------------------------------
BM_rule/8               554 ns        558 ns    1258407   123.087M items/s rules
BM_rule/64             2970 ns       2774 ns     213674   197.999M items/s rules
BM_rule/512           22194 ns      22389 ns      32051   196.277M items/s rules
BM_rule/4096         176862 ns     172102 ns       4079   204.276M items/s rules
BM_rule/32768       1403790 ns    1375560 ns        499   204.462M items/s rules
BM_rule/65536       2846764 ns    2836382 ns        264   198.316M items/s rules
BM_rule_BigO           2.41 N       2.39 N  rules
BM_rule_RMS               1 %          2 %  rules
BM_subrule/8            298 ns        337 ns    2039614   204.033M items/s subrules
BM_subrule/64          1051 ns       1064 ns     747858    516.35M items/s subrules
BM_subrule/512         7063 ns       7092 ns     112179   619.622M items/s subrules
BM_subrule/4096       55553 ns      55625 ns      11218    632.02M items/s subrules
BM_subrule/32768     447610 ns     447664 ns       1603   628.262M items/s subrules
BM_subrule/65536     893906 ns     875941 ns        748   642.166M items/s subrules
BM_subrule_BigO        0.76 N       0.75 N  subrules
BM_subrule_RMS            0 %          2 %  subrules

This simple test shows that rules have significant call overhead. So if you have a lot of rules the performance boost from subrules could be huge, especially if your parser just validates the input.

Kojoley · 2017-11-21T16:40:31Z

I wonder if anyone is using it.

Subrules were broken for about 2 years and never worked on c++11 (#259) so will not expect they are exist, but I think many people tried:

I just found these links and can confirm the problems described in 4 and 5 before this patch, while with it they are fixed.

djowel · 2017-11-21T21:23:07Z

Very thoughtful of you to do all these benchmarks! I wonder how X3 fares in comparison and I think for performance concerns such as call overhead, I'd go for X3 instead. The design of X3 is based on the subrules anyway.

Kojoley · 2017-11-21T22:46:18Z

I wonder how X3 fares in comparison

-----------------------------------------------------------
Benchmark                    Time           CPU Iterations
-----------------------------------------------------------
BM_qi_rule/8               366 ns        350 ns    1869646   195.936M items/s qi_rule
BM_qi_rule/64             2753 ns       2753 ns     249286   199.499M items/s qi_rule
BM_qi_rule/512           21786 ns      21903 ns      32051   200.638M items/s qi_rule
BM_qi_rule/4096         171558 ns     172102 ns       4079   204.276M items/s qi_rule
BM_qi_rule/32768       1371057 ns    1344297 ns        499   209.217M items/s qi_rule
BM_qi_rule/65536       2742350 ns    2756644 ns        249   204.052M items/s qi_rule
BM_qi_rule_BigO           2.32 N       2.33 N  qi_rule
BM_qi_rule_RMS               0 %          2 %  qi_rule
BM_qi_subrule/8            105 ns        106 ns    7478585   645.438M items/s qi_subrule
BM_qi_subrule/64           755 ns        765 ns     897430   718.196M items/s qi_subrule
BM_qi_subrule/512         6001 ns       5980 ns     112179     734.9M items/s qi_subrule
BM_qi_subrule/4096       47801 ns      47281 ns      11218   743.553M items/s qi_subrule
BM_qi_subrule/32768     381998 ns     382398 ns       1795    735.49M items/s qi_subrule
BM_qi_subrule/65536     761447 ns     765222 ns        897   735.081M items/s qi_subrule
BM_qi_subrule_BigO        0.65 N       0.65 N  qi_subrule
BM_qi_subrule_RMS            0 %          0 %  qi_subrule
BM_x3/8                    100 ns        100 ns    7478585   685.778M items/s x3
BM_x3/64                   748 ns        747 ns     897430   734.898M items/s x3
BM_x3/512                 5941 ns       5980 ns     112179     734.9M items/s x3
BM_x3/4096               47482 ns      47978 ns      14957    732.76M items/s x3
BM_x3/32768             379852 ns     383746 ns       1870   732.907M items/s x3
BM_x3/65536             760884 ns     765222 ns        897   735.081M items/s x3
BM_x3_BigO                0.64 N       0.65 N  x3
BM_x3_RMS                    0 %          0 %  x3

The numbers are different, because for previous benchmark I used grammar and because x3 does not have it I have rewritten the benchmark to use just rules.

Benchmark code

#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_subrule.hpp>

#include <benchmark/benchmark.h>

#include <string>


struct Helper
{
    Helper(benchmark::State & state_, std::string && label)
      : state(state_)
      , data()
    {
        for (auto i = 0; i < state.range(0); ++i)
            data = data + "abcbcacab";

        state.SetLabel(std::move(label));
    }

    ~Helper()
    {
        // complexity for "abc" in compare (`lit`) calls is
        // a: 1, b: 2, c: 3; "abc" repeats 3 times
        state.SetComplexityN(state.range(0) * (1 + 2 + 3) * 3);
        state.SetItemsProcessed(state.iterations() * data.size());
    }


    benchmark::State & state;
    std::string data;
};

static void BM_qi_subrule(benchmark::State & state)
{
    using boost::spirit::qi::rule;
    using boost::spirit::repository::qi::subrule;
    using boost::spirit::qi::parse;

    Helper helper(state, "qi_subrule");
    auto const iter = helper.data.begin();
    auto const end = helper.data.end();
    using Iterator = decltype(helper.data.end());

    subrule<99> entry;
    subrule<42> a;
    subrule<48> b;
    subrule<16> c;
    rule<Iterator> start;

    start = (
        entry = *(a | b | c)
      , a = 'a'
      , b = 'b'
      , c = 'c'
    );

    for (auto _ : state) {
        benchmark::DoNotOptimize(parse(iter, end, start));
    }
}

static void BM_qi_rule(benchmark::State & state)
{
    using boost::spirit::qi::rule;
    using boost::spirit::qi::parse;

    Helper helper(state, "qi_rule");
    auto const iter = helper.data.begin();
    auto const end = helper.data.end();
    using Iterator = decltype(helper.data.end());

    rule<Iterator> a, b, c, start;

    a = 'a';
    b = 'b';
    c = 'c';

    start = *(a | b | c);

    for (auto _ : state) {
        benchmark::DoNotOptimize(parse(iter, end, start));
    }
}

static void BM_x3(benchmark::State & state)
{
    using boost::spirit::x3::lit;
    using boost::spirit::x3::rule;
    using boost::spirit::x3::parse;

    auto a = lit('a');
    auto b = lit('b');
    auto c = lit('c');

    rule<class r> r;
    auto start =
        r = *(a | b | c);

    Helper helper(state, "x3");
    auto const iter = helper.data.begin();
    auto const end = helper.data.end();
    using Iterator = decltype(helper.data.end());

    for (auto _ : state) {
        benchmark::DoNotOptimize(parse(iter, end, start));
    }
}

#define MAKE_BENCH(name) \
  BENCHMARK(name) \
    /*->RangeMultiplier(2)*/ \
    ->Range(8, 8 << 13) \
    ->Complexity(benchmark::oN) \
  /**/

MAKE_BENCH(BM_qi_rule);
MAKE_BENCH(BM_qi_subrule);
MAKE_BENCH(BM_x3);

BENCHMARK_MAIN()

Note: Code will note compile because of #257, I will open a PR for it on the week.

djowel · 2017-11-21T22:59:58Z

Very nice! You should add that in the spirit/workbench directory along with the other tests.

Kojoley · 2017-11-21T23:21:36Z

You should add that in the spirit/workbench directory along with the other tests.

It uses google benchmark library. Is this okay?

And what about this PR? I think it is safe to merge because tests pass and even if somehow the PR introduces a regression no one will notice this because subrules are currently in a broken state (what PR should fix) and no-one uses them 😃.

djowel · 2017-11-21T23:27:25Z

Hmmm.. probably not. Boost should not require any dependencies.

Kojoley force-pushed the fix-subrule-use-after-scope branch from a4b5c74 to e357244 Compare November 20, 2017 21:31

Kojoley force-pushed the fix-subrule-use-after-scope branch from e357244 to b174d1f Compare November 20, 2017 21:38

djowel merged commit 2a6e8f7 into boostorg:develop Nov 21, 2017

Kojoley deleted the fix-subrule-use-after-scope branch November 21, 2017 23:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subrule: Fixed a huge use after scope bug #284

subrule: Fixed a huge use after scope bug #284

Kojoley commented Nov 20, 2017 •

edited

Loading

Kojoley commented Nov 21, 2017

djowel commented Nov 21, 2017

frabar666 commented Nov 21, 2017 •

edited

Loading

Kojoley commented Nov 21, 2017

Kojoley commented Nov 21, 2017 •

edited

Loading

djowel commented Nov 21, 2017

Kojoley commented Nov 21, 2017

djowel commented Nov 21, 2017

Kojoley commented Nov 21, 2017

djowel commented Nov 21, 2017

subrule: Fixed a huge use after scope bug #284

subrule: Fixed a huge use after scope bug #284

Conversation

Kojoley commented Nov 20, 2017 • edited Loading

Kojoley commented Nov 21, 2017

djowel commented Nov 21, 2017

frabar666 commented Nov 21, 2017 • edited Loading

Kojoley commented Nov 21, 2017

mini_xml2.cpp vs mini_xml2_sr.cpp

114Mb xml

start = (entry = *(a | b | c), a = 'a', b = 'b', c = 'c');

Kojoley commented Nov 21, 2017 • edited Loading

djowel commented Nov 21, 2017

Kojoley commented Nov 21, 2017

djowel commented Nov 21, 2017

Kojoley commented Nov 21, 2017

djowel commented Nov 21, 2017

Kojoley commented Nov 20, 2017 •

edited

Loading

frabar666 commented Nov 21, 2017 •

edited

Loading

`mini_xml2.cpp` vs `mini_xml2_sr.cpp`

`start = (entry = *(a | b | c), a = 'a', b = 'b', c = 'c');`

Kojoley commented Nov 21, 2017 •

edited

Loading