Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support nothrow expect #242

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

wanghan02
Copy link
Contributor

@wanghan02 wanghan02 commented Sep 5, 2017

Throwing expectation_failure is so much slower (around 1000 times with clang -O2 on my mac) than the equivalent parser with sequence operator. It will increase the parsing performance if we are giving the option to use expect operator without throwing exceptions yet still keep its semantics. It has the same speed in a simple straightforward parser with the equivalent parser using sequence operator. It becomes faster when the parser has alternatives.

  1. Allow user to use expect operator more aggressively without any performance loss brought by throwing expectation_failure. When nothrow expect is used, an expectation failure simply fails the whole parser without throwing expectation_failure.
  2. Expect operator throwing expectation_failure is still the default way to handle expectation failures. Users can inject a false as default value to your parser's context to enable nothrow expect.
    x3::parse(input.begin(), input.end(), x3::with<x3::expectation_failure_tag>(false)[...]);
  3. If expect operator is used and fails in skipper parser, the main parser should not fail (The skipper parser is a totally different parser with a different context. We should use it only to skip. If the skipper parser fails, it should mean no more skipping and should return to the main parser). We use nothrow expect as the default way to call skipper parser.
  4. The default implementation injects a bool into the context and it carries the expectation failure information. Users can also overload handle_expectation_failure_impl and has_expectation_failure_impl functions with e.g. boost::optional to carry more information.
  5. Add unit test for nothrow expect to test all existing parsers.

@djowel
Copy link
Member

djowel commented Sep 5, 2017

What's the reason for using -O2 instead of -O3? Can you provide a benchmark test program that we can try that shows these numbers? How about other compilers, e.g. g++?

@djowel
Copy link
Member

djowel commented Sep 5, 2017

Also, if you are going to just fail the parse anyway, what's the advantage of using a > b instead of just a >> b ?

@wanghan02
Copy link
Contributor Author

#include <iostream>
#include <cassert>
#include <chrono>
#include <boost/spirit/home/x3.hpp>

namespace x3 = boost::spirit::x3;

constexpr int len = 10;

template<typename Iterator, typename Expr>
bool parse_(Iterator& first, Iterator const& last, Expr const& expr) {
    try {
        return x3::parse(first, last, expr);
    } catch (x3::expectation_failure<Iterator> const&) {}
    return false;
}

template<typename Expr>
int Test(std::string const& input, Expr const& expr) {
    int n = 0;
    auto const last = boost::end(input);
    for(int i = 0; i < len; ++i) {
        auto first = boost::begin(input) + i;
        if(parse_(first, last, expr))
            ++n;
    }
    return n;
}


int main() {
    std::string const input(len, 'a');
    auto t0 = std::chrono::high_resolution_clock::now();
    int n1 = Test(input, x3::lit('a') >> 'b');
    auto t1 = std::chrono::high_resolution_clock::now();
    int n2 = Test(input, x3::lit('a') > 'b');
    auto t2 = std::chrono::high_resolution_clock::now();
    
    std::chrono::duration<int64_t,std::nano> dur0 = t1 - t0;
    std::chrono::duration<int64_t,std::nano> dur1 = t2 - t1;
	
    std::cout<<"Test with sequence    takes "<<dur0.count()<<" nano seconds"<<std::endl;
    std::cout<<"Test with expectation takes "<<dur1.count()<<" nano seconds"<<std::endl;
    assert(n1 == n2);
}

@wanghan02
Copy link
Contributor Author

wanghan02 commented Sep 6, 2017

The example above is a simple case that will fail len times at the second part lit('b'). I tried it with different compilers and platforms.

  1. VC2017 with release mode: expect is around 200-300 times slower on my pc.
  2. clang with -O3: expect is around 500 times slower on my mac (1000 times slower if len is 100), 80-100 times slower on coliru.
  3. g++ with -O3: expect is around 80-100 times slower on coliru.

http://coliru.stacked-crooked.com/a/e87a4d5490ee949e

@wanghan02
Copy link
Contributor Author

wanghan02 commented Sep 6, 2017

The advantage of expectation over sequence is when we have many alternatives. For example in a simple parser below,

x3::char_("afkp") > '1'
|
x3::char_("bglq") > '2'
|
x3::char_("chmr") > '3'
|
x3::char_("dins") > '4'
|
x3::char_("ejot") > '5';

if the first '1' fails, we don't have to try the other 4 branches at all. If the expectation failure doesn't throw, this parser will be faster than the one with sequence operator.

We also found that in our practice, most of the sequence operators are actually expectations.

Why are there so many failures in the code? We use spirit x3 to implement erase_all/replace_all and it performs better than the boost regex equivalents.

@djowel
Copy link
Member

djowel commented Sep 8, 2017

Looks good to me. I'll need some time to try and study the example code.

@djowel
Copy link
Member

djowel commented Sep 8, 2017

OK, the speed issue is really disconcerting. On one hand this fix is a step in the right direction. On the other hand, previous code using the throw can't take advantage of the "fix". For example, I rely on the exceptions for error handling, by catching the expectation errors in the grammar via error handlers. Is there a way to also change the X3 error handling scheme to not require any code changes with this fix?

@wanghan02
Copy link
Contributor Author

@djowel Sorry for the late reply.
The error handling scheme in guard and rule_parser::parse_rhs_main can also been done using the proposed method (should inject a boost::optional<expectation_failure<Iterator>> instead of a bool). But before going any further maybe I will list the pros and cons of the proposals first.

1. throwing expectation_failure

  • pros:
  • nothing to be changed.
  • easier to write user defined parsers
  • cons:
  • becomes slow when expectation_failure is thrown.

2. inject a bool to the context

  • pros:
  • when expectation fails, it's as fast as sequence operator or even faster (when alternatives exist).
  • cons:
  • it dose not carry detailed failure information.
  • users have to be careful when writing user defined parser structs (when another parser's parse function is called inside the current parse function, the user should know how the expectation failure will change the parse result, e.g. optional operator)

3. inject a boost::optional<expectation_failure<Iterator>> to the context

  • pros:
  • when expectation fails, it's a bit slower than sequence operator (due to filling info string to expectation_failure) but faster than throwing exceptions.
  • it carries detailed failure information for error handling.
  • cons:
  • users have to be careful when writing user defined parser structs (when another parser's parse function is called inside the current parse function, the user should know how the expectation failure will change the parse result, e.g. optional operator)

I will put option 2 (inject a bool to the context) into the nice to have category. It should be given as an option to the users but not the default one. Because in many places we care more about the processing speed, not the reason why the parser fails (e.g. replace_all/erase_all). We can still use the error handling scheme based on whether the expectation failure exists (without detailed information such as iterator and info string). The users who write user defined parser structs should review their parse function if they decide to use this option (it doesn't matter if they don't because this is not the default option given to them).

Functionality-wise option 1 and option 3 are the same and one of them should be the default option given to the users. Both of them have full error handling capability. I will assume error handling based on full expectation_failure information is the reason to use these options (otherwise option 2 is a better choice). It then depends on what do we do in the error handling code. Option 3's advantage becomes nothing if we always print out the error information (because normally printing is even slower than throwing exceptions). But if some users don't print out the error information in error handling code, they can still benefit from option 3. All users who write user defined parser struct should review their parse function if option 3 is used as the default one given to users.

The code change for option 2 and 3 in error handling (guard and parse_rhs_main) and parser structs (operators, directives) are mostly the same except for some overloaded functions for option 3.

To sum up, option 2 is something nice to have. whether or not using option 3 instead of option 1 depends on what do we normally do in the error handling code and how many users need to review their user defined parser structs.

@djowel
Copy link
Member

djowel commented Sep 18, 2017

Option 3 sounds good. I'm not sure I understand "users have to be careful when writing user defined parser structs"... could you elaborate a bit with an example, like the optional operator?

@wanghan02
Copy link
Contributor Author

wanghan02 commented Sep 18, 2017

For example we have a user defined ascii_no_case parser struct (in this case a user defined directive). In it's parse function it calls subject's parse function and simply return its result. It's safe for nothrow expect because the parse function will return false when expectation failure happens.

But optional operator is different (take it as an example). The parse(_subject) function tries to match subject's parse function but will always return true even if the subject's parse function fails. Then we have to be careful and change the return statement to return !has_expectation_failure(context); instead of simply return true; because the parser should fail when expectation failure happens.

So if we choose option 3 as the default option, the users who write their own parser structs (like ascii_no_case or more complex ones like optional operator) should carefully review these structs.

	template <typename Subject>
	struct ascii_no_case_directive : unary_parser<Subject, ascii_no_case_directive<Subject>>
	{
		typedef unary_parser<Subject, ascii_no_case_directive<Subject> > base_type;
		static bool const is_pass_through_unary = true;
		static bool const handles_container = Subject::handles_container;
		
		ascii_no_case_directive(Subject const& subject)
		  : base_type(subject) {}
		
		template <typename Iterator, typename Context
		  , typename RContext, typename Attribute>
		bool parse(Iterator& first, Iterator const& last
		  , Context const& context, RContext& rcontext, Attribute& attr) const
		{
			return this->subject.parse(
				first, last
			  , make_context<no_case_tag>(ascii_no_case_compare_, context)
			  , rcontext
			  , attr);
		}
	};

@djowel
Copy link
Member

djowel commented Sep 20, 2017

OK, I like option 3. This one needs documentation though. Are you willing to go all the way and include documentation updates? Many thanks!

@wanghan02
Copy link
Contributor Author

Sure. I will change the code and do some profiling first.

1. x3::expect directive and operator have the same semantics as before.
2. When expect failure happens, the whole parser will not try any alternative paths just like before. It will simply fail without throwing any exceptions.
3. x3::guard and ID::on_error can still get the expectation_failure struct when expect failure happens. The error handling scheme is also not changed. 'rethrow' means return false and keep expect failure information visible to the parent parser. 'fail' means return false but remove the expect failure information.
4. Allow user to use expect operator more aggressively without any performance loss brought by throwing exceptions.
5. If x3::expect is used and fails in skipper parser, the main parser should not fail (The skipper parser is a totally different parser with a different context. We should use it only to skip).
6. Add unit test cases to test all existing parsers.
7. Users who care more about the processing speed and always ignore the detailed expectation_failure information can accelerate your parser by injecting a 'false' into your parser/skipper using x3::with<x3::expectation_failure_tag>(false)[parser/skipper].
@wanghan02
Copy link
Contributor Author

@djowel

x3::expect doesn't throw expectation_failure anymore

  1. x3::expect directive and operator have the same semantics as before.
  2. When expect failure happens, the whole parser will not try any alternative paths just like before. It will simply fail without throwing any exceptions.
  3. x3::guard and ID::on_error can still get the expectation_failure struct when expect failure happens. The error handling scheme is also not changed. 'rethrow' means return false and keep expect failure information visible to the parent parser. 'fail' means return false but remove the expect failure information.
  4. Allow user to use expect operator more aggressively without any performance loss brought by throwing exceptions.
  5. If x3::expect is used and fails in skipper parser, the main parser should not fail (The skipper parser is a totally different parser with a different context. We should use it only to skip).
  6. Add unit test cases to test all existing parsers.
  7. Users who care more about the processing speed and always ignore the detailed expectation_failure information can accelerate your parser by injecting a 'false' into your parser/skipper using x3::withx3::expectation_failure_tag(false)[parser/skipper].
  8. Document has been updated.

@djowel
Copy link
Member

djowel commented Oct 22, 2017

I might be missing something here. This is a major change and can definitely affect current users' code. I think the code should therefore be optional, using a preprocessor macro, with the default having the original exception behavior. We should not force this change on the users. With this option on, some parts of the API, such as 'rethrow', do not make sense any more, and perhaps should have a different behavior. We need to be very careful. I think we also need to discuss this over at the Spirit list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants