New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
basic_parser_impl: make sentinel() return a unique pointer #814
Conversation
9562730
to
91de423
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## develop #814 +/- ##
========================================
Coverage 99.00% 99.00%
========================================
Files 71 71
Lines 6832 6832
========================================
Hits 6764 6764
Misses 68 68
Continue to review full report at Codecov.
|
Very interesting change :) @sdkrystian what do you think about this? |
This should work, but may affect performance. I wanted to see automatic benchmark results. @MaxKellermann also please add a test that's failing without your change (use one commit). |
Btw, alternative implementation would be to return the address of any parser's member except the first. |
That's fragile because it can break when you reorder the members. But you could add |
91de423
to
7e2dfa8
Compare
I added a failing unit test and I changed the implementation to return |
this + 1 will end up within the handler. So, I wonder if someone may try feeding the parser with something within the handler. |
True; but note that without this PR, it was already within the handler, so this PR is an improvement for sure. What about using |
|
Can you please squash the commits? |
7e2dfa8
to
062b93d
Compare
Done. |
|
@vinniefalco this reason why |
Interesting!! |
I had the same idea yesterday, but
If you confirm there isn't any reason, I'll happily change this PR to |
try |
That fails lots of unit tests:
... and many more. I did not investigate further. |
Another possibility is to privately derive
|
You'd need to have at least two chars, and return a pointer to the second one, to ensure that no preceding buffer can end at the returned pointer, which is the whole problem here. |
Well no, because
|
They have the same address in your example (most likely, but that may depend on the compiler-specific C++ ABI).
No. Let me godbolt this for you :-) https://godbolt.org/z/63T8c4vx8 |
:( That is rather unfortunate |
If you want it watertight, I can change this PR to add a |
How would that be better than just returning an address to any member? Or this + 1? |
If you return the address of any member, it's only watertight if you can guarantee that this member is not directly preceded by a buffer to be parsed. You can look at the current layout of the class and speculate it's not, but that's fragile; if eventually the code gets refactored and members get reordered (which is a perfectly reasonable thing to do), this assumption may no longer be true and the code may break. Having a
It was you who wrote: "this + 1 will end up within the handler. So, I wonder if someone may try feeding the parser with something within the handler." |
|
@@ -217,8 +217,10 @@ const char* | |||
basic_parser<Handler>:: | |||
sentinel() | |||
{ | |||
// the "+1" ensures that the returned pointer is unique even if | |||
// the given input buffer borders on this object | |||
return reinterpret_cast< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static_assert(sizeof(*this) > 1);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class basic_parser
has lots of fields, and is always larger than 1 byte, unless this class gets refactored and all fields get removed, which isn't going to happen. I don't think this static_assert
adds real value, it's trivial to see it's always true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, lol
The worst part of making I would stick to returning a pointer into parser, even if that means some reduction in performance. I also don't like adding unnecessary fields just for this purpose, I would prefer returning a pointer to an existing field. We already keep layout of On the other hand, I'm not necessarily against returning @vinniefalco your thoughts? |
@MaxKellermann can you please rebase on current develop and re-push. I will merge it then. |
Right now, sentinel() casts the `basic_parser` pointer (`this`) to `const char *`, but that pointer is not unique if the input buffer happens to be placed right before the `basic_parser_impl` instance - the end of that buffer then has the same address as `basic_parser`. Example code: ``` struct { char buffer[8]{"{\"12345\""}; boost::json::stream_parser p; } s; s.p.write(s.buffer, sizeof(s.buffer)); s.p.write(":0}", 3); ``` This stops parsing at the end of the buffer, and then the `incomplete()` check in `parse_string()` will return true; the second `write()` call will crash with assertion failure: > boost/json/basic_parser_impl.hpp:1016: const char* boost::json::basic_parser<Handler>::parse_unescaped(const char*, std::integral_constant<bool, StackEmpty_>, std::integral_constant<bool, AllowComments_>, bool) [with bool StackEmpty_ = true; bool IsKey_ = true; Handler = boost::json::detail::handler]: Assertion `*cs == '\x22'' failed. This changes `sentinel()` by adding 1 to guaranteed that the sentinel pointer is unique even if the input buffers borders on this object.
062b93d
to
0f987f0
Compare
done |
|
Thank you for your contribution. |
Right now, sentinel() casts the
basic_parser_impl
pointer (this
) toconst char *
, but that pointer is not unique if the input buffer happens to be placed right before thebasic_parser_impl
instance - the end of that buffer then has the same address asbasic_parser_impl
.Example code:
This stops parsing at the end of the buffer, and then the
incomplete()
check inparse_string()
will return true; the secondwrite()
call will crash with assertion failure:This changes
sentinel()
to return the address of a static variable instead, which is guaranteed to be unique.