Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic_parser_impl: make sentinel() return a unique pointer #814

Merged
merged 1 commit into from Dec 30, 2022

Conversation

MaxKellermann
Copy link
Contributor

@MaxKellermann MaxKellermann commented Nov 21, 2022

Right now, sentinel() casts the basic_parser_impl pointer (this) to const char *, but that pointer is not unique if the input buffer happens to be placed right before the basic_parser_impl instance - the end of that buffer then has the same address as basic_parser_impl.

Example code:

  struct {
    char buffer[8]{"{\"12345\""};
    boost::json::stream_parser p;
  } s;
  s.p.write(s.buffer, sizeof(s.buffer));
  s.p.write(":0}", 3);

This stops parsing at the end of the buffer, and then the incomplete() check in parse_string() will return true; the second write() call will crash with assertion failure:

boost/json/basic_parser_impl.hpp:1016: const char* boost::json::basic_parser::parse_unescaped(const char*, std::integral_constant<bool, StackEmpty_>, std::integral_constant<bool, AllowComments_>, bool) [with bool StackEmpty_ = true; bool IsKey_ = true; Handler = boost::json::detail::handler]: Assertion `*cs == '\x22'' failed.

This changes sentinel() to return the address of a static variable instead, which is guaranteed to be unique.

@codecov
Copy link

codecov bot commented Nov 21, 2022

Codecov Report

Merging #814 (0f987f0) into develop (6af3a5e) will not change coverage.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #814   +/-   ##
========================================
  Coverage    99.00%   99.00%           
========================================
  Files           71       71           
  Lines         6832     6832           
========================================
  Hits          6764     6764           
  Misses          68       68           
Impacted Files Coverage Δ
include/boost/json/basic_parser_impl.hpp 98.33% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6af3a5e...0f987f0. Read the comment docs.

@vinniefalco
Copy link
Member

vinniefalco commented Nov 21, 2022

Very interesting change :) @sdkrystian what do you think about this?

@grisumbras
Copy link
Member

This should work, but may affect performance. I wanted to see automatic benchmark results. @MaxKellermann also please add a test that's failing without your change (use one commit).

@grisumbras
Copy link
Member

Btw, alternative implementation would be to return the address of any parser's member except the first.

@MaxKellermann
Copy link
Contributor Author

Btw, alternative implementation would be to return the address of any parser's member except the first.

That's fragile because it can break when you reorder the members. But you could add +1 to the char-casted this pointer.

@MaxKellermann
Copy link
Contributor Author

I added a failing unit test and I changed the implementation to return (char*)this + 1; this was necessary because my first approach was still faulty - theoretically (though unlikely), this global variable could be the end of a global buffer, leading to the same crash as before.

@grisumbras
Copy link
Member

this + 1 will end up within the handler. So, I wonder if someone may try feeding the parser with something within the handler.

@MaxKellermann
Copy link
Contributor Author

MaxKellermann commented Nov 21, 2022

this + 1 will end up within the handler

True; but note that without this PR, it was already within the handler, so this PR is an improvement for sure.

What about using nullptr as sentinel? That would be cheaper than comparing pointers with this (i.e. with a CPU register). Are there corner cases where a valid non-sentinel value can be nullptr? (This idea is so obvious that I thought there must be a reason why this wasn't already done.)

@cppalliance-bot
Copy link

@grisumbras
Copy link
Member

Can you please squash the commits?

@MaxKellermann
Copy link
Contributor Author

Can you please squash the commits?

Done.

@cppalliance-bot
Copy link

@sdkrystian
Copy link
Member

@vinniefalco this reason why sentinel returned this was because it would be already loaded in some register when the function was called, so it could simply be moved into rax via a register-to-register move (which don''t require any execution resources on modern CPUs). So while this fix may work, it may just be better to return nullptr (which I believe should work; possibly even better than the old implementation).

@vinniefalco
Copy link
Member

Interesting!!

@MaxKellermann
Copy link
Contributor Author

it may just be better to return nullptr (which I believe should work; possibly even better than the old implementation).

I had the same idea yesterday, but

This idea is so obvious that I thought there must be a reason why this wasn't already done

If you confirm there isn't any reason, I'll happily change this PR to nullptr!

@vinniefalco
Copy link
Member

try nullptr

@MaxKellermann
Copy link
Contributor Author

try nullptr

That fails lots of unit tests:

====== BEGIN OUTPUT ======
boost.json.parse
#3 parse.cpp(49) failed: ! ec
#6 parse.cpp(49) failed: ! ec
#9 parse.cpp(49) failed: ! ec
#12 parse.cpp(49) failed: ! ec
#15 parse.cpp(49) failed: ! ec
#18 parse.cpp(49) failed: ! ec
terminate called after throwing an instance of 'boost::wrapexcept<boost::system::system_error>'
  what():  incomplete JSON [boost.json:3 at ../../boost/json/basic_parser_impl.hpp:2753 in function 'write_some']
Aborted (core dumped)

EXIT STATUS: 134
====== END OUTPUT ======
====== BEGIN OUTPUT ======
boost.json.value
#1023 value.cpp(2283) failed: jv == 23
#1024 value.cpp(2288) failed: jv == 23
#1026 value.cpp(2294) failed: jv == 23
#1028 value.cpp(2300) failed: jv == 23
#1030 value.cpp(2307) failed: jv == 23
#1033 value.cpp(2315) failed: jv == 23
3ms, 1 suites, 6 failures, 1034 total.

EXIT STATUS: 1
====== END OUTPUT ======

... and many more. I did not investigate further.

@vinniefalco
Copy link
Member

Another possibility is to privately derive basic_parser_impl from a struct that has a single private char in it:

class detail::pad
{
  char c__ = 0;
};

@MaxKellermann
Copy link
Contributor Author

Another possibility is to privately derive basic_parser_impl from a struct that has a single private char in it:

You'd need to have at least two chars, and return a pointer to the second one, to ensure that no preceding buffer can end at the returned pointer, which is the whole problem here.

@vinniefalco
Copy link
Member

You'd need to have at least two chars, and return a pointer to the second one, to ensure that no preceding buffer can end at the returned pointer, which is the whole problem here.

Well no, because sentinel() is a member of basic_parser_impl so this would refer to the basic_parser_impl part and not the detail::pad part.

struct pad
{
  char c;
};

struct parser : private pad
{
  char const* sentinel() const noexcept
  {
    return this; // one past `pad`
  }
};

@MaxKellermann
Copy link
Contributor Author

Well no, because sentinel() is a member of basic_parser_impl so this would refer to the basic_parser_impl part and not the detail::pad part.

They have the same address in your example (most likely, but that may depend on the compiler-specific C++ ABI).

one past pad

No. Let me godbolt this for you :-) https://godbolt.org/z/63T8c4vx8

@vinniefalco
Copy link
Member

:( That is rather unfortunate

@MaxKellermann
Copy link
Contributor Author

If you want it watertight, I can change this PR to add a char sentinel[2] field and return &sentinel[1]. Do you want that?
For my taste, the current state or the PR is "good enough", but it's not (strictly) watertight.
I'd love to have nullptr, but that may require fixing some existing quirks - if you want that, I can give that a try.
Let me know your preference.

@grisumbras
Copy link
Member

How would that be better than just returning an address to any member? Or this + 1?

@MaxKellermann
Copy link
Contributor Author

How would that be better than just returning an address to any member?

If you return the address of any member, it's only watertight if you can guarantee that this member is not directly preceded by a buffer to be parsed. You can look at the current layout of the class and speculate it's not, but that's fragile; if eventually the code gets refactored and members get reordered (which is a perfectly reasonable thing to do), this assumption may no longer be true and the code may break.

Having a char sentinel[2] and using &sentinel[1] guarantees that the pointer is never preceded by a buffer, because &sentinel[1] is guaranteed to be preceded by &sentinel[0], no matter which C++ ABI, and no matter how you reorder fields. This is the strictly watertight solution.

Or this + 1?

It was you who wrote: "this + 1 will end up within the handler. So, I wonder if someone may try feeding the parser with something within the handler."
It's not whatertight. It's only safe against parsing buffers outside of the parser object, but not against buffers which happen to be inside.

@vinniefalco
Copy link
Member

vinniefalco commented Nov 24, 2022

( reinterpret_cast<char const*>(this) + 1 )

@@ -217,8 +217,10 @@ const char*
basic_parser<Handler>::
sentinel()
{
// the "+1" ensures that the returned pointer is unique even if
// the given input buffer borders on this object
return reinterpret_cast<

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_assert(sizeof(*this) > 1);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class basic_parser has lots of fields, and is always larger than 1 byte, unless this class gets refactored and all fields get removed, which isn't going to happen. I don't think this static_assert adds real value, it's trivial to see it's always true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, lol

@grisumbras
Copy link
Member

The worst part of making sentinel() return nullptr is that it will put a precondition on write_some (you wouldn't be able to do parser.write_some(true, nullptr, 0, ec). This isn't necessarily bad, but when GCC started optimising around a similar precondition in memcpy, it broke a lot of code.

I would stick to returning a pointer into parser, even if that means some reduction in performance. I also don't like adding unnecessary fields just for this purpose, I would prefer returning a pointer to an existing field. We already keep layout of value et al. fixed, doing something similar for basic_parser shouldn't be a problem.

On the other hand, I'm not necessarily against returning this + 1, but we might want to document the fact that people shouldn't feed to the parser buffers inside handler.

@vinniefalco your thoughts?

@grisumbras
Copy link
Member

@MaxKellermann can you please rebase on current develop and re-push. I will merge it then.

Right now, sentinel() casts the `basic_parser` pointer (`this`) to
`const char *`, but that pointer is not unique if the input buffer
happens to be placed right before the `basic_parser_impl` instance -
the end of that buffer then has the same address as `basic_parser`.

Example code:

```
  struct {
    char buffer[8]{"{\"12345\""};
    boost::json::stream_parser p;
  } s;
  s.p.write(s.buffer, sizeof(s.buffer));
  s.p.write(":0}", 3);
```

This stops parsing at the end of the buffer, and then the
`incomplete()` check in `parse_string()` will return true; the second
`write()` call will crash with assertion failure:

> boost/json/basic_parser_impl.hpp:1016: const char* boost::json::basic_parser<Handler>::parse_unescaped(const char*, std::integral_constant<bool, StackEmpty_>, std::integral_constant<bool, AllowComments_>, bool) [with bool StackEmpty_ = true; bool IsKey_ = true; Handler = boost::json::detail::handler]: Assertion `*cs == '\x22'' failed.

This changes `sentinel()` by adding 1 to guaranteed that the sentinel
pointer is unique even if the input buffers borders on this object.
@MaxKellermann
Copy link
Contributor Author

@MaxKellermann can you please rebase on current develop and re-push. I will merge it then.

done

@cppalliance-bot
Copy link

@grisumbras grisumbras merged commit 0f987f0 into boostorg:develop Dec 30, 2022
@grisumbras
Copy link
Member

Thank you for your contribution.

@MaxKellermann MaxKellermann deleted the unique_sentinel branch October 5, 2023 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants