Skip to content

Conversation

D0zee
Copy link

@D0zee D0zee commented Jun 27, 2025

No description provided.

@D0zee D0zee marked this pull request as draft June 27, 2025 19:38
@D0zee D0zee force-pushed the 263-generic-flat-response branch from 0d454ef to 7e7e3dd Compare June 27, 2025 19:47
Copy link
Collaborator

@anarthal anarthal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the flat_response_impl part. It looks what I'd expect. Marcelo is more suited to review the rest, since I don't really know how adapters work. Thanks for submitting the patch.

@D0zee
Copy link
Author

D0zee commented Jun 27, 2025

@anarthal It's not really ready, just wanted to make sure that I'm on right direction. Thank you for your review!

@mzimbres
Copy link
Collaborator

Hi @D0zee, I've looked at this PR again and think it has a good direction so I endorse further development. The two issues I talked about in the issue are minor and can be addressed in the future if at all, i.e.

  1. offset_node vs offset_string.
  2. node_view at(std::size_t) vs node_view const& at(std::size_t)

Thanks again for the time invested.

@D0zee
Copy link
Author

D0zee commented Jun 30, 2025

Hi @mzimbres, I'm going to look at suggested issues in details and implement them to enhance performance and make the structure more user-friendly. Thank you for your comments in the issue

@D0zee D0zee marked this pull request as ready for review July 2, 2025 18:40
struct impl_t {
fn_type adapt_fn;
adapt_fn_type adapt_fn;
done_fn_type prepare_done_fn;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid the creation of two std::function for each request (i.e. async_exec calls) since each one represents a potential dynamic allocation. In this case however I think we can't do much better because inside async_exec we only have access to the type erased adapter i.e. std::function and therefore we can't call adapter.set_done() for example. I think the prepare_done_fn callback is not critical though since it is small and the dynamic allocation is likely to be optimized away with SOO (small object optimization). @anarthal What do you think?

Copy link
Author

@D0zee D0zee Jul 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizeof(generic_flat_response*)=8
sizeof(generic_flat_response&)=64
sizeof([](){})=1
  • If response type is generic_flat_response we will return lambda with captured reference ([res]()mutable{}). The size of this lambda is 64 bytes, std::function will definitely keep it on heap. I suggest to capture and pass the pointergeneric_flat_response*. Its size is 8 bytes and in this case SOO will take place during creation of std::function in any_adapter. As a result we have no heap allocations in this case.

  • If the response type is any other SOO will take a place by default due to the fact that size of empty lambda is 1 byte.

Source: https://stackoverflow.com/a/57049013

Copy link
Collaborator

@mzimbres mzimbres Jul 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking about this and there seems to be a simpler way of avoiding the extra std::function that I haven't considered earlier. First we define a special node value in node.hpp

constexpr resp3::node_view done_node{resp3::type::invalid, -1, -1, {}};

then call the adapter with this node when the parser is done here

template <class Adapter>
bool parse(resp3::parser& p, std::string_view const& msg, Adapter& adapter, system::error_code& ec)
{
   while (!p.done()) {
      ...
   }

   // ---> New adapter call here to inform parsing is finished.
   adapter(std::make_optional<resp3::parser::result>(done_node), system::error_code());

   return true;
}

Then call set_views() on the adapter when this node is detected here

template <>
class general_aggregate<result<flat_response_value>> {
private:
   result<flat_response_value>* result_;

public:
   template <class String>
   void operator()(resp3::basic_node<String> const& nd, system::error_code&)
   {
      // ---> Check whether done here.
      if (nd == done_node) {
         result_->value().set_views();
         return;
      }

      ...
   }
};

I totally missed this possibility earlier but it looks cleaner than adding a new std::function? Do you mind trying this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks much better and we don't have to care about heap allocations. I will try it!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct that this case must be handled by others adapters as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @D0zee, yes other adapters will have to handle this as well, for example

if (nd == done_node)
    return;

I forgot to say this in my previous comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi again, there is still one problem to solve. The way I suggested above will cause the set_view to be called multiple times, once for each request in the pipeline. That is because the same flat response can be used to store the response of multiple requests. I am still not sure what is the simplest way to avoid that, it can be solved by adding state to set_views() so that it does not traverse what has already been traversed.

Or perhaps there is a way to wrap the adapters at other location such as here

   template <class T>
   static auto create_impl(T& resp) -> impl_t
   {
      using namespace boost::redis::adapter;
      auto adapter = boost_redis_adapt(resp);
      std::size_t size = adapter.get_supported_response_size();
      return {std::move(adapter), size};
   }

or here

   adapter_ = [this, adapter](resp3::node_view const& nd, system::error_code& ec) {
      auto const i = req_->get_expected_responses() - remaining_responses_;
      adapter(i, nd, ec);
   };

so that only the last call to adapter(done_node, ec) triggers a set_view(). I think however this might be too complicated and perhaps it just simpler to let each call to set_view to traverse only what has not been set yet, as suggested above.

If I have time I will review all of this to see if there is any simplification possible.

@anarthal
Copy link
Collaborator

I might be late to the discussion, but the new API looks more inconvenient than what was proposed initially, leaking much many implementation details and requiring workarounds in the adapters. I guess there's a strong reason performance-wise to go this way. Do we have measures on how faster this approach is vs. the more convenient one?

@mzimbres
Copy link
Collaborator

@anarthal There are two problems to solve

  1. The generic_flat_response api
  2. Where to construct the string_views from the offsets.

Number 1. is the easier part and I would like to have it equal to generic_response. The current PR is almost there but actually requires calling resp.view(). I am still not sure whether this is ok but it is something easy to change.

Number 2. is what is causing problems. I am still trying to understand what is cleaner from the design and performance point of view and in a way that doesn't break existing code. We have already two callbacks associated with async_exec, the adapter (and its wrapper) and the set_done_callback. The latter is the best place where we can call flat_resp.set_view() but type erasure is preventing that. Adding a new callback is a poor workaround which makes the code more messy. The third option is to let parser notify the adapter it is done.

I think the third option is the only clean and sound option because it makes sense to know when parsing is complete from the adapter. But once we have that the set_done_callback existence become perhaps unnecessary.

Given this complexity I think it would be simpler to split this PR in two. @D0zee works only on 1. and I will work on part 2. so he has a sane way of calling set_views() that does not mess with the code. After I am done he can rebase and it should work.

@D0zee
Copy link
Author

D0zee commented Jul 10, 2025

@mzimbres Let me please know when your part is ready. I think I will finish my part (generic_flat_response API) tomorrow and on weekends

@mzimbres
Copy link
Collaborator

@D0zee I will. I am currently finishing this PR and after it is merged I will investigate how to solve this one.

@anarthal
Copy link
Collaborator

We have already two callbacks associated with async_exec, the adapter (and its wrapper) and the set_done_callback.

Note that the done callback currently belongs to the I/O world (it interacts with a channel) rather than the parser world. IMO long-term the callback should be replaced by a reader action (as I think was your intention according to this comment).

Semantically, having adapters support a "done" callback looks sound to me. But as you said, there is the type erasing issue.

If you want, I can try to write a type-erased adapter type that encapsulates both functions, something like:

class any_adapter_impl {
    // stores the underlying adapter, akin to what std::function does for functions
public:
    void on_node(std::size_t, resp3::node_view const&, system::error_code&);
    void on_finished();
};

Then you can use this type in parser.

@mzimbres
Copy link
Collaborator

Note that the done callback currently belongs to the I/O world (it interacts with a channel) rather than the parser world. IMO long-term the callback should be replaced by a reader action (as I think was your intention according to this comment).

Yeah, that is an important realization. Extending the adapter with on_finished is only meant to aid building the response from the wire protocol.

Semantically, having adapters support a "done" callback looks sound to me.

I think we need three functions on_star(), on_node() and on_finish(). We actually have on_start() that is currently called on_value_available.

But as you said, there is the type erasing issue.

There is a simple way to deal with that. We can add a new parameter to the adapter, currently we have

template <class String>
void operator()(resp3::basic_node<String> const&, system::error_code&);

which could be changed to

enum class event { start, node, finish};

template <class String>
void operator()(event ev, resp3::basic_node<String> const& nd, system::error_code& ev)
{
   switch (ev) {
      case start: adapter.on_start(); return;
      case node: adapter.on_node(nd, ev); return;
      case finish: adapter.on_finish(); return;
   }
}

That would be a breaking change but probably nobody is writing adapters.

If you want, I can try to write a type-erased adapter type that encapsulates both functions, something like:

You are welcome, the adapter module has its complexity so feel free to ask. Also, please create a sub-issue in the corresponding ticket.

class any_adapter_impl {
// stores the underlying adapter, akin to what std::function does for functions
public:
void on_node(std::size_t, resp3::node_view const&, system::error_code&);
void on_finished();
};

Note that on_finished has to be called when each response is finished and not only once when the all responses were received. For example

request req
req.push("COMMAND1", ...);
req.push("COMMAND2", ...);
req.push("COMMAND3", ...);

response<T1, T2, T3> resp;

co_wait conn.async_exec(request, resp);

// on_finish will have been called three times when we get here, once
// for each response element.

I am noticing how much background knowledge this implementation requires, I should not have expected @D0zee to go through all these details, apologies.

@D0zee
Copy link
Author

D0zee commented Aug 11, 2025

Hi @mzimbres, thank you for letting me know! I will tweak PR this week

@D0zee D0zee requested review from anarthal and mzimbres August 18, 2025 19:58
@D0zee
Copy link
Author

D0zee commented Aug 18, 2025

Hi guys, can you please have one more look at PR? After rebase I've replaced almost all generic_response occurrences with generic_flat_response and added implementation of on_init()/on_done()/on_node() methods. All affected tests are passed. All tweaked examples work correctly if I didn't miss anything. Sorry for the delay!

void on_done()
{
if (result_->has_value()) {
result_->value().set_view();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a request contains multiple commands, on_done() and by consequence set_view() will be called multiple times, although it should be called only once on the last on_done() call. I have to review the code again to understand whether we know upfront how many times on_done() will be called. Traversing the vector of nodes every time is prohibitively expensive.

This might get more complex when reading server pushes with async_receive since we don't really know how many we are expecting.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I have not thought about it, I will consider this case as well

@mzimbres
Copy link
Collaborator

Hi @D0zee, thanks again. I did not have the time to review last week. While reading the code now it occurs me we need a way to avoid calling set_view() on every on_done() since that would be prohibitively expensive. I have to review the code again.

I might no have the time next week, but after that I do, so please have some patience. Thanks.

@mzimbres
Copy link
Collaborator

Hi again, regarding the problem of calling set_view() only at the last on_done() parse event: This is probably simple to solve if if we pass the number of expected responses to the adapter. However, when dealing with server pushes we don't know how many will be received upfront so the library would have to call set_view() right before resuming async_receive(), which would be probably here, but we can't since we only have a type erased push-receiver adapter.

I think I also oversaw this problem in the first stages of this PR that used the set_done_callback, that only works for request/response but not for server pushes.

In summary, it might be better to delegate the set_view() call to the user instead of finding a way to make the connection or adapter call it on behalf thereof. If we can agree on that I would like to revert the changes to the examples that replace all generic_response occurrences with generic_flat_response. We should use only one example for now and comment on the need to call set_view().

Thoughts?

@anarthal
Copy link
Collaborator

anarthal commented Sep 1, 2025

Pushing the responsibility to the user seems hostile. The library seems the right place to do it.

If the on_finished adapter callback does not suit this use case, what's its use then? Shouldn't it be either removed or replaced by something that does suit (e.g. on_all_finished)?

On the other hand, why was the idea of creating the string_view objects on demand (i.e. a custom range) ruled out? Has the "less efficient" call been backed by any actual data?

@criatura2
Copy link

After some days thinking I believe there is a good solution that just requires making add_node and set_view smarter.

  1. Add a new field std::size_t pos_ = 0 to flat_response_value.

  2. Let set_view use pos_ to set the views incrementally when it gets called by on_done

void set_view()
 {
    for (; pos_ < view_.size(); ++pos_) {
       auto& v = view_.at(pos_).value;
       v.data = std::string_view{data_.data() + v.offset, v.size};
    }
 }
  1. Move the current add_node implementation to add_node_impl and let add_node detect any memory allocations
void add_node(resp3::basic_node<String> const& nd)
{
  auto capacity_before = data_.capacity();
  add_node_impl(nd);
  auto capacity_after = data_.capacity();

  if (capacity_after > capacity_before)
    pos_ = 0;
}

That means, every time on_done calls set_view it will only traverse nodes that haven't been set yet. When a memory reallocation is detected pos_ is set to 0 so that all nodes get reset on the next on_done call. Reallocation however will either not occur if the user reserves memory upfront or only a couple of times at program startup since responses are reused, meaning that asymptotically it will not allocate any memory.

@D0zee Thanks for keeping up so far, this PR has been very helpful in detecting open points in the original design. If you don't have the time to implement the suggestion above I can take over it, I think however this is not much work. Thanks.

@D0zee
Copy link
Author

D0zee commented Sep 4, 2025

@criatura2 Thank you for your proposal, but I think the similar problem was discussed at some moment of time in this PR and we denied that. The intention is to call set_view() only once to be efficient. That's why we need to know at which point we need to call this method.

I agree with @anarthal that the responsibility of setting view must be rather on library than on client. I will spend some time thinking about this problem on FSM side.

@criatura2
Copy link

@criatura2 Thank you for your proposal, but I think the similar problem was discussed at some moment of time in this PR and we denied that. The intention is to call set_view() only once to be efficient.

What is not ok is to traverse the whole vector for each on_done call, which is a O(n^2) solution. It is ok however to traverse only the newly added items, which is an amortized O(n) solution. And as said previously, since re-allocations are very rare or non-existent, the solution would be close to optimal.

@D0zee
Copy link
Author

D0zee commented Sep 4, 2025

What is not ok is to traverse the whole vector for each on_done call, which is a O(n^2) solution.

You are right, that is the next step I'm going to look at. It was noticed here #278 (comment).

Ideally we would like to call set_view()only once and I think it is possible, just don't have time this week to dive deeper.

I suspect your solution to be O(n * log(n)) where n is the number of response strings in generic_flat_response, because of resetting pos_ when reallocation happens. We can assume that it happens log(n) times in general case (for different implementations of strings the load factor might be different):

  if (capacity_after > capacity_before)
    pos_ = 0;

@criatura2 If you would like to help you can check the comments below and think about the case with server pushes when we don't know how many messages to expect (details are here: #278 (comment)). Next week I'm going to get deeper and try to come up with solution.

@criatura2
Copy link

I suspect your solution to be O(n * log(n)) where n is the number of response strings in generic_flat_response, because of resetting pos_ when reallocation happens.

If the response is reused such as here only one read will be O(n * log(n)) after that it gets O(n). Also, something like

generic_flat_response resp;
resp.value().reserve(some_number);

Will bring reallocations to negligible levels.

@D0zee
Copy link
Author

D0zee commented Sep 4, 2025

You have come up to the same problem - how to learn the number of responses we are going to handle. The problem is that in your approach we also need to know what is the average or expected size of each response. In my opinion it is impossible to know in advance because it depends on the content stored in redis-server.

In suggested below approach to call set_view on the last response we need to know only if this response is last. I believe that it is possible derive somehow in comparison to the content of redis-server.

@criatura2
Copy link

There is no real difference between calling set_view() only once and calling it incrementally on each on_done call. Knowing the number of responses e.g. the number of on_done calls is actually not necessary (and not available for server pushes)

Do you agree that the number of reallocations is negligible?

@D0zee
Copy link
Author

D0zee commented Sep 8, 2025

What if we indeed move the responsibility to call set_view method to the user, but we will do that implicitly? In each API method of generic_flat_response we will check if the structure is ready for use, if it is not - we call set_view. API is used only by clients and only when all operations with Redis are done. Thus, first API call will automatically set up views.

@mzimbres @anarthal What do you think?

@anarthal
Copy link
Collaborator

anarthal commented Sep 8, 2025

Are you proposing the following scheme?

iterator begin() const {
   if (!views_set_) {
      set_views();
      views_set_ = true;
   }
   return /* whatever */
}

@D0zee
Copy link
Author

D0zee commented Sep 9, 2025

@anarthal yes, exactly. In our case this check will be at the beginning of this methods of generic_flat_response:

Screenshot 2025-09-09 at 3 19 48 PM

@anarthal
Copy link
Collaborator

Hm. This has a number of problems:

  • begin() and end() are now either non-const, or need to use a mutable variable underneath.
  • begin(), end() and similar functions are no longer guaranteed to be "cheap".
  • I don't know if this is a concern, but all of these functions now have an extra branch, which may hurt performance.

I want to understand why this can't be done after parsing is finished - is it because of pushes? For regular requests, it looks like the points where parsing starts and ends are clearly defined.

@anarthal
Copy link
Collaborator

Also, if the problem are indeed pushes, what are the plans to implement consume_one? It looks like the recommended way to manage pushes, although the implementation for generic_response already looks inefficient.

@mzimbres
Copy link
Collaborator

I want to understand why this can't be done after parsing is finished - is it because of pushes?

Yes. The default size of the channel used to deliver pushes is 256, that means on_done() can be called that many times and you never know which one is the last call so that set_view() can be called. We might want to call set_view right after try_send and just before async_send here but that does not work either because that would be a race condition i.e. the receiver side might suspend allowing a further push to be processed after set_view has been called.

@mzimbres
Copy link
Collaborator

mzimbres commented Sep 13, 2025

@D0zee sorry for late reply, I was very busy these days. As I said earlier (in my other account criatura2). Detecting allocations is close to optimal IMO. I don't like the idea of safe-guarding every function with an if.

@D0zee
Copy link
Author

D0zee commented Sep 13, 2025

Hi @mzimbres, didn't know that account is yours! I need to think about it one more time. Sorry I was busy this week as well. Thank you for bearing with me!

@D0zee
Copy link
Author

D0zee commented Oct 1, 2025

If you don't have the time to implement the suggestion above I can take over it, I think however this is not much work. Thanks.

Hi @mzimbres, can you please take over that change? I understand that I've been extremely busy recently and it stops this PR to be merged.

@mzimbres
Copy link
Collaborator

mzimbres commented Oct 3, 2025

Hi @mzimbres, can you please take over that change? I understand that I've been extremely busy recently and it stops this PR to be merged.

Hi @D0zee, no problem I will pick your commits to properly attribute your the work you did. I will also pull you in the PR when it is ready. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants