New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify Message.Size functionality #109

Closed
jtaubensee opened this Issue Mar 24, 2017 · 39 comments

Comments

Projects
None yet
5 participants
@jtaubensee
Contributor

jtaubensee commented Mar 24, 2017

Actual Behavior

  1. I have reason to believe that Message.Size will only calculate the size of a stream, and not the headers.

Expected Behavior

  1. Message.Size should include the size of the headers

Versions

All

@jtaubensee jtaubensee added the bug label Mar 24, 2017

@jtaubensee jtaubensee added this to the 0.0.2-preview milestone Mar 24, 2017

@jtaubensee jtaubensee referenced this issue Mar 24, 2017

Merged

(Brokered)Message refactoring #108

5 of 7 tasks complete
@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Mar 24, 2017

Collaborator

👍
This is an old pain. The current (we should really find a name to call it) client does a poor job, especially when querying the messages that were not sent yet.

Collaborator

SeanFeldman commented Mar 24, 2017

👍
This is an old pain. The current (we should really find a name to call it) client does a poor job, especially when querying the messages that were not sent yet.

@jtaubensee

This comment has been minimized.

Show comment
Hide comment
@jtaubensee

jtaubensee Apr 7, 2017

Contributor

So, after looking into this one a little bit more, we are wondering where would be the best place to put this functionality. I think rather than being a property on the message, this should be a "utility function" somewhere (perhaps on the MessageSender). The reason being is that the size cannot be cached (as the body and properties are mutable), and in order to provide the true message size, we need to convert the message to an AMQP message first. Of course, there is some overhead associated with this, hence the reason to have a method called CalculateMessageSize.

Any thoughts about where this should go? Also, keep in mind that we will throw an exception on the client side before the message is sent if the size is too big. So in most cases, users can just handle the exception on send.

Contributor

jtaubensee commented Apr 7, 2017

So, after looking into this one a little bit more, we are wondering where would be the best place to put this functionality. I think rather than being a property on the message, this should be a "utility function" somewhere (perhaps on the MessageSender). The reason being is that the size cannot be cached (as the body and properties are mutable), and in order to provide the true message size, we need to convert the message to an AMQP message first. Of course, there is some overhead associated with this, hence the reason to have a method called CalculateMessageSize.

Any thoughts about where this should go? Also, keep in mind that we will throw an exception on the client side before the message is sent if the size is too big. So in most cases, users can just handle the exception on send.

@jtaubensee jtaubensee modified the milestones: 0.0.3-preview, 0.0.4-preview Apr 10, 2017

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Apr 17, 2017

Collaborator

The message size is important to be assessed prior to sending it. Batching allows up to 100 messages or less than the maximum size for the tier used. If there are less than 100 messages, one has to calculate the overall size of the messages in order to make a conscious decision to split the batch into "chunks".

Utility method sounds off to be frank. The message is the one that should know how to calculate its size, not some method on an arbitrary class that might (or might not) be used later. Yes, calculation of the message time is a perf hit, but it's a necessary evil. And when someone opts into it, they probably do that for a good reason.

To sum it up:

  • No to a utility method on something like MessageSender
  • No to an assumption at what point that calculated size will be needed
  • Yes for the Message to provide this method and proper documentation of what it's doing

If you really oppose the idea of Message providing the calculation, you could implement it as an extension method on the root namespace (along with the Message).

Collaborator

SeanFeldman commented Apr 17, 2017

The message size is important to be assessed prior to sending it. Batching allows up to 100 messages or less than the maximum size for the tier used. If there are less than 100 messages, one has to calculate the overall size of the messages in order to make a conscious decision to split the batch into "chunks".

Utility method sounds off to be frank. The message is the one that should know how to calculate its size, not some method on an arbitrary class that might (or might not) be used later. Yes, calculation of the message time is a perf hit, but it's a necessary evil. And when someone opts into it, they probably do that for a good reason.

To sum it up:

  • No to a utility method on something like MessageSender
  • No to an assumption at what point that calculated size will be needed
  • Yes for the Message to provide this method and proper documentation of what it's doing

If you really oppose the idea of Message providing the calculation, you could implement it as an extension method on the root namespace (along with the Message).

@jtaubensee jtaubensee modified the milestones: 0.0.4-preview, 0.0.5-preview May 2, 2017

@jtaubensee jtaubensee modified the milestones: Backlog, 0.0.6-preview May 25, 2017

@nemakam nemakam referenced this issue Jun 30, 2017

Merged

API documentation + Several minor API changes #200

7 of 7 tasks complete
@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Jul 26, 2017

Collaborator

@Azure/azure-service-bus-write this hasn't see any action. Should it be addressed?

Collaborator

SeanFeldman commented Jul 26, 2017

@Azure/azure-service-bus-write this hasn't see any action. Should it be addressed?

@poveilleux

This comment has been minimized.

Show comment
Hide comment
@poveilleux

poveilleux Aug 21, 2017

Any news on that?

poveilleux commented Aug 21, 2017

Any news on that?

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Aug 21, 2017

Collaborator

Has been backloged @poveilleux 🙁
What's the use case you're having?

Collaborator

SeanFeldman commented Aug 21, 2017

Has been backloged @poveilleux 🙁
What's the use case you're having?

@poveilleux

This comment has been minimized.

Show comment
Hide comment
@poveilleux

poveilleux Aug 21, 2017

I am trying to send 500 messages through ITopicClient.SendAsync(IList<Message> messages), but it throws a NotImplementedException saying "MessageSizeExceededException". I tried separating the messages into smaller batches that would fit into the 256 KB constraint per batch.

The problem I have is when I sum the Size property of all my messages, I get a result of 157 KB, but when I call SendAsync with my list of messages, the exception message says I have over 450 KB of data.

My question is: how can I calculate the size of a batch based on the message size?

poveilleux commented Aug 21, 2017

I am trying to send 500 messages through ITopicClient.SendAsync(IList<Message> messages), but it throws a NotImplementedException saying "MessageSizeExceededException". I tried separating the messages into smaller batches that would fit into the 256 KB constraint per batch.

The problem I have is when I sum the Size property of all my messages, I get a result of 157 KB, but when I call SendAsync with my list of messages, the exception message says I have over 450 KB of data.

My question is: how can I calculate the size of a batch based on the message size?

@poveilleux

This comment has been minimized.

Show comment
Hide comment
@poveilleux

poveilleux Sep 15, 2017

@SeanFeldman Are you able to help me with that problem?

poveilleux commented Sep 15, 2017

@SeanFeldman Are you able to help me with that problem?

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Sep 15, 2017

Collaborator

@poveilleux I've written a post for the old client that used to serialize message. With this library, you should be able to get away with a much simpler approach:

message total size = Message.Size + Message.UserProperties + Message.SystemProperties

Does that work for you?

Collaborator

SeanFeldman commented Sep 15, 2017

@poveilleux I've written a post for the old client that used to serialize message. With this library, you should be able to get away with a much simpler approach:

message total size = Message.Size + Message.UserProperties + Message.SystemProperties

Does that work for you?

@poveilleux

This comment has been minimized.

Show comment
Hide comment
@poveilleux

poveilleux Sep 15, 2017

Thank you for providing me with that article. I will test it when I got and I'll let you know if it worked, but I assume it will!

poveilleux commented Sep 15, 2017

Thank you for providing me with that article. I will test it when I got and I'll let you know if it worked, but I assume it will!

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Sep 21, 2017

Collaborator

@Azure/azure-service-bus-write could you please point out if there's a flaw in my assumption about message size calculation?

Collaborator

SeanFeldman commented Sep 21, 2017

@Azure/azure-service-bus-write could you please point out if there's a flaw in my assumption about message size calculation?

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Feb 18, 2018

Collaborator

Current Message.Size just returns number of bytes for Body.
Full size would be determined by Message.Body.Length + size of Message.SystemProperties + size of Message.UserProperties.
When a Message is sent it's converted into an AmqpMessage, which has SerializedMessageSize. I have tried to see if SerializedMessageSize can be used to report Message size, but there'a a significant discrepency between what AmqpMessage.SerializedMessageSize and actual mesasge size when it gets to the entity.

Would like to send a PR to address this issue and would appreciate any feedback on what I'm doing wrong. Thanks.

Collaborator

SeanFeldman commented Feb 18, 2018

Current Message.Size just returns number of bytes for Body.
Full size would be determined by Message.Body.Length + size of Message.SystemProperties + size of Message.UserProperties.
When a Message is sent it's converted into an AmqpMessage, which has SerializedMessageSize. I have tried to see if SerializedMessageSize can be used to report Message size, but there'a a significant discrepency between what AmqpMessage.SerializedMessageSize and actual mesasge size when it gets to the entity.

Would like to send a PR to address this issue and would appreciate any feedback on what I'm doing wrong. Thanks.

@nemakam

This comment has been minimized.

Show comment
Hide comment
@nemakam

nemakam Feb 20, 2018

Member

@SeanFeldman
Hitting the perfect solution to this problem is tricky. We might have to discuss and come up with the approach to deal with few things.

For example, the way SendAsync(Message) and SendAsync(IList<Message>) gets serialized is a bit different. Serialized size of list of messages might not be same as sum of sizes of individual messages.

Also, we don't want to serialize the message twice (once to calculate size, and once during actual Send. So we might end up caching it. And now, there are problems around caching. How do we deal with someone who modifies some property after the size calculation has been done? As a scenario, lets say I add a user property List<string> myList;. And now after calculating the size I add few more elements to the list. How does caching work in such scenarios?

Member

nemakam commented Feb 20, 2018

@SeanFeldman
Hitting the perfect solution to this problem is tricky. We might have to discuss and come up with the approach to deal with few things.

For example, the way SendAsync(Message) and SendAsync(IList<Message>) gets serialized is a bit different. Serialized size of list of messages might not be same as sum of sizes of individual messages.

Also, we don't want to serialize the message twice (once to calculate size, and once during actual Send. So we might end up caching it. And now, there are problems around caching. How do we deal with someone who modifies some property after the size calculation has been done? As a scenario, lets say I add a user property List<string> myList;. And now after calculating the size I add few more elements to the list. How does caching work in such scenarios?

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Feb 20, 2018

Collaborator

I was wondering about the differences between single message and a collection.
There's no perfect solution TBH. That also could be the reason the old library never provided a size other that for the body.

Personally, I'm against caching. Partially for the reason you've outlined. And partially due to introdction of yet another problem to the already existing one - cache invalidation 🙂

Also, we don't want to serialize the message twice (once to calculate size, and once during actual Send

This is where I'm not entirely sure why would it be so bad? Yes someone could modify the message or its headers right after they've triggered estimated size calculation, rendering results to be incorrect. But that's something a user would control anyways. Unless a message is in some sort of "locked for sending" mode, anything can change, right? So 👎 for size calculation caching.

Collaborator

SeanFeldman commented Feb 20, 2018

I was wondering about the differences between single message and a collection.
There's no perfect solution TBH. That also could be the reason the old library never provided a size other that for the body.

Personally, I'm against caching. Partially for the reason you've outlined. And partially due to introdction of yet another problem to the already existing one - cache invalidation 🙂

Also, we don't want to serialize the message twice (once to calculate size, and once during actual Send

This is where I'm not entirely sure why would it be so bad? Yes someone could modify the message or its headers right after they've triggered estimated size calculation, rendering results to be incorrect. But that's something a user would control anyways. Unless a message is in some sort of "locked for sending" mode, anything can change, right? So 👎 for size calculation caching.

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Apr 25, 2018

Collaborator

Would like to resurrect this conversation.

Do you have any ideas besides the one I had to implement in the past, that would work? Given that the new client supports outgoing pipeline, this could be done as a plugin. The benefits would be A) not the client's concern B) as user/developer who's adding the plugin you'll bare the responsibility of the impact it will have on the overall performance.

Collaborator

SeanFeldman commented Apr 25, 2018

Would like to resurrect this conversation.

Do you have any ideas besides the one I had to implement in the past, that would work? Given that the new client supports outgoing pipeline, this could be done as a plugin. The benefits would be A) not the client's concern B) as user/developer who's adding the plugin you'll bare the responsibility of the impact it will have on the overall performance.

@nemakam

This comment has been minimized.

Show comment
Hide comment
@nemakam

nemakam Apr 25, 2018

Member

@SeanFeldman
If I understood it correctly, are you proposing that the plugin will do size calculation by serializing the whole message? Can you think of how the API surface would look like?

If we are going away from caching, then the main thing we need to worry about is the explanation part. We need to make the user understand 2 things:

  1. Calculating the message size is not a very straighforward job and requires some CPU cycles.
  2. Sizeof(MessageA + MessageB) is not same as Sizeof(MessageA) + Sizeof(MessageB)

We also need to understand what is the user-case scenario we are targeting.

Member

nemakam commented Apr 25, 2018

@SeanFeldman
If I understood it correctly, are you proposing that the plugin will do size calculation by serializing the whole message? Can you think of how the API surface would look like?

If we are going away from caching, then the main thing we need to worry about is the explanation part. We need to make the user understand 2 things:

  1. Calculating the message size is not a very straighforward job and requires some CPU cycles.
  2. Sizeof(MessageA + MessageB) is not same as Sizeof(MessageA) + Sizeof(MessageB)

We also need to understand what is the user-case scenario we are targeting.

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Apr 25, 2018

Collaborator

Before I can even propose anything, need to understand the options. Don't think I'm done with that yet 😃

We also need to understand what is the user-case scenario we are targeting.

This is where we should probably focus first.
A user wants to send a batch of messages using await sender.SendAsync(IList<Message>);

Every message in the collection will go via registered plugins and only after that sent over the wire (AmqpMessageConverter.BatchSBMessagesAsAmqpMessage(IList<Message>)). Assuming there are no plugins attached, collection of messages might exceed the maximum batch size. For scenarios like this, users would want to "chunk" the batch into multiple batches to ensure operation doesn't fail.

Plugin route would not work here since outgoing pluins are applied per-message base, not the whole collection.

Collaborator

SeanFeldman commented Apr 25, 2018

Before I can even propose anything, need to understand the options. Don't think I'm done with that yet 😃

We also need to understand what is the user-case scenario we are targeting.

This is where we should probably focus first.
A user wants to send a batch of messages using await sender.SendAsync(IList<Message>);

Every message in the collection will go via registered plugins and only after that sent over the wire (AmqpMessageConverter.BatchSBMessagesAsAmqpMessage(IList<Message>)). Assuming there are no plugins attached, collection of messages might exceed the maximum batch size. For scenarios like this, users would want to "chunk" the batch into multiple batches to ensure operation doesn't fail.

Plugin route would not work here since outgoing pluins are applied per-message base, not the whole collection.

@nemakam

This comment has been minimized.

Show comment
Hide comment
@nemakam

nemakam Apr 25, 2018

Member

One thing to keep in mind is that we can always open up more plugin entry points in the code. If we want one more point that gets called at some other point, we can always add a new method. The whole reason why ServiceBusPlugin was made abstract.

If the use case we are targeting is being able to send a list of messages, then we can also do the way event hub does. Make MessageBatch as a first class citizen and expose TryAdd(Message msg).
Need to make sure each of these messages should go through BeforeSend() part of the plugin before getting added to the batch. This batch should be documented as 'immutable'.

Member

nemakam commented Apr 25, 2018

One thing to keep in mind is that we can always open up more plugin entry points in the code. If we want one more point that gets called at some other point, we can always add a new method. The whole reason why ServiceBusPlugin was made abstract.

If the use case we are targeting is being able to send a list of messages, then we can also do the way event hub does. Make MessageBatch as a first class citizen and expose TryAdd(Message msg).
Need to make sure each of these messages should go through BeforeSend() part of the plugin before getting added to the batch. This batch should be documented as 'immutable'.

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Apr 26, 2018

Collaborator

@nemakam I presume you're referring to EventHub's EventDataBatch. I like that idea of immutability and TryAdd(Message). There's more to to ask about thsi approach:

  1. How would the client know what's the batch size limit?
  2. ServiceBusPlugin evaluates a single message rather than MessageBatch. Whatever the extension will be, it would need to take work side-by-side with all the single message plugins. For example: if an attachment plugin is registered and configured for any message with a body >50KB to go over Storage, that should be respected.
Collaborator

SeanFeldman commented Apr 26, 2018

@nemakam I presume you're referring to EventHub's EventDataBatch. I like that idea of immutability and TryAdd(Message). There's more to to ask about thsi approach:

  1. How would the client know what's the batch size limit?
  2. ServiceBusPlugin evaluates a single message rather than MessageBatch. Whatever the extension will be, it would need to take work side-by-side with all the single message plugins. For example: if an attachment plugin is registered and configured for any message with a body >50KB to go over Storage, that should be respected.
@nemakam

This comment has been minimized.

Show comment
Hide comment
@nemakam

nemakam Apr 26, 2018

Member
  1. When an amqp link is created between client and the entity (gateway), the gateway provides the max message size as a link property. Client already uses this to decide when to throw MessageSizeExceededException
  2. So the thing I was proposing is batch.TryAdd(msg) will first send the message through all the plugins, and then use the final mutated message and tries to add it to the batch. Basically, ensure that no mutation happens after the message has been added.

The design still needs to thought over. Couple of pain points:

  1. The Batch should take the MessageSender in the constructor. Else it wouldn't know what the batch size is.
  2. Lets say the message went through attachment-plugin. What if the TryAdd() fails? How do we tell the attachment plugin to delete the attachment? Also, do we expect the plugin to undo the attachment? If that is an immediate requirement, then we need to somehow work on Revert semantics for the plugin.
Member

nemakam commented Apr 26, 2018

  1. When an amqp link is created between client and the entity (gateway), the gateway provides the max message size as a link property. Client already uses this to decide when to throw MessageSizeExceededException
  2. So the thing I was proposing is batch.TryAdd(msg) will first send the message through all the plugins, and then use the final mutated message and tries to add it to the batch. Basically, ensure that no mutation happens after the message has been added.

The design still needs to thought over. Couple of pain points:

  1. The Batch should take the MessageSender in the constructor. Else it wouldn't know what the batch size is.
  2. Lets say the message went through attachment-plugin. What if the TryAdd() fails? How do we tell the attachment plugin to delete the attachment? Also, do we expect the plugin to undo the attachment? If that is an immediate requirement, then we need to somehow work on Revert semantics for the plugin.
@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Apr 26, 2018

Collaborator

So a batch would be instantiated independently from MessageSender, and MessageSender would need to be injected into the Batch. Not ideal, but let's assume we go with that.

What if the TryAdd() fails? How do we tell the attachment plugin to delete the attachment?

image
(2nd att1 should be att3)

Assumptions about ServiceBusPlugin was that its BeforeMessageSend is invoked just before message is sent out. With the idea of a batch, we're using "store and forward" kind of concept, where we build up a batch first and only then send messages. The attachment plugin doesn't perform any cleanup today. It could, if the client API have a method to implement to revert the work done for the send operation.

Putting customer hat on, I could live with a blob that is created and not utilized. I don't like the idea of reverting as it would mean double charge for a message that wasn't even sent. First to store the payload in Blob storage and second to remove it. As Attachment plugin implementor I also cannot guarantee that Revert would be successful. At most it would be a best effort with a Result returned to the MessageSender.

Collaborator

SeanFeldman commented Apr 26, 2018

So a batch would be instantiated independently from MessageSender, and MessageSender would need to be injected into the Batch. Not ideal, but let's assume we go with that.

What if the TryAdd() fails? How do we tell the attachment plugin to delete the attachment?

image
(2nd att1 should be att3)

Assumptions about ServiceBusPlugin was that its BeforeMessageSend is invoked just before message is sent out. With the idea of a batch, we're using "store and forward" kind of concept, where we build up a batch first and only then send messages. The attachment plugin doesn't perform any cleanup today. It could, if the client API have a method to implement to revert the work done for the send operation.

Putting customer hat on, I could live with a blob that is created and not utilized. I don't like the idea of reverting as it would mean double charge for a message that wasn't even sent. First to store the payload in Blob storage and second to remove it. As Attachment plugin implementor I also cannot guarantee that Revert would be successful. At most it would be a best effort with a Result returned to the MessageSender.

@nemakam

This comment has been minimized.

Show comment
Hide comment
@nemakam

nemakam Apr 26, 2018

Member

MessageSender would need to be injected into the Batch

Could you explain this? We need the message sender before we add any message into the batch.

I could live with a blob that is created and not utilized.

I'm not so sure here. Once in a while if it happens, its fine. But I believe customers will basically keep on adding messages till the batch size has reached. That would mean for every batch that is sent to ASB, one attachment is leaked. So I am not in favor of leaking.
There are definitely some different things we can do here.
Force the developers of the plugin to make their functions idempotent. Let the attachment sit in the cloud and customer will basically get that post-attachment message back. The customer can just create a new batch and add this message there. The attachment plugin in this case wouldn't really do anything as the size is small. Just expose some method on the plugin/message that would allow the customer to retrieve the message back in case he wants to.

Or worst case, we can always say attachment plugin is not supported with batching. Batching is to be used when you are super critical of performance. Having attachment plugin isnt really in the same use case scenario. So I think it would not be that bad to say the plugin is not supported with batching.

Member

nemakam commented Apr 26, 2018

MessageSender would need to be injected into the Batch

Could you explain this? We need the message sender before we add any message into the batch.

I could live with a blob that is created and not utilized.

I'm not so sure here. Once in a while if it happens, its fine. But I believe customers will basically keep on adding messages till the batch size has reached. That would mean for every batch that is sent to ASB, one attachment is leaked. So I am not in favor of leaking.
There are definitely some different things we can do here.
Force the developers of the plugin to make their functions idempotent. Let the attachment sit in the cloud and customer will basically get that post-attachment message back. The customer can just create a new batch and add this message there. The attachment plugin in this case wouldn't really do anything as the size is small. Just expose some method on the plugin/message that would allow the customer to retrieve the message back in case he wants to.

Or worst case, we can always say attachment plugin is not supported with batching. Batching is to be used when you are super critical of performance. Having attachment plugin isnt really in the same use case scenario. So I think it would not be that bad to say the plugin is not supported with batching.

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Apr 26, 2018

Collaborator

Could you explain this? We need the message sender before we add any message into the batch.

Yes. Internally, a Batch has to have access to the sender to know size limit, doesn't it? Therefore sender.CreateBatch() would create the batch with a link to the sender? Perhaps I didn't quite understand your idea.

I'm not so sure here. Once in a while if it happens, its fine. But I believe customers will basically keep on adding messages till the batch size has reached. That would mean for every batch that is sent to ASB, one attachment is leaked. So I am not in favor of leaking.

You're making an assumption that users will add while they can. Depends on the scenario. Some scenarios you just have a finate number of messages and you'd be sending those only (i.e. not a stream). But I get your concern which is valid.

I like the idempotency idea. If a message has gone through a attachment plugin and has already saved the payload in Storage, the message will contain all the meta data in the headers. Therefore re-processing by attachment plugin could be shorcutted.

Or worst case, we can always say attachment plugin is not supported with batching.

I'd like to avoid that if possible. Sending batches of messages with payloads that could exceed max size is not so exceptional. Saying that, you're 100% right about reminding that "Batching is to be used when you are super critical of performance.". 👍

So I think it would not be that bad to say the plugin is not supported with batching.

I'll make sure it will be 🙂 If not right away, then eventually.

So it feels like a good candidate path for batches. Once you address my question, feels like it would be possible to get a PR going.

Collaborator

SeanFeldman commented Apr 26, 2018

Could you explain this? We need the message sender before we add any message into the batch.

Yes. Internally, a Batch has to have access to the sender to know size limit, doesn't it? Therefore sender.CreateBatch() would create the batch with a link to the sender? Perhaps I didn't quite understand your idea.

I'm not so sure here. Once in a while if it happens, its fine. But I believe customers will basically keep on adding messages till the batch size has reached. That would mean for every batch that is sent to ASB, one attachment is leaked. So I am not in favor of leaking.

You're making an assumption that users will add while they can. Depends on the scenario. Some scenarios you just have a finate number of messages and you'd be sending those only (i.e. not a stream). But I get your concern which is valid.

I like the idempotency idea. If a message has gone through a attachment plugin and has already saved the payload in Storage, the message will contain all the meta data in the headers. Therefore re-processing by attachment plugin could be shorcutted.

Or worst case, we can always say attachment plugin is not supported with batching.

I'd like to avoid that if possible. Sending batches of messages with payloads that could exceed max size is not so exceptional. Saying that, you're 100% right about reminding that "Batching is to be used when you are super critical of performance.". 👍

So I think it would not be that bad to say the plugin is not supported with batching.

I'll make sure it will be 🙂 If not right away, then eventually.

So it feels like a good candidate path for batches. Once you address my question, feels like it would be possible to get a PR going.

@nemakam

This comment has been minimized.

Show comment
Hide comment
@nemakam

nemakam Apr 27, 2018

Member

Yes. I think we are on the same page.
I was just mentioning that sender.CreateBatch() needs to be the API instead of var batch = new Batch()

Member

nemakam commented Apr 27, 2018

Yes. I think we are on the same page.
I was just mentioning that sender.CreateBatch() needs to be the API instead of var batch = new Batch()

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Apr 27, 2018

Collaborator

👍
@nemakam I'm still not clear about how we'll gather the size of the batch.
Assuming that all messages will go through the plugins (leaving out attachment plugin concerrns), how we'll know the batch size?

Collaborator

SeanFeldman commented Apr 27, 2018

👍
@nemakam I'm still not clear about how we'll gather the size of the batch.
Assuming that all messages will go through the plugins (leaving out attachment plugin concerrns), how we'll know the batch size?

@nemakam

This comment has been minimized.

Show comment
Hide comment
@nemakam

nemakam Apr 27, 2018

Member

I'm still not clear about how we'll gather the size of the batch.

@SeanFeldman Take a look at this:

if (amqpLink.Settings.MaxMessageSize.HasValue)
{
var size = (ulong)amqpMessage.SerializedMessageSize;
if (size > amqpLink.Settings.MaxMessageSize.Value)
{
throw new MessageSizeExceededException(Resources.AmqpMessageSizeExceeded.FormatForUser(amqpMessage.DeliveryId.Value, size, amqpLink.Settings.MaxMessageSize.Value));
}
}

Member

nemakam commented Apr 27, 2018

I'm still not clear about how we'll gather the size of the batch.

@SeanFeldman Take a look at this:

if (amqpLink.Settings.MaxMessageSize.HasValue)
{
var size = (ulong)amqpMessage.SerializedMessageSize;
if (size > amqpLink.Settings.MaxMessageSize.Value)
{
throw new MessageSizeExceededException(Resources.AmqpMessageSizeExceeded.FormatForUser(amqpMessage.DeliveryId.Value, size, amqpLink.Settings.MaxMessageSize.Value));
}
}

@nemakam

This comment has been minimized.

Show comment
Hide comment
@nemakam

nemakam Apr 27, 2018

Member

Also, just to clarify, the "Batch" is no longer a plugin or an extension. It needs to be internal to the library itself.

Member

nemakam commented Apr 27, 2018

Also, just to clarify, the "Batch" is no longer a plugin or an extension. It needs to be internal to the library itself.

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Apr 27, 2018

Collaborator

just to clarify, the "Batch" is no longer a plugin or an extension. It needs to be internal to the library itself.

Correct.

@nemakam the code you refer to in #109 (comment) check if a single message size exceeds the maximum size. My question is not that. I'm asking how will we get the size of a message itself once it has gone through plugins to know if the overall size of the batch is less than the max or not.

Collaborator

SeanFeldman commented Apr 27, 2018

just to clarify, the "Batch" is no longer a plugin or an extension. It needs to be internal to the library itself.

Correct.

@nemakam the code you refer to in #109 (comment) check if a single message size exceeds the maximum size. My question is not that. I'm asking how will we get the size of a message itself once it has gone through plugins to know if the overall size of the batch is less than the max or not.

@nemakam

This comment has been minimized.

Show comment
Hide comment
@nemakam

nemakam Apr 27, 2018

Member

Yeah just went through the code, and it is trickier than I assumed.
Will need to do something fancy like what you had mentioned initially.. :) Back to square one..

Member

nemakam commented Apr 27, 2018

Yeah just went through the code, and it is trickier than I assumed.
Will need to do something fancy like what you had mentioned initially.. :) Back to square one..

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Apr 27, 2018

Collaborator
Collaborator

SeanFeldman commented Apr 27, 2018

@nemakam

This comment has been minimized.

Show comment
Hide comment
@nemakam

nemakam Apr 27, 2018

Member

Haha. You need to tell me if we need batch. That's currently the only known use-case for calculating message size. And given that size(a) + size(b) <> size (a+b), its hard to let users figure it out themselves.

Member

nemakam commented Apr 27, 2018

Haha. You need to tell me if we need batch. That's currently the only known use-case for calculating message size. And given that size(a) + size(b) <> size (a+b), its hard to let users figure it out themselves.

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Apr 27, 2018

Collaborator

Without a batch, the API that would be invoked is sender.Send(IList<Message>). Meaning that by the time all messages are aggregated and passed, the size could already exceed the maximum with the estimates. The benefit of having a batch is that it would guard from a size "overflow". I.e. if you have a flow of messages, you can bundle those into batches that will go through, assuming estimated size is ok.

So I still see a value in having a Batch concept in addition to sender.Send(IList<Message>).

Collaborator

SeanFeldman commented Apr 27, 2018

Without a batch, the API that would be invoked is sender.Send(IList<Message>). Meaning that by the time all messages are aggregated and passed, the size could already exceed the maximum with the estimates. The benefit of having a batch is that it would guard from a size "overflow". I.e. if you have a flow of messages, you can bundle those into batches that will go through, assuming estimated size is ok.

So I still see a value in having a Batch concept in addition to sender.Send(IList<Message>).

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman May 2, 2018

Collaborator

@nemakam yay/nay?

Collaborator

SeanFeldman commented May 2, 2018

@nemakam yay/nay?

@nemakam

This comment has been minimized.

Show comment
Hide comment
@nemakam

nemakam May 2, 2018

Member

If you are asking regarding usefulness of Batch, definitely! I agree..
If you are asking me the best approach to implement this, it needs some time to analyze the entire code path and I haven't done that yet. If you are interested, you should be able to scan through this codebase (sender.sendAsync) and Microsoft.Azure.Amqp library to see what is the best way to solve it.

Member

nemakam commented May 2, 2018

If you are asking regarding usefulness of Batch, definitely! I agree..
If you are asking me the best approach to implement this, it needs some time to analyze the entire code path and I haven't done that yet. If you are interested, you should be able to scan through this codebase (sender.sendAsync) and Microsoft.Azure.Amqp library to see what is the best way to solve it.

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman May 3, 2018

Collaborator

I'm not asking from the implementation POV, but the Batch idea with estimation baked into it.
I'd go with a PR to review and validate the approch. My questions was related to that. Or simply put - should I send a PR?

Collaborator

SeanFeldman commented May 3, 2018

I'm not asking from the implementation POV, but the Batch idea with estimation baked into it.
I'd go with a PR to review and validate the approch. My questions was related to that. Or simply put - should I send a PR?

@nemakam

This comment has been minimized.

Show comment
Hide comment
@nemakam

nemakam May 3, 2018

Member

If you are asking regarding usefulness of Batch, definitely! I agree..

Yup. The idea is good and it will be great if you could ahead and send the PR.

Member

nemakam commented May 3, 2018

If you are asking regarding usefulness of Batch, definitely! I agree..

Yup. The idea is good and it will be great if you could ahead and send the PR.

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman May 3, 2018

Collaborator

Ok. I'll get to it. This will be a WIP PR, so no high expectations 🙂

Collaborator

SeanFeldman commented May 3, 2018

Ok. I'll get to it. This will be a WIP PR, so no high expectations 🙂

@binzywu binzywu added the on hold label Jun 26, 2018

@SeanFeldman SeanFeldman changed the title from Verify Message.Size functionality to Support for safe-size batch Aug 2, 2018

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Aug 2, 2018

Collaborator

Getting a size of an individual message was never a true concern, but a symptom of another problem - sending multiple messages w/o knowing if the whole batch does or does not exceed the maximum size. #538 repalces this issue and PR #539 will introduce a new API to resolve this.

Collaborator

SeanFeldman commented Aug 2, 2018

Getting a size of an individual message was never a true concern, but a symptom of another problem - sending multiple messages w/o knowing if the whole batch does or does not exceed the maximum size. #538 repalces this issue and PR #539 will introduce a new API to resolve this.

@SeanFeldman

This comment has been minimized.

Show comment
Hide comment
@SeanFeldman

SeanFeldman Aug 2, 2018

Collaborator

Closing. See reason above.

Collaborator

SeanFeldman commented Aug 2, 2018

Closing. See reason above.

@SeanFeldman SeanFeldman closed this Aug 2, 2018

@SeanFeldman SeanFeldman changed the title from Support for safe-size batch to Verify Message.Size functionality Aug 2, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment