Skip to content
This repository has been archived by the owner. It is now read-only.

Verify Message.Size functionality #109

Closed
jtaubensee opened this issue Mar 24, 2017 · 39 comments

Comments

Projects
None yet
5 participants
@jtaubensee
Copy link
Contributor

commented Mar 24, 2017

Actual Behavior

  1. I have reason to believe that Message.Size will only calculate the size of a stream, and not the headers.

Expected Behavior

  1. Message.Size should include the size of the headers

Versions

All

@jtaubensee jtaubensee added the bug label Mar 24, 2017

@jtaubensee jtaubensee added this to the 0.0.2-preview milestone Mar 24, 2017

@jtaubensee jtaubensee referenced this issue Mar 24, 2017

Merged

(Brokered)Message refactoring #108

5 of 7 tasks complete
@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented Mar 24, 2017

👍
This is an old pain. The current (we should really find a name to call it) client does a poor job, especially when querying the messages that were not sent yet.

@jtaubensee

This comment has been minimized.

Copy link
Contributor Author

commented Apr 7, 2017

So, after looking into this one a little bit more, we are wondering where would be the best place to put this functionality. I think rather than being a property on the message, this should be a "utility function" somewhere (perhaps on the MessageSender). The reason being is that the size cannot be cached (as the body and properties are mutable), and in order to provide the true message size, we need to convert the message to an AMQP message first. Of course, there is some overhead associated with this, hence the reason to have a method called CalculateMessageSize.

Any thoughts about where this should go? Also, keep in mind that we will throw an exception on the client side before the message is sent if the size is too big. So in most cases, users can just handle the exception on send.

@jtaubensee jtaubensee modified the milestones: 0.0.3-preview, 0.0.4-preview Apr 10, 2017

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented Apr 17, 2017

The message size is important to be assessed prior to sending it. Batching allows up to 100 messages or less than the maximum size for the tier used. If there are less than 100 messages, one has to calculate the overall size of the messages in order to make a conscious decision to split the batch into "chunks".

Utility method sounds off to be frank. The message is the one that should know how to calculate its size, not some method on an arbitrary class that might (or might not) be used later. Yes, calculation of the message time is a perf hit, but it's a necessary evil. And when someone opts into it, they probably do that for a good reason.

To sum it up:

  • No to a utility method on something like MessageSender
  • No to an assumption at what point that calculated size will be needed
  • Yes for the Message to provide this method and proper documentation of what it's doing

If you really oppose the idea of Message providing the calculation, you could implement it as an extension method on the root namespace (along with the Message).

@jtaubensee jtaubensee modified the milestones: 0.0.4-preview, 0.0.5-preview May 2, 2017

@jtaubensee jtaubensee modified the milestones: Backlog, 0.0.6-preview May 25, 2017

@nemakam nemakam referenced this issue Jun 30, 2017

Merged

API documentation + Several minor API changes #200

7 of 7 tasks complete
@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented Jul 26, 2017

@Azure/azure-service-bus-write this hasn't see any action. Should it be addressed?

@poveilleux

This comment has been minimized.

Copy link

commented Aug 21, 2017

Any news on that?

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented Aug 21, 2017

Has been backloged @poveilleux 🙁
What's the use case you're having?

@poveilleux

This comment has been minimized.

Copy link

commented Aug 21, 2017

I am trying to send 500 messages through ITopicClient.SendAsync(IList<Message> messages), but it throws a NotImplementedException saying "MessageSizeExceededException". I tried separating the messages into smaller batches that would fit into the 256 KB constraint per batch.

The problem I have is when I sum the Size property of all my messages, I get a result of 157 KB, but when I call SendAsync with my list of messages, the exception message says I have over 450 KB of data.

My question is: how can I calculate the size of a batch based on the message size?

@poveilleux

This comment has been minimized.

Copy link

commented Sep 15, 2017

@SeanFeldman Are you able to help me with that problem?

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented Sep 15, 2017

@poveilleux I've written a post for the old client that used to serialize message. With this library, you should be able to get away with a much simpler approach:

message total size = Message.Size + Message.UserProperties + Message.SystemProperties

Does that work for you?

@poveilleux

This comment has been minimized.

Copy link

commented Sep 15, 2017

Thank you for providing me with that article. I will test it when I got and I'll let you know if it worked, but I assume it will!

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented Sep 21, 2017

@Azure/azure-service-bus-write could you please point out if there's a flaw in my assumption about message size calculation?

@nemakam

This comment has been minimized.

Copy link
Member

commented Apr 25, 2018

@SeanFeldman
If I understood it correctly, are you proposing that the plugin will do size calculation by serializing the whole message? Can you think of how the API surface would look like?

If we are going away from caching, then the main thing we need to worry about is the explanation part. We need to make the user understand 2 things:

  1. Calculating the message size is not a very straighforward job and requires some CPU cycles.
  2. Sizeof(MessageA + MessageB) is not same as Sizeof(MessageA) + Sizeof(MessageB)

We also need to understand what is the user-case scenario we are targeting.

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented Apr 25, 2018

Before I can even propose anything, need to understand the options. Don't think I'm done with that yet 😃

We also need to understand what is the user-case scenario we are targeting.

This is where we should probably focus first.
A user wants to send a batch of messages using await sender.SendAsync(IList<Message>);

Every message in the collection will go via registered plugins and only after that sent over the wire (AmqpMessageConverter.BatchSBMessagesAsAmqpMessage(IList<Message>)). Assuming there are no plugins attached, collection of messages might exceed the maximum batch size. For scenarios like this, users would want to "chunk" the batch into multiple batches to ensure operation doesn't fail.

Plugin route would not work here since outgoing pluins are applied per-message base, not the whole collection.

@nemakam

This comment has been minimized.

Copy link
Member

commented Apr 25, 2018

One thing to keep in mind is that we can always open up more plugin entry points in the code. If we want one more point that gets called at some other point, we can always add a new method. The whole reason why ServiceBusPlugin was made abstract.

If the use case we are targeting is being able to send a list of messages, then we can also do the way event hub does. Make MessageBatch as a first class citizen and expose TryAdd(Message msg).
Need to make sure each of these messages should go through BeforeSend() part of the plugin before getting added to the batch. This batch should be documented as 'immutable'.

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented Apr 26, 2018

@nemakam I presume you're referring to EventHub's EventDataBatch. I like that idea of immutability and TryAdd(Message). There's more to to ask about thsi approach:

  1. How would the client know what's the batch size limit?
  2. ServiceBusPlugin evaluates a single message rather than MessageBatch. Whatever the extension will be, it would need to take work side-by-side with all the single message plugins. For example: if an attachment plugin is registered and configured for any message with a body >50KB to go over Storage, that should be respected.
@nemakam

This comment has been minimized.

Copy link
Member

commented Apr 26, 2018

  1. When an amqp link is created between client and the entity (gateway), the gateway provides the max message size as a link property. Client already uses this to decide when to throw MessageSizeExceededException
  2. So the thing I was proposing is batch.TryAdd(msg) will first send the message through all the plugins, and then use the final mutated message and tries to add it to the batch. Basically, ensure that no mutation happens after the message has been added.

The design still needs to thought over. Couple of pain points:

  1. The Batch should take the MessageSender in the constructor. Else it wouldn't know what the batch size is.
  2. Lets say the message went through attachment-plugin. What if the TryAdd() fails? How do we tell the attachment plugin to delete the attachment? Also, do we expect the plugin to undo the attachment? If that is an immediate requirement, then we need to somehow work on Revert semantics for the plugin.
@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented Apr 26, 2018

So a batch would be instantiated independently from MessageSender, and MessageSender would need to be injected into the Batch. Not ideal, but let's assume we go with that.

What if the TryAdd() fails? How do we tell the attachment plugin to delete the attachment?

image
(2nd att1 should be att3)

Assumptions about ServiceBusPlugin was that its BeforeMessageSend is invoked just before message is sent out. With the idea of a batch, we're using "store and forward" kind of concept, where we build up a batch first and only then send messages. The attachment plugin doesn't perform any cleanup today. It could, if the client API have a method to implement to revert the work done for the send operation.

Putting customer hat on, I could live with a blob that is created and not utilized. I don't like the idea of reverting as it would mean double charge for a message that wasn't even sent. First to store the payload in Blob storage and second to remove it. As Attachment plugin implementor I also cannot guarantee that Revert would be successful. At most it would be a best effort with a Result returned to the MessageSender.

@nemakam

This comment has been minimized.

Copy link
Member

commented Apr 26, 2018

MessageSender would need to be injected into the Batch

Could you explain this? We need the message sender before we add any message into the batch.

I could live with a blob that is created and not utilized.

I'm not so sure here. Once in a while if it happens, its fine. But I believe customers will basically keep on adding messages till the batch size has reached. That would mean for every batch that is sent to ASB, one attachment is leaked. So I am not in favor of leaking.
There are definitely some different things we can do here.
Force the developers of the plugin to make their functions idempotent. Let the attachment sit in the cloud and customer will basically get that post-attachment message back. The customer can just create a new batch and add this message there. The attachment plugin in this case wouldn't really do anything as the size is small. Just expose some method on the plugin/message that would allow the customer to retrieve the message back in case he wants to.

Or worst case, we can always say attachment plugin is not supported with batching. Batching is to be used when you are super critical of performance. Having attachment plugin isnt really in the same use case scenario. So I think it would not be that bad to say the plugin is not supported with batching.

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented Apr 26, 2018

Could you explain this? We need the message sender before we add any message into the batch.

Yes. Internally, a Batch has to have access to the sender to know size limit, doesn't it? Therefore sender.CreateBatch() would create the batch with a link to the sender? Perhaps I didn't quite understand your idea.

I'm not so sure here. Once in a while if it happens, its fine. But I believe customers will basically keep on adding messages till the batch size has reached. That would mean for every batch that is sent to ASB, one attachment is leaked. So I am not in favor of leaking.

You're making an assumption that users will add while they can. Depends on the scenario. Some scenarios you just have a finate number of messages and you'd be sending those only (i.e. not a stream). But I get your concern which is valid.

I like the idempotency idea. If a message has gone through a attachment plugin and has already saved the payload in Storage, the message will contain all the meta data in the headers. Therefore re-processing by attachment plugin could be shorcutted.

Or worst case, we can always say attachment plugin is not supported with batching.

I'd like to avoid that if possible. Sending batches of messages with payloads that could exceed max size is not so exceptional. Saying that, you're 100% right about reminding that "Batching is to be used when you are super critical of performance.". 👍

So I think it would not be that bad to say the plugin is not supported with batching.

I'll make sure it will be 🙂 If not right away, then eventually.

So it feels like a good candidate path for batches. Once you address my question, feels like it would be possible to get a PR going.

@nemakam

This comment has been minimized.

Copy link
Member

commented Apr 27, 2018

Yes. I think we are on the same page.
I was just mentioning that sender.CreateBatch() needs to be the API instead of var batch = new Batch()

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented Apr 27, 2018

👍
@nemakam I'm still not clear about how we'll gather the size of the batch.
Assuming that all messages will go through the plugins (leaving out attachment plugin concerrns), how we'll know the batch size?

@nemakam

This comment has been minimized.

Copy link
Member

commented Apr 27, 2018

I'm still not clear about how we'll gather the size of the batch.

@SeanFeldman Take a look at this:

if (amqpLink.Settings.MaxMessageSize.HasValue)
{
var size = (ulong)amqpMessage.SerializedMessageSize;
if (size > amqpLink.Settings.MaxMessageSize.Value)
{
throw new MessageSizeExceededException(Resources.AmqpMessageSizeExceeded.FormatForUser(amqpMessage.DeliveryId.Value, size, amqpLink.Settings.MaxMessageSize.Value));
}
}

@nemakam

This comment has been minimized.

Copy link
Member

commented Apr 27, 2018

Also, just to clarify, the "Batch" is no longer a plugin or an extension. It needs to be internal to the library itself.

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented Apr 27, 2018

just to clarify, the "Batch" is no longer a plugin or an extension. It needs to be internal to the library itself.

Correct.

@nemakam the code you refer to in #109 (comment) check if a single message size exceeds the maximum size. My question is not that. I'm asking how will we get the size of a message itself once it has gone through plugins to know if the overall size of the batch is less than the max or not.

@nemakam

This comment has been minimized.

Copy link
Member

commented Apr 27, 2018

Yeah just went through the code, and it is trickier than I assumed.
Will need to do something fancy like what you had mentioned initially.. :) Back to square one..

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented Apr 27, 2018

@nemakam

This comment has been minimized.

Copy link
Member

commented Apr 27, 2018

Haha. You need to tell me if we need batch. That's currently the only known use-case for calculating message size. And given that size(a) + size(b) <> size (a+b), its hard to let users figure it out themselves.

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented Apr 27, 2018

Without a batch, the API that would be invoked is sender.Send(IList<Message>). Meaning that by the time all messages are aggregated and passed, the size could already exceed the maximum with the estimates. The benefit of having a batch is that it would guard from a size "overflow". I.e. if you have a flow of messages, you can bundle those into batches that will go through, assuming estimated size is ok.

So I still see a value in having a Batch concept in addition to sender.Send(IList<Message>).

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented May 2, 2018

@nemakam yay/nay?

@nemakam

This comment has been minimized.

Copy link
Member

commented May 2, 2018

If you are asking regarding usefulness of Batch, definitely! I agree..
If you are asking me the best approach to implement this, it needs some time to analyze the entire code path and I haven't done that yet. If you are interested, you should be able to scan through this codebase (sender.sendAsync) and Microsoft.Azure.Amqp library to see what is the best way to solve it.

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented May 3, 2018

I'm not asking from the implementation POV, but the Batch idea with estimation baked into it.
I'd go with a PR to review and validate the approch. My questions was related to that. Or simply put - should I send a PR?

@nemakam

This comment has been minimized.

Copy link
Member

commented May 3, 2018

If you are asking regarding usefulness of Batch, definitely! I agree..

Yup. The idea is good and it will be great if you could ahead and send the PR.

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented May 3, 2018

Ok. I'll get to it. This will be a WIP PR, so no high expectations 🙂

@binzywu binzywu added the on hold label Jun 26, 2018

@SeanFeldman SeanFeldman changed the title Verify Message.Size functionality Support for safe-size batch Aug 2, 2018

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented Aug 2, 2018

Getting a size of an individual message was never a true concern, but a symptom of another problem - sending multiple messages w/o knowing if the whole batch does or does not exceed the maximum size. #538 repalces this issue and PR #539 will introduce a new API to resolve this.

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator

commented Aug 2, 2018

Closing. See reason above.

@SeanFeldman SeanFeldman closed this Aug 2, 2018

@SeanFeldman SeanFeldman changed the title Support for safe-size batch Verify Message.Size functionality Aug 2, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.