Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom manage the body to avoid copying #19996

Merged
merged 6 commits into from
Apr 3, 2021

Conversation

danielmarbach
Copy link
Contributor

@danielmarbach danielmarbach commented Mar 31, 2021

Alternative to #19985

This checklist is used to make sure that common guidelines for a pull request are followed.

  • Please open PR in Draft mode if it is:
    • Work in progress or not intended to be merged.
    • Encountering multiple pipeline failures and working on fixes.
  • If an SDK is being regenerated based on a new swagger spec, a link to the pull request containing these swagger spec changes has been included above.
  • I have read the contribution guidelines.
  • The pull request does not introduce breaking changes.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

SDK Generation Guidelines

  • The generate.cmd file for the SDK has been updated with the version of AutoRest, as well as the commitid of your swagger spec or link to the swagger spec, used to generate the code. (Track 2 only)
  • The *.csproj and AssemblyInfo.cs files have been updated with the new version of the SDK. Please double check nuget.org current release version.

Additional management plane SDK specific contribution checklist:

Note: Only applies to Microsoft.Azure.Management.[RP] or Azure.ResourceManager.[RP]

  • Include updated management metadata.
  • Update AzureRP.props to add/remove version info to maintain up to date API versions.

Management plane SDK Troubleshooting

  • If this is very first SDK for a services and you are adding new service folders directly under /SDK, please add new service label and/or contact assigned reviewer.
  • If the check fails at the Verify Code Generation step, please ensure:
    • Do not modify any code in generated folders.
    • Do not selectively include/remove generated files in the PR.
    • Do use generate.ps1/cmd to generate this PR instead of calling autorest directly.
      Please pay attention to the @microsoft.csharp version output after running generate.ps1. If it is lower than current released version (2.3.82), please run it again as it should pull down the latest version,

Old outstanding PR cleanup

Please note:
If PRs (including draft) has been out for more than 60 days and there are no responses from our query or followups, they will be closed to maintain a concise list for our reviewers.

@ghost ghost added Service Bus customer-reported Issues that are reported by GitHub users external to the Azure organization. labels Mar 31, 2021
@ghost
Copy link

ghost commented Mar 31, 2021

Thank you for your contribution @danielmarbach! We will review the pull request and get back to you soon.

@ghost ghost added the Community Contribution Community members are working on the issue label Mar 31, 2021
@@ -140,7 +140,7 @@ public BinaryData Body
get => AmqpMessage.GetBody();
set
{
AmqpMessage.Body = new AmqpMessageBody(new ReadOnlyMemory<byte>[] { value });
AmqpMessage.Body = new AmqpMessageBody(new BodyMemory(new ReadOnlyMemory<byte>[] { value }));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direct body modifications from outside are not yet covered. I wanted to first see what you think about this

@danielmarbach
Copy link
Contributor Author

This locally passed all Receiver, Sender and Processor tests. It is still missing to address the direct modifications of the underlying annotated message

@danielmarbach
Copy link
Contributor Author

This would also account for the more lazy memory allocation for the sender path when the body of the sent message is actually required vs the receive path where we anyway always need to copy but with this change we would copy once only

@@ -147,7 +94,7 @@ public static BinaryData GetBody(this AmqpAnnotatedMessage message)
{
if (message.Body.TryGetData(out IEnumerable<ReadOnlyMemory<byte>> dataBody))
{
return dataBody.ConvertAndFlattenData();
return BinaryData.FromBytes((BodyMemory)dataBody);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course this is a bit brute force right now especially since users can smuggle in stuff, but I figured it is good enough to get the conversation started even if it makes some test bomb with an InvalidCastException

Copy link
Member

@JoshLove-msft JoshLove-msft Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if users directly modify the AmqpAnnotatedMessage, this would not be BodyMemory, but I think this approach will work for the vast majority of cases.

@jsquire
Copy link
Member

jsquire commented Apr 1, 2021

It's an interesting approach. I'm curious as to how you would bridge the gap for the case where the AmqpAnnotatedMessage is manipulated. I suspect that it may have something to do with type sniffing. If it's BodyMemory then we know it hasn't been directly accessed otherwise we can assume that it was?

@jsquire
Copy link
Member

jsquire commented Apr 1, 2021

I was thinking about this scenario a bunch, and wanted to bounce some of those thoughts.

Assumptions

  • Creation of the AmqpAnnotatedMessage is owned by the Service Bus message types; it is not possible for callers to create them directly.

  • All access to the AmqpAnnotatedMessage must run through the GetRawAmqpMessage method on the Service Bus message types.

  • Both Service Bus message types allow access to the AmqpAnnotatedMessage and must account for mutation.

  • Once GetRawAmqpMessage is called, developers may keep a reference to the AmqpAnnotatedMessage and manipulate it without the Service Bus message types being aware.

  • We expect the vast majority of developers (90%+) will never call GetRawAmqpMessage and will instead read the Body property of the Service Bus Message types.

Possible Approach

Since the message types create the AmqpAnnotatedMessage and access is only possible through the GetRawAmqpMessage method, then we can be sure that the underlying AMQP message was not manipulated if GetRawAmqpMessage was not called. Daniel's previous proposal of using a Lazy<T> to cache the body (or even eagerly creating the body) would work in the majority use case. Doing so would allow the Body property to avoid unnecessary allocations and work when accessed repeatedly. In our 90%+ case, we'd always be able to use the cached value and never have to recompute.

It's only when GetRawAmqpMessage is called that we enter a state where we cannot safely use the cached value. Since developers could mutate the raw body repeatedly without our knowledge, we would lose our ability to repopulate and cache, having to use the current approach of rebuilding the Body each time the property is accessed. This would impact the edge case for advanced callers, but a type/reference sniffing approach like what I think Daniel is proposing here would potentially help to offset.

As a very rough sketch, I'm thinking something like:

public class ServiceBusMessage
{
    private BinaryData _cachedBody = null;
    private bool _rawMessageAccessed = false;

    public BinaryData Body
    {
       get
       {
            if (!_rawMessageAccessed)
            {
                return _cachedBody ??= ReadDataBody();
            }

           return ReadDataBody();
       }
       set
       {
           WriteDataBody(value);
            
           if (!_rawMessageAccessed)
           {
               _cachedBody = value;
           }
       }
    }

    public AmqpAnnotatedMessage GetRawAmqpMessage()
    {
        _rawMessageAccessed = true;
        _cachedBody = null;
        
        return AmqpMessage;
    }

    private BinaryData ReadDataBody()
    {
        // ... code here
    }

    private void WriteDataBody()
    {
        // ... code here
    }
}

Considerations

  • We're still unable to detect changes in the AmqpAnnotatedMessage and react; once accessed, we cannot make assumptions about its state.

  • The advanced scenario when developers are manipulating the AMQP message directly would still offer a misleading Body accessor.

@JoshLove-msft
Copy link
Member

When users do modify the body through AmqpAnnotatedMessage, I think we could still cache the underlying data reference and thereby avoid recomputing.

@jsquire
Copy link
Member

jsquire commented Apr 1, 2021

When users do modify the body through AmqpAnnotatedMessage, I think we could still cache the underlying data reference and thereby avoid recomputing.

We'd need a way to detect the change. I agree it's possible and worth doing if we have a design that isn't overly complex or high effort. Given the very small set of developers that would fall into this scenario, I question if it's worth pursing anything complicated.

@danielmarbach
Copy link
Contributor Author

Or maybe just bypass any type of optimizations for those seldom used scenarios given that basically anything is mutable there at any point I'm wondering if we can safely discover the cases. I'll think about this a bit more after my easter break or if I have a moment of motivation even during

@JoshLove-msft
Copy link
Member

When users do modify the body through AmqpAnnotatedMessage, I think we could still cache the underlying data reference and thereby avoid recomputing.

We'd need a way to detect the change. I agree it's possible and worth doing if we have a design that isn't overly complex or high effort. Given the very small set of developers that would fall into this scenario, I question if it's worth pursing anything complicated.

Yeah I agree it may not be worth it, but I think the way to detect the change would be checking if the references are equal, no?

@jsquire
Copy link
Member

jsquire commented Apr 1, 2021

When users do modify the body through AmqpAnnotatedMessage, I think we could still cache the underlying data reference and thereby avoid recomputing.

We'd need a way to detect the change. I agree it's possible and worth doing if we have a design that isn't overly complex or high effort. Given the very small set of developers that would fall into this scenario, I question if it's worth pursing anything complicated.

Yeah I agree it may not be worth it, but I think the way to detect the change would be checking if the references are equal,

Potentially, but if there's a list or other backing store for the IEnumerable passed into the AmqpMessageBody, we can't assume that the reference changes. Callers may hold the list and just manipulate it in place without replacing the Body of the AmqpAnnotatedMessage.

@JoshLove-msft
Copy link
Member

When users do modify the body through AmqpAnnotatedMessage, I think we could still cache the underlying data reference and thereby avoid recomputing.

We'd need a way to detect the change. I agree it's possible and worth doing if we have a design that isn't overly complex or high effort. Given the very small set of developers that would fall into this scenario, I question if it's worth pursing anything complicated.

Yeah I agree it may not be worth it, but I think the way to detect the change would be checking if the references are equal,

Potentially, but if there's a list or other backing store for the IEnumerable passed into the AmqpMessageBody, we can't assume that the reference changes. Callers may hold the list and just manipulate it in place without replacing the Body of the AmqpAnnotatedMessage.

Right, but we would still need to apply the same conversion.

@jsquire
Copy link
Member

jsquire commented Apr 1, 2021

Potentially, but if there's a list or other backing store for the IEnumerable passed into the AmqpMessageBody, we can't assume that the reference changes. Callers may hold the list and just manipulate it in place without replacing the Body of the AmqpAnnotatedMessage.

Right, but we would still need to apply the same conversion.

I may not be following... unless you're thinking of doing a multi-level check, looking first at the IEnumerable itself and if that hasn't changed then inspecting each bucket of the enumerable for changes, I'm not sure that you'd be able to tell. Am I overlooking what you're actually suggesting? I feel like I probably am.

Just because I'm thinking of it, the multi-level approach is interesting and may work, but runs afoul of the "don't assume that it is safe to enumerate an IEnumerable multiple times" guidance.

@JoshLove-msft
Copy link
Member

Potentially, but if there's a list or other backing store for the IEnumerable passed into the AmqpMessageBody, we can't assume that the reference changes. Callers may hold the list and just manipulate it in place without replacing the Body of the AmqpAnnotatedMessage.

Right, but we would still need to apply the same conversion.

I may not be following... unless you're thinking of doing a multi-level check, looking first at the IEnumerable itself and if that hasn't changed then inspecting each bucket of the enumerable for changes, I'm not sure that you'd be able to tell. Am I overlooking what you're actually suggesting? I feel like I probably am.

Just because I'm thinking of it, the multi-level approach is interesting and may work, but runs afoul of the "don't assume that it is safe to enumerate an IEnumerable multiple times" guidance.

Well if the underlying buffer changes values (but the reference remains the same), those changes would be reflected in the BinaryData instance provided that there was only one Data section. If there were multiple data sections, then we would still need to recombine them.

@danielmarbach
Copy link
Contributor Author

danielmarbach commented Apr 2, 2021

            await using var serviceBusClient = new ServiceBusClient(connectionString, new ServiceBusClientOptions
            {
                RetryOptions = new ServiceBusRetryOptions
                {
                    TryTimeout = TimeSpan.FromSeconds(60)
                }
            });

            await using var sender = serviceBusClient.CreateSender(destination);
            var messages = new List<ServiceBusMessage>(10000);
            for (int i = 0; i < 10000; i++)
            {
                messages.Add(new ServiceBusMessage(UTF8.GetBytes($"Deep Dive {i} Deep Dive {i} Deep Dive {i} Deep Dive {i} Deep Dive {i} Deep Dive {i}")));

                if (i % 1000 == 0)
                {
                    await sender.SendMessagesAsync(messages);
                    messages.Clear();
                }
            }

            await sender.SendMessagesAsync(messages);

            WriteLine("Message sent");
            Console.WriteLine("Take snapshot");
            Console.ReadLine();

            var countDownEvent = new CountdownEvent(10000);

            var processorOptions = new ServiceBusProcessorOptions
            {
                AutoCompleteMessages = false,
                MaxConcurrentCalls = 100,
                MaxAutoLockRenewalDuration = TimeSpan.FromMinutes(10),
                ReceiveMode = ServiceBusReceiveMode.PeekLock,
            };

            await using var receiver = serviceBusClient.CreateProcessor(destination, processorOptions);
            receiver.ProcessMessageAsync += async messageEventArgs =>
            {
                var message = messageEventArgs.Message;
                await Out.WriteLineAsync(
                    $"Received message with '{message.MessageId}' and content '{UTF8.GetString(message.Body)}' / binary {message.Body}");
                await messageEventArgs.CompleteMessageAsync(message);
                countDownEvent.Signal();
            };
            receiver.ProcessErrorAsync += async errorEventArgs =>
            {
                await Out.WriteLineAsync($"Exception: {errorEventArgs.Exception}");
                await Out.WriteLineAsync($"FullyQualifiedNamespace: {errorEventArgs.FullyQualifiedNamespace}");
                await Out.WriteLineAsync($"ErrorSource: {errorEventArgs.ErrorSource}");
                await Out.WriteLineAsync($"EntityPath: {errorEventArgs.EntityPath}");
            };

            await receiver.StartProcessingAsync();

            countDownEvent.Wait();

            Console.WriteLine("Take snapshot");
            Console.ReadLine();

            await receiver.StopProcessingAsync();

I deliberately assessed the body twice to demonstrate the effect of the changes. So also the allocations of GetBytArray are gone

Before

image

After

image


public static BodyMemory From(IEnumerable<ReadOnlyMemory<byte>> segments)
{
return segments as BodyMemory ?? new BodyMemory(segments);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the user smuggled in IEnumerable<ReadOnlyMemory<byte>> over the lower level APIs it will be freshly wrapped and materialized when needed.

@danielmarbach
Copy link
Contributor Author

danielmarbach commented Apr 2, 2021

I think this looks promising #19996 (comment)

So with these changes, the body is materialized ones on the receive path and no longer redundantly copied. On the send path it is only materialized when the body is accessed but also only once.

@danielmarbach danielmarbach marked this pull request as ready for review April 2, 2021 19:04
@danielmarbach
Copy link
Contributor Author

Can also do a run with single body access if you want

@JoshLove-msft
Copy link
Member

Can also do a run with single body access if you want

That would be great! Out of curiosity, is that dotTrace?

@danielmarbach
Copy link
Contributor Author

Single body access run

Before

image

After

Hasn't changed (of course what is missing in the picture above though is this)

image

Although they are not relevant in comparison to the byte array reduction IMHO

@danielmarbach
Copy link
Contributor Author

Out of curiosity, is that dotTrace?

DotMemory

@JoshLove-msft
Copy link
Member

/azp run net - servicebus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@danielmarbach
Copy link
Contributor Author

🎆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. Service Bus
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants