Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Binary Xml (XmlDictionaryWriter) performance #71478

Merged
merged 30 commits into from Apr 1, 2023

Conversation

Daniel-Svensson
Copy link
Contributor

@Daniel-Svensson Daniel-Svensson commented Jun 30, 2022

Improves performance the binary XmlDictionaryWriter mainly by using Span and Unsafe

  • Do larger writes using Unsafe instead of writing bytes at a time and avoid bounds checks for each byte written
  • Use Span instead of pointers in a few places
    • For array writes the usage of Spans means that it is no longer necessary to do extra copies separate byte arrays
  • Remove allocations when writing Guids
  • Behaviour change: 32bit floats (and doubles) will no longer be encoded as 64bit (or 32bit) int. Instead it will only ecode it as an int if it saves space.

Other:

  • I did not find the tests for the Binary XmlDictionaryWriter so instead I did some manual tests to ensure double and long round tripped as well as ran OpenRiaServices tests (which includes tests roundtriping classes with most property types with the changes on the server
  • update 2022-07-25: added tests cases for binary xml
  • I considered adding a tests to verify that (float)(1L << 63) is serialized as float instead of long, if you think it should be added then please advice where such a test should be written.
  • I also did som experriments just using Span and Cast instead of Unsafe but the improvements was smaller compared to using Unsafe (ratio 0.78 for span and 0.54 for Unsafe for calling WriteInt64Text directly with memorystream backing).

Benchmarks

"Before" is the official net7 preview.5 bits (with r2r) while After is local release build of System.Xml and "System.Private.DataContract..* dll.

*Benchmarks Code:*

using BenchmarkDotNet.Attributes; 
using System;
using System.Diagnostics;
using System.IO;
using System.Threading;
using System.Threading.Tasks;
using System.Xml;

namespace ConsoleApp1;

[MemoryDiagnoser]
public class BinaryXmlBenchmarks
{
   private XmlDictionaryWriter _writer;
   private Guid _guid = Guid.NewGuid();
   const int ArraySize = 100;
   private Guid[] _guids = Enumerable.Repeat(Guid.NewGuid(), ArraySize).ToArray();
   private Double[] _doubles = Enumerable.Repeat(4.0/3.0, ArraySize).ToArray();

   [GlobalSetup]
   public void Setup()
   {
      _writer = XmlDictionaryWriter.CreateBinaryWriter(new NullWriteStream());
      _writer.WriteStartDocument();
      _writer.WriteStartElement("root");

      var asm = _writer.GetType().Assembly;
      var version = FileVersionInfo.GetVersionInfo(asm.Location);

      Console.WriteLine($"Assembly: {asm.Location}");
      Console.WriteLine($"Version: {version.FileVersion}");
   }

   [Benchmark]
   public void WriteInt32()
   {
      _writer.WriteValue(unchecked ((int)0xdeadbeef));
   }

   [Benchmark]
   public void WriteInt64()
   {
      _writer.WriteValue(long.MaxValue);
   }

   [Benchmark]
   public void WriteDouble()
   {
      _writer.WriteValue(1.0 / 3.0);
   }

   [Benchmark]
   public void WriteDecimal()
   {
      _writer.WriteValue(1.2m);
   }

   [Benchmark]
   public void WriteGuid()
   {
      _writer.WriteValue(_guid);
   }

   [Benchmark]
   public void WriteGuidArray()
   {
      _writer.WriteArray(null, "a", null, _guids, 0, _guids.Length);
   }

   [Benchmark]
   public void WriteDoubleArray()
   {
      _writer.WriteArray(null, "a", null, _doubles, 0, _doubles.Length);
   }
}

internal class NullWriteStream : Stream
{
   public override bool CanRead => false;

   public override bool CanSeek => false;

   public override bool CanWrite => true;

   public override long Length => throw new NotSupportedException();

   public override long Position { get => throw new NotSupportedException(); set => throw new NotSupportedException(); }

   public override void Flush() { }

   public override int Read(byte[] buffer, int offset, int count)
   {
      throw new NotSupportedException();
   }

   public override long Seek(long offset, SeekOrigin origin)
   {
      throw new NotSupportedException();
   }

   public override void SetLength(long value)
   {
      throw new NotSupportedException();
   }

   public override void Write(byte[] buffer, int offset, int count) { }

   public override void Write(ReadOnlySpan<byte> buffer) { }

   public override Task WriteAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken)
   {
      return Task.CompletedTask;
   }

   public override void WriteByte(byte value) { }

   public override ValueTask WriteAsync(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken = default)
   {
      return ValueTask.CompletedTask;
   }
}

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 5 5600U with Radeon Graphics, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-preview.5.22307.18
  [Host] : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT
  After  : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT
  7.0-preview5 : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT

MaxRelativeError=0.01  IterationTime=250.0000 ms  
Method Job Mean Error StdDev Median Ratio RatioSD TotalIssues/Op TotalCycles/Op InstructionRetired/Op Timer/Op Gen 0 Allocated
WriteInt32 7.0-preview5 16.26 ns 0.186 ns 0.207 ns 16.21 ns 1.00 0.00 88 29 95 0 - -
WriteInt32 Unsafe 13.55 ns 0.149 ns 0.204 ns 13.52 ns 0.84 0.02 69 24 74 0 - -
WriteInt32 Span 14.06 ns 0.163 ns 0.326 ns 13.99 ns 0.85 0.03 79 25 74 0 -
WriteInt64 7.0-preview5 21.66 ns 0.237 ns 0.446 ns 21.57 ns 1.00 0.00 117 41 126 0 - -
WriteInt64 Unsafe 12.45 ns 0.150 ns 0.125 ns 12.47 ns 0.58 0.01 62 25 66 0 - -
WriteInt64 Span 13.05 ns 0.154 ns 0.183 ns 13.02 ns 0.60 0.01 74 24 69 0 - -
WriteDouble 7.0-preview5 24.43 ns 0.261 ns 0.290 ns 24.35 ns 1.00 0.00 120 47 128 0 - -
WriteDouble Unsafe 14.15 ns 0.162 ns 0.233 ns 14.19 ns 0.58 0.01 68 26 72 0 - -
WriteDouble Span 17.00 ns 0.682 ns 1.958 ns 16.34 ns 0.64 0.04 79 30 74 0 - -
WriteDecimal 7.0-preview5 43.56 ns 0.688 ns 1.918 ns 42.75 ns 1.00 0.00 249 77 270 0 - -
WriteDecimal Unsafe 20.44 ns 0.219 ns 0.234 ns 20.40 ns 0.44 0.01 91 40 96 0 - -
WriteDecimal Span 21.81 ns 0.173 ns 0.200 ns 21.82 ns 0.50 0.01 103 45 97 0 - -
WriteGuid 7.0-preview5 45.66 ns 0.451 ns 0.646 ns 45.69 ns 1.00 0.00 209 89 224 0 0.0047 40 B
WriteGuid* Unsafe* 19.75 ns 0.180 ns 0.159 ns 19.80 ns 0.43 0.01 82 39 86 0 - -
WriteGuid* Span* 16.90 ns 0.189 ns 0.266 ns 16.84 ns 0.38 0.01 88 31 83 0 - -
WriteTimespan 7.0-preview5 22.05 ns 0.219 ns 0.195 ns 22.01 ns 1.00 0.00 119 40 128 0 - -
WriteTimespan Unsafe 12.59 ns 0.150 ns 0.307 ns 12.59 ns 0.56 0.01 61 23 65 0 - -
WriteTimespan Span 14.49 ns 0.167 ns 0.245 ns 14.49 ns 0.65 0.01 72 28 68 0 - -
WriteGuidArray 7.0-preview5 3,417.86 ns 33.230 ns 34.125 ns 3,426.72 ns 1.00 0.00 16,152 6,654 17,066 34 0.4704 4,000 B
WriteGuidArray Unsafe 595.89 ns 5.973 ns 7.767 ns 594.17 ns 0.18 0.00 3,153 1,116 3,414 6 - -
WriteGuidArray Span 614.04 ns 5.926 ns 6.824 ns 612.00 ns 0.18 0.00 3,417 1,167 3,167 6 - -
WriteDoubleArray 7.0-preview5 1,040.62 ns 10.192 ns 10.467 ns 1,039.25 ns 1.00 0.00 5,954 1,896 6,504 10 - -
WriteDoubleArray Unsafe 130.87 ns 1.337 ns 1.830 ns 131.20 ns 0.13 0.00 666 249 715 1 - -
WriteDoubleArray Span 130.76 ns 1.242 ns 1.526 ns 130.69 ns 0.13 0.00 715 249 666 1 - -

Remarks: * The span and unsafe version of WriteGuid and Write...Array are identical so I do not know why there is a timing differecnce.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jun 30, 2022
@@ -624,19 +604,19 @@ public unsafe override void WriteText(string value)
}
}

public unsafe override void WriteText(char[] chars, int offset, int count)
private unsafe void WriteText(Span<char> chars)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try using ReadOnlySpan but then it looked like to many methods called this overload instead of the string version so I used span instead to avoid extra string allocations.

However it might be possibly to merge this implementation with the string based method if using ReadOnlyMemory instead since it allows trying to retrieve the original string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many methods called this overload instead of the string version

That does not sound right. If you have a string and there is a string overload, the compiler should pick the string overload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect might have been a VisualStudio bug, but I did not want to take any chances in case it had anything to do with one method being an override or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the code, will have to decompile the code and have a second look. Visual studio (17.2) still reports many references to the ReadOnlySpan version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unexpected calls are still there in the compiled assembly, should I keep it?
You might know who to involve to find out why it is called instead of the string overload.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this surprises me:
SharpLab
For some reason the WriteText(string) overload being an override is causing it to lose out to the span-based overload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I solved this by creating a separate "implementation metod" which is not an override (I called it WriteTextImpl to follow what seems to be the current naming convention for implementation methods)

@@ -72,6 +75,12 @@ private void WriteTextNode(XmlBinaryNodeType nodeType)
_textNodeOffset = this.BufferOffset - 1;
}

private ref byte GetBufferRef(int size)
Copy link
Member

@jkotas jkotas Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was using safe C# with proper boundschecks. You are switching this to use unsafe code without boundschecks. It is increasing security risk profile of this code.

Can this stay with safe C# and Spans? You should be still able to get nice performance gains by doing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there was quite much unsafe code and points present in the file so I assumed it was ok to be unsafe for the best performance sine GetBuffer does the bounds check.

I've refactored the code so that all new "unsafe" code is now only present in 2 methods to make it much easier to reason about and updated the benchmark with timings for both Unsafe and Span.

The span version does .AsSpan(offset, size) on the buffer and then uses MemoryMarshal.Write<T> which I just foud out about.. I hope that method is considered safe enough and works well with "unaligned writes".

Please have a look and let me know based on the new code and the benchmark results which version you want.
I can push the updated span version instead if that version is the prefered one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: A side effect of moving to the span version is that it will probably be necessary to keep separate implementationa for writing 8bit (and maybe 16bit) nodes in order to not regress performance for them.

Some benchmarks would be required to determine the performance numbers for smaller elements first

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there sufficient test coverage to fully exercise all of these changes and root out any corner-case issues?

Co-authored-by: Stephen Toub <stoub@microsoft.com>
@Daniel-Svensson
Copy link
Contributor Author

Daniel-Svensson commented Jul 8, 2022

I think I fixed all the review comments for now so I think next up it to determine what to do with tests.

Is there sufficient test coverage to fully exercise all of these changes and root out any corner-case issues?

@stephentoub I am a bit unsure about that.

I did not find the tests for the Binary XmlDictionaryWriter so instead I did some manual tests to ensure double and long round tripped as well as ran OpenRiaServices tests (which includes tests roundtriping classes with most property types with the changes on the server

I can look into adding a new tests I found some tests for the text version of dictionary writer I could write some tests for the binary writer (write primitive types and some arrays ?).
I am thinking about serializing some data and then checking the binary representation against expected binary, or do you prefer round trip kind tests (where the same data is read by binary reader)?

update: I've added a couple of testcases so I think relevant scenarios are covered.

@Daniel-Svensson
Copy link
Contributor Author

Daniel-Svensson commented Jul 25, 2022

I've added a couple of tests that I think cover relevant cases.

I was a bit unsure how to handle any potential big endian platform since it is only handled for single integers so the array test as it is now will fail on big endian platforms. it can easily be made to pass but it will not make the binary xml the same for different endiannes (this pr does not make any changes to the representation apart from the fix for floats described in the description).

Daniel-Svensson and others added 2 commits August 16, 2022 12:10
…nOnly/System.Runtime.Serialization.Xml.ReflectionOnly.Tests.csproj
@Daniel-Svensson
Copy link
Contributor Author

Daniel-Svensson commented Aug 24, 2022

The test will give problem on big-endian architectures just as @uweigand  found out #73332 (comment)

I can take a look and add a commit to this PR (maybe even this week) to try to fix the implementation for big endian architectures if you think it is a good idea? @jkotas @mconnew

If you prefer this PR as is without any behavioral changes is there a recommended way of skipping the "problematic" test on big endian architectures?

@HongGit
Copy link
Contributor

HongGit commented Feb 2, 2023

@StephenMolloy and @mconnew can you please take a look?

@StephenMolloy
Copy link
Member

Test failure appears to be unrelated. #64227

Copy link
Member

@StephenMolloy StephenMolloy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@StephenMolloy StephenMolloy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@StephenMolloy StephenMolloy merged commit 390c2d5 into dotnet:main Apr 1, 2023
104 of 107 checks passed
WCF Owned Areas automation moved this from Open PRs to Completed Apr 1, 2023
@Daniel-Svensson Daniel-Svensson deleted the dev branch April 5, 2023 17:45
@dotnet dotnet locked as resolved and limited conversation to collaborators May 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Serialization community-contribution Indicates that the PR has been added by a community member
Projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants