Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowdown in MagickImage.ToByteArray on arm64 Mac #1447

Closed
schnoberts1 opened this issue Oct 4, 2023 · 19 comments
Closed

Slowdown in MagickImage.ToByteArray on arm64 Mac #1447

schnoberts1 opened this issue Oct 4, 2023 · 19 comments
Milestone

Comments

@schnoberts1
Copy link

schnoberts1 commented Oct 4, 2023

Magick.NET version

Magick.NET-Q8-arm64 13.1.2

Environment (Operating system, version and so on)

arm64, Mac M1, .NET 7.0.11/7.0.401

Description

            using var image = new MagickImage(stream, new MagickReadSettings { Format = MagickFormat.Psd });
            image.ToByteArray();

takes minutes in 13.1.2 when it used to take seconds for a 3k/4k file of reasonable image complexity. All the additional time is in ToByteArray. In 13.1.2 there's an overflow exception and in 13.1.0 the speed is as I expect.

I scaled the image down so I can attach it (if I can work out how to do that) and the perforce is 25ms for 13.1.0 and 4123ms for 13.1.2. Note: the slow down is less extreme when there is less detail in the image.

Steps to Reproduce

Create a PSD (RGB/8 is fine), make it say 1000 x 1000 and ensure it has some detail in it. Time the code above. Increase the image size and complexity to make the timing difference larger. My Mac has 64GB of RAM and an Apple M1 Max processor.

@dlemstra
Copy link
Owner

dlemstra commented Oct 4, 2023

Have you tested this with the most recent version?

@schnoberts1
Copy link
Author

schnoberts1 commented Oct 4, 2023 via email

@dlemstra
Copy link
Owner

dlemstra commented Oct 4, 2023

Do you get the "old" performance when you do this instead:

using var stream = new MemoryStream();
image.Write(stream);
var bytes = stream.ToArray();

@schnoberts1
Copy link
Author

schnoberts1 commented Oct 4, 2023

Writing the image into a MemoryStream and then getting the array takes < 1ms on both new and old version. [CORRECTION: ~20-25ms]

@dlemstra
Copy link
Owner

dlemstra commented Oct 4, 2023

I really wonder what kind of measurements you are using. This is what was happening inside the old version of ToByteArray and that should not be less than 1ms when you are writing a PSD image. Can you create a small project with an image on GitHub that reproduces what you are seeing?

@schnoberts1
Copy link
Author

namespace Ezen.Uploader.Tests;

using ImageMagick;
using System.IO;
using System.Diagnostics;
public class Tests
{
    [SetUp]
    public void Setup()
    {
    }

    [Test]
    public void Test1()
    {
        FileStream stream = File.OpenRead("/Users/andy/code/ezen/ezen-uploader/JPN485.psd");
        using var image = new MagickImage(stream, new MagickReadSettings { Format = MagickFormat.Psd });

        var s = Stopwatch.StartNew();
        var ba = image.ToByteArray();        
        Console.WriteLine($"ToByteArray {s.ElapsedMilliseconds}ms {ba.Length} bytes");

        s = Stopwatch.StartNew();
        using var mstream = new MemoryStream();
        image.Write(mstream);
        var bytes = mstream.ToArray();
        Console.WriteLine($"ToArray {s.ElapsedMilliseconds}ms {bytes.Length} bytes");

        Assert.Pass();
    }
}

csproj snippet:

  <Choose>
    <When Condition="'$(Platform)' == 'x64'">
      <ItemGroup>
        <PackageReference Include="Magick.NET-Q8-x64" Version="13.3.0"/>
      </ItemGroup>
    </When>
  </Choose>
  <Choose>
    <When Condition="'$(Platform)' == 'arm64'">
      <ItemGroup>
        <PackageReference Include="Magick.NET-Q8-arm64" Version="13.3.0"/>
      </ItemGroup>
    </When>
  </Choose>

Run test with 13.3.0

(base) andy@ice ezen-uploader % dotnet test -c release Ezen.Uploader.Tests  -p:Platform=arm64 -l console
  Determining projects to restore...
  All projects are up-to-date for restore.
  Ezen.Uploader -> /Users/andy/code/ezen/ezen-uploader/Ezen.Uploader/bin/arm64/release/net7.0/Ezen.Uploader.dll
  Ezen.Uploader.Tests -> /Users/andy/code/ezen/ezen-uploader/Ezen.Uploader.Tests/bin/arm64/release/net7.0/Ezen.Uploader.Tests.dll
Test run for /Users/andy/code/ezen/ezen-uploader/Ezen.Uploader.Tests/bin/arm64/release/net7.0/Ezen.Uploader.Tests.dll (.NETCoreApp,Version=v7.0)
Microsoft (R) Test Execution Command Line Tool Version 17.7.1 (arm64)
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
ToByteArray 4167ms 7842052 bytes
ToArray 22ms 7842052 bytes

Run test with 13.1.0

(base) andy@ice ezen-uploader % dotnet test -c release Ezen.Uploader.Tests  -p:Platform=arm64 -l console
  Determining projects to restore...
  Restored /Users/andy/code/ezen/ezen-uploader/Ezen.Uploader.Tests/Ezen.Uploader.Tests.csproj (in 148 ms).
  Restored /Users/andy/code/ezen/ezen-uploader/Ezen.Uploader/Ezen.Uploader.csproj (in 148 ms).
  Ezen.Uploader -> /Users/andy/code/ezen/ezen-uploader/Ezen.Uploader/bin/arm64/release/net7.0/Ezen.Uploader.dll
  Ezen.Uploader.Tests -> /Users/andy/code/ezen/ezen-uploader/Ezen.Uploader.Tests/bin/arm64/release/net7.0/Ezen.Uploader.Tests.dll
Test run for /Users/andy/code/ezen/ezen-uploader/Ezen.Uploader.Tests/bin/arm64/release/net7.0/Ezen.Uploader.Tests.dll (.NETCoreApp,Version=v7.0)
Microsoft (R) Test Execution Command Line Tool Version 17.7.1 (arm64)
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
ToByteArray 27ms 7842052 bytes
ToArray 23ms 7842052 bytes

The output of file on the DLLs is PE32+ executable (console) Aarch64 Mono/.Net assembly, for MS Windows

I'll pull this into a project if this isn't enough, let me know. I'll send you the image.

@schnoberts1
Copy link
Author

schnoberts1 commented Oct 4, 2023

My mistake, not 1ms, 23ish. I guess I must have read 21 as '1' :(

@schnoberts1
Copy link
Author

... just so I can follow on at home:

So versions that perform differently are post a change where ToByteArray was changed to look a lot like the Stream writer:

   public byte[] ToByteArray()
    {
        _settings.FileName = null;

        using var wrapper = new ByteArrayWrapper();
        var writer = new ReadWriteStreamDelegate(wrapper.Write);
        var seeker = new SeekStreamDelegate(wrapper.Seek);
        var teller = new TellStreamDelegate(wrapper.Tell);
        var reader = new ReadWriteStreamDelegate(wrapper.Read);

        _nativeInstance.WriteStream(_settings, writer, seeker, teller, reader);

        return wrapper.GetBytes();
    }

... what you are alluding to is the memory stream is how ToByteArray used to look:

            using var stream = new MemoryStream();
            Write(stream);
            return stream.ToArray();

and Write looks a lot like what ToByteArray was changed to:

            Throw.IfNull(nameof(stream), stream);

            _settings.FileName = null;

            using var wrapper = StreamWrapper.CreateForWriting(stream);
            var writer = new ReadWriteStreamDelegate(wrapper.Write);
            ReadWriteStreamDelegate? reader = null;
            SeekStreamDelegate? seeker = null;
            TellStreamDelegate? teller = null;

            if (stream.CanSeek)
            {
                seeker = new SeekStreamDelegate(wrapper.Seek);
                teller = new TellStreamDelegate(wrapper.Tell);
            }

            if (stream.CanRead)
                reader = new ReadWriteStreamDelegate(wrapper.Read);

            _nativeInstance.WriteStream(_settings, writer, seeker, teller, reader);

I have to agree, these are very very similar things.

@schnoberts1
Copy link
Author

schnoberts1 commented Oct 4, 2023

I ran both versions through a profiler. In my test code using version 13.3.0 it's spending a lot of time in Seek because Seek is spending its time calling System.Array.Resize 7269 times. I can't find any calls to Resize in the profile for 13.1.0.

ByyeArrayWrapper.Seek in 13.3.0 looks like:

    public long Seek(long offset, IntPtr whence, IntPtr user_data)
    {
        AppendBufferToBytes();

        var newOffset = (int)((SeekOrigin)whence switch
        {
            SeekOrigin.Begin => offset,
            SeekOrigin.Current => _offset + offset,
            SeekOrigin.End => _bytes.Length - offset,
            _ => -1,
        });

        if (_offset == newOffset)
            return _offset;

        if (newOffset < 0)
            return -1;
        else if (newOffset > _bytes.Length)
            ResizeBytes(newOffset);

        _offset = newOffset;
        return _offset;
    }

... could the interaction of this with the calling function be the smoking gun. In 13.1.0 StreamWrapper is used and there Seek delegates to the Stream passed in, which presumably is doing something quite different in Seek than the ByteArrayWrapper as it’s a core .NET library and likely doesn’t increment in the way ByteArrayWrapper does.

@schnoberts1
Copy link
Author

schnoberts1 commented Oct 5, 2023

... this also explains why the slow down is related to the complexity of the image (something I noticed yesterday). If the image is entirely white (one of my test cases that didn't exhibit as bad a slowdown) then perhaps it's uncompressed in a very small number of chunks as the information content is low, so the array size is grown to size in a small number of Resize calls.

I don't know the codebase very well at all, am I heading in the right direction here?

If I am right then I should be able to replicate this in a test in the Native repo. I'll give that a go in the next 3-4 weeks. If I replicate it then I'll think about a fix if you would like.

@dlemstra
Copy link
Owner

dlemstra commented Oct 5, 2023

Thanks for all the research. I was suspecting this was the issue but I was not sure about it. The ToByteArray method was changed to reduce memory allocations.

Would it be possible to share an image that I can use to reproduce the issue? I might be able to find a workaround for this myself then. Maybe this would work ResizeBytes(newOffset + BufferSize);?

@schnoberts1
Copy link
Author

schnoberts1 commented Oct 6, 2023 via email

@schnoberts1
Copy link
Author

schnoberts1 commented Oct 6, 2023

I would imagine so. It'd likely reduce the number of Resize calls which would also have the handy feature of reducing the chances of heap fragmentation requiring the whole array is moved in memory as part of resizing so it can maintain a contiguous chunk of memory (assuming that's how C# arrays work, I've only been coding in C# for less than a month part time so I know very little.

I would think you don't need to allocate memory on a Seek as no Read or Write is happening. You could consider Seek() a declaration of intent. If you took that approach then you only need the array resized when something happens, e.g. when reads or writes overlap a region that was intended to be resized via a previous seek and has yet to be resized. Write already handles something similar to this case but it'd need adjusting and you'd need to add Resize support to Read to handle when Read is reading off the end of the allocated area into the "to be allocated" area. It also means you can Seek around as you wish without incurring cost until you do something. That being said, I don't know what the call pattern of the Readers/Writers is in this library.

Re: reproduction, a simple test that just seeks forward 1 byte at a time is going to produce this problem on arm64 (I guess). I'd implement this test myself but with my inexperience I can't get the stuff building on osx-arm64. dotnet build -r osx-arm64 Magick.NET.sln is just generating some complaint that NETSDK1100: To build a project targeting Windows on this operating system, set the EnableWindowsTargeting property to true. which is broadly equivalent to "pilot does not know what he is doing" :)

@schnoberts1
Copy link
Author

I've just checked the C# Memory Stream implementation and it takes the approach I describe above, which is why that performs in a more consistent fashion. Look at Seek() in https://github.com/microsoft/referencesource/blob/master/mscorlib/system/io/memorystream.cs.

@dlemstra
Copy link
Owner

Could you please share a test file I could use for this? I would like to investigate this but having no test file is blocking me.

@schnoberts1
Copy link
Author

schnoberts1 commented Oct 13, 2023

...sent to the email address on your github profile

@dlemstra
Copy link
Owner

Thanks for the file and I can see what is happening. I really think I should simplify the implementation of the ByteArrayWrapper.

@dlemstra
Copy link
Owner

I made some changes to the implementation and this should fix the performance again in the next release.

@dlemstra dlemstra added this to the 13.4.0 milestone Oct 16, 2023
@schnoberts1
Copy link
Author

schnoberts1 commented Oct 16, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants