Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce buffered crc32 algorithm to calculate a checksum #436

Closed
wants to merge 1 commit into from
Closed

introduce buffered crc32 algorithm to calculate a checksum #436

wants to merge 1 commit into from

Conversation

novak-as
Copy link

Fixes #255, based on this commit

Few more thoughts/questions:

  • I'm not sure if TestBufferedChecksum should be renamed. I can assume (because of the existence of the LuceneNetSpecificAttribute) that it could break some inner stuff.
  • CRC32 is left as proof of the same test results. Probably it could be dropped and replaced with implementation from crc32.net
  • renaming of BufferedChecksumIndexInput is a breaking change, so I wasn't sure if it should be introduced in the scope of this PR

add missed LuceneNetSpecific attribute
@NightOwl888
Copy link
Contributor

Thanks for the PR, and for the great work.

You managed to get the update functionality working with .NET's HashAlgorithm class, which is great. Unfortunately, through no fault of yours, the HashAlgorithm class creates a significant number of allocations compared to the current implementation based on IChecksum even though it outperforms in raw speed. The allocations would certainly more than negate any improvement due to the additional amount of garbage collection.

    [MemoryDiagnoser]
    public class BenchmarkCrc32
    {
        private const int RandomSeed = 1234;
        private const int Iterations = 100000;

        [Benchmark]
        public void UnbufferedCRC32_RunSequence()
        {
            var Random = new Random(RandomSeed);
            var data = new byte[Random.Next(1024)];
            Random.NextBytes(data);

            var c1 = new CRC32();

            for (int i = 0; i < Iterations; i++)
            {

                c1.Update(data);

                c1.Reset();

                Random.NextBytes(data);
                c1.Update(data);

                var _ = c1.Value;
            }
        }

        [Benchmark]
        public void BufferedCRC32_RunSequence()
        {
            var Random = new Random(RandomSeed);
            var data = new byte[Random.Next(1024)];
            Random.NextBytes(data);

            var c1 = new BufferedChecksum(new CRC32());

            for (int i = 0; i < Iterations; i++)
            {

                c1.Update(data);

                c1.Reset();

                Random.NextBytes(data);
                c1.Update(data);

                var _ = c1.Value;
            }
        }

        [Benchmark]
        public void BufferedCrc32Algorithm_RunSequence()
        {
            var Random = new Random(RandomSeed);
            var data = new byte[Random.Next(1024)];
            Random.NextBytes(data);

            var c2 = new BufferedCrc32Algorithm();

            for (int i = 0; i < Iterations; i++)
            {
                c2.TransformFinalBlock(data, 0, data.Length);

                c2.Initialize();

                Random.NextBytes(data);
                c2.Update(data);

                var _ = c2.Value;
            }
        }
    }
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.19041
Intel Core i7-8850H CPU 2.60GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.104
  [Host]     : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT
  DefaultJob : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
UnbufferedCRC32_RunSequence 539.0 ms 10.54 ms 16.42 ms - - - 736 B
BufferedCRC32_RunSequence 531.9 ms 10.57 ms 21.84 ms - - - 1056 B
BufferedCrc32Algorithm_RunSequence 429.8 ms 8.50 ms 15.54 ms 11000.0000 - - 52800800 B

I attempted to modify the Crc32Algorithm class to reduce allocations, but most of them appear to be coming from the HashAlgorithm base class.

So by exploring this, it is now clear that:

  1. We cannot make a BufferedHashAlgorithm class that works with any HashAlgorthm because it doesn't expose APIs to update the algorithm.
  2. The raw speed of BuffferedCrc32Algorithm is better than a buffered CRC32, but the large number of allocations by the former would cause performance degradation.

So, while this seemed to be sensible when writing up #255, our understanding of the issues with doing this have changed as we are neither gaining in API usability or in performance. In short, there is no benefit to replacing the IChecksum interface with HashAlgorthm and it is best to leave it marked internal.

Thanks for putting this together so we could explore this option. You definitely deserve all of the credit for closing #255 and we'd welcome your further participation in the project.

asfgit pushed a commit that referenced this pull request Mar 11, 2021
… are not making this class public as it was in Lucene. See #436.
@novak-as novak-as deleted the impr-crc32 branch March 12, 2021 17:06
@novak-as novak-as restored the impr-crc32 branch March 12, 2021 17:06
@novak-as novak-as deleted the impr-crc32 branch March 15, 2021 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Factor out ICheckSum and replace with System.Security.Cryptography.HashAlgorithm
2 participants