Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize SlugService #10923

Merged
merged 6 commits into from
Jan 6, 2022
Merged

Optimize SlugService #10923

merged 6 commits into from
Jan 6, 2022

Conversation

hishamco
Copy link
Member

@hishamco hishamco commented Dec 24, 2021

Addresses #10922

@hishamco
Copy link
Member Author

hishamco commented Dec 24, 2021

The benchmark result shows that there's a slightly bit improvments, after the new modification:

Method Mean Error StdDev Ratio RatioSD Gen 0 Allocated
NewSlugify 1.409 us 0.0174 us 0.0163 us 0.97 0.02 0.2174 456 B
OldSlugify 1.453 us 0.0202 us 0.0256 us 1.00 0.00 0.2327 488 B

Copy link
Member

@deanmarcussen deanmarcussen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you removed features which seems odd?

@lahma
Copy link
Contributor

lahma commented Jan 1, 2022

OK I went through the thing and the problem was that the ZString didn't didn't have using, it was leaking all the time thus bad perf. Here are my numbers (I changed the benchmark to have private static readonly SlugService _slugService; instead of ISlugService because I don't want to test interface dispatch):

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=6.0.101
  [Host]     : .NET 6.0.1 (6.0.121.56705), X64 RyuJIT
  DefaultJob : .NET 6.0.1 (6.0.121.56705), X64 RyuJIT

Original (main branch)

Method Mean Error StdDev Gen 0 Allocated
EvaluateSlugify 407.6 ns 1.76 ns 1.37 ns 0.0291 488 B

StringBuilder (this PR)

Method Mean Error StdDev Gen 0 Allocated
EvaluateSlugify 410.3 ns 0.97 ns 0.81 ns 0.0272 456 B

ZString (only StringBuilder changed to using ZString)

Method Mean Error StdDev Gen 0 Allocated
EvaluateSlugify 419.6 ns 2.07 ns 1.84 ns 0.0148 248 B

ZString v2

Method Mean Error StdDev Gen 0 Allocated
EvaluateSlugify 353.6 ns 1.47 ns 1.30 ns 0.0100 168 B

Code for ZString v2

This was my experiment to show how to optimize the implementation, I might have introduced a bug with ToLowerInvariant change, but it should still be faster even without it. The output stayed the same though for given example. I don't know if this PR has the correct implementation either, just testing 😉

I've commented the code to show what's changed. General easy rule (when implicit usings aren't being used): If there's using System.Linq; at the top of the performance critical code, try to find a way to get rid of it.

public class SlugService : ISlugService
{
    private const char Hyphen = '-';
    private const int MaxLength = 1000;

    public string Slugify(string text)
    {
        if (String.IsNullOrEmpty(text))
        {
            return text;
        }

        // one thing to consider here, if text < 20 or some other threshold should the code do pre-check
        // with foreach if all the chars are valid already (letter, digit or hyphen) and skip whole process?
        // this is why it's important to study common input and optimize usual cases

        // You need to have using directive in order for the pooling to work
        using var slug = ZString.CreateStringBuilder();
        var appendHyphen = false;

        // removed ToLowerInvariant from here, I'm unsure if I break something here
        var normalizedText = text.Normalize(NormalizationForm.FormKD);

        for (var i = 0; i < normalizedText.Length; i++)
        {
            // do ToLowerInvariant for each char so I don't need to create a new string above
            var currentChar = Char.ToLowerInvariant(normalizedText[i]);

            if (CharUnicodeInfo.GetUnicodeCategory(currentChar) == UnicodeCategory.NonSpacingMark)
            {
                continue;
            }

            if (Char.IsLetterOrDigit(currentChar))
            {
                slug.Append(currentChar);
                appendHyphen = true;
            }
            // split the Contains to two different branches, you shouldn't use LINQ method Contains for an
            // array, it's really bad for performance
            else if (currentChar is Hyphen)
            {
                if (appendHyphen && i != normalizedText.Length - 1)
                {
                    slug.Append(currentChar);
                    appendHyphen = false;
                }
            }
            // fast char equality
            else if (currentChar is '_' or '~')
            {
                slug.Append(currentChar);
            }
            else
            {
                if (appendHyphen)
                {
                    slug.Append(Hyphen);
                    appendHyphen = false;
                }
            }
        }

        // old code was doing a ToString() and then a Substring (two string allocations)
        // we can get a span from the builder and construct new string based on that
        return new string(slug.AsSpan()[..Math.Min(slug.Length, MaxLength)]).Normalize(NormalizationForm.FormC);
    }
}

@hishamco
Copy link
Member Author

hishamco commented Jan 1, 2022

Thanks a lot @lahma, I need to test your changes with the suggestion that you did in a BiG string, could you please let me know the CLI command that you use to make sure I don't get another results

@lahma
Copy link
Contributor

lahma commented Jan 1, 2022

I just do dotnet run -c Release --framework net6.0 in benchmark project folder, virus scanner disabled and minimal apps running.

@hishamco
Copy link
Member Author

hishamco commented Jan 1, 2022

dotnet run -c Release --framework net6.0

I though you mark the --job short, anyhow I will made your changes and benchmark again

@hishamco
Copy link
Member Author

hishamco commented Jan 1, 2022

How this else if (currentChar is '_' or '~') working with you, are you changed the lang version?

@hishamco
Copy link
Member Author

hishamco commented Jan 1, 2022

I got the below results using a normal or condition

Method Mean Error StdDev Gen 0 Allocated
EvaluateSlugifyWithShortSlug 1.286 us 0.0255 us 0.0389 us 0.0801 168 B
EvaluateSlugifyWithLongSlug 21.137 us 0.3821 us 0.5356 us 1.1902 2,528 B

@hishamco hishamco merged commit 15bde53 into main Jan 6, 2022
@hishamco hishamco deleted the hishamco/slug-service-benchmark branch January 6, 2022 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants