Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port RegularExpressions benchmarks #84

Merged
merged 9 commits into from
Aug 6, 2018
Merged

Conversation

adamsitnik
Copy link
Member

Fixes #71 and #76

TestData:

To consume the TestData I added new NuGet feed and configured the project to copy the files:

<add key="dotnet-core" value="https://dotnet.myget.org/F/dotnet-core/api/v3/index.json" />

  <ItemGroup>
    <Content Include="$(NuGetPackageRoot)\system.text.regularexpressions.testdata\1.0.2\content\regexredux\*.*">
      <Link>corefx\System.Text.RegularExpressions\content\%(Filename)%(Extension)</Link>
      <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
    </Content>

However, I am not sure if I should not simply copy-paste the single file used by benchmarks. It's 2 MB file, it would simplify it.

@jorive @ViktorHofer any thoughts on this?

@adamsitnik
Copy link
Member Author

I have also removed the managed heap allocations from TestData of Perf_Regex. Every iteration through Match_TestData was causing managed array allocation (new object[] ) and boxing (RegexOptions is an enum, if we store it in object array we box it).

public static IEnumerable<object[]> Match_TestData()
 {
     yield return new object[] { "[abcd-[d]]+", "dddaabbccddd", RegexOptions.None };
}

The difference:

Before:

Method Mean Error StdDev Median Min Max Gen 0 Allocated
Match 177.9 ms 9.831 ms 11.32 ms 175.1 ms 161.4 ms 202.4 ms 30000.0000 181.99 MB

After:

Method Mean Error StdDev Median Min Max Gen 0 Allocated
Match 158.7 ms 2.299 ms 2.038 ms 158.5 ms 155.8 ms 163.2 ms 29000.0000 179.63 MB

}

// A series of patterns (all valid and non pathological) and inputs (which they may or may not match)
public static IEnumerable<(string, string, RegexOptions)> Match_TestData()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We copied that over from our Match unit test. Should we remove some permutations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today if this benchmark fails we don't know which test case was regressed.

In the perfect scenario, we would have a single benchmark per most commonly used regex expression to monitor its performance.

@ViktorHofer Do we have that data/knowledge somewhere to split it into many benchmarks?

Path.Combine(
Path.GetDirectoryName(typeof(RegexRedux).Assembly.Location),
"corefx", "System.Text.RegularExpressions", "content",
"200_000.in"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this path be read from some sort of config file for this benchmark, so we do not need to rebuild to test against different content?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be, however I decided to port it as is, the source benchmark has the file name hardcoded https://github.com/dotnet/corefx/blob/master/src/System.Text.RegularExpressions/tests/Performance/Perf.RegexRedux.cs#L19


namespace System.Text.RegularExpressions.Tests
{
[BenchmarkCategory(Categories.CoreFX)]
Copy link
Member

@jorive jorive Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Categories.CoreFX [](start = 23, length = 17)

Do we know if there is any duplication compared with the regex-redux benchmarks from Benchmarks Game in CoreClr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ViktorHofer you are the person that is similar with both benchmarks and the regexp code. Could you answer this question?

/// Performance tests for Regular Expressions
/// </summary>
[BenchmarkCategory(Categories.CoreFX)]
public class Perf_Regex2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perf_Regex2 [](start = 17, length = 11)

Does changing the name here mean new benchmarks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I change it the ID exported for BenchView integration purpose is going to change too ;(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perf_Regex2 was the original name then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked: no, it's my mistake. Will revert now. Good catch!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorive reverted!

Copy link
Member

@jorive jorive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

# Conflicts:
#	src/benchmarks/Benchmarks.csproj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants