-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port RegularExpressions benchmarks #84
Conversation
I have also removed the managed heap allocations from TestData of Perf_Regex. Every iteration through public static IEnumerable<object[]> Match_TestData()
{
yield return new object[] { "[abcd-[d]]+", "dddaabbccddd", RegexOptions.None };
} The difference: Before:
After:
|
} | ||
|
||
// A series of patterns (all valid and non pathological) and inputs (which they may or may not match) | ||
public static IEnumerable<(string, string, RegexOptions)> Match_TestData() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We copied that over from our Match unit test. Should we remove some permutations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today if this benchmark fails we don't know which test case was regressed.
In the perfect scenario, we would have a single benchmark per most commonly used regex expression to monitor its performance.
@ViktorHofer Do we have that data/knowledge somewhere to split it into many benchmarks?
Path.Combine( | ||
Path.GetDirectoryName(typeof(RegexRedux).Assembly.Location), | ||
"corefx", "System.Text.RegularExpressions", "content", | ||
"200_000.in")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this path be read from some sort of config file for this benchmark, so we do not need to rebuild to test against different content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be, however I decided to port it as is, the source benchmark has the file name hardcoded https://github.com/dotnet/corefx/blob/master/src/System.Text.RegularExpressions/tests/Performance/Perf.RegexRedux.cs#L19
|
||
namespace System.Text.RegularExpressions.Tests | ||
{ | ||
[BenchmarkCategory(Categories.CoreFX)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Categories.CoreFX [](start = 23, length = 17)
Do we know if there is any duplication compared with the regex-redux benchmarks from Benchmarks Game in CoreClr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ViktorHofer you are the person that is similar with both benchmarks and the regexp code. Could you answer this question?
/// Performance tests for Regular Expressions | ||
/// </summary> | ||
[BenchmarkCategory(Categories.CoreFX)] | ||
public class Perf_Regex2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perf_Regex2 [](start = 17, length = 11)
Does changing the name here mean new benchmarks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I change it the ID exported for BenchView integration purpose is going to change too ;(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perf_Regex2
was the original name then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked: no, it's my mistake. Will revert now. Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorive reverted!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
# Conflicts: # src/benchmarks/Benchmarks.csproj
Fixes #71 and #76
TestData:
To consume the TestData I added new NuGet feed and configured the project to copy the files:
However, I am not sure if I should not simply copy-paste the single file used by benchmarks. It's 2 MB file, it would simplify it.
@jorive @ViktorHofer any thoughts on this?