New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High differences between run for GC Diagnoser #133

Closed
xoofx opened this Issue Apr 1, 2016 · 21 comments

Comments

Projects
None yet
5 participants
@xoofx
Member

xoofx commented Apr 1, 2016

Trying to play with the GCDiagnoser and I'm getting weird results on consecutive run, like the Gen0/Gen1/Gen2/Bytes Allocated/Op are varying a lot (30-40%), making it quite useless to compare... while I would not expect figures to change that much (afaik, I'm not aware about this indeterministic GC behaviour)

Have you been noticing this issue or Am I doing something wrong there?

@adamsitnik

This comment has been minimized.

Show comment
Hide comment
@adamsitnik

adamsitnik Apr 1, 2016

Member

I have been running our integration tests plenty of times, they contain simple test for GC Diagnoser and the results were always the same.

.NET GC is selftunning (changing Gen sizes etc) but we always start new process so the results should be almost the same as long as OS has similar resources available.

Could you give some sample benchmark with config that you use?

Member

adamsitnik commented Apr 1, 2016

I have been running our integration tests plenty of times, they contain simple test for GC Diagnoser and the results were always the same.

.NET GC is selftunning (changing Gen sizes etc) but we always start new process so the results should be almost the same as long as OS has similar resources available.

Could you give some sample benchmark with config that you use?

@mattwarren

This comment has been minimized.

Show comment
Hide comment
@mattwarren

mattwarren Apr 1, 2016

Contributor

Can you post the logs from 2 runs with different # of GC collections? It'll be easier to see what's going on

On 1 Apr 2016, at 02:48, "Alexandre Mutel" notifications@github.com wrote:

Trying to play with the GCDiagnoser and I'm getting weird results on consecutive run, like the Gen0/Gen1/Gen2/Bytes Allocated/Op are varying a lot (30-40%), making it quite useless to compare... while I would not expect figures to change that much (afaik, I'm not aware about this indeterministic GC behaviour)

Have you been noticing this issue or Am I doing something wrong there?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub

Contributor

mattwarren commented Apr 1, 2016

Can you post the logs from 2 runs with different # of GC collections? It'll be easier to see what's going on

On 1 Apr 2016, at 02:48, "Alexandre Mutel" notifications@github.com wrote:

Trying to play with the GCDiagnoser and I'm getting weird results on consecutive run, like the Gen0/Gen1/Gen2/Bytes Allocated/Op are varying a lot (30-40%), making it quite useless to compare... while I would not expect figures to change that much (afaik, I'm not aware about this indeterministic GC behaviour)

Have you been noticing this issue or Am I doing something wrong there?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub

@xoofx

This comment has been minimized.

Show comment
Hide comment
@xoofx

xoofx Apr 1, 2016

Member

The output I'm getting while running multiple times is like this:

               Method |    Median |    StdDev |  Gen 0 | Gen 1 | Gen 2 | Bytes Allocated/Op |
--------------------- |---------- |---------- |------- |------ |------ |------------------- |
    TestCommonMarkNet | 4.8020 ms | 0.0326 ms | 208.54 | 14.34 | 89.28 |       1,117,194.48 |
 TestCommonMarkNetNew | 5.7072 ms | 0.0541 ms | 169.87 | 84.31 | 72.44 |       1,124,892.65 |
          TestMarkdig | 6.4552 ms | 0.0454 ms | 115.00 | 75.00 | 55.00 |         851,150.77 |
     TestMarkdownDeep | 8.4201 ms | 0.5206 ms | 176.43 | 76.75 | 56.46 |       1,039,481.68 |

               Method |    Median |    StdDev |  Gen 0 | Gen 1 | Gen 2 | Bytes Allocated/Op |
--------------------- |---------- |---------- |------- |------ |------ |------------------- |
    TestCommonMarkNet | 4.7802 ms | 0.0365 ms | 117.28 |  8.10 | 50.15 |         738,831.91 |
 TestCommonMarkNetNew | 5.7256 ms | 0.0317 ms | 176.00 | 87.42 | 75.26 |       1,371,935.85 |
          TestMarkdig | 6.2702 ms | 0.0673 ms |  91.59 | 59.74 | 43.81 |         797,976.80 |
     TestMarkdownDeep | 7.9212 ms | 0.0405 ms | 164.00 | 71.00 | 52.00 |       1,134,079.22 |


               Method |    Median |    StdDev |  Gen 0 | Gen 1 | Gen 2 | Bytes Allocated/Op |
--------------------- |---------- |---------- |------- |------ |------ |------------------- |
    TestCommonMarkNet | 4.8237 ms | 0.0504 ms | 152.82 | 10.51 | 65.42 |       1,117,194.48 |
 TestCommonMarkNetNew | 5.6900 ms | 0.0490 ms | 138.97 | 68.97 | 59.27 |       1,255,915.01 |
          TestMarkdig | 6.3441 ms | 0.0963 ms |  70.75 | 46.02 | 33.66 |         713,802.94 |
     TestMarkdownDeep | 7.8991 ms | 0.0547 ms | 164.00 | 71.00 | 52.00 |       1,314,925.80 |    


               Method |    Median |    StdDev |  Gen 0 |  Gen 1 |  Gen 2 | Bytes Allocated/Op |
--------------------- |---------- |---------- |------- |------- |------- |------------------- |
    TestCommonMarkNet | 4.7855 ms | 0.0920 ms | 194.88 |  13.53 |  83.23 |         963,242.75 |
 TestCommonMarkNetNew | 5.7061 ms | 0.0456 ms | 249.01 | 123.83 | 106.91 |       1,522,547.54 |
          TestMarkdig | 6.3259 ms | 0.1434 ms | 115.00 |  75.00 |  55.00 |         785,682.84 |
     TestMarkdownDeep | 7.9409 ms | 0.0626 ms | 173.00 |  75.00 |  55.00 |         938,949.44 |

               Method |    Median |    StdDev |  Gen 0 |  Gen 1 | Gen 2 | Bytes Allocated/Op |
--------------------- |---------- |---------- |------- |------- |------ |------------------- |
    TestCommonMarkNet | 4.7873 ms | 0.0881 ms | 188.43 |  13.16 | 80.36 |         992,620.58 |
 TestCommonMarkNetNew | 5.7098 ms | 0.0235 ms | 212.03 | 105.35 | 90.77 |       1,381,368.58 |
          TestMarkdig | 6.4275 ms | 0.0694 ms | 113.54 |  74.13 | 54.42 |         827,121.71 |
     TestMarkdownDeep | 7.9162 ms | 0.0472 ms | 173.00 |  75.00 | 55.00 |       1,000,627.30 |

As you can see, sometimes figures are jolting for the Gem0/Gen1/Gen2 (worse for Bytes Allocated/Op), changing even the ranking between runs.

When I'm running for example a simple benchmark (without using BenchmarkDotNet) and outputing GC.CollectionCount(0)/GC.CollectionCount(1)/GC.CollectionCount(2)) and testing one solution (like TestcommontMarkNet), the GC counts are fixed and not changing between runs.

Member

xoofx commented Apr 1, 2016

The output I'm getting while running multiple times is like this:

               Method |    Median |    StdDev |  Gen 0 | Gen 1 | Gen 2 | Bytes Allocated/Op |
--------------------- |---------- |---------- |------- |------ |------ |------------------- |
    TestCommonMarkNet | 4.8020 ms | 0.0326 ms | 208.54 | 14.34 | 89.28 |       1,117,194.48 |
 TestCommonMarkNetNew | 5.7072 ms | 0.0541 ms | 169.87 | 84.31 | 72.44 |       1,124,892.65 |
          TestMarkdig | 6.4552 ms | 0.0454 ms | 115.00 | 75.00 | 55.00 |         851,150.77 |
     TestMarkdownDeep | 8.4201 ms | 0.5206 ms | 176.43 | 76.75 | 56.46 |       1,039,481.68 |

               Method |    Median |    StdDev |  Gen 0 | Gen 1 | Gen 2 | Bytes Allocated/Op |
--------------------- |---------- |---------- |------- |------ |------ |------------------- |
    TestCommonMarkNet | 4.7802 ms | 0.0365 ms | 117.28 |  8.10 | 50.15 |         738,831.91 |
 TestCommonMarkNetNew | 5.7256 ms | 0.0317 ms | 176.00 | 87.42 | 75.26 |       1,371,935.85 |
          TestMarkdig | 6.2702 ms | 0.0673 ms |  91.59 | 59.74 | 43.81 |         797,976.80 |
     TestMarkdownDeep | 7.9212 ms | 0.0405 ms | 164.00 | 71.00 | 52.00 |       1,134,079.22 |


               Method |    Median |    StdDev |  Gen 0 | Gen 1 | Gen 2 | Bytes Allocated/Op |
--------------------- |---------- |---------- |------- |------ |------ |------------------- |
    TestCommonMarkNet | 4.8237 ms | 0.0504 ms | 152.82 | 10.51 | 65.42 |       1,117,194.48 |
 TestCommonMarkNetNew | 5.6900 ms | 0.0490 ms | 138.97 | 68.97 | 59.27 |       1,255,915.01 |
          TestMarkdig | 6.3441 ms | 0.0963 ms |  70.75 | 46.02 | 33.66 |         713,802.94 |
     TestMarkdownDeep | 7.8991 ms | 0.0547 ms | 164.00 | 71.00 | 52.00 |       1,314,925.80 |    


               Method |    Median |    StdDev |  Gen 0 |  Gen 1 |  Gen 2 | Bytes Allocated/Op |
--------------------- |---------- |---------- |------- |------- |------- |------------------- |
    TestCommonMarkNet | 4.7855 ms | 0.0920 ms | 194.88 |  13.53 |  83.23 |         963,242.75 |
 TestCommonMarkNetNew | 5.7061 ms | 0.0456 ms | 249.01 | 123.83 | 106.91 |       1,522,547.54 |
          TestMarkdig | 6.3259 ms | 0.1434 ms | 115.00 |  75.00 |  55.00 |         785,682.84 |
     TestMarkdownDeep | 7.9409 ms | 0.0626 ms | 173.00 |  75.00 |  55.00 |         938,949.44 |

               Method |    Median |    StdDev |  Gen 0 |  Gen 1 | Gen 2 | Bytes Allocated/Op |
--------------------- |---------- |---------- |------- |------- |------ |------------------- |
    TestCommonMarkNet | 4.7873 ms | 0.0881 ms | 188.43 |  13.16 | 80.36 |         992,620.58 |
 TestCommonMarkNetNew | 5.7098 ms | 0.0235 ms | 212.03 | 105.35 | 90.77 |       1,381,368.58 |
          TestMarkdig | 6.4275 ms | 0.0694 ms | 113.54 |  74.13 | 54.42 |         827,121.71 |
     TestMarkdownDeep | 7.9162 ms | 0.0472 ms | 173.00 |  75.00 | 55.00 |       1,000,627.30 |

As you can see, sometimes figures are jolting for the Gem0/Gen1/Gen2 (worse for Bytes Allocated/Op), changing even the ranking between runs.

When I'm running for example a simple benchmark (without using BenchmarkDotNet) and outputing GC.CollectionCount(0)/GC.CollectionCount(1)/GC.CollectionCount(2)) and testing one solution (like TestcommontMarkNet), the GC counts are fixed and not changing between runs.

@xoofx

This comment has been minimized.

Show comment
Hide comment
@xoofx

xoofx Apr 2, 2016

Member

Looking at the code, one thing I'm not sure is that I would expect the GCDiagnoser to only process GC events for the IterationMode.MainTarget period: Typically there are other periods that seem to be non deterministic (like the IdleTarget, not sure what it does, but it looks like it is adapting to the process) and we don't want them as part of the measurement. Also, not sure that it is good to have variable #operation count for different method when measuring GC events...

Member

xoofx commented Apr 2, 2016

Looking at the code, one thing I'm not sure is that I would expect the GCDiagnoser to only process GC events for the IterationMode.MainTarget period: Typically there are other periods that seem to be non deterministic (like the IdleTarget, not sure what it does, but it looks like it is adapting to the process) and we don't want them as part of the measurement. Also, not sure that it is good to have variable #operation count for different method when measuring GC events...

@xoofx

This comment has been minimized.

Show comment
Hide comment
@xoofx

xoofx Apr 2, 2016

Member

Ok, so after trying to tweak the usage of BenchmarkDotNet and using Mode.SingleRun which disable the Pilot phase, numbers are completely stable (and much more equivalent between the methods).

Though, not sure I did use correctly the API, but I had to patch the MethodInvoker.Invoke method to hardcode invokeCount = 64, because if I tried to setup the attribute on each method [Benchmark(OperationsPerInvoke = 64)], the methods are actually not executed and gives completely invalid results... so difficult to understand what's going on with this...

I used the following config:

var config = ManualConfig.Create(DefaultConfig.Instance);
var gcDiagnoser = new GCDiagnoser();
config.Add(new Job { Mode = Mode.SingleRun, LaunchCount = 2, WarmupCount = 2, TargetCount = 10 });
config.Add(gcDiagnoser);
Member

xoofx commented Apr 2, 2016

Ok, so after trying to tweak the usage of BenchmarkDotNet and using Mode.SingleRun which disable the Pilot phase, numbers are completely stable (and much more equivalent between the methods).

Though, not sure I did use correctly the API, but I had to patch the MethodInvoker.Invoke method to hardcode invokeCount = 64, because if I tried to setup the attribute on each method [Benchmark(OperationsPerInvoke = 64)], the methods are actually not executed and gives completely invalid results... so difficult to understand what's going on with this...

I used the following config:

var config = ManualConfig.Create(DefaultConfig.Instance);
var gcDiagnoser = new GCDiagnoser();
config.Add(new Job { Mode = Mode.SingleRun, LaunchCount = 2, WarmupCount = 2, TargetCount = 10 });
config.Add(gcDiagnoser);
@mattwarren

This comment has been minimized.

Show comment
Hide comment
@mattwarren

mattwarren Apr 2, 2016

Contributor

As you can see, sometimes figures are jolting for the Gem0/Gen1/Gen2 (worse for Bytes Allocated/Op), changing even the ranking between runs

The Gen 0/1/2 values are scaled/normalised based on the min value across the benchmarks (see https://github.com/PerfDotNet/BenchmarkDotNet/blob/master/BenchmarkDotNet.Diagnostics/GCDiagnoser.cs#L174), so it's not surprising they vary as the # of runs/Ops changes.

However the # Bytes is per Op, so should be more stable

Contributor

mattwarren commented Apr 2, 2016

As you can see, sometimes figures are jolting for the Gem0/Gen1/Gen2 (worse for Bytes Allocated/Op), changing even the ranking between runs

The Gen 0/1/2 values are scaled/normalised based on the min value across the benchmarks (see https://github.com/PerfDotNet/BenchmarkDotNet/blob/master/BenchmarkDotNet.Diagnostics/GCDiagnoser.cs#L174), so it's not surprising they vary as the # of runs/Ops changes.

However the # Bytes is per Op, so should be more stable

@mattwarren

This comment has been minimized.

Show comment
Hide comment
@mattwarren

mattwarren Apr 2, 2016

Contributor

Yeah 'OperationsPerInvoke' isn't going to help, that's for something else.
However you should be able to use 'IterationCount' on the Job, see
https://github.com/PerfDotNet/BenchmarkDotNet/blob/master/BenchmarkDotNet/Jobs/Job.cs#L28

Looking at the code, one thing I'm not sure is that I would expect the GCDiagnoser to only process GC events for the IterationMode.MainTarget period: Typically there are other periods that seem to be non deterministic (like the IdleTarget, not sure what it does, but it looks like it is adapting to the process) and we don't want them as part of the measurement. Also, not sure that it is good to have variable #operation count for different method when measuring GC event

Good point, I need to have a think about how to best handle this.

Contributor

mattwarren commented Apr 2, 2016

Yeah 'OperationsPerInvoke' isn't going to help, that's for something else.
However you should be able to use 'IterationCount' on the Job, see
https://github.com/PerfDotNet/BenchmarkDotNet/blob/master/BenchmarkDotNet/Jobs/Job.cs#L28

Looking at the code, one thing I'm not sure is that I would expect the GCDiagnoser to only process GC events for the IterationMode.MainTarget period: Typically there are other periods that seem to be non deterministic (like the IdleTarget, not sure what it does, but it looks like it is adapting to the process) and we don't want them as part of the measurement. Also, not sure that it is good to have variable #operation count for different method when measuring GC event

Good point, I need to have a think about how to best handle this.

@xoofx

This comment has been minimized.

Show comment
Hide comment
@xoofx

xoofx Apr 2, 2016

Member

Setting IterationTime doesn't work for me:

var config = ManualConfig.Create(DefaultConfig.Instance);
var gcDiagnoser = new GCDiagnoser();
config.Add(new Job { Mode = Mode.SingleRun, LaunchCount = 2, WarmupCount = 2, IterationTime = 64, TargetCount = 10 });
config.Add(gcDiagnoser);

It results in only 1 op:

MainTarget 1: 1 op, 4785568.28 ns, 4.7856 ms/op
MainTarget 2: 1 op, 4660543.99 ns, 4.6605 ms/op
Member

xoofx commented Apr 2, 2016

Setting IterationTime doesn't work for me:

var config = ManualConfig.Create(DefaultConfig.Instance);
var gcDiagnoser = new GCDiagnoser();
config.Add(new Job { Mode = Mode.SingleRun, LaunchCount = 2, WarmupCount = 2, IterationTime = 64, TargetCount = 10 });
config.Add(gcDiagnoser);

It results in only 1 op:

MainTarget 1: 1 op, 4785568.28 ns, 4.7856 ms/op
MainTarget 2: 1 op, 4660543.99 ns, 4.6605 ms/op
@xoofx

This comment has been minimized.

Show comment
Hide comment
@xoofx

xoofx Apr 2, 2016

Member

Looking at MethodInvoker.cs:92 , IterationTime is only used for the Throughput mode and not the SingleRun

Member

xoofx commented Apr 2, 2016

Looking at MethodInvoker.cs:92 , IterationTime is only used for the Throughput mode and not the SingleRun

@xoofx

This comment has been minimized.

Show comment
Hide comment
@xoofx

xoofx Apr 2, 2016

Member

Also on a side note, I had to compile BenchmarkDotNet.Diagnostics on my side as it is not published to nuget. Is it an oversight?

Member

xoofx commented Apr 2, 2016

Also on a side note, I had to compile BenchmarkDotNet.Diagnostics on my side as it is not published to nuget. Is it an oversight?

@xoofx

This comment has been minimized.

Show comment
Hide comment
@xoofx

xoofx Apr 4, 2016

Member

Hm, ok, maybe Diagnostics cannot be released because xproj+project.json doesn't support yet embedding native files... arg... It seems we will have to wait for dotnet/cli :(

Member

xoofx commented Apr 4, 2016

Hm, ok, maybe Diagnostics cannot be released because xproj+project.json doesn't support yet embedding native files... arg... It seems we will have to wait for dotnet/cli :(

@mattwarren

This comment has been minimized.

Show comment
Hide comment
@mattwarren

mattwarren Apr 5, 2016

Contributor

Hm, ok, maybe Diagnostics cannot be released because xproj+project.json doesn't support yet embedding native files... arg... It seems we will have to wait for dotnet/cli :(

Hmm, didn't realise that @adamsitnik or @AndreyAkinshin does this block us from releasing a BenchmarkDotNet.Diagnostics nuget package?

Contributor

mattwarren commented Apr 5, 2016

Hm, ok, maybe Diagnostics cannot be released because xproj+project.json doesn't support yet embedding native files... arg... It seems we will have to wait for dotnet/cli :(

Hmm, didn't realise that @adamsitnik or @AndreyAkinshin does this block us from releasing a BenchmarkDotNet.Diagnostics nuget package?

@mattwarren

This comment has been minimized.

Show comment
Hide comment
@mattwarren

mattwarren Apr 5, 2016

Contributor

@xoofx are you happy with the workaround you have, i.e. using a custom Config?

Or would you want BenchmarkDotNet to be more stable/consistent across runs?

Contributor

mattwarren commented Apr 5, 2016

@xoofx are you happy with the workaround you have, i.e. using a custom Config?

Or would you want BenchmarkDotNet to be more stable/consistent across runs?

@xoofx

This comment has been minimized.

Show comment
Hide comment
@xoofx

xoofx Apr 5, 2016

Member

Hmm, didn't realise that @adamsitnik or @AndreyAkinshin does this block us from releasing a BenchmarkDotNet.Diagnostics nuget package?

Actually, I was able to ship native DLLs packInclude in project.json, along with a msbuild .targets file for net20-net4x+ platforms, so that is finally possible at least.

Or would you want BenchmarkDotNet to be more stable/consistent across runs?

Hm, actually, no as I said above, IterationTime is not working, so I still had to patch the BenchmarkDotNet.
At least, when using GCDiagnoser, it is only useful to run it in a stable mode, otherwise figures are completely useless.
I'm wondering if it is really good to have this adaptive Iteration count done for each method... I would expect that the first run returns its adaptive results and other methods measured would use the exact same iteration. But I understand that in scenarios where figures are changing a lot between the methods, that would be not ideal (if a very fast method is used as a baseline for the count, and then a much slower method is used, it would take more time to run it...)... so yeah, not sure how to deal with this

Member

xoofx commented Apr 5, 2016

Hmm, didn't realise that @adamsitnik or @AndreyAkinshin does this block us from releasing a BenchmarkDotNet.Diagnostics nuget package?

Actually, I was able to ship native DLLs packInclude in project.json, along with a msbuild .targets file for net20-net4x+ platforms, so that is finally possible at least.

Or would you want BenchmarkDotNet to be more stable/consistent across runs?

Hm, actually, no as I said above, IterationTime is not working, so I still had to patch the BenchmarkDotNet.
At least, when using GCDiagnoser, it is only useful to run it in a stable mode, otherwise figures are completely useless.
I'm wondering if it is really good to have this adaptive Iteration count done for each method... I would expect that the first run returns its adaptive results and other methods measured would use the exact same iteration. But I understand that in scenarios where figures are changing a lot between the methods, that would be not ideal (if a very fast method is used as a baseline for the count, and then a much slower method is used, it would take more time to run it...)... so yeah, not sure how to deal with this

@mattwarren

This comment has been minimized.

Show comment
Hide comment
@mattwarren

mattwarren Apr 5, 2016

Contributor

Hm, actually, no as I said above, IterationTime is not working, so I still had to patch the BenchmarkDotNet.

IterationTime is in milliseconds, so you might need to increase it (depending on how long your benchmark lasts.

Contributor

mattwarren commented Apr 5, 2016

Hm, actually, no as I said above, IterationTime is not working, so I still had to patch the BenchmarkDotNet.

IterationTime is in milliseconds, so you might need to increase it (depending on how long your benchmark lasts.

@xoofx

This comment has been minimized.

Show comment
Hide comment
@xoofx

xoofx Apr 5, 2016

Member

ah, ok 😅, but how does it translate to a fixed count then? Not sure to understand the meaning of this parameter...

Member

xoofx commented Apr 5, 2016

ah, ok 😅, but how does it translate to a fixed count then? Not sure to understand the meaning of this parameter...

@xoofx

This comment has been minimized.

Show comment
Hide comment
@xoofx

xoofx May 25, 2016

Member

Back on the subject, so any way to run for all benchmarks the same count of loop (I guess SingleRun mode, with no setup/pilot phase) to have a predictible test with MemoryDiagnoser?

I tried

var config = ManualConfig.Create(DefaultConfig.Instance);
var gcDiagnoser = new MemoryDiagnoser();
config.Add(new Job { Mode = Mode.SingleRun, LaunchCount = 2, WarmupCount = 2, IterationTime = 1024, TargetCount = 10 });
config.Add(gcDiagnoser);

But doesn't not work...

Very annoying, as so far, I can't have reliable test with GC. If you have a workaround without modifying the code, that would help, thanks!

Member

xoofx commented May 25, 2016

Back on the subject, so any way to run for all benchmarks the same count of loop (I guess SingleRun mode, with no setup/pilot phase) to have a predictible test with MemoryDiagnoser?

I tried

var config = ManualConfig.Create(DefaultConfig.Instance);
var gcDiagnoser = new MemoryDiagnoser();
config.Add(new Job { Mode = Mode.SingleRun, LaunchCount = 2, WarmupCount = 2, IterationTime = 1024, TargetCount = 10 });
config.Add(gcDiagnoser);

But doesn't not work...

Very annoying, as so far, I can't have reliable test with GC. If you have a workaround without modifying the code, that would help, thanks!

@xoofx

This comment has been minimized.

Show comment
Hide comment
@xoofx

xoofx May 25, 2016

Member

For reference, the patch I used to get stable results

Member

xoofx commented May 25, 2016

For reference, the patch I used to get stable results

@mattwarren

This comment has been minimized.

Show comment
Hide comment
@mattwarren

mattwarren May 31, 2016

Contributor

@AndreyAkinshin, what are your thoughts on this, what's a safe way to make the # of Ops more consistent across runs? I don't want to break the core timings to make diagnostics work!

As a test I added the following line of code:

logger.WriteLineError($"// Run, Diagnostic - {allRuns.Sum(r => r.Operations):N0} Ops in total, across {allRuns.Count:N0} Runs");

And across 3 runs I got the following:

// Run, Diagnostic - 2,949,116 Ops in total, across 58 Runs
         Method | ArraySize |    Median |    StdDev |    Gen 0 | Gen 1 | Gen 2 | Bytes Allocated/Op |
--------------- |---------- |---------- |---------- |--------- |------ |------ |------------------- |
      GetSquare |      5000 | 3.5959 us | 0.0267 us | 7,085.00 |     - |     - |           8,872.17 |
 GetSquareStack |      5000 | 3.5182 us | 0.0235 us |        - |     - |     - |               0.44 |

// Run, Diagnostic - 2,818,044 Ops in total, across 56 Runs
         Method | ArraySize |    Median |    StdDev |    Gen 0 | Gen 1 | Gen 2 | Bytes Allocated/Op |
--------------- |---------- |---------- |---------- |--------- |------ |------ |------------------- |
      GetSquare |      5000 | 3.5873 us | 0.0274 us | 7,567.04 |     - |     - |           9,254.91 |
 GetSquareStack |      5000 | 3.5191 us | 0.0253 us |        - |     - |     - |               0.46 |

// Run, Diagnostic - 3,538,940 Ops in total, across 67 Runs
         Method | ArraySize |    Median |    StdDev |    Gen 0 | Gen 1 | Gen 2 | Bytes Allocated/Op |
--------------- |---------- |---------- |---------- |--------- |------ |------ |------------------- |
      GetSquare |      5000 | 4.0673 us | 0.3156 us | 9,702.31 |     - |     - |           9,448.04 |
 GetSquareStack |      5000 | 3.4928 us | 0.0117 us |        - |     - |     - |               0.39 |

This is the problem, the total number of Ops/Runs changes, which in turn means the Bytes Allocated/Op also varies.

Currently @xoofx uses the config below, plus this patch

var config = ManualConfig.Create(DefaultConfig.Instance);
var gcDiagnoser = new MemoryDiagnoser();
config.Add(new Job { Mode = Mode.SingleRun, LaunchCount = 2, WarmupCount = 2, TargetCount = 10 });
config.Add(gcDiagnoser);

What's the best way to get this behaviour into BenchmarkDotNet in a reliable way? What about if we explicitly set long invokeCount = 64, but only for Diagnostic runs? We already discard the timings from these runs anyway.

Contributor

mattwarren commented May 31, 2016

@AndreyAkinshin, what are your thoughts on this, what's a safe way to make the # of Ops more consistent across runs? I don't want to break the core timings to make diagnostics work!

As a test I added the following line of code:

logger.WriteLineError($"// Run, Diagnostic - {allRuns.Sum(r => r.Operations):N0} Ops in total, across {allRuns.Count:N0} Runs");

And across 3 runs I got the following:

// Run, Diagnostic - 2,949,116 Ops in total, across 58 Runs
         Method | ArraySize |    Median |    StdDev |    Gen 0 | Gen 1 | Gen 2 | Bytes Allocated/Op |
--------------- |---------- |---------- |---------- |--------- |------ |------ |------------------- |
      GetSquare |      5000 | 3.5959 us | 0.0267 us | 7,085.00 |     - |     - |           8,872.17 |
 GetSquareStack |      5000 | 3.5182 us | 0.0235 us |        - |     - |     - |               0.44 |

// Run, Diagnostic - 2,818,044 Ops in total, across 56 Runs
         Method | ArraySize |    Median |    StdDev |    Gen 0 | Gen 1 | Gen 2 | Bytes Allocated/Op |
--------------- |---------- |---------- |---------- |--------- |------ |------ |------------------- |
      GetSquare |      5000 | 3.5873 us | 0.0274 us | 7,567.04 |     - |     - |           9,254.91 |
 GetSquareStack |      5000 | 3.5191 us | 0.0253 us |        - |     - |     - |               0.46 |

// Run, Diagnostic - 3,538,940 Ops in total, across 67 Runs
         Method | ArraySize |    Median |    StdDev |    Gen 0 | Gen 1 | Gen 2 | Bytes Allocated/Op |
--------------- |---------- |---------- |---------- |--------- |------ |------ |------------------- |
      GetSquare |      5000 | 4.0673 us | 0.3156 us | 9,702.31 |     - |     - |           9,448.04 |
 GetSquareStack |      5000 | 3.4928 us | 0.0117 us |        - |     - |     - |               0.39 |

This is the problem, the total number of Ops/Runs changes, which in turn means the Bytes Allocated/Op also varies.

Currently @xoofx uses the config below, plus this patch

var config = ManualConfig.Create(DefaultConfig.Instance);
var gcDiagnoser = new MemoryDiagnoser();
config.Add(new Job { Mode = Mode.SingleRun, LaunchCount = 2, WarmupCount = 2, TargetCount = 10 });
config.Add(gcDiagnoser);

What's the best way to get this behaviour into BenchmarkDotNet in a reliable way? What about if we explicitly set long invokeCount = 64, but only for Diagnostic runs? We already discard the timings from these runs anyway.

@redknightlois

This comment has been minimized.

Show comment
Hide comment
@redknightlois

redknightlois May 31, 2016

Contributor

What you can do there is normalize everything. The Bytes Allocated/Op will be similar (no reason it should change over multiple runs) so normalization can give you a single value that is the average of all them.

Contributor

redknightlois commented May 31, 2016

What you can do there is normalize everything. The Bytes Allocated/Op will be similar (no reason it should change over multiple runs) so normalization can give you a single value that is the average of all them.

@mattwarren

This comment has been minimized.

Show comment
Hide comment
@mattwarren

mattwarren Jun 1, 2016

Contributor

I'm not sure that just normalising them will help here, because unless I'm mistaken what you're proposing, we already do that?

In MemoryDiagnoser.cs we have this code:

return (result.AllocatedBytes / (double)result.TotalOperations).ToString("N2");
Contributor

mattwarren commented Jun 1, 2016

I'm not sure that just normalising them will help here, because unless I'm mistaken what you're proposing, we already do that?

In MemoryDiagnoser.cs we have this code:

return (result.AllocatedBytes / (double)result.TotalOperations).ToString("N2");
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment