Remove delegate based debugging assert #373

theolivenbaum · 2020-10-17T17:45:34Z

No description provided.

… FieldCacheImpl

eladmarg · 2020-10-17T17:50:32Z

looks good, this will save allocations.

NightOwl888 · 2020-10-17T18:07:17Z

This looks like a pretty good way to accomplish the best performance both in production and in tests.

…ndition before checking for AssertsEnabled

theolivenbaum · 2020-10-19T13:45:53Z

@eladmarg @NightOwl888 think I'm done with the changes, for reviewing, worth checking just the final diff against master as I did some cleanup to make the final diff smaller.
There should be no logic changes now, and this should work also in the case where generating the message would throw if the assert is false.

…ed-debugging-assert

…gate-based-debugging-assert" This reverts commit 187a3be, reversing changes made to 06eb432.

theolivenbaum · 2020-10-19T19:40:27Z

This closes #346 as well...

jeme · 2020-10-21T12:29:12Z

I am trying to understand the design choices here, it seems somewhat messy (No offense).

You have implemented a "ShouldAssert", which internally checks the flag AssertsEnabled and then the boolean passed, yet in many places we see:

if(Debugging.AssertsEnabled && Debugging.ShouldAssert(*** Some Check ***)){}

Which means we run the check against Debugging.AssertsEnabled twice... (while its cheap, it's still redundant)
Was the intention not to have the Debugging.AssertsEnabled checked outside?...
If not, then I fail to see the reasoning behind the ShouldAssert method alltogether and I think it would be cleaner to:

if(Debugging.AssertsEnabled && *** Some Check ***) Debugging.Throw("message")

Also note that if if the aim here is to increase performance and reduce allocations, something indicates that string interpolation may be a better candidate than string format for this particular design. Obviously string format would have it's advantages if we were just to rewrite the design by @NightOwl888 from taking a lambda to take a string format and then args.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1082 (1909/November2018Update/19H2)
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4220.0), X86 LegacyJIT
  DefaultJob : .NET Framework 4.8 (4.8.4220.0), X86 LegacyJIT

Method	Data	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
Format	42	1,813.3 ns	31.65 ns	41.16 ns	0.3090	-	-	1298 B
FormatExplicit	42	1,214.2 ns	23.00 ns	25.56 ns	0.0858	-	-	361 B
Interpolate	42	1,777.4 ns	33.33 ns	57.49 ns	0.3090	-	-	1298 B
InterpolateExplicit	42	562.2 ns	6.39 ns	5.66 ns	0.0429	-	-	180 B

Format	1337	1,871.1 ns	31.90 ns	37.97 ns	0.3090	-	-	1298 B
FormatExplicit	1337	1,342.0 ns	20.94 ns	17.48 ns	0.0858	-	-	361 B
Interpolate	1337	1,894.2 ns	36.20 ns	43.09 ns	0.3090	-	-	1298 B
InterpolateExplicit	1337	596.2 ns	9.37 ns	8.76 ns	0.0429	-	-	180 B

theolivenbaum · 2020-10-21T14:16:18Z

Hi @jeme, no worries, I also found the final code a bit confusing.

The reason for the redundant check is that I didn't want to change the behavior from:

Check Flag ----> Check if Assert condition is true ----> Throw

To:

Check if Assert condition is true ----> Check Flag ----> Throw

This would happen if we only have the ShouldAssert call - even with the aggressive inlining option on the ShouldAssert method, the compiler will obviously not invert the checks as that could change the code behavior.

if(Debugging.ShouldAssert(*** Some Check ***)){}

Thus I added back the Debugging.AssertsEnabled check before the call to ShouldAssert to keep it consistent with the original code, and to avoid the extra checks due to the condition on every call even when Debugging.AssertsEnabled = false.

if(Debugging.AssertsEnabled && Debugging.ShouldAssert(*** Some Check ***)){}

Regarding the interpolated string, true, with the new design where the check for the condition is outside of the code, that would work fine - I could change it to use the interpolated string again instead of the new string.Format design...

NightOwl888 · 2020-10-21T15:13:21Z

This feature should be optimized for:

Asserts are disabled
Condition succeeds (equals true)

An exception will only occur in an application that both has asserts enabled and is misbehaving in some way.

This PR has changed a bit from the Debugging.Assert overloads I thought it might be a good solution:

Assert<T0>(bool condition, string messageFormat, T0 arg0);
Assert<T0, T1>(bool condition, string messageFormat, T0 arg0, T1 arg1);
Assert<T0, T1, T2>(bool condition, string messageFormat, T0 arg0, T1 arg1, T2 arg2);

The above signatures would prevent the string from being constructed until after the condition is checked, which is what we need to avoid unnecessary string concatenation/formatting. We can live with boxing when formatting strings that include value type parameters in this scenario (using string.Format), since this will only occur in an application that is misbehaving (provided the condition is checked before any boxing occurs). It is an extreme edge case.

For the same reason, which string concatenation method that is used is irrelevant, as string concatenation will only occur in a misbehaving application. Ideally, the string concatenation will also only occur if the condition fails and should never happen when asserts are disabled.

I have to agree that there doesn't seem to be much value in ShouldAssert. If the application is running correctly, ShouldAssert will always be false. If we are going to split it up, it would definitely be better to nix the Assert method and just add the condition to the if block as I pointed out in #346 and @jeme pointed out above.

// Before
if (Debugging.AssertsEnabled) Debugging.Assert(outputLen > 0, () => "output contains empty string: " + scratchChars);

// After
if (Debugging.AssertsEnabled && !(outputLen > 0)) throw new AssertionException($"output contains empty string: {scratchChars}");

This keeps the syntax relatable to both Java and .NET asserts:

// Java
assert outputLen > 0 : "output contains empty string: " + scratchChars;

// .NET
Debug.Assert(outputLen > 0, $"output contains empty string: {scratchChars}");

Sure, it is more verbose, but it is what is needed to make the feature as cheap as possible.

From #346 (comment):

Think the only problem with that is that the JIT usually won't inline methods with a throw expression.

I suspect not. It would mean that any guard clause would cause a method not to be inlined. Consider:

public void Foo(string p1)
{
    if (p1 is null)
        throw new ArgumentNullException(nameof(p1));

    // Implementation
}

This is really no different than:

public void Foo(string p1)
{
    if (Debugging.AssertsEnabled && !p1.Contains("0"))
        throw new AssertionException();

    // Implementation
}

Both of them check a condition, then throw an exception if the condition fails. In addition, the failure is expected to be an extreme edge case so any overhead of concatenating an error message doesn't occur normally.

In fact, about 10% of all of the asserts in Lucene could be converted to guard clauses in .NET. For end users, it would be more intuitive to make them guard clauses rather than asserts, but it would come at a runtime performance cost. The point is the asserts are designed to take the place of guard clauses when asserts are enabled, but unlike guard clauses asserts can be turned off in production to improve performance.

I am still debating whether to go with asserts or guard clauses, but I am leaning toward making them guard clauses because .NET users don't normally expect to have to "turn on" these checks when debugging.

Which means we run the check against Debugging.AssertsEnabled twice... (while its cheap, it's still redundant)

The original design was to have a few Debugging.Assert() overloads that are self-contained, so they check internally whether asserts are enabled before checking the condition. However, there was a high production performance cost in doing it that way (presumably because of the unnecessary allocations of method parameters when asserts were disabled). Moving the if block outside of the method improved performance considerably. The check was also left inside of the method primarily to future-proof it if someone were to just call Debugging.Assert() without checking to see whether it is enabled first (after all, it was designed to be a drop-in replacement for Debug.Assert()).

Having a second check is redundant, but the redundancy only occurs when asserts are enabled (during testing/debugging).

The redundancy was removed in places where there are blocks of 2 or more Debugging.Assert() calls - a single if (Debugging.AssertsEnabled) check is done for the entire block, which also optimizes for asserts being disabled.

Boolean Switch

One thing that was missed in the current implementation is the fact that .NET already has a built-in feature to turn something on/off from outside of the application for debugging purposes: BooleanSwitch.

internal static BooleanSwitch asserts = new BooleanSwitch("assertsEnabled", "Enable Assertions", "false");

public static void MyMethod(string location) {
   //Insert code here to handle processing.
   if(asserts.Enabled && !<condition>)
      throw new AssertionException("Error happened at " + location);
}

To utilize this, we need to ensure that it is disabled by default in the core library and enabled by default in the test framework and is correctly passed into the application in the Azure DevOps pipeline tests.

This doesn't have to be part of this PR - I am just pointing out that we might change AssertsEnabled later to integrate tighter with .NET.

Assert(false)

Note that there are many places in Lucene that where the line

throw new AssertionError();

was replaced with either:

Debugging.Assert(false);

// or

throw new InvalidOperationException();

Essentially, these were both done to compensate for the fact that System.Diagnostics.Debug.Assert() (which we originally used) gets stripped out from the Release build and does not always throw an exception that can be caught. Do note that some of the tests differentiate between catching an AssertionException or an InvalidOperationException and do something different in each case. But in cases where we have Debugging.Assert(false) now, we can just replace the line with throw new AssertionException() - these generally just equate to execution paths that should be unreachable unless the application is misbehaving.

This doesn't have to be part of this PR - I am just pointing it out so there is a clear picture the direction this may take.

jeme · 2020-10-22T09:11:23Z

Hi @jeme, no worries, I also found the final code a bit confusing.

The reason for the redundant check is that I didn't want to change the behavior from:

Check Flag ----> Check if Assert condition is true ----> Throw

To:

Check if Assert condition is true ----> Check Flag ----> Throw

This would happen if we only have the ShouldAssert call - even with the aggressive inlining option on the ShouldAssert method, the compiler will obviously not invert the checks as that could change the code behavior.
if(Debugging.ShouldAssert(*** Some Check ***)){}
Thus I added back the Debugging.AssertsEnabled check before the call to ShouldAssert to keep it consistent with the original code, and to avoid the extra checks due to the condition on every call even when Debugging.AssertsEnabled = false.

But we have to choose between Simplicity or Performance in this particular case... The attempt here looks like you went for simplicity first, but then in adding in the performance after that it completely nullified the simplicity attempt, and even made it much much worse.

So we have to choose...

// "Simplicity" (Well somewhat, the Debug.Assert(check, message) is probably the most simple design)...
if(Debug.ShouldThrow(*** Check ***)) Debug.Throw("Message");
ShouldThrow(bool check) => AssertsEnabled && check;

If the "Check" is costly, this will hurt performance regardless of Asserts being enabled or not.
We do save allocation and computation of the resulting message.
But I don't think we will go for that.

Or

// Performance
if(Debug.AssertsEnabled && Debug.ShouldThrow(*** Check ***)) Debug.Throw("Message");
ShouldThrow(bool check) => check;

In the later you will quickly realize that we are essentially passing a boolean to a method just to return it, so the method becomes irrelevant all together and only adds noise, hence we can eliminate that to:

// Performance
if(Debug.AssertsEnabled && *** Check ***) Debug.Throw("Message");

This saves the computation of the check (in normal mode, where asserts are disabled), if it's costly that matters, otherwise it is negligible.
We still save allocation and computation of the resulting message.

Obviously we can then as a final step simply inline the throw of the exception. I have no opinion on that here.

The example overloads like @NightOwl888 presented:

Assert<T0>(bool condition, string messageFormat, T0 arg0);
Assert<T0, T1>(bool condition, string messageFormat, T0 arg0, T1 arg1);
Assert<T0, T1, T2>(bool condition, string messageFormat, T0 arg0, T1 arg1, T2 arg2);

Should avoid the extra allocations as well and is the option most in line with what is currently in place. This is also what is most in line with the built-in "Debug.Assert".

I would as much as anyone like to avoid all this mess all-together, but the only solution that would be somewhat "Clean" to the code I can think of is some heavy reliance of AOP and that is complicated to add.

NightOwl888 · 2020-10-22T11:37:26Z

If the "Check" is costly, this will hurt performance

I neglected to mention that in the original implementation, the condition parameter was a Func<bool> to ensure that it wasn't run in production. However, this proved to be too costly in terms of performance when asserts are disabled. The if (Debugging.AssertsEnabled) added and first parameter changed to bool at the same time, as they go hand-in-hand. Together they brought performance close to the original Debug.Assert being compiled out. The condition should never be checked when asserts are disabled, as that would incur unnecessary production overhead.

Originally, all of the Assert overloads that had a message string also used a Func<string> as the message parameter. This was also somewhat costly, so the additional overloads with a string were added and utilized in all cases where the message string had no concatenation. ~80% of the asserts do not call the overload with Func<string>.

Both of the above changes brought the performance pretty close to what it was when we used Debug.Assert.

But this is just more proof that not all of these asserts are equal. For example:

~20% of the asserts use Func<string> as the second parameter, which could be converted as we have been discussing
~10% of the asserts could be converted to guard clauses (which means they will be more expensive at runtime, but more intuitive to .NET users)
~5% of the asserts cannot ever fail and could be changed back to Debug.Assert and compiled out of the release, provided we add tests for a Debug build to the nightly build (for example, places that are checking if the current object is locked where the outside world doesn't have any control over the lock)
~5% of the asserts had been converted from simple throw AssertionException() statements to Debug.Assert(false) and now that AssertionException is part of Lucene.Net.dll, they can be converted back

Unfortunately, going down this road is inevitably going to lead to many specializations of asserts, some of which diverge from the Lucene source, all in the name of performance.

I would as much as anyone like to avoid all this mess all-together, but the only solution that would be somewhat "Clean" to the code I can think of is some heavy reliance of AOP and that is complicated to add.

Well, I was hoping to avoid it, too. Several years ago we assumed that Debug.Assert would do everything we needed. But after @Shazwazza pointed out that some of the tests were failing in debug mode because we were skipping asserts that were meant as test conditions, the realization that some of the production overhead could be removed by putting in a switch to keep certain "test only" methods from being executed, and the realization that both the test framework and the "check index" feature require asserts to be enabled in Release mode to run all of the test conditions, there doesn't seem to be much point in trying to avoid it.

AOP could potentially save us from having to go down the road of specializing asserts in the name of performance. It might be costly, but when compared against trying to maintain a diverging set of asserts and Assert methods that require an if block outside of them in order to optimize performance, it might be worth it. Perhaps a small example is in order of AOP. Although, years ago when I checked there didn't appear to be any mainstream open source AOP libraries, which would kill that idea immediately if it is still the case.

jeme · 2020-10-26T08:45:02Z

AOP could potentially save us from having to go down the road of specializing asserts in the name of performance. It might be costly, but when compared against trying to maintain a diverging set of asserts and Assert methods that require an if block outside of them in order to optimize performance, it might be worth it. Perhaps a small example is in order of AOP. Although, years ago when I checked there didn't appear to be any mainstream open source AOP libraries, which would kill that idea immediately if it is still the case.

There is some fairly mature AOP frameworks out there, however not all will be suitable for what we need here. E.g. I still think PostSharp, which is one of the bigger ones, is focused on Compile time AOP which makes it unsuitable here as we are looking for a Runtime.

Harmony works at runtime so that could be a candidate, it's has apparently been mostly used to mod games, but that shouldn't matter. The bigger problem though is that the Debugging.Assert is not only used on Method entry in Lucene, which means we need to somehow get around this problem as we can't just inject code into the middle of a method (AFAIK, at least not with ease)...

I was thinking that one approach for this could be to pull out the actual assert into it's own method in each class (A .NET specific method)... That would introduce a call to a "NOOP" by default at runtime, and that could then be replace. The consequence here though is that then we may not even need AOP at all to replace the logic as we can perhaps inject it instead... (But I want to perform some experiments to look into the performance cost of a "NOOP" in various scenarios vs the boolean check we have now..

eladmarg · 2020-10-26T09:02:56Z

another option is to write custom Fody weaver but I'm not sure we should invest the efforts on this area to gain performance.
we have still many other areas that have bigger impact and we should focus there.

for instance, there are many places we can save allocations, better re-use of allocated objects using pools and managers.
there are also some places we can utilize span to avoid more allocations.
and of course, we can aggressive inline static some methods.

I would even say that in the cases the code isn't in use for testing/validation and only output, we can simply wrap it with conditional #IF TEST. (to remove it completely)

I'm not sure its that necessary in all the places its used.

NightOwl888 · 2020-10-26T11:15:28Z

we have still many other areas that have bigger impact and we should focus there.

for instance, there are many places we can save allocations, better re-use of allocated objects using pools and managers.
there are also some places we can utilize span to avoid more allocations.
and of course, we can aggressive inline static some methods.

Agreed.

But to resolve this PR, I would say the least we should do is eliminate the Func<string> since it clearly has a negative impact on performance.

There are 2 acceptable ways to eliminate the Func<string> overloads of Debugging.Assert to complete this PR:

Preferred: Revert the overloads to the format Assert<T0, T1, T2>(bool condition, string messageFormat, T0 arg0, T1 arg1, T2 arg2) and use string.Format() after the condition is checked
Check the condition, format the message and throw the exception inline instead of calling Debugging.Assert()

We can then evaluate whether additional measures to improve performance of this feature are worth the effort. My guess is there will not be any additional measures required. This feature exists in Lucene and they were willing to accept its performance cost in the design as a tradeoff to properly run all of the test conditions when they are needed, including when end users run tests on their own components using the test framework or run CheckIndex.

jeme · 2020-10-26T11:40:50Z

I would also go for option 1.
Being the one that is most in line with how the build in Debug.Asserts looks means it's what will be most familiar IMO.

…ters so the parameters are not resolved until a condition fails. There are still some calls that do light math and pick items from arrays, but this performance hit in the tests is something we can live with for better production performance. Closes apache#375, closes apache#373, closes apache#372.

…ters so the parameters are not resolved until a condition fails. There are still some calls that do light math and pick items from arrays, but this performance hit in the tests is something we can live with for better production performance. Closes apache#346, closes apache#373, closes apache#372.

theolivenbaum added 4 commits October 17, 2020 16:41

Add new interface to NumericDocValues to close apache#370

0106e11

Avoid also allocation of a new FieldCache.T on every call to GetTs on…

c402c88

… FieldCacheImpl

Change Debugging.Assert usage to not allocate due to lambda capture

8ec9c36

start removing delegate-based Debugging.AssertsEnabled

383d347

theolivenbaum mentioned this pull request Oct 17, 2020

Identify/Fix Bottlenecks #261

Closed

theolivenbaum added 21 commits October 19, 2020 10:50

replace Debugging.Assert with 2 parameters

ec928a3

more fixes

ee96c10

few more

7ef9dd9

Refactor Debugging class

6749513

replacing with new methods

ae149b7

a few more refactorings

08407d6

Fix missing )

dd8d9d5

more refactoring

1945ddb

yet more refactoring

3b9b875

few more

9b42e4a

and it compiles again

34fdcf8

put condition inline

fc49e9f

Need to test first for AssertsEnabled to avoid computing the inner co…

d98b1da

…ndition before checking for AssertsEnabled

remove duplicated checks

f024470

Simplify AssertsEnabled + ShouldAssert(condition) + ThrowAssert()

7ad21cb

Simplify AssertsEnabled + ShouldAssert(condition) + ThrowAssert()

36c0da7

Simplify AssertsEnabled + ShouldAssert(condition) + ThrowAssert()

36d10b4

change if(ShouldAssert) to if(AssertsEnabled) ThrowIf

f38d341

rename ThrowAssertIf(bool) back to Assert(bool) to simplify github diff

c649fd4

more simplifications

70c6823

more simplification

06eb432

theolivenbaum marked this pull request as ready for review October 19, 2020 13:47

theolivenbaum added 3 commits October 19, 2020 15:57

Merge branch 'reduce-allocations-fieldcache' into remove-delegate-bas…

187a3be

…ed-debugging-assert

Revert "Merge branch 'reduce-allocations-fieldcache' into remove-dele…

5692658

…gate-based-debugging-assert" This reverts commit 187a3be, reversing changes made to 06eb432.

Update BlockTreeTermsWriter.cs

9b29119

NightOwl888 mentioned this pull request Nov 3, 2020

Fix for Delegate-based Asserts #383

Merged

NightOwl888 closed this in ca2ecb2 Nov 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove delegate based debugging assert #373

Remove delegate based debugging assert #373

theolivenbaum commented Oct 17, 2020

eladmarg commented Oct 17, 2020

NightOwl888 commented Oct 17, 2020

theolivenbaum commented Oct 19, 2020

theolivenbaum commented Oct 19, 2020

jeme commented Oct 21, 2020 •

edited

theolivenbaum commented Oct 21, 2020

NightOwl888 commented Oct 21, 2020

jeme commented Oct 22, 2020 •

edited

NightOwl888 commented Oct 22, 2020

jeme commented Oct 26, 2020

eladmarg commented Oct 26, 2020

NightOwl888 commented Oct 26, 2020

jeme commented Oct 26, 2020

Remove delegate based debugging assert #373

Remove delegate based debugging assert #373

Conversation

theolivenbaum commented Oct 17, 2020

eladmarg commented Oct 17, 2020

NightOwl888 commented Oct 17, 2020

theolivenbaum commented Oct 19, 2020

theolivenbaum commented Oct 19, 2020

jeme commented Oct 21, 2020 • edited

theolivenbaum commented Oct 21, 2020

NightOwl888 commented Oct 21, 2020

Boolean Switch

Assert(false)

jeme commented Oct 22, 2020 • edited

NightOwl888 commented Oct 22, 2020

jeme commented Oct 26, 2020

eladmarg commented Oct 26, 2020

NightOwl888 commented Oct 26, 2020

jeme commented Oct 26, 2020

jeme commented Oct 21, 2020 •

edited

jeme commented Oct 22, 2020 •

edited