Fix memory regression in DuplicatePropertyNameChecker #2834

habbes · 2024-01-04T13:32:42Z

Issues

*This pull request fixes #2813

Description

This PR fixes a couple of "low-hanging" excessive memory allocaitons.

`DuplicatePropertyNameChecker`

The #if flags we used to conditionally compile object-pooling for the DuplicatePropertyNameChecker only applied to .NET Standard 2.0. I've extended them to also cover .NET Core 3.1. The regression was probably introduced when we added .netcoreapp3.1 as an explicit framework target.

I also made NullDuplicatePropertyChecker a "global" singleton instead of creating per request/WriterValidator. The impact here is low, but it doesn't hurt. In any case, I'd argue it makes the code cleaner anyway.

LINQ `Func<T,bool>` predicate

Found a case of Func<T,bool> lurking on a hot path from calling resource.NonComputedProperties.SingleOrDefault(r => r.Name == propertyName). I reduced the allocation by implementing SingleOrDefault manually (raw while loop and enumerator.MoveNext()).

While this reduced the allocations, I question our use of SingleOrDefault. Personally, I think we should reduce or avoid the use of SingleOrDefault in our library. In cases where we need to validate uniqueness, I think it would much better to consider pre-validating the collection before use, or ways to ensure we validate uniqueness only once, or in some cases just consider duplicates undefined behaviour and leave it to the customer to deal with. In this case, we potentially do a full scan of the property list each time we need to fetch the value of a single property. And yet we'll still validate property name uniqueness with the duplicate property name checker inside the writer.

Update

Following this comment thread: #2834 (comment), I made the following change:

I removed the duplicate property name check from the ODataResourceMetadataContext because:

it doesn't break existing tests
the duplicate check adds extra cost on a hot path

I opted not to move the duplicate-check to the ODataResource.Properties setter because:

we already have duplicate property name checking logic in the writer via DuplicatePropertyNameChecker
in WebApi and OData Client, we can guarantee in the serializer that the property names are unique. That's also the best place to guarantee uniqueness since they have access the original property collection
For customers using ODL directly, and who have disabled writer validation, we can assume they know what they're doing and already ensure that properties are unique.

However, if the last assumption doesn't hold, this could be a breaking change (though I think chances are extremely low). So, I'm open to restoring the original behaviour or moving the duplicate-check to the ODataResource.Properties setter if you or someone else believes it necessary to do so.

Checklist (Uncheck if it is not completed)

Test cases added
Build and test with one-click build and test script passed

Results

Before

After

Before

After

Before

After

Before

After

Before

After

odero · 2024-01-04T14:50:11Z

src/Microsoft.OData.Core/WriterValidator.cs

    using System.Collections.Generic;
-#if NETSTANDARD2_0_OR_GREATER
+#if NETSTANDARD2_0_OR_GREATER || NETCOREAPP3_1_OR_GREATER


Why doesn't NETSTANDARD2_0_OR_GREATER also include NETCOREAPP_3_1+?

In the past #if NETSTANDARD2_0_OR_GREATER used to apply to .NET Core 3.1 and later. But when we added .netcoreapp3.1 as an explicit target framework in our csproj file, NETSTANDARD2_0_OR_GREATER stopped applying to it. I don't know why actually. But since then, we had to manually use NETCOREAPP3_1_OR_GREATER.

odero · 2024-01-04T14:57:43Z

src/Microsoft.OData.Core/Evaluation/ODataResourceMetadataContext.cs

+            {
+                // since this method is called frequently, we implement SingleOrDefault manually
+                // to avoid allocating predicate closures.
+                using (IEnumerator<ODataProperty> e = resource.NonComputedProperties.GetEnumerator())


Why not do the check at the time of adding to NonComputedProperties? And/or use a data structure that can do this validation at point of adding?

I think validating at the time of adding NonComputedProperties or using a better suited data structure would indeed be better. But I didn't want to risk breaking existing behaviour since I hadn't done a thorough analysis to ensure changing the behaviour here doesn't break any assumptions its callers.

One challenge with validating the uniqueness properties at the time that they are added to NonComputedProperties is that we don't know when the properties are added to the collection. The NonComputedProperties refers to the ODatarResource.Properties property, which is an IEnumerable<ODataProperty>.

We can detect when the properties collection is set in the setter, and do the validation there, but we can't tell if new entries are added to the collection after the property has been set. For example, if we only validate the properties collection in the setter, we would not be able to catch the following violation:

var properties = new List<ODataProperty> { new ODataProperty { Name = "Foo", Value = "Bar" }, new ODataProperty { Name = "A", Value = "B" } } odataResource.Properties = properties; // duplicate name validation would occur here properties.Add(new ODataProperty { Name = "Foo", Value = "Baz" }); // this duplicate property name violation would not be caught

It's worthwhile to note that we do already perform so other verification in the setter, and we can argue that the verification would also not catch violations that happen after the property has been set.

My only concern is that taking this route for the uniqueness check is effectively changing existing behaviour where a property is guaranteed to be unique at the type we query its value, even if the collection had changed since the ODataResource.Properties was set. This is behaviour if I have not yet verified is safe to break.

That said, my opinion is that we should probably leave it to the user to ensure they gave us unique properties and if they violate that we treat that as unsafe/undefined behaviour with unpredictable results.

It's also worth noting that the duplicate name checker will still verify property uniqueness independently of the check we do here, unless validation is disabled (in which case we probably shouldn't be doing that verification here either anyway). So, this check is sort of redundant.

I've removed to the duplicate check and no unit test has failed. We also have other cases where we (questionably) use SingleOrDefault in our libraries, there's a chance the original author did not add with intentions to ensure strong guarantees of uniqueness, but more as a sanity check.

I have revised the code and removed the duplicate check, the implementation now has FirstOrDefault() semantics instead of SingleOrDefault().

I removed the duplicate-check from this method because:

it doesn't break existing tests

the duplicate check adds extra cost on a hot path

I opted not to move the duplicate-check to the ODataResource.Properties setter because:

we already have a duplicate property name checking logic in the writer via DuplicatePropertyNameChecker

in WebApi and OData Client, we can guarantee in the serializer that the property names are unique. That's also the best place to guarantee uniqueness since they have access the original property collection

For customers using ODL directly, and who have disabled writer validation, we can assume they know what they're doing and already ensure that properties are unique.

However, if the last assumption doesn't hold, this could be a breaking change (though I think chances are extremely low). So, I'm open to restoring the original behaviour or moving the duplicate-check to the ODataResource.Properties setter if you or someone else believes it necessary to do so.

What do alloc results look like after removing the duplicate check?

I had not collected alloc results after removing the duplicate check (I don't expect the result to be much different), but since I'm currently working on an area that calls this method, I've just taken CPU samples before and after applying this change:

You can see the TryGetPrimitiveOrEnumPropertyValue went down from 0.96% to 0.52% CPU. In the before graph, we can see that the lion's share was from TryGetSingle (which is called by SingleOrDefault())

Before

After

pull-request-quantifier-deprecated · 2024-01-05T04:48:41Z

This PR has 27 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!

Quantification details

Label      : Extra Small
Size       : +17 -10
Percentile : 10.8%

Total files changed: 4

Change summary by file extension:
.cs : +12 -10
.csproj : +5 -0

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

Fast and predictable releases to production:
- Optimal size changes are more likely to be reviewed faster with fewer
  iterations.
- Similarity in low PR complexity drives similar review times.
Review quality is likely higher as complexity is lower:
- Bugs are more likely to be detected.
- Code inconsistencies are more likely to be detected.
Knowledge sharing is improved within the participants:
- Small portions can be assimilated better.
Better engineering practices are exercised:
- Solving big problems by dividing them in well contained, smaller problems.
- Exercising separation of concerns within the code changes.

What can I do to optimize my changes

Use the PullRequestQuantifier to quantify your PR accurately
- Create a context profile for your repo using the context generator
- Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
- Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
- Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
Change your engineering behaviors
- For PRs that fall outside of the desired spectrum, review the details and check if:
  - Your PR could be split in smaller, self-contained PRs instead
  - Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

One line was added: +1 -0
One line was deleted: +0 -1
One line was modified: +1 -1 (git diff doesn't know about modified, it will
interpret that line like one addition plus one deletion)
Change percentiles: Change characteristics (addition, deletion, modification)
of this PR in relation to all other PRs within the repository.

Was this comment helpful? 👍 :ok_hand: :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

gathogojr

habbes added 2 commits January 4, 2024 16:14

Use object pooling in .NET Core 3.1 and later

eda643f

Make NullDuplicatePropertyNameChecker a global singleton

fbe5c63

pull-request-quantifier-deprecated bot added the Extra Small label Jan 4, 2024

Reduce Func<> allocations in SingleOrDefault() call

0e97237

habbes requested review from KenitoInc, corranrogue9, xuzhg and gathogojr January 4, 2024 14:23

odero reviewed Jan 4, 2024

View reviewed changes

Remove uniqueness check

b795c7b

habbes requested a review from odero January 5, 2024 05:06

odero approved these changes Jan 5, 2024

View reviewed changes

gathogojr approved these changes Jan 5, 2024

View reviewed changes

habbes merged commit 7ccb467 into OData:master Jan 5, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memory regression in DuplicatePropertyNameChecker #2834

Fix memory regression in DuplicatePropertyNameChecker #2834

habbes commented Jan 4, 2024 •

edited

odero Jan 4, 2024

habbes Jan 5, 2024 •

edited

odero Jan 4, 2024

habbes Jan 5, 2024

habbes Jan 5, 2024

habbes Jan 5, 2024 •

edited

habbes Jan 5, 2024

odero Jan 5, 2024 •

edited

habbes Jan 5, 2024

pull-request-quantifier-deprecated bot commented Jan 5, 2024

What can I do to optimize my changes

How to interpret the change counts in git diff output

gathogojr left a comment

Fix memory regression in DuplicatePropertyNameChecker #2834

Fix memory regression in DuplicatePropertyNameChecker #2834

Conversation

habbes commented Jan 4, 2024 • edited

Issues

Description

DuplicatePropertyNameChecker

LINQ Func<T,bool> predicate

Checklist (Uncheck if it is not completed)

Results

odero Jan 4, 2024

Choose a reason for hiding this comment

habbes Jan 5, 2024 • edited

Choose a reason for hiding this comment

odero Jan 4, 2024

Choose a reason for hiding this comment

habbes Jan 5, 2024

Choose a reason for hiding this comment

habbes Jan 5, 2024

Choose a reason for hiding this comment

habbes Jan 5, 2024 • edited

Choose a reason for hiding this comment

habbes Jan 5, 2024

Choose a reason for hiding this comment

odero Jan 5, 2024 • edited

Choose a reason for hiding this comment

habbes Jan 5, 2024

Choose a reason for hiding this comment

pull-request-quantifier-deprecated bot commented Jan 5, 2024

What can I do to optimize my changes

How to interpret the change counts in git diff output

gathogojr left a comment

Choose a reason for hiding this comment

habbes commented Jan 4, 2024 •

edited

`DuplicatePropertyNameChecker`

LINQ `Func<T,bool>` predicate

habbes Jan 5, 2024 •

edited

habbes Jan 5, 2024 •

edited

odero Jan 5, 2024 •

edited