Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory regression in DuplicatePropertyNameChecker #2834

Merged
merged 4 commits into from Jan 5, 2024

Conversation

habbes
Copy link
Contributor

@habbes habbes commented Jan 4, 2024

Issues

*This pull request fixes #2813

Description

This PR fixes a couple of "low-hanging" excessive memory allocaitons.

DuplicatePropertyNameChecker

The #if flags we used to conditionally compile object-pooling for the DuplicatePropertyNameChecker only applied to .NET Standard 2.0. I've extended them to also cover .NET Core 3.1. The regression was probably introduced when we added .netcoreapp3.1 as an explicit framework target.

I also made NullDuplicatePropertyChecker a "global" singleton instead of creating per request/WriterValidator. The impact here is low, but it doesn't hurt. In any case, I'd argue it makes the code cleaner anyway.

LINQ Func<T,bool> predicate

Found a case of Func<T,bool> lurking on a hot path from calling resource.NonComputedProperties.SingleOrDefault(r => r.Name == propertyName). I reduced the allocation by implementing SingleOrDefault manually (raw while loop and enumerator.MoveNext()).

While this reduced the allocations, I question our use of SingleOrDefault. Personally, I think we should reduce or avoid the use of SingleOrDefault in our library. In cases where we need to validate uniqueness, I think it would much better to consider pre-validating the collection before use, or ways to ensure we validate uniqueness only once, or in some cases just consider duplicates undefined behaviour and leave it to the customer to deal with. In this case, we potentially do a full scan of the property list each time we need to fetch the value of a single property. And yet we'll still validate property name uniqueness with the duplicate property name checker inside the writer.

Update

Following this comment thread: #2834 (comment), I made the following change:

I removed the duplicate property name check from the ODataResourceMetadataContext because:

  • it doesn't break existing tests
  • the duplicate check adds extra cost on a hot path

I opted not to move the duplicate-check to the ODataResource.Properties setter because:

  • we already have duplicate property name checking logic in the writer via DuplicatePropertyNameChecker
  • in WebApi and OData Client, we can guarantee in the serializer that the property names are unique. That's also the best place to guarantee uniqueness since they have access the original property collection
  • For customers using ODL directly, and who have disabled writer validation, we can assume they know what they're doing and already ensure that properties are unique.

However, if the last assumption doesn't hold, this could be a breaking change (though I think chances are extremely low). So, I'm open to restoring the original behaviour or moving the duplicate-check to the ODataResource.Properties setter if you or someone else believes it necessary to do so.

Checklist (Uncheck if it is not completed)

  • Test cases added
  • Build and test with one-click build and test script passed

Results

Before

image

After

image

Before
image

After

image

image

Before
image

After
image

Before
image

After
image

Before
image

After
image

using System.Collections.Generic;
#if NETSTANDARD2_0_OR_GREATER
#if NETSTANDARD2_0_OR_GREATER || NETCOREAPP3_1_OR_GREATER
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't NETSTANDARD2_0_OR_GREATER also include NETCOREAPP_3_1+?

Copy link
Contributor Author

@habbes habbes Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past #if NETSTANDARD2_0_OR_GREATER used to apply to .NET Core 3.1 and later. But when we added .netcoreapp3.1 as an explicit target framework in our csproj file, NETSTANDARD2_0_OR_GREATER stopped applying to it. I don't know why actually. But since then, we had to manually use NETCOREAPP3_1_OR_GREATER.

{
// since this method is called frequently, we implement SingleOrDefault manually
// to avoid allocating predicate closures.
using (IEnumerator<ODataProperty> e = resource.NonComputedProperties.GetEnumerator())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do the check at the time of adding to NonComputedProperties? And/or use a data structure that can do this validation at point of adding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think validating at the time of adding NonComputedProperties or using a better suited data structure would indeed be better. But I didn't want to risk breaking existing behaviour since I hadn't done a thorough analysis to ensure changing the behaviour here doesn't break any assumptions its callers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One challenge with validating the uniqueness properties at the time that they are added to NonComputedProperties is that we don't know when the properties are added to the collection. The NonComputedProperties refers to the ODatarResource.Properties property, which is an IEnumerable<ODataProperty>.

We can detect when the properties collection is set in the setter, and do the validation there, but we can't tell if new entries are added to the collection after the property has been set. For example, if we only validate the properties collection in the setter, we would not be able to catch the following violation:

var properties = new List<ODataProperty>
{
   new ODataProperty { Name = "Foo", Value = "Bar" },
   new ODataProperty { Name = "A", Value = "B" }
}

odataResource.Properties = properties; // duplicate name validation would occur here

properties.Add(new ODataProperty { Name = "Foo", Value = "Baz" }); // this duplicate property name violation would not be caught

It's worthwhile to note that we do already perform so other verification in the setter, and we can argue that the verification would also not catch violations that happen after the property has been set.

My only concern is that taking this route for the uniqueness check is effectively changing existing behaviour where a property is guaranteed to be unique at the type we query its value, even if the collection had changed since the ODataResource.Properties was set. This is behaviour if I have not yet verified is safe to break.

That said, my opinion is that we should probably leave it to the user to ensure they gave us unique properties and if they violate that we treat that as unsafe/undefined behaviour with unpredictable results.

It's also worth noting that the duplicate name checker will still verify property uniqueness independently of the check we do here, unless validation is disabled (in which case we probably shouldn't be doing that verification here either anyway). So, this check is sort of redundant.

Copy link
Contributor Author

@habbes habbes Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed to the duplicate check and no unit test has failed. We also have other cases where we (questionably) use SingleOrDefault in our libraries, there's a chance the original author did not add with intentions to ensure strong guarantees of uniqueness, but more as a sanity check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have revised the code and removed the duplicate check, the implementation now has FirstOrDefault() semantics instead of SingleOrDefault().

I removed the duplicate-check from this method because:

  • it doesn't break existing tests
  • the duplicate check adds extra cost on a hot path

I opted not to move the duplicate-check to the ODataResource.Properties setter because:

  • we already have a duplicate property name checking logic in the writer via DuplicatePropertyNameChecker
  • in WebApi and OData Client, we can guarantee in the serializer that the property names are unique. That's also the best place to guarantee uniqueness since they have access the original property collection
  • For customers using ODL directly, and who have disabled writer validation, we can assume they know what they're doing and already ensure that properties are unique.

However, if the last assumption doesn't hold, this could be a breaking change (though I think chances are extremely low). So, I'm open to restoring the original behaviour or moving the duplicate-check to the ODataResource.Properties setter if you or someone else believes it necessary to do so.

Copy link

@odero odero Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do alloc results look like after removing the duplicate check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not collected alloc results after removing the duplicate check (I don't expect the result to be much different), but since I'm currently working on an area that calls this method, I've just taken CPU samples before and after applying this change:

You can see the TryGetPrimitiveOrEnumPropertyValue went down from 0.96% to 0.52% CPU. In the before graph, we can see that the lion's share was from TryGetSingle (which is called by SingleOrDefault())

Before

image

After

image

This PR has 27 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!


Quantification details

Label      : Extra Small
Size       : +17 -10
Percentile : 10.8%

Total files changed: 4

Change summary by file extension:
.cs : +12 -10
.csproj : +5 -0

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

  • Fast and predictable releases to production:
    • Optimal size changes are more likely to be reviewed faster with fewer
      iterations.
    • Similarity in low PR complexity drives similar review times.
  • Review quality is likely higher as complexity is lower:
    • Bugs are more likely to be detected.
    • Code inconsistencies are more likely to be detected.
  • Knowledge sharing is improved within the participants:
    • Small portions can be assimilated better.
  • Better engineering practices are exercised:
    • Solving big problems by dividing them in well contained, smaller problems.
    • Exercising separation of concerns within the code changes.

What can I do to optimize my changes

  • Use the PullRequestQuantifier to quantify your PR accurately
    • Create a context profile for your repo using the context generator
    • Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
    • Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
    • Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
  • Change your engineering behaviors
    • For PRs that fall outside of the desired spectrum, review the details and check if:
      • Your PR could be split in smaller, self-contained PRs instead
      • Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

  • One line was added: +1 -0
  • One line was deleted: +0 -1
  • One line was modified: +1 -1 (git diff doesn't know about modified, it will
    interpret that line like one addition plus one deletion)
  • Change percentiles: Change characteristics (addition, deletion, modification)
    of this PR in relation to all other PRs within the repository.


Was this comment helpful? 👍  :ok_hand:  :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

@habbes habbes requested a review from odero January 5, 2024 05:06
Copy link
Contributor

@gathogojr gathogojr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@habbes habbes merged commit 7ccb467 into OData:master Jan 5, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Perf regression: DuplicatePropertyNameChecker allocations
3 participants