Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace CopyOnReadEnumerable with immutable collections #6176

Closed
Tracked by #6940
KirillOsenkov opened this issue Feb 21, 2021 · 9 comments · Fixed by #7117
Closed
Tracked by #6940

Replace CopyOnReadEnumerable with immutable collections #6176

KirillOsenkov opened this issue Feb 21, 2021 · 9 comments · Fixed by #7117
Assignees
Labels
backlog performance Performance-Scenario-Build This issue affects build performance. Priority:2 Work that is important, but not critical for the release size:1 triaged

Comments

@KirillOsenkov
Copy link
Member

The whole machinery around ProjectPropertyInstanceEnumeratorProxy, ProjectItemInstanceEnumeratorProxy, ItemDictionary and CopyOnReadEnumerable could really benefit from immutable collections.

A lot of unnecessary work and allocations are happening to enumerate properties and items.

@KirillOsenkov KirillOsenkov added performance needs-triage Have yet to determine what bucket this goes in. Performance-Scenario-Build This issue affects build performance. labels Feb 21, 2021
@KirillOsenkov
Copy link
Member Author

I pulled on the string some more and ended up in ProjectInstance, where _items is set in several places, including Translator. It might make sense to just use a lighterweight read-only collection in situations where items are deserialized such that we don't have to clone and use CopyOnWrite. Maybe seal the items themselves somehow (like use TaskItemData) to avoid wrapping each item in ProjectItemInstance.TaskItem.

@KirillOsenkov
Copy link
Member Author

Russian Dolls:
image

@ladipro ladipro added backlog performance and removed performance needs-triage Have yet to determine what bucket this goes in. labels Feb 25, 2021
@panopticoncentral panopticoncentral added the Priority:2 Work that is important, but not critical for the release label Mar 23, 2021
@ladipro ladipro added the size:1 label Oct 12, 2021
@ladipro
Copy link
Member

ladipro commented Oct 12, 2021

Labeling with size:1 as the cost of the initial investigation. Will open a follow-up issue if more work is identified.

@ladipro ladipro self-assigned this Oct 27, 2021
@ladipro ladipro moved this from To Do to In Progress in Perf sprint Nov 22nd - Dec 5th 2021 Dec 3, 2021
@ladipro ladipro moved this from In Progress to In Review in Perf sprint Nov 22nd - Dec 5th 2021 Dec 6, 2021
AR-May pushed a commit that referenced this issue Dec 10, 2021
Fixes #6176

Context
The props and items sent to loggers as part of ProjectStartedEventArgs are passed in special collections holding snapshots of props/items that are created lazily when a logger starts enumerating them. These props/items are guaranteed to be deep copies of the real props/items such that no modifications that a logger may make don't propagate back.

Changes Made
For props, the collection holds tuples of prop name and evaluated value. Both are strings so the concept of deep copying is implicit (strings are immutable). Similarly, for items we return TaskItem instances created over the original item, which guarantees the deep-copy semantics (TaskItems holds a copy of the metadata and a few strings). Consequentially, there is no need to explicitly deep-copy the props/items before enumeration and the IDeepCloneable<T> interface becomes unused.

As an additional code cleanup, ProjectPropertyInstanceEnumeratorProxy and ProjectItemInstanceEnumeratorProxy can easily be replaced with the Select LINQ operation and are thus not needed.

Here's a detailed analysis of the code that runs when enumerating props and items, without and with these changes:

Props
Before: When a logger calls GetEnumerator CopyOnReadEnumerable creates a list and populates it with the results of calling DeepClone on the props. During enumeration the iterator in ProjectPropertyInstanceEnumeratorProxy extracts the Name and EvaluatedValue and returns them to the logger.
After: When a logger calls GetEnumerator CopyOnReadEnumerable creates a list and populates it with [Name, EvaluatedValue] tuples, saving the intermediate step of cloning.

Items
Before: When a logger calls GetEnumerator CopyOnReadEnumerable creates a list and populates it with the results of calling DeepClone on the items. During enumeration the iterator in ProjectItemInstanceEnumeratorProxy creates new instances of TaskItem and returns them to the logger.
After: When a logger calls GetEnumerator CopyOnReadEnumerable creates a list and populates it with [ItemType, TaskItem] tuples, saving the intermediate step of cloning.

Semantically the old and new behaviors are equivalent. A snapshot of prop/items is made at the point where GetEnumerator is called.

Testing
Existing unit tests, some tests were modified.
@ladipro
Copy link
Member

ladipro commented Dec 10, 2021

@KirillOsenkov I have eliminated redundant deep cloning in the attached PR. I don't think it's worth optimizing further as this work/allocations are relatively minor. When building a medium sized solution with /v:diag we spend ~80 ms and allocate ~1.2 MB in BaseConsoleLogger.ExtractItemList and BaseConsoleLogger.ExtractPropertyList combined. For both metrics this is about 0.1% of total.

@ladipro ladipro moved this from In Review to Done in Perf sprint Nov 22nd - Dec 5th 2021 Dec 10, 2021
@KirillOsenkov
Copy link
Member Author

@ladipro nice, thanks!

I've also found https://source.dot.net/#Microsoft.Build/BackEnd/Components/Logging/TargetLoggingContext.cs,d99a8dbf081f1a4d,references

Should we do the same thing for target items too? Shall we file a separate issue?

@KirillOsenkov
Copy link
Member Author

Also see related:
#6704

Since items aren't truly immutable, we have a bug where we capture the live data structure for logging instead of capturing an immutable snapshot. If the core data structures are truly immutable, we can give an immutable snapshot to loggers without any copying whatsoever, this will also avoid allocations.

@KirillOsenkov
Copy link
Member Author

Specifically, see this comment:

// For target outputs bypass copying of all items to save on performance.
// The proxy creates a deep clone of each item to protect against writes,
// but since we're not writing we don't need the deep cloning.
// Additionally, it is safe to access the underlying List<ITaskItem> as it's allocated
// in a single location and noboby else mutates it after that:
// https://github.com/dotnet/msbuild/blob/f0eebf2872d76ab0cd43fdc4153ba636232b222f/src/Build/BackEnd/Components/RequestBuilder/TargetEntry.cs#L564
if (items is TargetLoggingContext.TargetOutputItemsInstanceEnumeratorProxy proxy)
{
items = proxy.BackingItems;
}

@ladipro
Copy link
Member

ladipro commented Dec 10, 2021

Should we do the same thing for target items too? Shall we file a separate issue?

Yes to both! If you file an issue, may I ask you to include a high-level description of the MSBuild logging infra? How log events flow today and the shortcomings / inefficiencies you are aware of.

@KirillOsenkov
Copy link
Member Author

OK here you go: #7142

mruxmohan4 pushed a commit to mruxmohan4/msbuild that referenced this issue Dec 13, 2021
Fixes dotnet#6176

Context
The props and items sent to loggers as part of ProjectStartedEventArgs are passed in special collections holding snapshots of props/items that are created lazily when a logger starts enumerating them. These props/items are guaranteed to be deep copies of the real props/items such that no modifications that a logger may make don't propagate back.

Changes Made
For props, the collection holds tuples of prop name and evaluated value. Both are strings so the concept of deep copying is implicit (strings are immutable). Similarly, for items we return TaskItem instances created over the original item, which guarantees the deep-copy semantics (TaskItems holds a copy of the metadata and a few strings). Consequentially, there is no need to explicitly deep-copy the props/items before enumeration and the IDeepCloneable<T> interface becomes unused.

As an additional code cleanup, ProjectPropertyInstanceEnumeratorProxy and ProjectItemInstanceEnumeratorProxy can easily be replaced with the Select LINQ operation and are thus not needed.

Here's a detailed analysis of the code that runs when enumerating props and items, without and with these changes:

Props
Before: When a logger calls GetEnumerator CopyOnReadEnumerable creates a list and populates it with the results of calling DeepClone on the props. During enumeration the iterator in ProjectPropertyInstanceEnumeratorProxy extracts the Name and EvaluatedValue and returns them to the logger.
After: When a logger calls GetEnumerator CopyOnReadEnumerable creates a list and populates it with [Name, EvaluatedValue] tuples, saving the intermediate step of cloning.

Items
Before: When a logger calls GetEnumerator CopyOnReadEnumerable creates a list and populates it with the results of calling DeepClone on the items. During enumeration the iterator in ProjectItemInstanceEnumeratorProxy creates new instances of TaskItem and returns them to the logger.
After: When a logger calls GetEnumerator CopyOnReadEnumerable creates a list and populates it with [ItemType, TaskItem] tuples, saving the intermediate step of cloning.

Semantically the old and new behaviors are equivalent. A snapshot of prop/items is made at the point where GetEnumerator is called.

Testing
Existing unit tests, some tests were modified.
@AR-May AR-May added the triaged label Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog performance Performance-Scenario-Build This issue affects build performance. Priority:2 Work that is important, but not critical for the release size:1 triaged
Development

Successfully merging a pull request may close this issue.

4 participants