Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize RAR's GetReferenceItems method #5929

Merged
merged 23 commits into from Feb 6, 2021
Merged

Conversation

@Forgind
Copy link
Member

@Forgind Forgind commented Dec 2, 2020

This change should not alter behavior other than changing casing on metadata (that should be case-insensitive), but it includes a series of changes (see individual commits) that simplify the code, correct typos, or prevent MSBuild from having to do unnecessary work. (Note that this means that most of the commits are expected to have no relevant impact on perf despite the title of this PR.) From building OrchardCore once with and once without this change, RAR's execution time declined by about 2.5% when there were no caches. This part of RAR does not rely on the caches, so this gain will not be wiped out by including caches.

Forgind added 10 commits Dec 2, 2020
Changes "true" to "True" and "false" to "False", but that's ok because it should be case-insensitive.
This means we don't have to repeatedly unset it.
…emoved once

nonForwardableMetadata?.Remove(ItemMetadataNames.*) is used to prevent the correct metadata value from being overwritten when metadata are re-added from nonForwardableMetadata.
CopyOnWriteDictionary<string> copiedMetadata = _metadata.Clone(); // Copy on write!
foreach (KeyValuePair<string, string> entry in destinationAsTaskItem.Metadata)
{
copiedMetadata[entry.Key] = entry.Value;

This comment has been minimized.

@danmoseley

danmoseley Dec 2, 2020
Member

Could you clarify the purpose of this change? Is this to optimistically optimize the case where you are copying metadata values that exactly match the destination (in value and count) - in this case the foreach here will not cause a "copy" of the dictionary because it will just be writing the same values? If that's the case, why not destinationAsTaskItem.Metadata?.Count == _metadata?.Count?

This comment has been minimized.

@Forgind

Forgind Dec 3, 2020
Author Member

There was previously an optimization for if destinationAsTaskItem.Metadata was null, i.e., its count was 0, but there are also cases in which its count is small compared to _metadata?.Count. This is trying to improve that case, where there are a few metadata defined for destinationAsTaskItem but not as many as for the source. In that case, it's better to modify a clone of the source rather than modifying the destination directly.

CopyOnWriteDictionary<string> copiedMetadata = _metadata.Clone(); // Copy on write!
foreach (KeyValuePair<string, string> entry in destinationAsTaskItem.Metadata)
{
copiedMetadata[entry.Key] = entry.Value;

This comment has been minimized.

@danmoseley

danmoseley Dec 2, 2020
Member

Should this be escaping, as on line 331? I'm not sure, but you'll want to make sure there's a test. IIRC, everything is always escaped when it enters the MSBuild engine, and unescaped on the way out.

This comment has been minimized.

@Forgind

Forgind Dec 3, 2020
Author Member

This is copying metadata from a TaskItem to a TaskItem, so its escaped status should be the same in both cases. Just copying it over without escaping or unescaping it makes sense to me.

Line 331 is for ITaskItems that are not ITaskItem2s, and this will always be an ITaskItem2, so if some escaping/unescaping is warranted, it should be replacing entry.Value with destinationAsTaskItem.GetMetadataValueEscaped(entry.Key), but that's less performant, and I don't understand why that would be warranted. Doesn't mean I'm right, though.

{
referenceItem.SetMetadata(ItemMetadataNames.copyLocal, "false");
}
referenceItem.SetMetadata(ItemMetadataNames.copyLocal, reference.IsCopyLocal.ToString());

This comment has been minimized.

@Forgind

Forgind Dec 4, 2020
Author Member

Convert to ternary operator

Forgind added 2 commits Dec 4, 2020
Also, comparing to a null value apparently always returns false, so the check I was using was bogus. This fixes that.
{
destinationAsTaskItem.Metadata = _metadata.Clone(); // Copy on write!
CopyOnWriteDictionary<string> copiedMetadata;
if (destinationAsTaskItem.Metadata == null || destinationAsTaskItem.Metadata.Count < _metadata.Count)

This comment has been minimized.

@rainersigwald

rainersigwald Dec 4, 2020
Contributor

Comment, please. And I think I personally would find it clearer if there were three explicit cases:

  1. Destination has no metadata.
  2. Destination has less metadata than source.
  3. Source has more metadata than destination.
@@ -2777,7 +2765,6 @@ private ITaskItem SetItemMetadata(List<ITaskItem> relatedItems, List<ITaskItem>
// Clone metadata.
referenceItem.CopyMetadataTo(item);
// We don't have a fusion name for scatter files.

This comment has been minimized.

@rainersigwald

rainersigwald Dec 4, 2020
Contributor

Remove these comments please.

// Set the FusionName metadata properly.
referenceItem.SetMetadata(ItemMetadataNames.fusionName, fusionName);
Comment on lines 2788 to 2789

This comment has been minimized.

@rainersigwald

rainersigwald Dec 4, 2020
Contributor

What does "properly" mean? In general, how do you know this is safe? Is the local variable fusionName guaranteed to be empty for all the paths where it was previously unset? How do you know?

This comment has been minimized.

@Forgind

Forgind Dec 4, 2020
Author Member

If any of reference.GetSourceItems() has a fusionName, or reference.PrimarySourceItem has one, fusionName may have been set to an invalid value at this point. We remove it before forwarding referenceItem's metadata to other items, but we still need to set it properly at the end, i.e., give it the right fusionName.

This comment has been minimized.

@Forgind

Forgind Dec 4, 2020
Author Member

Would you mind clarifying what you meant about fusionName being guaranteed to be empty? Previously, fusionName was set to a value near the top of this method, that value was passed on to a variety of "relatedItems," "satelliteItems," etc., and then it was unset from all of those other items. Here, instead, I'm unsetting it prior to using fusionName for any of those, and only giving it its value at the bottom, so CopyMetadataTo automatically skips fusionName.

This comment has been minimized.

@rainersigwald

rainersigwald Dec 4, 2020
Contributor

The meta comment is: "properly" is basically useless in a code comment. Comments describe what you want the code to do/why it's done, and of course you want to do things "properly". Instead, describe details. Here, you might have a comment that fusionName should be set last, so that it doesn't get copied to the derived items but is still available on the primary item.

item.RemoveMetadata(ItemMetadataNames.winmdImplmentationFile);
item.RemoveMetadata(ItemMetadataNames.imageRuntime);
item.RemoveMetadata(ItemMetadataNames.winMDFile);
Dictionary<string, string> metadata = new Dictionary<string, string>();

This comment has been minimized.

@rainersigwald

rainersigwald Dec 4, 2020
Contributor

is there a more descriptive name for this variable? removedMetadata?

if (!String.IsNullOrEmpty(meta))
{
metadata.Add(ItemMetadataNames.targetPath, meta);
}
item.RemoveMetadata(ItemMetadataNames.targetPath);

This comment has been minimized.

@rainersigwald

rainersigwald Dec 4, 2020
Contributor

Since this one has to be next to the add to the new list, can you move the others to be next to their equivalents too?

List<string> relatedFileExtensions = reference.GetRelatedFileExtensions();
List<string> satellites = reference.GetSatelliteFiles();
List<string> serializationAssemblyFiles = reference.GetSerializationAssemblyFiles();
string[] scatterFiles = reference.GetScatterFiles();
Comment on lines +2686 to +2689

This comment has been minimized.

@rainersigwald

rainersigwald Dec 4, 2020
Contributor

My gut feeling is that these will all be small enough that keeping them live all at once instead of in sequence won't be a memory problem. But do you have data on that?

This comment has been minimized.

@Forgind

Forgind Dec 4, 2020
Author Member

Nope. How would I get such data? Build OrchardCore and...somehow check whether something is being paged out?

// Set the FusionName metadata properly.
referenceItem.SetMetadata(ItemMetadataNames.fusionName, fusionName);

if (nonForwardableMetadata != null)

This comment has been minimized.

@rainersigwald

rainersigwald Dec 4, 2020
Contributor

Comment please

referenceItem.SetMetadata(ItemMetadataNames.winmdImplmentationFile, Path.GetFileName(reference.ImplementationAssembly));
nonForwardableMetadata?.Remove(ItemMetadataNames.winmdImplmentationFile);
Comment on lines 2772 to 2773

This comment has been minimized.

@rainersigwald

rainersigwald Dec 4, 2020
Contributor

These inverted remove-from-the-removal-list things definitely need a comment please.

@@ -3455,7 +3455,7 @@ public void PrimaryFXAssemblyRefIsNotCopyLocal()

Assert.Single(t.ResolvedFiles);
Assert.Equal(Path.Combine(s_myVersion20Path, "System.Data.dll"), t.ResolvedFiles[0].ItemSpec);
Assert.Equal("false", t.ResolvedFiles[0].GetMetadata("CopyLocal"));

This comment has been minimized.

@rainersigwald

rainersigwald Dec 4, 2020
Contributor

Revert all these please since you went back to the ternary.

// This happens when the redist root is "null". This means there
// was no IsRedistRoot flag in the Redist XML (or there was no
// redist XML at all for this item).
referenceItem.SetMetadata(ItemMetadataNames.isRedistRoot, (bool)reference.IsRedistRoot ? "true" : "false");

This comment has been minimized.

@rainersigwald

rainersigwald Dec 4, 2020
Contributor

This cast stands out. Should it be

Suggested change
referenceItem.SetMetadata(ItemMetadataNames.isRedistRoot, (bool)reference.IsRedistRoot ? "true" : "false");
referenceItem.SetMetadata(ItemMetadataNames.isRedistRoot, reference.IsRedistRoot == true ? "true" : "false");

This comment has been minimized.

@Forgind

Forgind Dec 4, 2020
Author Member

Both ways work. We did a null check just before this, so it's either true or false.

else
{
copiedMetadata = destinationAsTaskItem.Metadata.Clone();
copiedMetadata.SetItems(Metadata.Where(entry => !destinationAsTaskItem.Metadata.ContainsKey(entry.Key)));

This comment has been minimized.

@Forgind

Forgind Dec 4, 2020
Author Member

I was unsure between using Where or setting them one-by-one in this case. I did some benchmarking, and this is what I got:
image
Note that for each test case, I added NumStrings / 10 items to an ImmutableDictionary with NumStrings items in it previously.

You can see that using Where had fewer allocations for all three sizes. (Base is creating but not combining the dictionaries, so Where actually allocated very little.) It lost on time, though, unless the dataset was large, and I don't have data as to whether that's expected sometimes, always, or never. Thoughts?

This comment has been minimized.

@rainersigwald

rainersigwald Dec 4, 2020
Contributor

Can you instead do copiedMetadata = Metadata.Clone().SetItems(destinationAsTaskItem.Metadata)?

This comment has been minimized.

@Forgind

Forgind Dec 4, 2020
Author Member

Yes, and honestly, there are probably a lot of cases in which that would be better than what I currently have. What I'm trying to optimize for is adding metadata from a small dictionary to a large dictionary, and in that case, it would be better to add a few items individually rather than using SetItems to set a lot of items. I can run a test for that, though, and see if it's optimized nicely, in which case I'll switch to that.

This comment has been minimized.

@Forgind

Forgind Dec 4, 2020
Author Member

It looks like doing it that way was 50-75% slower if the adding dictionary had 10x as many items. Next, trying it with a smaller ratio...

This comment has been minimized.

@Forgind

Forgind Dec 4, 2020
Author Member

I tried comparing adding the larger dictionary to the smaller dictionary for ratios of 1, 2, 5, and 8, and it looks like setting with Where was faster for all but 1x size. I currently think that's the best option.
image

else
{
copiedMetadata = destinationAsTaskItem.Metadata.Clone();
copiedMetadata.SetItems(Metadata.Where(entry => !destinationAsTaskItem.Metadata.ContainsKey(entry.Key)));

This comment has been minimized.

@ladipro

ladipro Dec 7, 2020
Contributor

The original non-optimized code down in the else branch uses string.IsNullOrEmpty(value) to decide if the metadatum should be overwritten. It doesn't seem to match your new logic here - you only check if the key exists in the metadata dictionary.

This comment has been minimized.

@Forgind

Forgind Dec 7, 2020
Author Member

Fixed, thanks!

string meta = item.GetMetadata(ItemMetadataNames.winmdImplmentationFile);
if (!String.IsNullOrEmpty(meta))
{
removedMetadata.Add(ItemMetadataNames.winmdImplmentationFile, meta);
}
item.RemoveMetadata(ItemMetadataNames.winmdImplmentationFile);
Comment on lines 2941 to 2946

This comment has been minimized.

@ladipro

ladipro Dec 7, 2020
Contributor

nit: Refactor these statements into a local function to be called 4 times?

/// </summary>
public void SetItems(IEnumerable<KeyValuePair<string, V>> items)
{
_backing = _backing.SetItems(items);

This comment has been minimized.

@ladipro

ladipro Dec 7, 2020
Contributor

SetItems and Where will also need to be implemented on the .NET 3.5 version of ImmutableDictionary in C:\src\msbuild\src\MSBuildTaskHost\Immutable\ImmutableDictionary.cs.

This comment has been minimized.

@Forgind

Forgind Dec 7, 2020
Author Member

Good catch! Why isn't Where implemented by LINQ as it is for most IEnumerables?

This comment has been minimized.

@ladipro

ladipro Dec 7, 2020
Contributor

Of course! I stand corrected, only SetItems is missing.

Forgind and others added 2 commits Dec 7, 2020
@@ -2818,6 +2747,61 @@ private ITaskItem SetItemMetadata(List<ITaskItem> relatedItems, List<ITaskItem>
}
}

if (reference.IsWinMDFile)
{
// The ImplementationAssembly is only set if the implementation file exits on disk

This comment has been minimized.

@ladipro

ladipro Dec 8, 2020
Contributor

nit: Typo exits.

copiedMetadata = _metadata.Clone(); // Copy on write!
copiedMetadata.SetItems(destinationAsTaskItem.Metadata);
Comment on lines 310 to 311

This comment has been minimized.

@ladipro

ladipro Dec 8, 2020
Contributor

I don't think this is doing the same thing as the else right below. I would expect the !IsNullOrEmpty check here as well.

The else below:

  1. Take all of destination item's metadata.
  2. Overwrite with this item's metadata that are missing or empty on the destination item.

This block:

  1. Take all of this item's metadata.
  2. Overwrite with all of destination item's metadata.

For a missing destination metadatum this behaves the same, but for an empty one the SetItem call in this block will erroneously overwrite it with the empty value.

This comment has been minimized.

@Forgind

Forgind Dec 8, 2020
Author Member

You're absolutely right. I'm wondering if we're going at this the wrong way, though, since that would mean we'd need another Where statement to filter out non-empty metadata, which would make this optimization noticeably less of an optimization. Empty metadata should be equivalent (in all cases) to missing metadata, I think. If that's true, it would probably be more efficient to maintain an invariant that TaskItems don't have empty metadata. The COW dictionary storing the metadata is private, so that isn't a problem, but it's theoretically a breaking change unless we keep a separate list for "metadata added as empty" for the unfortunately public MetadataNames (and MetadataCount). What do you think?

This comment has been minimized.

@ladipro

ladipro Dec 8, 2020
Contributor

I wouldn't go as far as to introduce a separate list for empty metadata unless it proves impactful. Can you measure how much this optimization is gaining us with and without handling empty?

This comment has been minimized.

@Forgind

Forgind Dec 8, 2020
Author Member

I measured 3 scenarios (building from a clean state (I appended /restore to the command line for this one), building with caches but not processes to connect to, and building with both) for 3 versions of TaskItem (as it is in master, with this change plus a Where statement on the else if branch, and with all the empty metadata stored in a HashSet once each for building OrchardCore. Note that I kept the other optimizations in this PR for all three versions of TaskItem. I used my computer instead of a VM, mostly because my VM seems to be slow...but I also tried to avoid doing anything too intense in the background. Note that I forgot to use -m for the first several tests, so I continued to not use it for the last few. If you want new results with having that enabled for all of them, I can do that.

I have full results saved away if you want to look. My first takeaway was that caches really matter. It took more than twice as long to build in every case if I deleted them and restored them. In contrast, presumably because OrchardCore is big, and I neglected to enable -m, having processes already running only saved about 170ms on average.

As far as differences between the versions of TaskItem, here are some times for the entire build in ms:

MSBuild execution times Clean build Process not running Process running
Master 108475 49657 49704
Where statements 108253 48140 47949
Empty HashSet 105464 49105 48708

Since it occurred to me that this has impacts outside RAR, I don't think just looking at RAR time makes sense. Are there other data you'd like to see? I'm not sure why the Where statements was so much worse than keeping an empty list for the clean state build—it's possible that was an outlier, since I only ran each of these once.

This comment has been minimized.

@ladipro

ladipro Dec 9, 2020
Contributor

Thank you. I was actually expecting you to measure this particular aspect with a microbenchmark. Basically, see if the SetItems call is helping (even with the Where) given the expected metadata count. I second Ben's question below about what the realistic number of metadata is. Maybe you could add a logging statement here while building OrchardCore and see where the breaking point is relative to the distribution of metadata count seen during the build. (Assuming there is a breaking point and the new code is actually a bit slower if the number of metadata is small.)

Measuring the entire build end-to-end would have to be repeated several times to get trustworthy results. As you wrote, it could have been an outlier and, honestly, this optimization is likely going to be well within noise when running the entire thing.

This comment has been minimized.

@Forgind

Forgind Jan 7, 2021
Author Member

(Merging with @BenVillalobos's comment) I'm currently a bit confused by the results I got when I tried to test this. I put in a simple check of how many custom metadata each taskitem has when its metadata is copied over, as well as the amount of metadata on the destination. The answer seemed to be that the destination never had any custom metadata for either MSBuild.sln or OrchardCore.sln, in which case we would never hit the case I optimized, and the time would be almost identical between them unless the other changes I made (which could have a minor effect, but I didn't expect to be really relevant) had much more of a positive effect than I'd expected. The same test suggested that the source TaskItems had 20-50 pieces of metadata, of which 15 were not custom metadata, so these wouldn't be huge numbers, but this operation also got called a lot, which might mean that a small optimization could have somewhat noticeable effects. What do you make of that?

This comment has been minimized.

@ladipro

ladipro Jan 8, 2021
Contributor

Looks like a good argument against pursuing this optimization further. If you really wanted to be sure, you could run the build under a profiler to see how much time we're spending in this function - which would be the upper limit of what any micro-optimization here could give us.

{
copiedMetadata = _metadata.Clone(); // Copy on write!
}
else if (destinationAsTaskItem.Metadata.Count < _metadata.Count)

This comment has been minimized.

@BenVillalobos

BenVillalobos Dec 9, 2020
Member

How likely are we to hit the case where destinationAsTaskItem.Metadata.Count < _metadata.Count and both metadata count are very large? I don't think I've seen items with a standout number of metadata, but MSBuild has surprised me before.

Do we happen to have any idea of a "realistic average" number of metadata per TaskItem that comes through here? Something more detailed than just assuming it's the number of well-known metadata?

@Forgind
Copy link
Member Author

@Forgind Forgind commented Jan 26, 2021

I added one more commit that should make it do exactly what the previous code path had done in all cases. Rather than fully reverting that part, I left it in, although I don't expect that to be a perf optimization anymore, just a potential improvement for some customers. If you want me to fully revert it and just leave the other part, I can, but it feels wrong to undo something that could theoretically improve perf if I don't think it could worsen perf.

@ladipro
ladipro approved these changes Feb 1, 2021
else
{
copiedMetadata = destinationAsTaskItem.Metadata.Clone();
copiedMetadata.SetItems(Metadata.Where(entry => !destinationAsTaskItem.Metadata.TryGetValue(entry.Key, out string val) || String.IsNullOrEmpty(val)));

This comment has been minimized.

@ladipro

ladipro Feb 1, 2021
Contributor

super-nit: Directly read this._metadata instead of calling the this.Metadata getter here for consistency with the if branches?

Forgind added 2 commits Feb 2, 2021
A TaskItem in AppDomain A is identical to a TaskItem in AppDomain B, but the TaskItems cannot access each other's fields, only their properties. This changes accessing a possibly remote TaskItem's _metadata field into accessing its Metadata property.
@Forgind Forgind dismissed rainersigwald’s stale review Feb 2, 2021

won't have the opportunity to re-review

@Forgind Forgind merged commit 8fb627e into dotnet:master Feb 6, 2021
8 checks passed
8 checks passed
license/cla All CLA requirements met.
Details
@azure-pipelines
msbuild-pr Build #20210202.3 succeeded
Details
@azure-pipelines
msbuild-pr (Linux Core) Linux Core succeeded
Details
@azure-pipelines
msbuild-pr (Windows Core) Windows Core succeeded
Details
@azure-pipelines
msbuild-pr (Windows Full Release (no bootstrap)) Windows Full Release (no bootstrap) succeeded
Details
@azure-pipelines
msbuild-pr (Windows Full) Windows Full succeeded
Details
@azure-pipelines
msbuild-pr (macOS Core) macOS Core succeeded
Details
@azure-pipelines
msbuild-pr (macOS Mono) macOS Mono succeeded
Details
@Forgind Forgind deleted the Forgind:optimize-rar branch Feb 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants