Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of Group-Object #7410

Merged
merged 18 commits into from Aug 9, 2018

Conversation

Projects
None yet
4 participants
@powercode
Copy link
Collaborator

commented Jul 30, 2018

Fixes #7409.

PR Summary

code cleanup
object[] -> IList in PSType.ArrayToTuple signature to reduce allocations.
Sort input before grouping. Use sorting to only consider last group when comparing.
adding DebuggerDisplay for GroupInfo.

PR Checklist

Gustafsson, Staffan
Initial cleanup.
Refactoring only - no functional changes.

Gustafsson, Staffan and others added some commits Jul 30, 2018

object[] -> IList<T> in PSType.ArrayToTuple signature
This reducing alloctions and object copying.
Sort input before grouping. Use sorting to only consider last group w…
…hen comparing.

Much faster when the number of unique values grows large.

@powercode powercode force-pushed the powercode:groupobj branch to bdd6bb8 Jul 30, 2018

@powercode

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 31, 2018

Appveyor error seem unrelated to my changes.

@@ -113,10 +116,10 @@ private static string BuildName(List<ObjectCommandPropertyValue> propValues)
StringBuilder sb = new StringBuilder();
foreach (ObjectCommandPropertyValue propValue in propValues)
{
if (propValue != null && propValue.PropertyValue != null)
var propValuePropertyValue = propValue?.PropertyValue;
if (propValuePropertyValue != null)

This comment has been minimized.

Copy link
@iSazonov

iSazonov Jul 31, 2018

Collaborator

Is the check really needed?

This comment has been minimized.

Copy link
@powercode

powercode Jul 31, 2018

Author Collaborator

yes, it is dereferenced in the else clause. #Closed.

@@ -246,53 +248,54 @@ public SwitchParameter NoElement
internal static void DoGrouping(OrderByPropertyEntry currentObjectEntry, bool noElement, List<GroupInfo> groups, Dictionary<object, GroupInfo> groupInfoDictionary,
OrderByPropertyComparer orderByPropertyComparer)
{
if (currentObjectEntry != null && currentObjectEntry.orderValues != null && currentObjectEntry.orderValues.Count > 0)
var currentObjectorderValues = currentObjectEntry.orderValues;

This comment has been minimized.

Copy link
@iSazonov

iSazonov Jul 31, 2018

Collaborator

Should it be currentObjectOrderValues?

This comment has been minimized.

Copy link
@powercode

powercode Jul 31, 2018

Author Collaborator

yes

This comment has been minimized.

Copy link
@iSazonov

iSazonov Aug 1, 2018

Collaborator

I don't see the fix.

This comment has been minimized.

Copy link
@iSazonov

iSazonov Aug 2, 2018

Collaborator

Please address the comment and fix case.

//add this inputObject to an existing group
currentGroupInfo.Add(currentObjectEntry.inputObject);
}
//add this inputObject to an existing group

This comment has been minimized.

Copy link
@iSazonov

iSazonov Jul 31, 2018

Collaborator

Please add space after //.

try
{
foreach (GroupInfo _grp in _groups)
foreach (GroupInfo grp in _groups)
{
if (AsString)

This comment has been minimized.

Copy link
@iSazonov

iSazonov Jul 31, 2018

Collaborator

Make sense to move the cycle into the if (AsString) ?

This comment has been minimized.

Copy link
@powercode

powercode Jul 31, 2018

Author Collaborator

Not sure i understand what you suggest here.

This comment has been minimized.

Copy link
@iSazonov

iSazonov Jul 31, 2018

Collaborator

I meant

if (AsString)
{
    foreach (GroupInfo grp in _groups)
    {
        table.Add(grp.Name, grp.Group); 
    }
}
else
{
    foreach (GroupInfo grp in _groups)
    {
        if (grp.Values.Count == 1)
        {
             ...
        }
    }
}

This comment has been minimized.

Copy link
@powercode

powercode Jul 31, 2018

Author Collaborator

Yes, I think that is clearer.

This comment has been minimized.

Copy link
@iSazonov

iSazonov Aug 1, 2018

Collaborator

I thought more about performancethough :-) From this point of view it seems for() better than foreach - less allocations. Is the _groups so much that we can get a win?

Addressing review comments
Also some additional CodeFactor complaints
@powercode

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 31, 2018

I would like to fix the parameter sets too.

[Parameter(ParameterSetName = "HashTable", Mandatory = true)]
public SwitchParameter AsHashTable { get; set; }

[Parameter(ParameterSetName = "HashTable")]
public SwitchParameter AsString { get; set; }
Refactoring Group-Object logic.
Moving parameter checking earlier.

Adding back the slow quadratic code path for the cases where we have more than one set of input types.

@powercode powercode changed the title Improve performance of Group-Object WIP: Improve performance of Group-Object Jul 31, 2018

@powercode powercode changed the title WIP: Improve performance of Group-Object Improve performance of Group-Object Jul 31, 2018

@iSazonov

This comment has been minimized.

Copy link
Collaborator

commented Aug 1, 2018

I believe we shouldn't mess performance changes with other and it is better fix parameter sets in new PR.

if (AsHashTable
&& !AsString
&& (Property != null && Property.Length > 1
|| _orderByProperty.MshParameterList.Count > 1))

This comment has been minimized.

Copy link
@iSazonov

iSazonov Aug 1, 2018

Collaborator

We'd move this line on previous line.

powercode added some commits Aug 1, 2018

@iSazonov
Copy link
Collaborator

left a comment

Leave a comment

@@ -83,7 +113,7 @@ internal static object ArrayToTuple<T>(IList<T> inputObjects, int startIndex)
public sealed class GroupInfoNoElement : GroupInfo
{
internal GroupInfoNoElement(OrderByPropertyEntry groupValue)
: base(groupValue)
: base(groupValue)

This comment has been minimized.

Copy link
@iSazonov

iSazonov Aug 2, 2018

Collaborator

Please move to previous line 115.

@@ -224,6 +250,7 @@ public class GroupObjectCommand : ObjectBase
/// <summary>
/// Gets or sets the AsString parameter.
/// </summary>
/// <value></value>

This comment has been minimized.

Copy link
@iSazonov

iSazonov Aug 2, 2018

Collaborator

Please remove the empty XML comment.

@@ -246,53 +248,54 @@ public SwitchParameter NoElement
internal static void DoGrouping(OrderByPropertyEntry currentObjectEntry, bool noElement, List<GroupInfo> groups, Dictionary<object, GroupInfo> groupInfoDictionary,
OrderByPropertyComparer orderByPropertyComparer)
{
if (currentObjectEntry != null && currentObjectEntry.orderValues != null && currentObjectEntry.orderValues.Count > 0)
var currentObjectorderValues = currentObjectEntry.orderValues;

This comment has been minimized.

Copy link
@iSazonov

iSazonov Aug 2, 2018

Collaborator

Please address the comment and fix case.

@powercode powercode force-pushed the powercode:groupobj branch to 152d1eb Aug 2, 2018

@daxian-dbw daxian-dbw self-assigned this Aug 3, 2018

@iSazonov
Copy link
Collaborator

left a comment

LGTM with two minor comments.

@@ -131,15 +165,16 @@ private static string BuildName(List<ObjectCommandPropertyValue> propValues)
}
else
{
sb.Append(string.Format(CultureInfo.InvariantCulture, "{0}, ", propValue.PropertyValue.ToString()));
sb.Append(string.Format(CultureInfo.InvariantCulture, "{0}, ", propValuePropertyValue.ToString()));

This comment has been minimized.

Copy link
@iSazonov

iSazonov Aug 7, 2018

Collaborator

Seems string.Format was optimized in .Net Core 2 but we could still use AppendFormat() here and in line 160 to get more readable code and exclude extra method calls.

This comment has been minimized.

Copy link
@dantraMSFT

dantraMSFT Aug 7, 2018

Contributor

This definitely should use AppendFormat()

OrderByPropertyComparer orderByPropertyComparer)
{
if (currentObjectEntry != null && currentObjectEntry.orderValues != null && currentObjectEntry.orderValues.Count > 0)
var currentObjectorderValues = currentObjectEntry.orderValues;

This comment has been minimized.

Copy link
@iSazonov

iSazonov Aug 7, 2018

Collaborator

currentObjectorderValues should be currentObjectOrderValues.

{
Diagnostics.Assert(inputObjects != null, "inputObjects is null");
Diagnostics.Assert(inputObjects.Length > 0, "inputObjects is empty");
Diagnostics.Assert(inputObjects.Count > 0, "inputObjects is empty");

This comment has been minimized.

Copy link
@dantraMSFT

dantraMSFT Aug 7, 2018

Contributor

This assert is redundant. ArrayToTuple(IList, int> has the exact same code. Suggest removing both Assert statements.

{
Diagnostics.Assert(inputObjects != null, "inputObjects is null");
Diagnostics.Assert(inputObjects.Length > 0, "inputObjects is empty");
Diagnostics.Assert(inputObjects.Count > 0, "inputObjects is empty");

This comment has been minimized.

Copy link
@dantraMSFT

dantraMSFT Aug 7, 2018

Contributor

These asserts should be backed up with runtime checks. Either document it as returning NULL when constraints are not met for throw the appropriate exceptions.

This comment has been minimized.

Copy link
@powercode

powercode Aug 8, 2018

Author Collaborator

This is a fairly common pattern in the engine, that there are asserts, but no backing runtime checks. I have always wondered about it...

This comment has been minimized.

Copy link
@powercode

powercode Aug 8, 2018

Author Collaborator

I guess the thinking is that since it's an internal class, the usage can be demonstrated to be safe and the error checking skipped.

This comment has been minimized.

Copy link
@dantraMSFT

dantraMSFT Aug 8, 2018

Contributor

Since we don't have an official policy around Assert, I'll not hold up the PR for it and you can ignore the rest. :)

I've been involved in three different threads recently where Dbg.Assert in internal code is hit in debug builds. In two cases, crashes were occurring the release build and the third the lack of the check was masking a bug in another component.

IMHO: Since we don't test debug builds, Dbg.Assert is a placebo at best without a release check.

@@ -131,15 +165,16 @@ private static string BuildName(List<ObjectCommandPropertyValue> propValues)
}
else
{
sb.Append(string.Format(CultureInfo.InvariantCulture, "{0}, ", propValue.PropertyValue.ToString()));
sb.Append(string.Format(CultureInfo.InvariantCulture, "{0}, ", propValuePropertyValue.ToString()));

This comment has been minimized.

Copy link
@dantraMSFT

dantraMSFT Aug 7, 2018

Contributor

This definitely should use AppendFormat()

/// <param name="groups">List containing Groups.</param>
/// <param name="groupInfoDictionary">Dictionary used to keep track of the groups with hash of the property values being the key.</param>
/// <param name="orderByPropertyComparer">The Comparer to be used while comparing to check if new group has to be created.</param>
internal static void DoOrderedGrouping(

This comment has been minimized.

Copy link
@dantraMSFT

dantraMSFT Aug 7, 2018

Contributor

Shouldn't this be private

This comment has been minimized.

Copy link
@powercode

powercode Aug 8, 2018

Author Collaborator

It was internal, and I just kept it so, but I'm happy to make it private.


GroupInfo currentGroupInfo = null;
if (groupInfoDictionary.TryGetValue(currentTupleObject, out currentGroupInfo))
if (groupInfoDictionary.TryGetValue(currentTupleObject, out var currentGroupInfo))

This comment has been minimized.

Copy link
@dantraMSFT

dantraMSFT Aug 7, 2018

Contributor

What's the use case for currentGroupInfo being null in the groupInfoDictionary? If it is expected, a comment is needed; otherwise, it looks like currentObjectEntry.inputObject is dropped on the floor.

This comment has been minimized.

Copy link
@powercode

powercode Aug 8, 2018

Author Collaborator

I don't think that is a case that will ever occur. Just defensive programming.

Both cases are behaving as the original implementation.

One could argue for removing the null check as it is a condition that shouldn't occur, and if it does, it is a bug that should be fixed.

if (groupInfoDictionary.TryGetValue(currentTupleObject, out var currentGroupInfo))
{
// add this inputObject to an existing group
currentGroupInfo?.Add(currentObjectEntry.inputObject);

This comment has been minimized.

Copy link
@dantraMSFT

dantraMSFT Aug 7, 2018

Contributor

same as above, when is currentTupleObject present but currentGroupInfo is null.

This comment has been minimized.

Copy link
@powercode

powercode Aug 8, 2018

Author Collaborator

Yeah, I'm removing the null checks.

powercode added some commits Aug 8, 2018

Addressing review comments.
Removing unnecessary call to .ToArray().

Making internal methods private.
@daxian-dbw

This comment has been minimized.

Copy link
Member

commented Aug 9, 2018

@dantraMSFT Can you please take another look?

@dantraMSFT
Copy link
Contributor

left a comment

LGTM

@daxian-dbw daxian-dbw merged commit 037e12e into PowerShell:master Aug 9, 2018

5 checks passed

CodeFactor 50 issues fixed. 2 issues found.
Details
WIP ready for review
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
license/cla All CLA requirements met.
Details
@iSazonov

This comment has been minimized.

Copy link
Collaborator

commented Aug 10, 2018

@powercode Many thanks for the great perf improvement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.