Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Improve Compare-Object by adding set operations (union, intersection, symmetric difference, relative complement) #4316

Closed
mklement0 opened this issue Jul 21, 2017 · 9 comments
Labels
Issue-Enhancement the issue is more of a feature request than a bug Resolution-No Activity Issue has had no activity for 6 months or more WG-Cmdlets general cmdlet issues

Comments

@mklement0
Copy link
Contributor

mklement0 commented Jul 21, 2017

This suggestion is the result of a conversation between @iSazonov and me - see #4293.

The idea is to introduce new parameter sets that:

  • frame the operations in the more established set-theory terms
  • while introducing the relative-complement operation to make it easy to determine the objects unique to one side.
  • make it easier to retrieve just the selected input objects (without the custom-object wrapper that contains the .SideIndicator property
  • improve the performance of certain use cases

Below are examples of each desired new parameter (parameter set) that would be mutually incompatible and also incompatible with the current parameter sets.

Aside from referring to the desired set operation, their desired behavior is expressed as s command using Compare-Object's current capabilities:

  • Compare-Object -Union $A $B: union (A ∪ B)

    Compare-Object $A $B -IncludeEqual -PassThru

  • Compare-Object -Intersection $A $B: intersection (A ∩ B):

    Compare-Object $A $B -IncludeEqual -ExcludeDifferent -PassThru

  • Compare-Object -SymmetricDifference $A $B: symmetric difference (A ∆ B) - the same as the current default behavior, but without the wrapper objects

    Compare-Object $A $B -PassThru

  • Compare-Object -Complement $A $B: relative complement (A ∖ B) - getting objects unique to $B

    Compare-Object $A $B | ? SideIndicator -eq '=>' | % InputObject

Syntax note: @dragonwolf83 proposes using a single parameter such as -SetOperation <operation> (e.g., -SetOperation Intersection or -SetOperation Union) instead of distinct switches (e.g., -Intersection or -Union), which makes for easier implementation (no need for a distinct parameter set for each operation) and better discoverability, though is slightly more cumbersome to type for experienced users who already know what they want.

Note that all commands above (effectively) suppress the custom-object wrapper with the .SideIndicator property and return the selected input objects directly (or, with -Property specified, the resulting [pscustomobject] instance would lack the .SideIndicator property).

One thing to note is the order in which objects are output - this is not currently documented (from a set perspective, order doesn't matter, but for subsequent processing it may), and I haven't dug into the source to verify, but from what I can tell, it is:

  • == (identical) objects first
  • >= right-side-only objects next
  • <= left-side-only objects last

On a related note, adding a new switch would make sense in order to return a hashtable of original objects grouped by what is currently the .SideIndicator value.

Two names have been proposed for this switch:

  • -Group
  • -AsHashtable

-AsHashtable has the advantage of being familiar from Group-Object, although there it doesn't indicate a fundamental change in output structure.

The following example uses -Group for now:

$A = 1, 2, 3, 4
$B = 1, 3, 4, 5

# Wishful thinking
Compare-Object -Group -Union $A $B

The above would yield the equivalent of:

@{
  '==' = 1, 3, 4
  '<=' = , 2
  '=>' = , 5
}

Environment data

PowerShell Core v6.0.0-beta.4
@vexx32
Copy link
Collaborator

vexx32 commented Aug 17, 2018

I like this quite a bit. Compare-Object has long been... lacklustre... in implementation. I would personally also prefer if rather than ==, <=, and => symbols for the grouping (or indeed even for the current behaviour) the SideIndicators were changed to match the actual parameters the objects are passed to (i.e., ReferenceSet, DifferenceSet, and Both)

Currently the default display is actually surprisingly difficult to make sense of in my opinion with any appreciably large comparison sets. It may even make more sense for the default display to actually do a Format-Table -GroupBy SideIndicator similar to how GetChildItem will group the table display by folder in a recursive search. It would make comprehending the data you're getting significantly less befuddling. :)

@mklement0
Copy link
Contributor Author

Good point re default output format, @vexx32. Can I suggest you create a new issue to propose the grouped output?

As for the .SideIndicator property values: I fear that ship has sailed, as the value of that property is used programmatically. (We could transform the values just for display, but the resulting discrepancy between what is displayed and what you need to use programmatically may be confusing).

As an aside: It's unfortunate that the property wasn't defined as an [enum] type to begin with.

Personally it took me a while to remember the logic of <= vs. =>; here's how I remember it now: the arrow points to the side the object at hand is exclusive to.

@vexx32
Copy link
Collaborator

vexx32 commented Aug 17, 2018

Oh, I get that, but it's not clear at a glance which side is actually which; you have to examine your objects' input data and see which is <= and which is =>

I'll type up an issue on it. Whether or not it's actually used (which, frankly, I very rarely see because of its obscure and unclear implementation) it needs to change.

@dragonwolf83
Copy link

I don't think -Group parameter should be used. That starts overloading parameters when the original intent is to pipe to a standard set of cmdlets to do that, like Group-Object. It is not any more complex to use and keeps the cmdlet code clean.

Compare-Object -Group -Union $A $B
vs
Compare-Object -Union $A $B | Group-Object

Back to using Sets, it would be nicer to have one parameter, like -UsingSet, with the option of Union, Intersect, Except instead of separate parameters. Hopefully a better name than I came up with.

@mklement0
Copy link
Contributor Author

Thanks, @dragonwolf83.

Re -UsingSet <operation> vs. distinct parameters: That makes sense for ease of implementation and discoverability, though I like the direct expression of the intent with distinct switches for experienced users (read: less to type). I don't feel strongly about this, and Ive updated the original post to mention -UsingSet - renamed to -SetOperation - as an alternative.

Re -Group, I do see your point: piping to Group-Object SideIndicator is conceptually cleaner.
However, that is at odds with the proposed implementation of returning the unwrapped elements from the set operations, in which case ... | Group-Object SideIndicator wouldn't work.
We would then need an opt-in to have the output objects wrapped (as currently happens by default) - which sounds clunky too.
Any thoughts?

@iSazonov
Copy link
Collaborator

How many set operations exist out of the propose so that we want -SetOperation?

Copy link
Contributor

This issue has not had any activity in 6 months, if this is a bug please try to reproduce on the latest version of PowerShell and reopen a new issue and reference this issue if this is still a blocker for you.

1 similar comment
Copy link
Contributor

This issue has not had any activity in 6 months, if this is a bug please try to reproduce on the latest version of PowerShell and reopen a new issue and reference this issue if this is still a blocker for you.

@microsoft-github-policy-service microsoft-github-policy-service bot added Resolution-No Activity Issue has had no activity for 6 months or more labels Nov 16, 2023
Copy link
Contributor

This issue has been marked as "No Activity" as there has been no activity for 6 months. It has been closed for housekeeping purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Enhancement the issue is more of a feature request than a bug Resolution-No Activity Issue has had no activity for 6 months or more WG-Cmdlets general cmdlet issues
Projects
None yet
Development

No branches or pull requests

4 participants