Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select-Object -unique is inconsistently case sensitive #12059

Closed
bobfrankly opened this issue Mar 6, 2020 · 17 comments · Fixed by #19683
Closed

Select-Object -unique is inconsistently case sensitive #12059

bobfrankly opened this issue Mar 6, 2020 · 17 comments · Fixed by #19683
Labels
In-PR Indicates that a PR is out for the issue Issue-Bug Issue has been identified as a bug in the product Up-for-Grabs Up-for-grabs issues are not high priorities, and may be opportunities for external contributors WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module

Comments

@bobfrankly
Copy link

Steps to reproduce

"derp","Derp" | Select-Object -unique

Expected behavior

Single Return of derp

Actual behavior

[PSCustomObject]@{Name = "derp"},[PSCustomObject]@{Name = "Derp"} | Select-Object -unique
# Returns one 'derp'

"derp","Derp" | Select-Object -unique
# Returns two, "Derp", and "derp"

"derp","Derp" | Sort-Object -unique
# Returns One 'derp'

# Also: This is also the way it behaves in 5.1, if it helps

Environment data

Name                           Value
----                           -----
PSVersion                      7.0.0
PSEdition                      Core
GitCommitId                    7.0.0
OS                             Microsoft Windows 10.0.18362
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0
@bobfrankly bobfrankly added the Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a label Mar 6, 2020
@vexx32 vexx32 added WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module Issue-Bug Issue has been identified as a bug in the product and removed Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a labels Mar 7, 2020
@marshallwp
Copy link

Odd, I'd assume case sensitivity would only be an issue on non-Windows implementations per the Case-Sensitivity In Powershell known issue.

How curious.

@vexx32 vexx32 added the Up-for-Grabs Up-for-grabs issues are not high priorities, and may be opportunities for external contributors label Mar 20, 2020
@marshallwp
Copy link

marshallwp commented Mar 20, 2020

Well, I'm stumped. I looked through the portion of Select-Object code that gets invoked when only the -Unique flag is supplied. I went down every path I could find from the different comparison methods to cultural differences, to type conversion, but everything appears consistently implemented.

Aside from the questionable choice to hard-code case-sensitivity into Select-Object -Unique when declaring the ObjectCommandComparer, I couldn't find much.

I mean, that would explain why "derp","Derp" | Select-Object -unique returns both values, but it doesn't explain why [PSCustomObject]@{Name = "derp"},[PSCustomObject]@{Name = "Derp"} | Select-Object -unique returns only a single value. As far as I can tell, they should both be case sensitive under the current code.

@BrandynThornton
Copy link

It looks like the issue reported is due to not specifying a Property for comparison. Without the Property the comparison is between the base objects of each PSCustomObject which are equal. The Sort-Object cmdlet has a separate switch parameter to set case sensitivity.

Updated example showing difference in behavior, note the two Name values in the first example are different words and not just different case. I would agree that it would be nice if this defaulted to compare all properties rather than none but that could be a breaking change.

[PSCustomObject]@{Name = "derp"},[PSCustomObject]@{Name = "lerp"} | Select-Object -unique
# Returns one 'derp'

[PSCustomObject]@{Name = "derp"},[PSCustomObject]@{Name = "Derp"} | Select-Object -unique -Property Name
# Returns two, 'derp' and 'Derp'

"derp","Derp" | Sort-Object -unique -CaseSensitive
# Returns two, 'derp' and 'Derp'

# Also: This is also the way it behaves in 5.1, if it helps

@marshallwp
Copy link

marshallwp commented Mar 22, 2020

Good catch! Though that behavior is rather alarming as it is both clearly incorrect and an easy mistake for a programmer to make. The help for Select-Object explicitly states that "When you select properties, Select-Object returns new objects that have only the specified properties," so using the -Property parameter to get wholly unique objects without discarding properties is contraindicated.

If that change is judged to be too breaking, we need to update the documentation to make it clear that to select unique objects, the proper syntax is Select-Object -Unique -Property *. Probably add a warning message to the same effect as well.

@bobfrankly
Copy link
Author

So I ran across this issue dealing with REST style API, which doesn't return json or xml objects, just an array of strings. It's similar to what you'd expect from Get-Content on a raw text file. If a property is required for Select-Object to behave as expected, what is one supposed to use to pull uniques out of an array of strings?

@vexx32
Copy link
Collaborator

vexx32 commented Mar 22, 2020

@bobfrankly Sort-Object -Unique works, but (naturally) is a sorted list.

@bobfrankly
Copy link
Author

@vexx32 I get that it works, it's listed in original bug report. But what's the design supposed to be? That Select-Object isn't for arrays of strings? Example 4 in the Select-Object appears to state that it's supposed to be used exactly that way. Am I misunderstanding?

@vexx32
Copy link
Collaborator

vexx32 commented Mar 23, 2020

@bobfrankly I would be inclined to assume that, given PowerShell is typically by default case-insensitive, this case is more likely to be a bug than a design choice.

@dillardd
Copy link

Just ran into this today. Sort-Object was the workaround, but it is definitely still broken. Pwsh 7.1.3 on Window 10.

@Szeraax
Copy link

Szeraax commented Sep 7, 2021

Wow, I had no idea that this sort of a bug was present in Select-Object.

In my case, I needed to preserve order and preserve casing from the original addition.

My solution uses a case insensitive hashset that will only add to the list IF the input is not present in the list. Good enough for my small lists. (Thanks SeeminglyScience):

$List = [Collections.Generic.HashSet[string]]::new([StringComparer]::OrdinalIgnoreCase)
"a","A","B","b" | Foreach-Object {if (-not $List.Contains($_)) {$null = $List.Add($_)}}
$List

@mklement0
Copy link
Contributor

@Szeraax, you can simplify and improve the performance of your hash-set approach by passing the input collection as an argument to the constructor

# -> 'a', 'b'
[Collections.Generic.HashSet[string]]::new([string[]] ("a","A","B","b"), [StringComparer]::OrdinalIgnoreCase)

Note, however, that hash sets are documented as unordered, i.e. the output order is not guaranteed to reflect the input order.
The same applies to the Distinct() LINQ method, which is an alternative that provides lazy enumeration on output.

Unfortunately, there is no built-in .NET type that implements an ordered hash set, as of .NET 6

Thus, if you wanted to preserve the input order, you'd have to something like this (which is considerably slower and more memory-intensive than the solution above):

$auxHashSet = [Collections.Generic.HashSet[string]]::new([StringComparer]::OrdinalIgnoreCase)
[array] $distinctInInputOrder = foreach ($str in "a","A","B","b") { if ($auxHashSet.Add($str)) { $str } }

@Szeraax
Copy link

Szeraax commented Oct 15, 2021

Wow, that's a really good workaround. Thank you for the additional clarification. 10/10!

Now... If we only can get a breaking change approved to fix the bug with select object being case sensitive 😂

@mklement0
Copy link
Contributor

mklement0 commented Oct 15, 2021

Glad to hear it was helpful, @Szeraax. If I were to guess, a breaking change is not in the cards (even though conceptually it is undoubtedly called for, along with introducing a -CaseSensitive switch).
The next best thing would be to introduce a -CaseInsensitive switch.

@Szeraax
Copy link

Szeraax commented Oct 15, 2021

Ya, that's the frustrating part of working with powershell. The board is so risk averse/change averse that the only hope I have of not dealing with issues like this and other much wanted community fixes (like 1st class support for classes, inline splatting, etc.) is to use something else besides powershell. :(

@ArmaanMcleod
Copy link
Contributor

ArmaanMcleod commented May 6, 2023

Glad to hear it was helpful, @Szeraax. If I were to guess, a breaking change is not in the cards (even though conceptually it is undoubtedly called for, along with introducing a -CaseSensitive switch). The next best thing would be to introduce a -CaseInsensitive switch.

@mklement0 I agree including a -CaseInsensitive switch is probably our best bet to make this easier without introducing a breaking change, although I am curious who would be relying on this behaviour in the first place.

In the past I have just forced lowercase before piping to Select-Object -Unique:

C:\> @('A', 'a', 'c').ToLower() | Select-Object -Unique
a
c

Which is to work around the unintuitive behaviour we have today.

@ArmaanMcleod
Copy link
Contributor

ArmaanMcleod commented May 7, 2023

@iSazonov What do you think about adding a -CaseInsenstive parameter to make using Select-Object -Unique easier for strings?

Curious to see if PowerShell team would be ok with this.

@iSazonov
Copy link
Collaborator

iSazonov commented May 7, 2023

I personally see no other way but to add the new switch, only Get-Unique should be considered too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
In-PR Indicates that a PR is out for the issue Issue-Bug Issue has been identified as a bug in the product Up-for-Grabs Up-for-grabs issues are not high priorities, and may be opportunities for external contributors WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module
Projects
None yet
9 participants