Skip to content

Select-Object -Unique is unnecessary slow and exhaustive #11221

@heaths

Description

@heaths

Steps to reproduce

# Contrived example; imagine a pipelined collection of a lot of large objects.
get-childitem c:\ -recurse -ea silentlycontinue | select-object Name -unique

Expected behavior

Items with a unique name are output immediately to the host.

Actual behavior

Items are buffered and require a large amount of memory and a ridiculous amount of time.

Reason

Passing -Unique creates a List<T> to store all items, and performance is O(n^2) based on this source. Instead, you could instead create a key based on properties selected or some other heuristic. In fact, using that same ObjectCommandComparer, you could probably even use its GetHashCode (if implemented properly). I have been using my own Select-Unique for years quite successfully. It has O(n) performance and uses very little memory. Its key algorithm is roughly copied from what Group-Object does.

I had to write this when I was trying to filter unique items in a huge graph of objects. In that particular case, Select-Object -Unique ended up throwing an OutOfMemoryException. My version didn't and was much faster to use even on smaller data sets.

Environment data


Name                           Value
----                           -----
PSVersion                      6.2.3
PSEdition                      Core
GitCommitId                    6.2.3
OS                             Microsoft Windows 10.0.18362 
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue-Enhancementthe issue is more of a feature request than a bugResolution-DuplicateThe issue is a duplicate.Up-for-GrabsUp-for-grabs issues are not high priorities, and may be opportunities for external contributorsWG-Cmdlets-Utilitycmdlets in the Microsoft.PowerShell.Utility moduleWG-Engine-Performancecore PowerShell engine, interpreter, and runtime performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions