-
Notifications
You must be signed in to change notification settings - Fork 8k
Description
Steps to reproduce
# Contrived example; imagine a pipelined collection of a lot of large objects.
get-childitem c:\ -recurse -ea silentlycontinue | select-object Name -unique
Expected behavior
Items with a unique name are output immediately to the host.
Actual behavior
Items are buffered and require a large amount of memory and a ridiculous amount of time.
Reason
Passing -Unique
creates a List<T>
to store all items, and performance is O(n^2) based on this source. Instead, you could instead create a key based on properties selected or some other heuristic. In fact, using that same ObjectCommandComparer
, you could probably even use its GetHashCode
(if implemented properly). I have been using my own Select-Unique for years quite successfully. It has O(n) performance and uses very little memory. Its key algorithm is roughly copied from what Group-Object
does.
I had to write this when I was trying to filter unique items in a huge graph of objects. In that particular case, Select-Object -Unique
ended up throwing an OutOfMemoryException
. My version didn't and was much faster to use even on smaller data sets.
Environment data
Name Value
---- -----
PSVersion 6.2.3
PSEdition Core
GitCommitId 6.2.3
OS Microsoft Windows 10.0.18362
Platform Win32NT
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0