-
Notifications
You must be signed in to change notification settings - Fork 130
OutVariable Transparency #120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## Specification | ||
|
||
This table summarises the changes proposed in this RFC: | ||
| Input | Non-OutVar Result Type | Old Result Type | New Result Type | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table formatting needs redone to make it visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, VS Code's preview seems to like it. I'll see what I can do
|
||
## Motivation | ||
|
||
As a PowerShell user, I can use OutVariable just like pipeline output, so that my language experience is simpler and more consistent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with this motivation.
I specifically use this feature when I do NOT want the default pipeline behavior.
A multitude of cmdlets and functions return objects in an inconsistent way. Because -OutVariable
always returns an ArrayList, I can ensure the result of the cmdlet/function has a consistent format
Take the following example function as a stand in for Get-ADUser
and quite a few of the Exchange and Exchange Online cmdlets:
function Get-Result {
[CmdletBinding()]
param ()
end {
:outer
switch( 1..9 | Get-Random) {
1 { 'String'; break }
2 { 1; break }
3 { $null; break }
4 { 1,2,3,4; break }
5 { ,@(1,2,3,4) }
6 { throw; break }
7 { Write-Error 'SomeError'; break }
8 { break :outer }
9 { break }
}
}
}
Now, run the following and see all the various issues you encounter:
1..20 | %{
$_
$Result = 'Stale'
try {
$Result = Get-Result
}
Catch {
'Caught'
}
finally {
'Stale? {0}' -f ('Stale' -eq $Result)
$Result.gettype().name
$Result.Count
$Result[0]
}
}
A string will result in the array index being S
, throw results in the $result
being stale.
Compare that with
1..20 | %{
$_
$Result = 'Stale'
try {
$null = Get-Result -OutVariable Result
}
Catch {
'Caught'
}
finally {
'Stale? {0}' -f ('Stale' -eq $Result)
$Result.gettype().name
$Result.Count
$Result[0]
}
}
The ArrayList is always created fresh, so it is never stale even when the cmdlet errors or throws. it is always an ArrayList so I don't get weird indexing issues. I can always rely on the methods and properties of an ArrayList to be there.
If this is taken away and changed, this will break what I feel is the most stable and reliable method for retrieving output from commands. I will be forced to write ridiculous logic around Microsoft's cmdlets in many cases to try and deal with the insanity of the results they provide.
Another thing this RFC needs to consider is how the change will reconcile with the usage of Get-Widget -Option1 -OutVariable '+Results'
Get-Widget -Option2 -OutVariable '+Results'
Get-Widget -Option3 -OutVariable '+Results'
foreach ($Result in $Result) {
<# do stuff #>
} It also supports "bring your own collections": $Results = [System.Collections.ArrayList]::new()
$Results.Add('before')
Write-Output 'a' -OutVariable '+Results'
$Results.Add('After') How would this change handle a situation where the it is now returning string instead of an ArrayList? $null = Write-Output 'a' -OutVariable 'Results'
$null = Write-Output 'b' -OutVariable '+Results' |
Although it seems to be better to make the behaviour more consistent at first sight, I think we might be steering into the wrong direction. One of the design mistakes in PowerShell (to me) was to use array types of fixed size like System.Array by default, which leads to bad performance. To me, using ArrayList would be much better and if you also fixed the += operator to call the add method on it then the average user would not run into issues that easily. |
@markekraus: The > $Results = 'hi' <# scalar #>; $null = Write-Output 'there' -ov +Results; $Results
hi
there
It generally supports "bring any preexisting value, irrespective of how it was obtained". |
Generally, I don't think that efficiency concerns regarding the Specifically, the reallocation of the array to store in the output variable would only need to happen once per pipeline, so it's less problematic than building up a large collection item by item. |
Except it is not. In the PR that was submitted, that functionality was broken when a string was returned. That's why I want it firmly spelled out in the RFC how that will be handled in all the same situations where a change is being made.
Yes, As I have repeatedly said in this thread, the output is always an ArrayList. It is kind enough to take the existing elements of your existing collection and add them to an ArrayList. I'm not asking to change that, just pointing out that this would be broken with what this RFC suggestes: $Results = 1,2
Write-Output 'a' -OutVariable '+Results' Because instead of now being a collection it would be a single string per spec. It's also confusing... Does # Results does not yet exist in the session
Write-Output 'a' -OutVariable 'Results'
Write-Output 'b' -OutVariable '+Results' vs: $Results = 'a'
Write-Output 'b' -OutVariable '+Results' Does it take the existing value and add it to an ArrayList? What about in the case of mismatching types: $Results = 1
Write-Output 'b' -OutVariable '+Results' I still say this doesn't need any changing and that there is a significant enough documentation and usage of the existing behavior in a way that is very reliant on it to justify breaking it just to make it more like the pipeline... Especially when it is more commonly used when the default Pipeline behavior is not desirable. But if this is going to be broken, than I want all the pieces spelled out. |
The answer is: In the case of a preexisting scalar value, it should become the first element of the output
Yes, in-place extending would be broken, which is unavoidable, if the output type is changed to
No,
It would "append" to the collection - by (in effect) copying the existing collection elements followed by the single string to the
There is no need to distinguish these two scenarios; how the preexisting value was obtained should be irrelevant.
Thanks for helping to clarify these issues. @rjmholt: Do you agree with the |
| Input | Non-OutVar Result Type | Old Result Type | New Result Type | | ||
| :----------------------------------------------: | :--------------------------: | :--------------------------: | :--------------------------: | | ||
| `'Hello'` | System.String | System.Collections.ArrayList | System.String | | ||
| `@(1, 2)` | object[] | System.Collections.ArrayList | object[] | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we use Object[]
? we should be moving away from Object[]
in favor of List<Object>
. I don't see why taking a step back from ArrayList
to Object[]
is beneficial to anyone.
On a meta note:
So, unencumbered by the history of the implementation, let's take a look: The sparse existing documentation tell us merely "Stores output objects from the command in the specified variable [...]" and Thus, in the absence of information that would suggest deviating from PowerShell's normal behavior, it's reasonable and consistent to infer the following behavior: <cmdlet> ... -OutVariable Result is short for (leaving the ability to use $Result = <cmdlet> ... # capture
$Result # ... and output Similarly, <cmdlet> ... -OutVariable +Result is short for: $tmpResult = <cmdlet> ... # capture
if ($null -ne $Result) {
# ... append to existing $Result value with array concatenation
$Result = @($Result) + $tmpResult
} else {
$Result = $tmpResult
}
$Result # ... and output In a nutshell: Assuming that @rjmholt agrees with my description of the Now let's look at the current implementation: Returning (a) an instance of (Even if the implementation was by design, which seems unlikely, it is an awkward deviation from regular PowerShell behavior that would have deserved conspicuous documentation.) Now let's look at the rationale for keeping the current behavior given @markekraus: Again, if the consensus turns out to be that too many people have relied on the implementation leak, the best we can is to document the existing behavior and move on - but it's important that moving on means to avoid creating such "opportunities" again.
It's important to be clear that having made productive use of an implementation leak for a different purpose than originally intended does not make that purpose a meaningful feature retroactively and it therefore shouldn't be framed as such.
This suggests that you have an issue with a behavior at the very heart of PowerShell's collection handling: the default pipeline behavior of unwrapping a single-item collection to a scalar, given
If it takes reliance on an accidental implementation leak for the most stable and reliable method for retrieving output from commands, using a construct that suggests nothing of the sort, |
I don't see how this makes any sense to use |
Is this an implementation leak though? Or is it just that documentation doesn't explicitly call out that it is always an I'm not real interested in the history of its original intent either. Use of and reliance the current behavior is prolific. It is a recommended workaround for command like What I am interested in is why this needs to change at all. What problem is this change addressing? As far as I know, there are no issues with the current behavior other than a lack of official documentation on the always- Those need to be present in this RFC. The change needs to be justified because the existing behavior is relied upon, regardless of the original intent. I believe this to be a Bucket-1 breaking change and as such, the level of detail in the RFC should be very high and justifications well documented. It also seems to me that some of the design considerations for this are a step in the wrong direction.
The issue is with cmdlets and functions written by Microsoft, vendors, and the community often return data in an inconsistent manner. A single cmdlet that returns all of these:
The real problem is with how developers/authors/scripters (not sure of a good catchall term here) sometimes choose to design their functions and cmdlets. I don't think PowerShell is at fault for being an expressive shell and scripting language. Getting people to make best practice choices in their code that does not abuse (either intentionally or not) the expressive nature of PowerShell is not something I think we can or should fix in the language itself. This is the crux of my opposition to this change. The current behavior, intended or not, provides a best practice solution for dealing with non-best practice code. I, myself, learned about it in the context of dealing with such problems in the AD cmdlets. The "ridiculous" logic I was referring to before was in reference to my solutions before I learned about |
Another thing to consider in this RFC from @PetSerAl's comment here. PowerShell currently supports this usage: 1..10 | echo -OutVariable a | % { "$a" } The PR that was previously submitted broke that usage. I would like for this RFC to spell out how it will either maintain, change, or break this usage. |
I know it's a bit pedantic, but I'd also like to see it explicitly spelled out as to what happens when there is no output. (It's conceptually the same as |
I think it would be an extremely confusing mistake to change OutVariable unless we are on a campaign to change all uses of Arraylist to a generic List or something. All of the common parameter capture variables except the implicitly singular PipelineVariable are and always have been implemented the same way: as ArrayLists. We have more than ten years worth of code, articles, and blog posts documenting the fact that these variables are ArrayLists.
And it's not just those variables, it's also In addition, it is completely misleading to suggest that the fact that OutputVariable is an Arraylist is just "an implementation leak" -- it has always been a deliberate choice for every variable like this, since PowerShell was called Monad -- and I believe it should continue to be until such time as PowerShell is ready to embrace modern generic collections. Do a quick search on Google and you will find tens of thousands of articles documenting these variables and the fact that they are ArrayLists. Finally, it is completely incorrect to suggest that the perf hit of using arrays would only happen once. All of the common parameter capture variables are available within the pipeline for use as they are being filled. They are updated after each Get-ChildItem -File -PipelineVariable file -OutVariable out | ForEach-Object -WarningVariable warn {
Write-Warning $file.Fullname
$PSItem
} | ForEach-Object {
Write-Host $Warn.Count $Out[-1].FullName
} That is, one can do something like this, to get a list of directories in the current folder and files within them gci -ov items -Directory | gci -ov +items -File | out-null
$items But one can also modify the list by hand while collecting it: gci -ov items -Directory | % -Process { $items.AddRange( @(dir $_ -File) ) } -End { $items } |
I'm digging through my usages of OutVariable and I discovered another instance where this would break: function Get-Widget {
[cmdletbinding()]
param()
end{
$random = 1..3 | Get-Random
Write-Verbose ('Returning {0} Objects' -f $random)
1..$random | %{
[PSCustomObject]@{
Count = 4
}
}
}
}
1..15 | %{
'----------------'
'Run {0}' -f $_
$null = Get-Widget -OutVariable Results -Verbose
$Results.Count
} In instances where an object (which may or may not be a collection) has a function Get-Widget {
[cmdletbinding()]
param()
end{
$random = 1..3 | Get-Random
Write-Verbose ('Returning {0} Objects' -f $random)
1..$random | %{
,@(1,2,3,4)
}
}
} It's also further complicated if it happens to return one or more ArrayLists not-unrolledt. or an |
I agree with @markekraus that this needs more details on what benefits come out of this change and how those offset the breaking of so many existing scripts. It seems the root issue is really that it's not thoroughly documented that the We also need to consider how many newer scripters are using I'd also support moving to |
On a meta note: There are two, not necessarily mutual considerations: (a) maintaining backward-compatibility
If (a) is all you care about, you have your answer, and there's no need to read on: If you (also) want to participate in (b), be sure to avoid two pitfalls:
With that out of the way, let's look at @Jaykul 's arguments:
See (a) and (3).
Among these,
There is no evidence in the documentation or the v3 language spec (the latest version available) that the use of The Windows PowerShell Language Specification Version 3.0 - the most recent version available - even goes out of its way to indicate that the specific collection type chosen for In the absence of specific behavior prescribed for The spec just says re
That assumes that the proposal is to (re)create the Just like captured pipeline output is converted to You do pay a memory and performance penalty for that, just as when using assignment to capture a pipeline's output in a variable - but see the first aside below. Two asides:
|
The primary value in these common parameters is that they are populated as the pipeline runs, not at the end. If all we wanted was to collect the output at the end, there would be no OutVariable -- that's what In fact,
|
As a separate point -- I don't appreciate the attempt to control and redefine the terms of the discussion. First: it's never an either/or discussion when considering fixing problems by breaking compatibility with a previous release. Second: it's rude to come to a product with 12 years of history and say things like:
You shouldn't need to coerce people into your point of view by trying to make it sound like a vote for the status quo doesn't make sense. Third: don't make the mistake of insulting people who don't want you to break their tools by suggesting that they are championing "existing de-facto behavior unquestioningly, just because 'it's always been this way'" -- some of us have been using this tool every day for a long time, and although we always want to fix things that are broken, we're sometimes going to look at suggested changes as fixing things that aren't broken. You won't convince veterans that your way is better by suggesting they're stuck in a rut and trying to put the onus on them of justifying their experience -- you need to show them how your suggestion would improve things. I haven't actually seen anything in the RFC or in this conversation that convinced me it would be worth changing, even if it didn't break anything. Honestly, I feel like you jumped in here in an effort to make something easier to use, but did so without fully understanding the feature and it's many benefits (i.e. the fact that these are for use within the pipeline, not just at the end). I don't think that any change is called for, but an improvement to the common parameter documentation would be great, specifically documenting the fact that these are |
That's not the only reason. It's the desire to maintain backwards compatibility because the current behavior provides a series of demonstrably valuable features that would be completely lost if changed, some without a reasonably simple way to re-implement in another way. Yes, the fact that the a breaking change would result in work to update and change scripts is a part of it. As myself and others have pointed out, we use this feature in our code in ways that would be broken by this change. That's not just bellyaching at the fact that we would need to do work. Just do a search of GitHub for PowerShell where But, I'm all for breaking changes that advance the language and provide value. I made quite a few breaking changes in the web cmdlets and I have been championing breaking changes in other areas of PowerShell. So personally, this is not a fight to hold on to the way it's always been for the sake of holding on to the way it's always been. As such, personally I find your tone insulting and does not further this discussion. The point is, this feature has been around, has prolific use, and has demonstrably valuable features that would be lost with the changes proposed in this RFC. At this point, the onus is on those who wish to make this breaking change to provide a significant body of evidence and justification as to how this breaking change benefits the language, divorced of any (perceived or real) initial intent of the feature. Even taking all of that out of consideration, this RFC and the resulting changes are rife with issues. The behavior is confusing and unexpected, even with knowledge of the pipeline., There are several performance concerns which have been called into question, and the use of
Previously you called me out for my use of language which seems to only seek to antagonize. This is exactly what you are doing here. Please heed some of your own advice. The current behavior if
Why? This is one of the benefits of the current behavior (again, divorced from intent). Why should it be avoided? The ability to retrieve, insert, sort, remove items in these variables within the pipeline provides flexibility and value. What value do we get out of this change? Why should it be avoided? is it that it should be avoided due to implementation concerns? If so, we should not make this change as existing value will be lost if only due to complexity concerns.
Yes, but what about the next pipeline and the one after that where the
I would like for this RFC to list a compelling body of reason as to why that behavior is even desirable. One of the most common complaints I get from novice users is how hard it is to comprehend just how the pipeline produces values and why you sometimes get back an I have seen (and personally used) the current behavior What benefit do we gain by changing the behavior of |
More meta notes :)
(Below I'm referring to my previous post's meta note at the beginning; @markekraus, you may have misconstrued my note about "sensible" behavior: all I wanted to say there was that, divorced from its history, the name and description of It's a valid point, and I apologize - I reacted to what I perceived as haughty dismissals. I will say that I am no stranger to the pitfalls I've described myself, and as such they are reminder to me as much as anyone else. In the particular context of PowerShell, it just so happens that I may need a reminder of the opposite - that, given my relative inexperience, I should be more mindful of history and real-world impact. I will do that in the future and defer to others for assessing backward-compatibility risks. I absolutely do believe that a vote for the status makes sense, but it's important to have clarity why. In short: my (regrettable) tone aside, I wanted to have a separate discussion about the merits of the proposed change in the abstract, and, this particular discussion aside, I think the conversation actually has moved in that direction, thankfully, and let me try to continue in that spirit below.
Yes, I was coming from my personal use of the feature, which was like
They currently can be use that way, but it isn't obvious at all (except with the non-collecting, always-single-item If the consensus is that the existing behavior should be retained as-is (but see my suggestion below), the solution is therefore to improve the documentation (as @ChrisLGardner suggested):
That said, even a documented surprise will remain a surprise - both to newcomers making intuitive assumptions and to seasoned users who forget about the difference.
I think what you describe was true until v2, but went away in v3 with the ability to index scalars and their having a Understanding pipeline output may be nontrivial, but it is crucial, given PowerShell's pervasive use of pipelines. Once you have that understanding, it applies wherever pipelines are involved.
I think the symmetry of The simplest use case is to output to the screen while also capturing the output in a variable, for later use. That use case is currently broken. (Similarly, the simplest use case for That you can currently also modify this collection as it is being built is to me a secondary consideration (and quite possibly an accidental feature) that shouldn't violate the fundamental assumption of the capture behavior. So it sounds like the primary philosophical difference is over how to fundamentally frame the primary purpose of the Note that even advanced uses such as And I do wonder if the latter is worth supporting, given that it gets in the way of the simpler capture-and-inspect-later use case. If the consensus is to grandfather in this usage, perhaps the following compromise is an option - though it ain't too pretty:
|
Except in cases where a single item has a count property. I should have included this in my list of things to look for on GitHub: any instance where the variable defined in Without inspecting the type of that value returned from
In the case of the last one, how can the user know whether the Also, indexing on |
Yes, these are known (and unfortunate) edge cases that affect pipeline behavior fundamentally, but I don't think their avoidance in a particular scenario should be the driving factor.
Yes, you wouldn't be able to distinguish the following two cases: # output an explicitly constructed array list as a single item
% { , ([System.Collections.ArrayList] (1,2,3)) } -ov var
# let -ov collect the pipeline-enumerated elements in an array list
% { 1,2,3 } -ov var However, again I think that handling such an obscure edge case should not be the driving factor and, again, the behavior is fundamental to the pipeline: $var = % { , (1, 2, 3) } # output explicitly constructed [object[]] array as single item
$var = % { 1, 2, 3 } # let the assignment collect the enumerated elements in an [object[]] array
|
A very high percentage of occurrences of
You are proposing a breaking change. You need to provide justification for breaking these. Making edge cases more confusing is not a compelling reason to change. The edge cases now are clear and work the same as non-edge cases. That is a reason to maintain the current behavior: this proposed change creates edge cases where none existed before. |
I understand your desire to make this true, but I believe the symmetry to the other common parameters is more important: People don't learn of just one common parameter: they're all documented together, and basically all behave the same way. These common parameters (including I agree that the documentation is lacking.
I don't consider this an "advanced" use. It's pretty basic usage -- if you're doing anything simpler than that, you don't need this feature at all 😉 And of course, you are accessing the collection within the pipeline: you're modifying it in both commands. If you examine I think it would be a more confusing experience if you were writing code where the |
To me, having a consistent experience whenever PowerShell collects output for you - i.e., a pipeline-like experience - outweighs the risk of breaking code, given that that risk is low in my estimation: Re the potential A scalar type having a Re the potential A cmdlet outputting directly indexable scalars - notably Arguably, even In short: the edge cases are rare and arise from scalars posing as collections. Let's look at how I sampled about 240 uses, both for The vast majority of cases:
The exotic cases:
In short:
Given the above, I'm inclined to say that even the proposed on-pipeline-exit conversion to |
The By contrast,
You've discovered that they're implemented as streaming variables, and perhaps you've made use of that in practice. This RFC wouldn't take that ability away from you, except that a single-item result would be unwrapped to a scalar and - possibly - the multi-item results converted to The unifying vision behind this RFC, at least in my mind, is to provide a consistent experience whenever PowerShell collects output for you. Here's another example of how pervasive the pipeline logic is in PowerShell: Being able to rely on this behavior - without having to worry about exceptions here and there - is important. |
The pipeline experience is painful, not ideal for all situations, and the current implementation provided a way to work around issue I have already outlined here. The proposed changes introduces breaking changes for the sake of parity for something it currently serves as an alternative to. That in itself is a strong reason this change is not a direction we should take. The proposed code changes produces edge cases where currently none currently exist. This too is a very strong reason to not make the change. I have code that will definitely break with this change. I did some analysis of this as well and I was able to find examples on GitHub where it was for sure broken buy the change and areas where they may be subject to the edge cases. In any case, we should not introduce changes that also introduce instability, lack of predictability, nor where edge cases are created. Examples where code would be broken due to indexing into a string because they can return a single sting:
|
Please focus your efforts on fixing the pipeline, if you believe it to be painful.
Relying on an implementation detail that happens to bypass what you dislike about the pipeline is not a fix.
Again, to me the benefit of resolving the inconsistency outweighs such edge cases. |
There is nothing wrong with it. It's just painful in some situations.
I still believe this is not an accident. Everything about the way The fact is, the current implementation works, is convenient, and offers an alternative to the pipeline. It is often a go-to for when a user cannot use the pipeline or needs to standardize output. All of that is inconsequential of the initial intent. That is how it is used and the changes proposed here will break the current behavior. While my individual preference to keep this behavior unchanged may seem insignificant, I am also not alone in that sentiment. If users like a current feature, that is a good reason to keep it unchanged. Also, please re-read those examples I provided. They are working on string transformation emitted by foreach-object, not the objects emitted by the commands at the front of the pipeline. So yes, they are broken in instances where there is one option. |
@markekraus: Thanks for the correction re the samples you linked to - please see my updated comment above. |
No, the issue is not what specific flavor of extensible collection is returned, but (a) that it's not an instance of PowerShell's default collection data type,
That makes sense to established users, familiar with the warts and all, but won't somebody, please, think of the children? |
I want to share helpful perspectives from @BrucePay from PowerShell/PowerShell#6512 with respect to when pipeline-like unwrapping is not appropriate:
That cleared things up for me in my quest for unified behavior: these are simple rules to remember, assuming they're implemented consistently. In the case at hand, because |
I'm confused about where the RFC process for this is. It's been six months....
The reality is that we debated this for about a week and there basically haven't been any other comments since then. It seems like the committee should make a call to either reject or progress to draft and experimental implementation, right? |
In the interests of clearing out my history, I'm going to close this. It looks to controversial to consider properly and based on some of the discussion would needed to be worded better to convey original intent. Thank you for the discussion everyone. |
After some feedback on my pull request, I thought I should open an RFC on breaking PowerShell's OutVariable implementation.
Also see the original issue where this behaviour was discussed and reviewed by the PowerShell Committee.