Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict use of [ref] to variables #6807

Closed
mklement0 opened this issue May 3, 2018 · 22 comments
Closed

Restrict use of [ref] to variables #6807

mklement0 opened this issue May 3, 2018 · 22 comments
Labels
Issue-Discussion the issue may not have a clear classification yet. The issue may generate an RFC or may be reclassif Resolution-External The issue is caused by external component(s).

Comments

@mklement0
Copy link
Contributor

mklement0 commented May 3, 2018

For the ultimate resolution, see MicrosoftDocs/PowerShell-Docs#2402

From what I understand, use of [ref] only makes sense when applied to a variable or parameter [variable].

Assuming this assumption holds, perhaps nonsensical uses such as [ref] 'foo' or [ref] $hashtable.key1 could be flagged as syntax errors.

The confusion that not preventing such pointless uses can create is exemplified by this SO question, in which the OP thought they could create a persistent reference to a specific hashtable entry as follows (simplified):

$Tree = @{ TextValue = "main"; Children = @() }
 # Mistaken attempt to create a "pointer" to a specific hashtable entry
$Pointer = [ref] $Tree.Children  # This should be prevented.
# Mistaken attempt to indirectly append  to $Tree.Children
$Pointer.Value += $Item

Environment data

Written as of:

PowerShell Core v6.0.2
@BrucePay
Copy link
Collaborator

BrucePay commented May 3, 2018

[ref] works just fine with data structures:

PS[1] (584) > $x = [pscustomobject] @{a=@{b=2; c=3}}
PS[1] (585) > $r = [ref] $x
PS[1] (586) > $r.Value.a.abc = 123
PS[1] (587) > $x
a
-
{c, abc, b}

It does exactly what you would expect from other languages: it creates a durable reference to a specific instance. The SO item in question was using [ref] when they didn't need to not understanding that they already had a reference to the parent object and also not understanding how array concatenation is done in PowerShell. If they had assigned to an element of the array it would have worked fine. But by appending an element, they created a new object which, of course, did not update the reference.

@BrucePay BrucePay added the Issue-Discussion the issue may not have a clear classification yet. The issue may generate an RFC or may be reclassif label May 3, 2018
@mklement0
Copy link
Contributor Author

Thanks, @BrucePay, but this issue is not about how [ref] functions when it is used as intended, it is about the syntax not preventing nonsensical uses.

[ref] works just fine with data structures:

What you're demonstrating is not per se about data structures (that aspect is incidental), you're demonstrating use with a variable, i.e., effectively creating a variable alias (this is also covered in my answer to the SO question).

If they had assigned to an element of the array

Yes, it would have worked - but it also would have been pointless. Pointing to a piece of data (a) only makes sense with instances of reference types and (b) you can use a regular variable to do that - using [ref] in this scenario adds nothing and only complicates matters.

Thus, my point was that using it with a [parameter] variable is the only use that makes sense
and that users can be spared confusion if the language itself prevents other uses.

Or am I missing other legitimate uses of [ref]?

@the-CPU1
Copy link

the-CPU1 commented May 3, 2018

Since I'm the OP of the item in question, I thought I'd provide a little more insight into what I was trying to do and how I got to the bad [ref] use. First, here's a simplified version of the code I was trying to use:

 $List = @()
 while (!($Result.EOF)) {
     $Pointer = [ref] $List
     foreach ($Field in $Result.Fields) {
         $Pointer.Value += @{ DataValue = $Field.Value; Children = @() }
         $Pointer = [ref] ($Pointer.Value[$Pointer.Value.Count - 1].Children)
     }
     $Result.MoveNext()
 }

I hope this code explains why I was trying to point to an array element instead of just using the parent variable containing the array.

Second, I did understand that there might be an issue with array concatenation, that is why I first attempted to test with a simple [ref] to an array variable. Seeing that it worked (but not knowing how the aliases worked), I made an incorrect assumption that I could create a persistent pointer to an array member. When that didn't work, I looked for an explanation at SO, which I received.

So, now I know that creating references to a piece of data has no practical uses, and I should be fine using [ref] going forward as long as I know what is just data and what is a reference to data (and also keeping in mind aliases and how += works on arrays).

@mklement0
Copy link
Contributor Author

@the-CPU1: Thanks for the explanation - what you did is an understandable thing to try, especially given that the language doesn't prevent you from doing so.

So, if there's consensus that applying [ref] only ever makes sense when applied to a variable, perhaps trying anything else can be flagged as a syntax error.
That would spare future users the confusion over why their code doesn't work as expected.

@BrucePay
Copy link
Collaborator

BrucePay commented May 3, 2018

@mklement0

you're demonstrating use with a variable,

Bad example - how about this :-)

PS[2] (611) > $r = [ref] ([system.collections.generic.list[object]]::new())
PS[2] (612) > $r.Value.Add(1)
PS[2] (613) > $r.Value.Add(2)
PS[2] (614) > $r.Value
1
2

or this

PS[2] (626) > $rerr =  [ref] ([System.Collections.ObjectModel.Collection[System.Management.Automation.PSParseError]]::new())
PS[2] (627) > $null = [system.management.automation.psparser]::Tokenize("2 2 2", $rerr)
PS[2] (628) > $rerr.value

Token                                Message
-----                                -------
System.Management.Automation.PSToken Unexpected token '2' in expression or statement.
System.Management.Automation.PSToken Unexpected token '2' in expression or statement.

@the-CPU1

I could create a persistent pointer to an array member

You get a persistent pointer to the data stored in in the array member. Getting a pointer to a specific location in memory is not supported in PowerShell.

I should be fine using [ref] going forward as long as I know what is just data and what is a reference to data

Everything in PowerShell is already a pointer (object reference) so the set of circumstances where [ref] is needed is very small - basically with APIs that have In/Out/Ref parameters. COM APIs in particular tend to have out parameters, but, as the example above shows, it can be necessary even with PowerShell APIs.

@mklement0
Copy link
Contributor Author

@BrucePay:

$r = [ref] ([system.collections.generic.list[object]]::new())

That's an example of pointless use of [ref], because just using the reference directly gives you the same functionality, and does so more simply:

$r = [system.collections.generic.list[object]]::new() 
$r.Add(1)
$r.Add(2)
$r  # prints the list

Unless I'm missing something, there is no good reason to use [ref] in this scenario.


$rerr = [ref] ([System.Collections.ObjectModel.Collection[System.Management.Automation.PSParseError]]::new())

The [ref]'s purpose is to type the variable - $rerr - and therefore better written as follows:

[ref] $rerr = ([System.Collections.ObjectModel.Collection[System.Management.Automation.PSParseError]]::new())

Or, to localize the by-reference passing:

# Declare as regular variable.
$err = ([System.Collections.ObjectModel.Collection[System.Management.Automation.PSParseError]]::new())

# Pass with ad-hoc [ref] cast
$null = [system.management.automation.psparser]::Tokenize("2 2 2", [ref] $err)

# $err - still a regular variable - was assigned a value in the method call.
$err

Note that this idiom is also the form found in the v3.0 language spec.

@the-CPU1
Copy link

the-CPU1 commented May 4, 2018

@BrucePay:

Everything in PowerShell is already a pointer (object reference) so the set of circumstances where [ref] is needed is very small - basically with APIs that have In/Out/Ref parameters.

This is what got me confused initially. Suppose we have a function:

function foo([ref] $a) { $a.value += 1 }

I was trying to see if something like this would work:

$b = @(0); foo ([ref] $b); $b

And it did. But then this call didn't work:

$c = @(@(0)); foo ([ref] $c[0]); $c[0]

And this doesn't work either:

$c = @(@(0)); $d = $c[0]; foo ([ref] $d); $c[0]

I thought that if first would work, so would the second one, and vice versa. I do understand why it didn't work in the second call - that's the way arrays and += work.

If I were to make a guess, I'd think that the first call was specifically coded for by the developers (aka aliases), since there might be a need for a user to pass an array to a function and then modify the size of that array. I'd also guess that the second call behaves "normally", as one would expect if there were no aliases.

@rkeithhill
Copy link
Collaborator

rkeithhill commented May 4, 2018

I think you misunderstand the way the @() operator works. It does not always wrap the content in a new array. What it does is create an array if the content is a scalar value (or $null). If the content is already an array, @() is a no-op.

@mklement0
Copy link
Contributor Author

mklement0 commented May 4, 2018

@the-CPU1:

@rkeithhill correctly points out that you have a misconception about @(), which, in short, is not an array constructor, but an array guarantor.

However, even if we construct the nested array the way you intended - i.e., using , , 0 instead of @(@(0)), your commands cannot work:

$c = , , 0; foo ([ref] $c[0]); $c[0]

The problem here is again that [ref] is being used with a value, not a variable.
You're passing a reference to the inner array, and not a reference to the location of that array within the value of $c - the latter cannot be done, because it would require an additional level of indirection (and, as @BrucePay states, getting a pointer to a specific location in memory is not supported).

$c = , , 0; $d = $c[0]; foo ([ref] $d); $c[0]

This is basically the same scenario above, except that the by-reference passing works for the intermediate variable $d, because there [ref] is used on a variable - but, again, it cannot refer to the location of the nested array inside $c.

I'd think that the first call was specifically coded for by the developers (aka aliases), since there might be a need for a user to pass an array to a function and then modify the size of that array.

Note that [ref] is not about arrays specifically.
It's about passing any variable by reference to a method or function, typically so that the callee can modify it.

As @BrucePay states, you need [ref] to call .NET methods that have ref or out or in parameters - see the docs - or to call PowerShell functions that declare [ref] parameters, but that is rare.

And while you can use [ref] to create an effective alias of another variable outside the context of parameter passing (e.g., $v = 1; $vAlias = [ref] $v; $vAlias.Value++; $v), that is even rarer.


@rkeithhill:

Just to be clear: @() with an array operand is conceptually a no-op, but not technically: It actually clones something that already is an array (the only exception being array literals (explicitly enumerated elements such as 1, 2, 3), an optimization introduced in v5.1 - see #4280)

As an aside: While maintaining reference equality is rarely a concern in PowerShell, this cloning is problematic from a performance perspective.

@mklement0
Copy link
Contributor Author

mklement0 commented May 5, 2018

An attempt to summarize and clear (at least my) conceptual fog (arrived at without source-code analysis; do let me know if and where I'm wrong):

  • The purpose of [ref] ([System.Management.Automation.PSReference]) is to enable passing PowerShell variables by reference to .NET method parameters marked as ref/ out / in or, rarely, to PowerShell function parameters typed as [ref]

    • When used as such, a regular PowerShell variable is directly cast to [ref], the variable is wrapped so that modifying the [ref] instance's .Value property is equivalent to assigning to the variable directly (the docs suggest that [ref] essentially wraps a [psvariable] instance in this case).

      • This indirect access to a variable only works with a direct cast:

        • [ref] $var # OK
        • [ref] ($var) # !! Does NOT work
        • [ref] $ref = $var # !! Does NOT work
      • The conceptually cleanest idiom is:

        • Define a regular PowerShell variable.
        • Cast it to [ref] as part of the invocation only; e.g.:
          [System.Management.Automation.PSParser]::Tokenize('foo', [ref] $err)
        • That way, the by-reference passing is a localized aspect of the given invocation; this mirrors C# usage, where you must use the ref / out / in keywords on invocation.
    • Outside of this use, there's no good reason to use [ref]:

      • There is no point in using [ref] with a value rather than a variable (something other than ultimately a [psvariable]) - see below.

      • If you really want an alias variable [wrapper], use Get-Variable:
        $v = 666; $vObj = (Get-Variable v); $vObj.Value++; $v # -> 667


Why [ref] should not be used with values (non-variables)

Note: There is one edge case: [ref] $null is useful for cases where you don't care about what the target method/function returns via the by-reference parameter; that said, you can conceive of $null as a variable too (it certainly is that syntactically).

When you use [ref] with a value:

  • It obviously doesn't work with the cast-to-[ref]-on-invocation idiom, whose purpose is to pass a variable.
  • If you save a [ref] <non-variable> expression in a variable (e.g., $ref = [ref] (1, 2, 3)), you're effectively creating a more cumbersome analog to a regular PowerShell variable in that you must then use .Value to access the enclosed value.
    • While you can then pass $ref to a ref / out / in parameter directly - in which case you mustn't use [ref] on invocation - it leaves you with having to access the value returned via $ref.Value.

    • Again: the cast-to-[ref]-on-invocation idiom is superior in every respect: $var = 1, 2, 3 on initialization, then [ref] $var on invocation.

    • Outside the context of by-reference parameter passing, use of [ref] is pointless:

      • $ref = [ref] (1, 2, 3) # pointless; just use the expression result directly
      • It can lead you to mistakenly think that it's possible to create a reference to properties inside other objects (the confusion that prompted creation of this issue).

Therefore, my preference is to disallow [ref] with a non-variable operand, but given that it technically works, it would be a breaking change.


Get-Help about_Ref is currently a mixed bag:

  • It commendably shows only the cast-to-[ref]-on-invocation idiom.

  • The type's primary purpose - use with .NET ref / out / in parameters is not mentioned.

  • The description is confusing.


@mklement0
Copy link
Contributor Author

mklement0 commented May 6, 2018

Re improving the documentation: please see MicrosoftDocs/PowerShell-Docs#2402.

@the-CPU1
Copy link

the-CPU1 commented May 7, 2018

@mklement0

Note that [ref] is not about arrays specifically.
It's about passing any variable by reference to a method or function, typically so that the callee can modify it.

That is what I remember from my C days. In my first example above, inside the callee foo any manipulation of variable a is effecting caller's variable b, including moving it from one memory location to another. I think I have a better picture now.

@mklement0

It can lead you to mistakenly think that it's possible to create a reference to properties inside other objects

One more question on this:

$a = @{ Children = New-Object System.Collections.ArrayList }
$b = [ref] $a.Children
$b.Value.Add(1) 

I understand that I can simply reference $a directly here, but for my purposes I was trying to create a function to create (and another one to traverse) a series of nested array lists (a tree-like structure). Even though this sort of scenario would be rare, and can probably be done without nesting, it was a quick and easy solution for me.

@mklement0
Copy link
Contributor Author

@the-CPU1:

In my first example above, inside the callee foo any manipulation of variable a is affecting caller's variable b

Indeed: [ref] $var is special in that it truly creates a reference to the variable object behind the scenes, not its present value.


I understand the intent behind

$a = @{ Children = New-Object System.Collections.ArrayList }
$b = [ref] $a.Children
$b.Value.Add(1)

but the point is that the use of [ref] here creates a pointless wrapper. You can simply assign $a.Children directly to a regular variable and get the same effect:

$a = @{ Children = New-Object System.Collections.ArrayList }
$b = $a.Children  # No need for [ref] - obtain a reference to the array list
$b.Add(1)         # Operate on the array list directly.

Again, note that this only works because the value of the Children entry is an instance of a .NET reference type.
If it were an instance of a value type (e.g., 666), this approach fundamentally wouldn't work - whether or not you use the [ref] wrapper.

@the-CPU1
Copy link

the-CPU1 commented May 7, 2018

That last example explains a lot.

There really isn't a good reason to use [ref] outside of referenced parameters.

@SeeminglyScience
Copy link
Collaborator

I sometimes use ref to force a value type to be a reference type. For example capturing a value from a child scope.

$innerVar = [ref] 0
& { $innerVar.Value = 10 }
$innerVar.Value
# 10

You could use a bunch of other things here like Nullable<> or even just wrapping it in a PSObject. But ref is nice and short, can be cast from anything, can hold anything, and the name fits.

@mklement0
Copy link
Contributor Author

mklement0 commented May 10, 2018

@SeeminglyScience:

That's a good example in principle, but note that it isn't about value types - it's about (conveniently) modifying a variable in a parent scope.

Without the scoping issue involved, again a simple variable will do - note the use of . rather than &, which creates no child scope:

$var = 0
. { $var = 10 }
$var # 10

Because & creates a child scope, you do need an indirect reference, as demonstrated in your example.

Implementing the same thing without [ref] would indeed be clunky (though it has the advantage of not needing .Value afterwards):

$var = 0
& { (Get-Variable var -Scope 1).Value = 10 }
$var # 10

So, as long as about_Ref is updated to properly frame the two - disparate - cases in which [ref] makes sense - use with APIs, use a convenient Get-Variable alternative - perhaps that's all we need.

@SeeminglyScience
Copy link
Collaborator

That's a good example in principle, but note that it isn't about value types - it's about (conveniently) modifying a variable in a parent scope.

Well, yes and no. This is just semantics but you aren't modifying the variable. The variable in the child scope is a different variable but it holds a reference to the same object or the value of a value type. I mentioned value types because if the variable instead held a reference type you could adjust it as you would in the parent scope (with the exception of replacing it entirely)

But with a value type you need to either change the value of the variable from the previous scope (like your example) or place it into a reference type.

More specifically I'd say it's useful for creating an explicit reference to an object.

@mklement0
Copy link
Contributor Author

mklement0 commented May 10, 2018

@SeeminglyScience:

I see what you're saying and "it's useful for creating an explicit reference to an object" is a good summary.

As the whole discussion here shows, users need guidance with respect to the primary purpose of [ref] and the secondary one that you describe.

This guidance is missing from the documentation, so let's try to summarize in preparation for updating it:

  • The primary purpose of [ref] ([System.Management.Automation.PSReference]) is to enable passing PowerShell variables by reference to .NET method parameters marked as ref/ out / in or, rarely, to PowerShell function parameters typed as [ref].

    • In this usage, [ref] is applied to a variable, and the resulting [ref] instance can be used to indirectly modify that variable's value.
  • Secondarily, you may also use [ref] as a general-purpose object holder.

    • In this usage, [ref] is applied to a value (data) - typically an instance of a value type.
    • In many scenarios you can use a regular variable or parameter instead, but this approach is useful as a concise way to modify a (value-type) value in a descendent scope - without having to explicitly pass a value holder (such as via a [ref] parameter).
      This technique is useful in scenarios where passing an explicit value holder is undesired (for brevity) or not possible (e.g., in script-block parameter values - see below).

Does that sound correct and comprehensive to you?


Your example inspired me to rethink a scenario in which I did use Get-Variable (clunkily) in the past:

If you use script-block parameter values, such as for calculating the value of the Rename-Item's -NewName parameter from each pipeline input object, such script blocks run in a child scope, so modifying a variable in the caller's scope directly is not an option (and neither is passing arguments to the script block in this context).

I solved that problem with Get-Variable as follows (in this case, an index (sequence number) needed to be maintained in the caller's context):

$i = 0; $iVar = Get-Variable -Name i
Get-ChildItem -File $setPath | Rename-Item -NewName { ... $iVar.Value++ ...  }

But your technique enables a more elegant solution:

$iRef = [ref] 0
Get-ChildItem -File $setPath | Rename-Item -NewName { ... $iRef.Value++ ...  }

@SeeminglyScience
Copy link
Collaborator

SeeminglyScience commented May 10, 2018

@mklement0

Does that sound correct and comprehensive to you?

Yes that is an excellent summary 👍

@mklement0
Copy link
Contributor Author

Thanks, @SeeminglyScience.

I've transferred the relevant information to MicrosoftDocs/PowerShell-Docs#2402, so we can close this.

@iSazonov iSazonov added the Resolution-External The issue is caused by external component(s). label May 26, 2018
@yecril71pl
Copy link
Contributor

yecril71pl commented Jul 28, 2020

SORT returns bogus results with [REF]:

, (
([REF] 0, [REF] 1), ([REF] 1, [REF] 0) |
 % { , ($_, { $_ | SORT -T:1 | % VALUE }) } |
 % { , ($_[0]) | % $_[1] }
) |
 % { $_[0] | SHOULD -BE $_[1] }

Explanation for hoomans (in case any come around):

  1. Create two equal but differently ordered sequences of integers, wrapping each element into a reference!
  2. Pair each sentence with an instruction to extract the value of each of the the smallest elements of the result!
  3. Execute said instruction on each pair!
  4. Verify that both values are equal!

Is this a problem with SORT or a problem with [REF]?
Workaround:

, (
([REF] 0, [REF] 1), ([REF] 1, [REF] 0) | % { , ($_, { $_ | SORT VALUE -T:1 | % VALUE }) } | % { , ($_[0]) | % $_[1] }
) |
 % { $_[0] | SHOULD -BE $_[1] }

The workaround means to explicitly sort by value.
OTOH, if I replace SORT -T:1 with MEASURE -MIN, MEASURE correctly fails.

@SeeminglyScience
Copy link
Collaborator

@yecril71pl ref just isn't sortable. It's probably doing ToString which will result in the same string for all ref's of the same type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Discussion the issue may not have a clear classification yet. The issue may generate an RFC or may be reclassif Resolution-External The issue is caused by external component(s).
Projects
None yet
Development

No branches or pull requests

7 participants