New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[S.M.A.Internal.AutomationNull]::Value is treated like a collection when used with -match, -notmatch, -like, -notlike #3866

Open
KirkMunro opened this Issue May 25, 2017 · 21 comments

Comments

Projects
None yet
7 participants
@KirkMunro
Contributor

KirkMunro commented May 25, 2017

Steps to reproduce

Set-Content -Value $null -Path .\zero.txt -NoNewline -Encoding Ascii
(get-item .\zero.txt).Length -eq 0 # returns $true
$content = gc .\zero.txt -Raw -Encoding Ascii
$content -eq $null # returns $true
$null -eq $content # returns $true
$content -match 'anything' # returns nothing, but should return $false
[System.Management.Automation.Internal.AutomationNull]::Value -match 'anything' # ditto
$null -match 'anything' # returns $false

Expected behavior

$true
$true
$true
$false
$false
$false

Actual behavior

$true
$true
$true
# returns nothing at all
# returns nothing at all
$false

Impact

This makes it more difficult to write scripts that process content in files, because tests that should fail return nothing instead, so if you were checking for a failure and then jumping to the next iteration of the loop with continue, your continue does not get called and then unpredictable things can happen as a result.

Environment data

Reproduced in PowerShell 5.1 and 6.0.

@iSazonov

This comment has been minimized.

Show comment
Hide comment
@iSazonov

iSazonov May 26, 2017

Collaborator

@KirkMunro Thanks for your report! Do you plan to make the fix?

Collaborator

iSazonov commented May 26, 2017

@KirkMunro Thanks for your report! Do you plan to make the fix?

@lzybkr

This comment has been minimized.

Show comment
Hide comment
@lzybkr

lzybkr May 26, 2017

Member

Interesting issue.

AutomationNull.Value is intended to convey "no results" which is different than $null. So I think this is by design.

And indeed, here is the code that explicitly treats AutomationNull.Value as an empty collection when we are checking if an object is a collection.

Maybe AutomationNull.Value could have been an empty collection in the first place (and hence not equal to $null), but that decision was made before I started.

Member

lzybkr commented May 26, 2017

Interesting issue.

AutomationNull.Value is intended to convey "no results" which is different than $null. So I think this is by design.

And indeed, here is the code that explicitly treats AutomationNull.Value as an empty collection when we are checking if an object is a collection.

Maybe AutomationNull.Value could have been an empty collection in the first place (and hence not equal to $null), but that decision was made before I started.

@fmichaleczek

This comment has been minimized.

Show comment
Hide comment
@fmichaleczek

fmichaleczek May 26, 2017

I hope someone will resolve this annoying bug.

fmichaleczek commented May 26, 2017

I hope someone will resolve this annoying bug.

@KirkMunro

This comment has been minimized.

Show comment
Hide comment
@KirkMunro

KirkMunro May 26, 2017

Contributor

If the AutomationNull.Value behaviour is by design, maybe this is an issue with Get-Content in the FileSystem provider.

My expectation is that when I invoke Get-Content filename on a text file, especially when I use the -Raw switch but I have this expectation even when I don't use -Raw, that I will get back either a string or an array of strings, depending on the content and whether or not I used -Raw. Certainly in a zero-byte text file, this should give me back an empty string. This expectation is not surprising given that the command metadata reports the OutputType as System.Byte or System.String. Further evidence that supports my expectation is the following:

# Create a zero-byte, empty ASCII file
Set-Content -LiteralPath .\empty.txt -Value '' -NoNewLine -Encoding Ascii
# I created the content using an empty string, so when I Get-Content -Raw,
# shouldn't I get back an empty string?
$content = Get-Content -LiteralPath .\empty.txt -Raw
$content -is [string] # returns $false

That script shows that you cannot round-trip empty content into a text file and back out again, because the command is returning AutomationNull.Value instead.

For this specific issue, given the questions about whether or not AutomationNull.Value should be treated as a collection, I think fixing Get-Content would be helpful; however, would that be a breaking change?

Maybe this is going to force me into using strong typing for my variables, because forcing the results of Get-Content into a string makes this problem go away. I feel that force shouldn't be necessary though, especially because I asked for the -Raw string output from the file.

Contributor

KirkMunro commented May 26, 2017

If the AutomationNull.Value behaviour is by design, maybe this is an issue with Get-Content in the FileSystem provider.

My expectation is that when I invoke Get-Content filename on a text file, especially when I use the -Raw switch but I have this expectation even when I don't use -Raw, that I will get back either a string or an array of strings, depending on the content and whether or not I used -Raw. Certainly in a zero-byte text file, this should give me back an empty string. This expectation is not surprising given that the command metadata reports the OutputType as System.Byte or System.String. Further evidence that supports my expectation is the following:

# Create a zero-byte, empty ASCII file
Set-Content -LiteralPath .\empty.txt -Value '' -NoNewLine -Encoding Ascii
# I created the content using an empty string, so when I Get-Content -Raw,
# shouldn't I get back an empty string?
$content = Get-Content -LiteralPath .\empty.txt -Raw
$content -is [string] # returns $false

That script shows that you cannot round-trip empty content into a text file and back out again, because the command is returning AutomationNull.Value instead.

For this specific issue, given the questions about whether or not AutomationNull.Value should be treated as a collection, I think fixing Get-Content would be helpful; however, would that be a breaking change?

Maybe this is going to force me into using strong typing for my variables, because forcing the results of Get-Content into a string makes this problem go away. I feel that force shouldn't be necessary though, especially because I asked for the -Raw string output from the file.

@iSazonov

This comment has been minimized.

Show comment
Hide comment
@iSazonov

iSazonov May 26, 2017

Collaborator

From docs :

This cmdlet returns strings or bytes. The output type depends upon the content that it gets.

So I expect:

  • Get-Content - return empty string
  • Get-Content -Raw - return empty string
  • Get-Content -Encoding Byte - return $null
Collaborator

iSazonov commented May 26, 2017

From docs :

This cmdlet returns strings or bytes. The output type depends upon the content that it gets.

So I expect:

  • Get-Content - return empty string
  • Get-Content -Raw - return empty string
  • Get-Content -Encoding Byte - return $null
@KirkMunro

This comment has been minimized.

Show comment
Hide comment
@KirkMunro

KirkMunro May 26, 2017

Contributor

Why would you expect Get-Content -Raw to return $null? It always returns a string. Even for binary files. Unless the file is empty (in which case it makes sense for it to return an empty string, no?).

From the FileSystem provider documentation:

-Raw
Ignores newline characters. Returns contents as a single item.

Contributor

KirkMunro commented May 26, 2017

Why would you expect Get-Content -Raw to return $null? It always returns a string. Even for binary files. Unless the file is empty (in which case it makes sense for it to return an empty string, no?).

From the FileSystem provider documentation:

-Raw
Ignores newline characters. Returns contents as a single item.

@iSazonov

This comment has been minimized.

Show comment
Hide comment
@iSazonov

iSazonov May 26, 2017

Collaborator

Yes, Get-Content always returns a string with -Raw and w/o. I agree that the cmdlet should returns an empty string for an empty file.

Collaborator

iSazonov commented May 26, 2017

Yes, Get-Content always returns a string with -Raw and w/o. I agree that the cmdlet should returns an empty string for an empty file.

@oising

This comment has been minimized.

Show comment
Hide comment
@oising

oising May 26, 2017

Contributor

"Certainly in a zero-byte text file, this should give me back an empty string."

How does one know what the format of a zero byte file is? If PowerShell was required to maintain a mapping of all known file extensions/mime types based on extensions, and an associated "empty" result, that would be unmanageable. It would also be impossible on Linux, which has no hoots to give about file TLEs. :)

Update: This isn't directed at you, Kirk. Just a general statement. I realize that the standard seems to be byte or string. I guess strings are seen as just more manageable than void or $null - even empty ones.

Contributor

oising commented May 26, 2017

"Certainly in a zero-byte text file, this should give me back an empty string."

How does one know what the format of a zero byte file is? If PowerShell was required to maintain a mapping of all known file extensions/mime types based on extensions, and an associated "empty" result, that would be unmanageable. It would also be impossible on Linux, which has no hoots to give about file TLEs. :)

Update: This isn't directed at you, Kirk. Just a general statement. I realize that the standard seems to be byte or string. I guess strings are seen as just more manageable than void or $null - even empty ones.

@oising

This comment has been minimized.

Show comment
Hide comment
@oising

oising May 26, 2017

Contributor

Also, isn't $null coerced to an empty string if required? I think the AutomationNull.Value idea was sound but it seems difficult to be consistent with. Argh...

Contributor

oising commented May 26, 2017

Also, isn't $null coerced to an empty string if required? I think the AutomationNull.Value idea was sound but it seems difficult to be consistent with. Argh...

@KirkMunro

This comment has been minimized.

Show comment
Hide comment
@KirkMunro

KirkMunro May 26, 2017

Contributor

@oising But with Get-Content -Raw, in my testing it always returns string, regardless of the format of the file. PDF, BMP, ZIP, etc. So the file format has nothing to do with it. I may have seen this at one point, but right now I'm not sure when it returns bytes instead of strings, and since it returns strings for all files, that's why I think it makes sense to return an empty string for a zero-byte file.

I think AutomationNull.Value when you invoke a command to get object data like services or processes and nothing comes back is sound. I'm not sold on AutomationNull.Value as a way to represent an empty file though, when all other file content comes back as string.

I can work around this all sorts of ways (strong typing a variable as string and assigning the results of Get-Content to that variable, for example), but beyond the inconsistency, I think the potential to cause scripts to do unexpected (or maybe undesirable) things if a script encounters a zero-byte file warrants re-thinking the original design decision (while evaluating whether or not it's a breaking change that could break someone's code). The current behaviour is not intuitive enough to be considered in scripts, which is why I brought it here as a bug to discuss.

Contributor

KirkMunro commented May 26, 2017

@oising But with Get-Content -Raw, in my testing it always returns string, regardless of the format of the file. PDF, BMP, ZIP, etc. So the file format has nothing to do with it. I may have seen this at one point, but right now I'm not sure when it returns bytes instead of strings, and since it returns strings for all files, that's why I think it makes sense to return an empty string for a zero-byte file.

I think AutomationNull.Value when you invoke a command to get object data like services or processes and nothing comes back is sound. I'm not sold on AutomationNull.Value as a way to represent an empty file though, when all other file content comes back as string.

I can work around this all sorts of ways (strong typing a variable as string and assigning the results of Get-Content to that variable, for example), but beyond the inconsistency, I think the potential to cause scripts to do unexpected (or maybe undesirable) things if a script encounters a zero-byte file warrants re-thinking the original design decision (while evaluating whether or not it's a breaking change that could break someone's code). The current behaviour is not intuitive enough to be considered in scripts, which is why I brought it here as a bug to discuss.

@rkeithhill

This comment has been minimized.

Show comment
Hide comment
@rkeithhill

rkeithhill May 27, 2017

Contributor

Get-Content will return bytes if you specify -Encoding Byte. If the file is empty, then you get a $null. So the user can determine if the file is read as text (encoded as ascii, unicode. utf8) or binary by specifying the appropriate encoding.

Contributor

rkeithhill commented May 27, 2017

Get-Content will return bytes if you specify -Encoding Byte. If the file is empty, then you get a $null. So the user can determine if the file is read as text (encoded as ascii, unicode. utf8) or binary by specifying the appropriate encoding.

@KirkMunro

This comment has been minimized.

Show comment
Hide comment
@KirkMunro

KirkMunro May 27, 2017

Contributor

Thanks @rkeithhill. I knew I had seen it before, but it wasn't something I have used frequently.

Contributor

KirkMunro commented May 27, 2017

Thanks @rkeithhill. I knew I had seen it before, but it wasn't something I have used frequently.

@mklement0

This comment has been minimized.

Show comment
Hide comment
@mklement0

mklement0 May 30, 2017

Contributor

There are two distinct issues here:

  • (a) Get-Content's behavior

  • (b) [System.Management.Automation.Internal.AutomationNull]::Value behavior as the LHS of array-aware operators.

(a) Get-Content's behavior

I agree that Get-Content -Raw when given an empty input file should return a scalar rather than [System.Management.Automation.Internal.AutomationNull]::Value, the latter signaling an empty collection.

By contrast, it is appropriate - and consistent with current behavior - for Get-Content without -Raw to return [System.Management.Automation.Internal.AutomationNull]::Value, because a collection is expected - be that one of lines or bytes (with -Encoding Byte).

Arguably, with -Raw that scalar should be '' (the empty string, which unambiguously implies an empty file), but even a bona fide $null is preferable to the current behavior.

@PetSerAl has done great sleuthing on SO to come up with a way to inspect whether a given value is actually $null or [System.Management.Automation.Internal.AutomationNull]::Value:

New-Item -Type File zero.txt # create 0-byte file

$refEquals=[Object].GetMethod('ReferenceEquals')

# Should be and is $True
$refEquals.Invoke($null, @((Get-Content zero.txt), [System.Management.Automation.Internal.AutomationNull]::Value))

# Should be and is $True
$refEquals.Invoke($null, @((Get-Content -Encoding Byte zero.txt), [System.Management.Automation.Internal.AutomationNull]::Value))

# !! Should be $False, but is $True
$refEquals.Invoke($null, @((Get-Content -Raw zero.txt), [System.Management.Automation.Internal.AutomationNull]::Value))

(b) [System.Management.Automation.Internal.AutomationNull]::Value behavior as the LHS of array-aware operators.

In short: The treatment of [System.Management.Automation.Internal.AutomationNull]::Value is inconsistent:

  • -match interprets [System.Management.Automation.Internal.AutomationNull]::Value as an array (collection)
  • while -eq, -le, ge and their variations treat it as (scalar) $null (haven't looked at others)

[System.Management.Automation.Internal.AutomationNull]::Value -match 'anything'

returning "nothing" (an empty [System.Object[]] instance) is defensible: an empty collection as the LHS to which a filtering operator is applied can only ever return that empty collection, albeit converted to an empty array by PowerShell.

By contrast, here are some sample commands that demonstrate (scalar) $null treatment with -eq, -le, and -ge:

> [System.Management.Automation.Internal.AutomationNull]::Value -eq  $null; $null -eq $null
True
True
> [System.Management.Automation.Internal.AutomationNull]::Value -eq 0; $null -eq 0
False
False

# Any negative value yields $False.
> [System.Management.Automation.Internal.AutomationNull]::Value -le 0; $null -le 0
True
True

# Any negative value yields $True
> [System.Management.Automation.Internal.AutomationNull]::Value -ge 0; $null -ge 0
False
False

On a side note, I find that comparing $null to anything other than $null returning $true baffling: for instance, why are $null -lt 0 and $null -gt -1 $true?

Contributor

mklement0 commented May 30, 2017

There are two distinct issues here:

  • (a) Get-Content's behavior

  • (b) [System.Management.Automation.Internal.AutomationNull]::Value behavior as the LHS of array-aware operators.

(a) Get-Content's behavior

I agree that Get-Content -Raw when given an empty input file should return a scalar rather than [System.Management.Automation.Internal.AutomationNull]::Value, the latter signaling an empty collection.

By contrast, it is appropriate - and consistent with current behavior - for Get-Content without -Raw to return [System.Management.Automation.Internal.AutomationNull]::Value, because a collection is expected - be that one of lines or bytes (with -Encoding Byte).

Arguably, with -Raw that scalar should be '' (the empty string, which unambiguously implies an empty file), but even a bona fide $null is preferable to the current behavior.

@PetSerAl has done great sleuthing on SO to come up with a way to inspect whether a given value is actually $null or [System.Management.Automation.Internal.AutomationNull]::Value:

New-Item -Type File zero.txt # create 0-byte file

$refEquals=[Object].GetMethod('ReferenceEquals')

# Should be and is $True
$refEquals.Invoke($null, @((Get-Content zero.txt), [System.Management.Automation.Internal.AutomationNull]::Value))

# Should be and is $True
$refEquals.Invoke($null, @((Get-Content -Encoding Byte zero.txt), [System.Management.Automation.Internal.AutomationNull]::Value))

# !! Should be $False, but is $True
$refEquals.Invoke($null, @((Get-Content -Raw zero.txt), [System.Management.Automation.Internal.AutomationNull]::Value))

(b) [System.Management.Automation.Internal.AutomationNull]::Value behavior as the LHS of array-aware operators.

In short: The treatment of [System.Management.Automation.Internal.AutomationNull]::Value is inconsistent:

  • -match interprets [System.Management.Automation.Internal.AutomationNull]::Value as an array (collection)
  • while -eq, -le, ge and their variations treat it as (scalar) $null (haven't looked at others)

[System.Management.Automation.Internal.AutomationNull]::Value -match 'anything'

returning "nothing" (an empty [System.Object[]] instance) is defensible: an empty collection as the LHS to which a filtering operator is applied can only ever return that empty collection, albeit converted to an empty array by PowerShell.

By contrast, here are some sample commands that demonstrate (scalar) $null treatment with -eq, -le, and -ge:

> [System.Management.Automation.Internal.AutomationNull]::Value -eq  $null; $null -eq $null
True
True
> [System.Management.Automation.Internal.AutomationNull]::Value -eq 0; $null -eq 0
False
False

# Any negative value yields $False.
> [System.Management.Automation.Internal.AutomationNull]::Value -le 0; $null -le 0
True
True

# Any negative value yields $True
> [System.Management.Automation.Internal.AutomationNull]::Value -ge 0; $null -ge 0
False
False

On a side note, I find that comparing $null to anything other than $null returning $true baffling: for instance, why are $null -lt 0 and $null -gt -1 $true?

@KirkMunro

This comment has been minimized.

Show comment
Hide comment
@KirkMunro

KirkMunro May 30, 2017

Contributor

You actually don't need to use reflection to identify automation null. For example:

$x = $null
$y = [System.Management.Automation.Internal.AutomationNull]::Value
foreach ($item in 'x','y') {
    $value = Get-Variable -Name $item -ValueOnly
    if ($value -eq $null) {
        if (@($value).Count -eq 0) {
            "`$${item} is [System.Management.Automation.Internal.AutomationNull]::Value"
        } else {
            "`$${item} is `$null"
        }
    }
}

I just added a comment to @PetSerAl's post sharing the same information.

For Get-Content's behaviour, I still expect an empty string when invoking Get-Content with encoding set to anything other than Byte. If you invoke Get-Content against a file containing a single line of text, you get back a string, not an array. An empty string is a much better representation of an empty file than $null. Consider non-ASCII files (e.g. UTF-8). They have a byte order mark included in them, so would $null be a good representation of their content when retrieved using the proper encoding?

All of these details aside, before I spend more time on this and before I could consider looking at the code to apply a fix for this, my concern is that these changes, regardless of what form they would take, would be breaking changes and rejected accordingly, resulting in wasted time and effort. The more I think about it, the more I feel that is what would happen, because someone may very well have scripts written that look something like this:

foreach ($filePath in Get-ChildItem -Recurse -Filter *.txt) {
    $content = @(Get-Content $filePath)
    # If the file is empty, skip it
    if ($content.Count -eq 0) {
        continue
    }
    # Other file processing goes here...
}

Or, considering the use of -Raw, someone may have scripts that do this:

foreach ($filePath in Get-ChildItem -Recurse -Filter *.txt) {
    $content = Get-Content $filePath -Raw
    # If the file is empty, skip it
    if ($content -eq $null) {
        continue
    }
    # Other file processing goes here...
}

With those possibilities in mind, the proposed changes to Get-Content should be rejected as breaking changes, regardless of whether or not we change the result when it is not invoked with -Raw, shouldn't they?

That brings me back to how AutomationNull.Value is treated like a collection when used with -match/-notmatch or -like/-notlike, but not -eq/-ne. If it looks like $null but doesn't act like $null, it must be AutomationNull.Value. Try explaining how AutomationNull.Value works, coupled with considerations you should take into account when you are scripting around AutomationNull.Value, to a classroom and see how well they understand it afterwards.

Contributor

KirkMunro commented May 30, 2017

You actually don't need to use reflection to identify automation null. For example:

$x = $null
$y = [System.Management.Automation.Internal.AutomationNull]::Value
foreach ($item in 'x','y') {
    $value = Get-Variable -Name $item -ValueOnly
    if ($value -eq $null) {
        if (@($value).Count -eq 0) {
            "`$${item} is [System.Management.Automation.Internal.AutomationNull]::Value"
        } else {
            "`$${item} is `$null"
        }
    }
}

I just added a comment to @PetSerAl's post sharing the same information.

For Get-Content's behaviour, I still expect an empty string when invoking Get-Content with encoding set to anything other than Byte. If you invoke Get-Content against a file containing a single line of text, you get back a string, not an array. An empty string is a much better representation of an empty file than $null. Consider non-ASCII files (e.g. UTF-8). They have a byte order mark included in them, so would $null be a good representation of their content when retrieved using the proper encoding?

All of these details aside, before I spend more time on this and before I could consider looking at the code to apply a fix for this, my concern is that these changes, regardless of what form they would take, would be breaking changes and rejected accordingly, resulting in wasted time and effort. The more I think about it, the more I feel that is what would happen, because someone may very well have scripts written that look something like this:

foreach ($filePath in Get-ChildItem -Recurse -Filter *.txt) {
    $content = @(Get-Content $filePath)
    # If the file is empty, skip it
    if ($content.Count -eq 0) {
        continue
    }
    # Other file processing goes here...
}

Or, considering the use of -Raw, someone may have scripts that do this:

foreach ($filePath in Get-ChildItem -Recurse -Filter *.txt) {
    $content = Get-Content $filePath -Raw
    # If the file is empty, skip it
    if ($content -eq $null) {
        continue
    }
    # Other file processing goes here...
}

With those possibilities in mind, the proposed changes to Get-Content should be rejected as breaking changes, regardless of whether or not we change the result when it is not invoked with -Raw, shouldn't they?

That brings me back to how AutomationNull.Value is treated like a collection when used with -match/-notmatch or -like/-notlike, but not -eq/-ne. If it looks like $null but doesn't act like $null, it must be AutomationNull.Value. Try explaining how AutomationNull.Value works, coupled with considerations you should take into account when you are scripting around AutomationNull.Value, to a classroom and see how well they understand it afterwards.

@mklement0

This comment has been minimized.

Show comment
Hide comment
@mklement0

mklement0 May 30, 2017

Contributor

@KirkMunro:

Thanks for that handy alternative for detecting [System.Management.Automation.Internal.AutomationNull]::Value; to summarize:

> $scalarNull = $null; $collectionNull = [System.Management.Automation.Internal.AutomationNull]::Value
> @($scalarNull).Count
1
> @($collectionNull).Count
0
Contributor

mklement0 commented May 30, 2017

@KirkMunro:

Thanks for that handy alternative for detecting [System.Management.Automation.Internal.AutomationNull]::Value; to summarize:

> $scalarNull = $null; $collectionNull = [System.Management.Automation.Internal.AutomationNull]::Value
> @($scalarNull).Count
1
> @($collectionNull).Count
0
@mklement0

This comment has been minimized.

Show comment
Hide comment
@mklement0

mklement0 May 30, 2017

Contributor

@KirkMunro:

For Get-Content's behaviour, I still expect an empty string when invoking Get-Content with encoding set to anything other than Byte

I would expect that with -Raw only, but not otherwise. Without -Raw, Get-Content inherently retrieves a collection of lines, and returning a value that signals "no items in this collection" seems appropriate.

Consider non-ASCII files (e.g. UTF-8). They have a byte order mark included in them, so would $null be a good representation of their content when retrieved using the proper encoding?

With -Raw, distinguishing between a true zero-byte file and one solely comprising a BOM (Unicode signature) would be the only argument for using $null for a zero-byte file and '' for an Unicode-signature-only file.
But my sense is that this distinction is not worth making.

would be breaking changes and rejected accordingly

Note that a proposed change being breaking may be, but doesn't have to be a reason for rejection.

because someone may very well have scripts written that look something like this

    $content = @(Get-Content $filePath)
    # If the file is empty, skip it
    if ($content.Count -eq 0) {

That should continue to work fine, if a change is restricted to -Raw's behavior.

Or, considering the use of -Raw, someone may have scripts that do this:

    $content = Get-Content $filePath -Raw
    if ($content -eq $null) {
        continue
    }

That would indeed be a breaking change (unless we make -Raw return $null with zero-byte and Unicode-signature-only files - which is still worth considering, given that $null behaves like '' in most contexts).

That brings me back to how AutomationNull.Value is treated like a collection when used with -match/-notmatch or -like/-notlike, but not -eq/-ne. If it looks like $null but doesn't act like $null, it must be AutomationNull.Value. Try explaining how AutomationNull.Value works, coupled with considerations you should take into account when you are scripting around AutomationNull.Value, to a classroom and see how well they understand it afterwards.

I agree that the current behavior is inconsistent and confusing.

No PowerShell user should ever have to learn about [System.Management.Automation.Internal.AutomationNull]::Value (unless they like that sorta thing), but if the fundamental scalar / collection distinction worked consistently, they wouldn't need to.


To summarize: If backward compatibility weren't an issue, the following should be fixed:

  • What Get-Content -Raw returns.

  • Ensuring consistent behavior of array-aware operators with [System.Management.Automation.Internal.AutomationNull]::Value as the LHS.

Assuming we're in agreement there: What do the powers that be think?

Contributor

mklement0 commented May 30, 2017

@KirkMunro:

For Get-Content's behaviour, I still expect an empty string when invoking Get-Content with encoding set to anything other than Byte

I would expect that with -Raw only, but not otherwise. Without -Raw, Get-Content inherently retrieves a collection of lines, and returning a value that signals "no items in this collection" seems appropriate.

Consider non-ASCII files (e.g. UTF-8). They have a byte order mark included in them, so would $null be a good representation of their content when retrieved using the proper encoding?

With -Raw, distinguishing between a true zero-byte file and one solely comprising a BOM (Unicode signature) would be the only argument for using $null for a zero-byte file and '' for an Unicode-signature-only file.
But my sense is that this distinction is not worth making.

would be breaking changes and rejected accordingly

Note that a proposed change being breaking may be, but doesn't have to be a reason for rejection.

because someone may very well have scripts written that look something like this

    $content = @(Get-Content $filePath)
    # If the file is empty, skip it
    if ($content.Count -eq 0) {

That should continue to work fine, if a change is restricted to -Raw's behavior.

Or, considering the use of -Raw, someone may have scripts that do this:

    $content = Get-Content $filePath -Raw
    if ($content -eq $null) {
        continue
    }

That would indeed be a breaking change (unless we make -Raw return $null with zero-byte and Unicode-signature-only files - which is still worth considering, given that $null behaves like '' in most contexts).

That brings me back to how AutomationNull.Value is treated like a collection when used with -match/-notmatch or -like/-notlike, but not -eq/-ne. If it looks like $null but doesn't act like $null, it must be AutomationNull.Value. Try explaining how AutomationNull.Value works, coupled with considerations you should take into account when you are scripting around AutomationNull.Value, to a classroom and see how well they understand it afterwards.

I agree that the current behavior is inconsistent and confusing.

No PowerShell user should ever have to learn about [System.Management.Automation.Internal.AutomationNull]::Value (unless they like that sorta thing), but if the fundamental scalar / collection distinction worked consistently, they wouldn't need to.


To summarize: If backward compatibility weren't an issue, the following should be fixed:

  • What Get-Content -Raw returns.

  • Ensuring consistent behavior of array-aware operators with [System.Management.Automation.Internal.AutomationNull]::Value as the LHS.

Assuming we're in agreement there: What do the powers that be think?

@KirkMunro

This comment has been minimized.

Show comment
Hide comment
@KirkMunro

KirkMunro May 31, 2017

Contributor

There's one point in your comment that I get stuck on.

Without -Raw, Get-Content inherently retrieves a collection of lines, and returning a value that signals "no items in this collection" seems appropriate.

That's actually not true. If a file contains one line, Get-Content does not return a collection of one item. It just returns the only line that is in the file (i.e. it returns a string). That's why I was leaning towards the behaviour of both Get-Content and Get-Content -Raw being consistent when the file is either empty or when there is one line.

Regardless, I also want to hear from the PowerShell team because the point may be moot otherwise.

Contributor

KirkMunro commented May 31, 2017

There's one point in your comment that I get stuck on.

Without -Raw, Get-Content inherently retrieves a collection of lines, and returning a value that signals "no items in this collection" seems appropriate.

That's actually not true. If a file contains one line, Get-Content does not return a collection of one item. It just returns the only line that is in the file (i.e. it returns a string). That's why I was leaning towards the behaviour of both Get-Content and Get-Content -Raw being consistent when the file is either empty or when there is one line.

Regardless, I also want to hear from the PowerShell team because the point may be moot otherwise.

@rkeithhill

This comment has been minimized.

Show comment
Hide comment
@rkeithhill

rkeithhill May 31, 2017

Contributor

It just returns the only line that is in the file (i.e. it returns a string).

And that is the PowerShell way, no? I mean that is one reason we have @() to force an array when we get a scalar. It is also why foreach will iterate a scalar (exactly once).

Contributor

rkeithhill commented May 31, 2017

It just returns the only line that is in the file (i.e. it returns a string).

And that is the PowerShell way, no? I mean that is one reason we have @() to force an array when we get a scalar. It is also why foreach will iterate a scalar (exactly once).

@mklement0

This comment has been minimized.

Show comment
Hide comment
@mklement0

mklement0 May 31, 2017

Contributor

If you'll indulge me:

The arc of PowerShell history is long, but it bends toward collections.

The special-casing of one-element collections has always been a pain point - until PSv3, the Great Unifier, came along and allowed us to treat even scalars as if they're collections.

that is one reason we have @()

In that vein: and now mostly do not need anymore (except if there's a chance that an element of the collection has .Length / .Count properties or itself supports indexing).

In short: The PowerShell Way, methinks, is: Everything's a collection, unless told otherwise (such as with Get-Content -Raw).

Contributor

mklement0 commented May 31, 2017

If you'll indulge me:

The arc of PowerShell history is long, but it bends toward collections.

The special-casing of one-element collections has always been a pain point - until PSv3, the Great Unifier, came along and allowed us to treat even scalars as if they're collections.

that is one reason we have @()

In that vein: and now mostly do not need anymore (except if there's a chance that an element of the collection has .Length / .Count properties or itself supports indexing).

In short: The PowerShell Way, methinks, is: Everything's a collection, unless told otherwise (such as with Get-Content -Raw).

@iSazonov

This comment has been minimized.

Show comment
Hide comment
@iSazonov

iSazonov May 31, 2017

Collaborator

Based on the discussion above, and considering that Get-Content behavior can be simulated by other (third-party) cmdlets we should exclude Get-Content from the issue and consider only "LHS of array-aware operators" option.

Collaborator

iSazonov commented May 31, 2017

Based on the discussion above, and considering that Get-Content behavior can be simulated by other (third-party) cmdlets we should exclude Get-Content from the issue and consider only "LHS of array-aware operators" option.

@mklement0

This comment has been minimized.

Show comment
Hide comment
@mklement0

mklement0 Jun 1, 2017

Contributor

Good idea to separate the two discussions:

  • The Get-Content issue, irrespective of its specific significance, is worth considering separately, because I think getting clarity on the underlying scalar-vs.-collection debate is important for the future (as an aside: I don't think deciding whether or not to change Get-Content should be based solely on whether the issue can be worked around ):

    • I've therefore created #3911, based on the following assertion:
      None is a special case of many, not one.

Now that the focus of this issue on the behavior of array-aware operators with [System.Management.Automation.Internal.AutomationNull]::Value as the LHS, let me summarize:

Note: For brevity, and to give the construct a more memorable name, I'll refer to [System.Management.Automation.Internal.AutomationNull]::Value as a null collection from now on.

  • All array-aware operators should treat the same, which is currently not the case: among the operators discussed so far, only -match treats a null collection as an array, whereas the others (-eq, ge, ...) treat it like (scalar) $null.

  • If null collections should categorically be treated like arrays (which makes sense to me), the secondary question is whether they should - invariably, by definition - return:

    • either: an empty [System.Object[]] instance, as -match currently does.
    • or: a null collection ([System.Management.Automation.Internal.AutomationNull]::Value) too.
  • The alternative approach is to categorically treat null collections as $null in the context of expressions, because, according to the documentation:

When received in an evaluation where a value is required, it should be replaced with null.


To help with experimenting, I thought I'd provide convenience function Test-Null that makes it easier to distinguish between $null and [System.Management.Automation.Internal.AutomationNull]::Value:

A few sample calls:

> New-Item -Type File zero.txt

> (Get-Content zero.txt) | Test-Null
(null collection)

> (Get-Content -Raw zero.txt) | Test-Null
(null collection)

> (Get-Content -Encoding Byte zero.txt) | Test-Null
(null collection)

> $null | Test-Null
$null

> Test-Null ((Get-Content -Raw zero.txt) -match 'anything')
[System.Object[]]   # an empty array - null collection was treated like empty array

> Test-Null ((Get-Content -Raw zero.txt) -gt 0)
[System.Boolean]   # Boolean - null collection was treated like scalar $null

> Test-Null ($null -gt 0)
[System.Boolean]

Important:

  • To distinguish a null collection from an empty collection object, use the (implied) -InputObject parameter.

  • To distinguish $null from a null collection ([System.Management.Automation.Internal.AutomationNull]::Value), use the pipeline.

<#
.SYNOPSIS
Tests if the (first) input object is non-$null, an explicit (scalar) $null, or 
a null collection.

.DESCRIPTION

IMPORTANT: Choose between pipeline and parameter input depending on what cases
           you need to distinguish:
 
 * To distinguish between $null and a null collection, use *pipeline* input.

 * To distinguish between a null collection and an empty collection object,
   use the (implied) -InputObject *parameter*.
     * Note: Any collection you specify is treated as a *single* input object.

Output is a string that indicates one of 3 conditions; if there is more than
1 (non-null-collection) value in the pipeline, ' ...' is appended.

* '$null' ... an explicit, scalar $null value 

* '(null collection)' ... the [System.Management.Automation.Internal.AutomationNull]::Value
  singleton that is returned behind the scenes by cmdlet or function calls 
  that produce no output.

* '[<type>]' ... the full type name of the (first) input object, which implies
  an object that is neither $null nor the null collection.

.NOTES
Caveat re multiple pipeline input objects:
The type of the 1st object OTHER THAN 
[System.Management.Automation.Internal.AutomationNull]::Value is reported.
Hypothetically, you could send something like
  [System.Management.Automation.Internal.AutomationNull]::Value, 'foo' |
    Test-Null
in which case it is "foo"'s type - [System.String] - that is reported.

.EXAMPLE
> $noSuchVar | Test-Null
$null
.EXAMPLE
> Get-ChildItem noSuchFiles* | Test-Null 
(null collection)
.EXAMPLE
> Get-ChildItem / | Test-Null
[System.IO.DirectoryInfo] ...
.EXAMPLE
> & { return } | Test-Null
(null collection)
.EXAMPLE
> & { return $null } | Test-Null
$null
#>
function Test-Null {
  param(
    [AllowEmptyCollection()]
    [AllowEmptyString()]
    [AllowNull()]
    [Parameter(ValueFromPipeline)]
    $InputObject
  )

  begin {
   $havePipelineInput = $MyInvocation.ExpectingInput
   $didEnumerate = $false
   $multiplePipelineObjects = $False
   $firstInputObj = $InputObject
  }

  process {
    if ($didEnumerate) { $multiplePipelineObjects = $true; return }
    $firstInputObj = $InputObject
    $didEnumerate = $True
  }

  end {
    if ($havePipelineInput -and -not $didEnumerate) {
      '(null collection)'
      # Issue a courtesy hint re inability to detect an *empty collection* object via the pipeline.
      Write-Verbose -Verbose 'Hint: To distinguish a null collection from an empty collection object, use -InputObject.'
    } elseif (-not $havePipelineInput -and -not $PSBoundParameters.ContainsKey('InputObject')) {
      Throw "Please provide input either via the pipeline or via the (implied) -InputObject parameter."
    } else {
      if ($null -eq $firstInputObj) {  # $null
        '$null' + ' ...' * $multiplePipelineObjects
        if (-not $havePipelineInput) { 
          # Issue a courtesy hint re inability to detect a null collection as a *parameter* value.
          Write-Verbose -Verbose 'Hint: To distinguish $null from a null collection, use the pipeline.'
        }
      } else { # (at least 1) non-$null object
        "[$($firstInputObj.GetType().FullName)]" + ' ...' * $multiplePipelineObjects
      }
    }
  }
}
Contributor

mklement0 commented Jun 1, 2017

Good idea to separate the two discussions:

  • The Get-Content issue, irrespective of its specific significance, is worth considering separately, because I think getting clarity on the underlying scalar-vs.-collection debate is important for the future (as an aside: I don't think deciding whether or not to change Get-Content should be based solely on whether the issue can be worked around ):

    • I've therefore created #3911, based on the following assertion:
      None is a special case of many, not one.

Now that the focus of this issue on the behavior of array-aware operators with [System.Management.Automation.Internal.AutomationNull]::Value as the LHS, let me summarize:

Note: For brevity, and to give the construct a more memorable name, I'll refer to [System.Management.Automation.Internal.AutomationNull]::Value as a null collection from now on.

  • All array-aware operators should treat the same, which is currently not the case: among the operators discussed so far, only -match treats a null collection as an array, whereas the others (-eq, ge, ...) treat it like (scalar) $null.

  • If null collections should categorically be treated like arrays (which makes sense to me), the secondary question is whether they should - invariably, by definition - return:

    • either: an empty [System.Object[]] instance, as -match currently does.
    • or: a null collection ([System.Management.Automation.Internal.AutomationNull]::Value) too.
  • The alternative approach is to categorically treat null collections as $null in the context of expressions, because, according to the documentation:

When received in an evaluation where a value is required, it should be replaced with null.


To help with experimenting, I thought I'd provide convenience function Test-Null that makes it easier to distinguish between $null and [System.Management.Automation.Internal.AutomationNull]::Value:

A few sample calls:

> New-Item -Type File zero.txt

> (Get-Content zero.txt) | Test-Null
(null collection)

> (Get-Content -Raw zero.txt) | Test-Null
(null collection)

> (Get-Content -Encoding Byte zero.txt) | Test-Null
(null collection)

> $null | Test-Null
$null

> Test-Null ((Get-Content -Raw zero.txt) -match 'anything')
[System.Object[]]   # an empty array - null collection was treated like empty array

> Test-Null ((Get-Content -Raw zero.txt) -gt 0)
[System.Boolean]   # Boolean - null collection was treated like scalar $null

> Test-Null ($null -gt 0)
[System.Boolean]

Important:

  • To distinguish a null collection from an empty collection object, use the (implied) -InputObject parameter.

  • To distinguish $null from a null collection ([System.Management.Automation.Internal.AutomationNull]::Value), use the pipeline.

<#
.SYNOPSIS
Tests if the (first) input object is non-$null, an explicit (scalar) $null, or 
a null collection.

.DESCRIPTION

IMPORTANT: Choose between pipeline and parameter input depending on what cases
           you need to distinguish:
 
 * To distinguish between $null and a null collection, use *pipeline* input.

 * To distinguish between a null collection and an empty collection object,
   use the (implied) -InputObject *parameter*.
     * Note: Any collection you specify is treated as a *single* input object.

Output is a string that indicates one of 3 conditions; if there is more than
1 (non-null-collection) value in the pipeline, ' ...' is appended.

* '$null' ... an explicit, scalar $null value 

* '(null collection)' ... the [System.Management.Automation.Internal.AutomationNull]::Value
  singleton that is returned behind the scenes by cmdlet or function calls 
  that produce no output.

* '[<type>]' ... the full type name of the (first) input object, which implies
  an object that is neither $null nor the null collection.

.NOTES
Caveat re multiple pipeline input objects:
The type of the 1st object OTHER THAN 
[System.Management.Automation.Internal.AutomationNull]::Value is reported.
Hypothetically, you could send something like
  [System.Management.Automation.Internal.AutomationNull]::Value, 'foo' |
    Test-Null
in which case it is "foo"'s type - [System.String] - that is reported.

.EXAMPLE
> $noSuchVar | Test-Null
$null
.EXAMPLE
> Get-ChildItem noSuchFiles* | Test-Null 
(null collection)
.EXAMPLE
> Get-ChildItem / | Test-Null
[System.IO.DirectoryInfo] ...
.EXAMPLE
> & { return } | Test-Null
(null collection)
.EXAMPLE
> & { return $null } | Test-Null
$null
#>
function Test-Null {
  param(
    [AllowEmptyCollection()]
    [AllowEmptyString()]
    [AllowNull()]
    [Parameter(ValueFromPipeline)]
    $InputObject
  )

  begin {
   $havePipelineInput = $MyInvocation.ExpectingInput
   $didEnumerate = $false
   $multiplePipelineObjects = $False
   $firstInputObj = $InputObject
  }

  process {
    if ($didEnumerate) { $multiplePipelineObjects = $true; return }
    $firstInputObj = $InputObject
    $didEnumerate = $True
  }

  end {
    if ($havePipelineInput -and -not $didEnumerate) {
      '(null collection)'
      # Issue a courtesy hint re inability to detect an *empty collection* object via the pipeline.
      Write-Verbose -Verbose 'Hint: To distinguish a null collection from an empty collection object, use -InputObject.'
    } elseif (-not $havePipelineInput -and -not $PSBoundParameters.ContainsKey('InputObject')) {
      Throw "Please provide input either via the pipeline or via the (implied) -InputObject parameter."
    } else {
      if ($null -eq $firstInputObj) {  # $null
        '$null' + ' ...' * $multiplePipelineObjects
        if (-not $havePipelineInput) { 
          # Issue a courtesy hint re inability to detect a null collection as a *parameter* value.
          Write-Verbose -Verbose 'Hint: To distinguish $null from a null collection, use the pipeline.'
        }
      } else { # (at least 1) non-$null object
        "[$($firstInputObj.GetType().FullName)]" + ' ...' * $multiplePipelineObjects
      }
    }
  }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment