about_parsing additional expression mode start characters #3440

msftrncs · 2018-12-16T07:04:59Z

Issue Details

I have left the template blank at this time, but I believe this might affect all versions.

In argument mode, each value is treated as an expandable string unless it begins with one of the following special characters: dollar sign ($), at sign (@), single quotation mark ('), double quotation mark ("), or an opening parenthesis (().

If preceded by one of these characters, the value is treated as a value expression.

On about_parsing, there are additional characters that cause the expression mode, including:

'-' or '+' or '!' as long as it is in turn followed by a numeric value or one of the other mentioned characters, ie, a valid expression.

-$a or -2 or !$a is an expression mode
-hello or +hello or !test is an argument mode

Also, I think the last line should actually be part of the bulleted paragraph.

I think this whole section also fails to explain that the mode determination is based on the first token of each command statement and then applies to the remainder of the statement, noting that parenthesis and braces begin a new sub-statement. Instead this section implies that the mode is determine for each token (it mentions the tokens are interpreted independently, but they are not, as interpretation of all tokens after the first one depend on the first one's evaluation). Assuming a function named 'hello', hello $a-3, the token $a-3 is still treated as an expandable string, not an expression. An example in the doc also shows this, Write-Output $a/H.

Also, I think the word 'value' may have gotten used where 'token' should have been used, for consistency. The doc started off talking about 'tokens' and then switches to 'values' in the bulleted items.

Document Details

Do not edit this section. It is required for docs.microsoft.com GitHub issue linking.

ID: efd8c446-9ff3-8331-a996-4fe6226f2c6d
Version Independent ID: b9957906-f58b-86de-7907-cc2f94174af3
Content: about_Parsing - PowerShell
Content Source: reference/7.0/Microsoft.PowerShell.Core/About/about_Parsing.md
Product: powershell
Technology: powershell-conceptual
GitHub Login: @SteveL-MSFT
Microsoft Alias: slee

The text was updated successfully, but these errors were encountered:

msftrncs · 2018-12-18T04:58:33Z

I noticed today that the opening bracket [ may also be one of those characters that define expression mode.

mklement0 · 2019-03-30T02:59:31Z

@msftrncs, some good points, but I don't think -, +, or [ force expression mode, as the following examples demonstrate:

(write-output -10).GetType().Name
(write-output +10).GetType().Name
(write-output !0).GetType().Name
(write-output [string]).GetType().Name

All these commands output String, indicating that the arguments were parsed as strings (following these chars. with a variable reference wouldn't make a difference).

Unquoted "number-looking" literals without a sign - e.g., 10, 0xa, 2.0 - are, however, half-parsed in expression mode: they are parsed as (suitably typed) numbers that, however, retain their original string representation via their [psobject] wrapper:

PS> (Write-Output 0xa).GetType().Name
Int32   # !! Parsed as number

# However, on output the original string representation is retained:
PS> Write-Output 0xa
0xa  # !! Not, 10, as you would get with Write-Output (0xa)

This awkward hybrid behavior must be retained for backward compatibility, however; it is implemented inconsistently in PowerShell code, unfortunately: see PowerShell/PowerShell#9157 for background information (note what said issue proposes as a resolution is misguided - I'm planning to revise it soon).

However, a char. that is missing from the list of expression-initiating chars. is {, because an unquoted {...} token is parsed as a script block:

PS> Write-Output {ha}
ha   # !! stringified script-block == literal contents between { and }

As for compound tokens such as $a/H, see #3038, whose basis is PowerShell/PowerShell#6467.

msftrncs · 2019-03-30T03:08:04Z

@mklement0, in the examples of (write-output xxx), you have already switched to aurgument mode. My issue was when sitting at the point between when deciding to go to expression mode or argument mode. Once in a given mode, that mode remains until certain delimiting characters come along, such as the closing ')' in your examples. The opening '(' does two things, it switches the mode to expression mode, but then enters a subexpression, which starts with a new decision of argument mode or expression mode. The command name sends the sub-expression to argument mode. If instead of a command name, a + character appeared, it would have switched to expression mode.

msftrncs · 2019-03-30T03:10:57Z

I should note in my previous comment about '[', that only if the contents after the '[' is a type name, and not an attribute, will it result in expression mode, so the '[' by itself does not mark the switch to expression mode.

mklement0 · 2019-03-30T03:23:58Z

@msftrncs: I thought this issue is about what characters at the start of a token in argument mode decide whether argument mode or expression mode is chosen for that argument - that's the passage from the docs you're quoting in the initial post is about - and that passage is missing {.

My point was that +, -, -, and [ at the start of a token in argument mode do not switch to expression mode - are we in agreement there?

[ starting something that looks like a type literal doesn't change anything:

PS> Write-Output [int]
[int]  # string literal
PS> (Write-Output [int]).GetType().Name
String

Now, intra-token use of special characters seems to follow the same rules: inside an unquoted token, encountering one of the special chars.:

starts a new parsing context
implicitly ends the previous argument

PS> Write-Output a(2)  # parses as *2* arguments
a  # string literal 'a'
2 # [int] 2 due to (2) being parsed as an expression

Again, a + does not do that:

PS> Write-Output foo+10
foo+10  # single string literal

mklement0 · 2019-03-30T03:34:39Z

@msftrncs: I think I now understand where the confusion lies:

You are talking about what characters determine whether to enter argument or expression mode, either at the start of a statement (start of a line or after ; or |) or after (, $( , @(, or { have forced a new parsing context in argument mode.

And, yes, I agree that it's worth spelling out how that decision is made in the docs.

msftrncs · 2019-03-30T03:53:00Z

@mklement0,

You are talking about what characters determine whether to enter argument or expression mode, either at the start of a statement (start of a line or after ; or |) or after (, $( , or @( have forced a new parsing context in argument mode.

Correct, as that is what I take the 'about_parsing' document to be referring to.

mklement0 · 2019-03-30T04:34:00Z

@msftrncs:

about_Parsing currently discusses the two parsing modes separately.
The passage you quote in your the initial post relates to already being in argument mode ("In argument mode, ..."), and what rules apply to command arguments - it is that behavior that I've tried to clarify in my previous comments.

What's missing is a description of what determines which parsing mode is chosen when:

The rules, from what I understand are (I have not looked at the source code):

A new parsing context (in which the decision between argument and expression mode must be made) is entered:
- at the start of a new statement (e.g., the start of a line or after a statement-separating ;)
- at the start of a new pipeline segment (after |), though semantically only commands (argument mode), not expressions are allowed there.
- inside $(...), @(...), (...), and {...} in expression mode
- inside $(...) in double-quoted strings
- inside $(...), @(...), (...), and {...} in argument mode, where recognized as such
Argument mode is entered:
- if the first token is syntactically an unquoted command name (e.g., Get-Date or git)
- or it is one of the command-invocation operators, & or .
  - & is by itself unequivocally the call operator.
  - ., by contrast, is only recognized as the dot-sourcing operator if followed by a space, $, (, $(, ', ", or {
    - otherwise:
      - if followed by a decimal integer, it is interpreted as a decimal fraction and therefore starts an expression (e.g., .7)
      - otherwise: it is interpreted as the start of a command name (e.g., .foo)
Expression mode is entered with any of the following:
- The characters that are also special when already in argument mode (which includes {, as mentioned)
  - To recap, these are: $ @ ' " ( {
  - $(, @(, and ( start subexpressions ($(...), @(...), (...)) that constitute a new, nested parsing context, inside of which the parsing mode is determined anew.
  - Similarly, {, which starts a script block ({...}), constitutes a new, nested parsing context.
- [ - it is invariably interpreted as the start of a type literal such as [int].
- +
- -, but only if followed by +, -, a space, a number literal, or any of the special characters from argument mode.
  - Counterexample: -foo is interpreted as a command name (argument mode).
- ., but only if followed by a decimal integer without sign; e.g., .123, which is the same as 0.123 and therefore a [double].
- Number literals (e.g., 10, 0xa, 2.0, 1e2) - optionally preceded by + or -
  - A digit by itself do not necessarily start an expression; counterexample: 7z

msftrncs · 2019-03-30T05:22:49Z

I understand what you are getting at, @mklement0. Its dawned on me now that this particular document is trying to describe something differently that I am thinking. However, I still have a problem with it. I think its describing the wrong subject, and its phrasing everything poorly. I think it was understood well from other areas that '(' starts a subexpression. However, '$' does not start an expression in the same sense. Yes, I can reference members of the object, but I cannot go beyond that, so I do not feel that is an expression, and definitely not expression mode. (I consider it a reference)

I think we both agree that people need a better explanation of the mode switch that occurs when you start a statement with a function name, versus a variable reference (as one example). A function name expects to be followed by parameter arguments, of which may be expressions if that's what they are called, but expression mode, allows arithmetic and the other operators, which cannot be used directly in an argument without a subexpression, and this is what I always thought the 'about_parsing' document was trying to describe.

BTW, Here is my list of what is allowed with the dot-source operator, in REGEX form, as is in my PR #156 in EditorSyntax:

\.(?=\*?[\s,;&|{}]|\$[\p{L}$?^:_{])

The . may be followed directly by a *, or the characters you stated, or a &, or a |. I am missing the quotes, and the sub-statement. I don't know how I determined a '', but I cannot confirm that works now. I see I need to add more, and clean up the ''. (Yes, the & or | represent invalid arguments, but from a syntax highlighting point of view, they have to be allowed.)

Everything else you list I have handled.

Ultimately I think I will close this issue. I think the document needs clarified, but that can be a new issue that starts off on the right context.

mklement0 · 2019-03-30T05:45:24Z

Thanks, @msftrncs.

Fully agreed that there's a lot of room for improvement of the help topic at hand.

Good point re a variable reference (e.g. $var) at the start of an argument in argument mode not being a full expression, though you can access a property or call a method; the help topic currently calls it a value expression, but that term isn't defined.
Inside an argument string expansion rules then apply, where you can only reference a variable as a whole (Write-Output $PSVersionTable.PSEdition -> 'Core' vs. Write-Output foo$PSVersionTable.PSEdition -> 'fooSystem.Management.Automation.PSVersionHashTable.PSEdition')
I just remembered that I tried to write a comprehensive overview of how unquoted arguments are parsed in this Stack Overflow answer.

msftrncs · 2019-03-30T06:10:47Z

On the unquoted expansion, the only thing I am having problems parsing in TextMate is a reference such as

echo $a$b$c.length

You get .length (if all variables are unassigned), but

echo $a.length$b.length$c.length yields '0 0 0' (each on separate lines)

mklement0 · 2019-04-01T03:19:52Z

Yeah, that's surprising:

a$b$c.length is treated like "a$b$c.length", i.e., like an expandable string.

$a.length$b.length$c.length is treated like $a.length $b.length $c.length, i.e., 3 distinct arguments.

Such compound tokens - by that I mean the direct concatenation of two or more distinct syntax constructs that may or may not be parsed as a single argument - show surprising behavior:

If the first token is an expression (something that starts with one of the special argument-mode chars.), whatever comes after starts a new argument.
- The exception - which you've observed - is if the expression is a simple variable reference ($a), as opposed to a variable reference plus member access ($a.length) - see next point.
If the first token is either an unquoted literal or a simple variable reference, it is combined with what comes after as an implicit expandable string, with the added feature of recognizing not only $(...) as part of the same compound token, but also quoted tokens, which have their quotes stripped.
- Note that an additional token starting with just ( rather than $( as well as with { again starts a new argument (with @( and @{ not getting recognized as such, so that the @ is appended to the previous token).

This makes for a surprising asymmetry, which is the subject of PowerShell/PowerShell#6467

Notable examples:

##  $(...) asymmetry

# $(...) after unquoted literal: 1 string argument
PS> Write-Output 3$(1+2)
33

# $(...) before unquoted literal: 2 arguments (both [int])
PS> Write-Output $(1+2)3
3
3

## Quoted-string asymmetry:

# Quoted string after unquoted literal: 1 string argument (with quotes stripped)
PS> Write-Output 3'3'
33

# Quoted string before unquoted literal: 2 arguments (1 string, 1 [int])
PS> Write-Output '3'3
3
3

sdwheeler · 2020-04-10T17:28:17Z

Adding link to PowerShell/PowerShell#6467

sdwheeler added Reference area-engine Area - PowerShell engine labels Dec 17, 2018

sdwheeler removed the Reference label Apr 17, 2019

rjmholt self-assigned this Mar 12, 2020

sdwheeler mentioned this issue Apr 10, 2020

Docs needed for 'Parsing of compound command-line tokens into arguments is surprising' #3038

Closed

This was referenced Jul 6, 2020

Argument-mode parsing: Why is a subexpression at the start of an unquoted token treated differently than a variable reference? PowerShell/PowerShell#3217

Closed

Unquoted strings PoshCode/PowerShellPracticeAndStyle#146

Open

sdwheeler added the area-native-cmds Area - native command support label Jul 22, 2020

mklement0 mentioned this issue Apr 20, 2021

Incorrect arg splitting when using .. syntax for git log PowerShell/PowerShell#15279

Open

sdwheeler assigned sdwheeler and unassigned rjmholt Sep 7, 2021

sdwheeler mentioned this issue Sep 7, 2021

Update Parsing and Quoting articles #8053

Merged

24 tasks

chasewilson closed this as completed in #8053 Sep 7, 2021

mklement0 mentioned this issue Mar 4, 2024

The splatting operator '@' cannot be used to reference variables in an expression. '@trigger' can be used only as an argument to a command. To reference variables in an expression use '$trigger' PowerShell/PowerShell#21291

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about_parsing additional expression mode start characters #3440

about_parsing additional expression mode start characters #3440

msftrncs commented Dec 16, 2018 •

edited by sdwheeler

msftrncs commented Dec 18, 2018

mklement0 commented Mar 30, 2019

msftrncs commented Mar 30, 2019

msftrncs commented Mar 30, 2019

mklement0 commented Mar 30, 2019 •

edited

mklement0 commented Mar 30, 2019 •

edited

msftrncs commented Mar 30, 2019

mklement0 commented Mar 30, 2019 •

edited

msftrncs commented Mar 30, 2019

mklement0 commented Mar 30, 2019 •

edited

msftrncs commented Mar 30, 2019

mklement0 commented Apr 1, 2019 •

edited

sdwheeler commented Apr 10, 2020

about_parsing additional expression mode start characters #3440

about_parsing additional expression mode start characters #3440

Comments

msftrncs commented Dec 16, 2018 • edited by sdwheeler

Issue Details

Document Details

msftrncs commented Dec 18, 2018

mklement0 commented Mar 30, 2019

msftrncs commented Mar 30, 2019

msftrncs commented Mar 30, 2019

mklement0 commented Mar 30, 2019 • edited

mklement0 commented Mar 30, 2019 • edited

msftrncs commented Mar 30, 2019

mklement0 commented Mar 30, 2019 • edited

msftrncs commented Mar 30, 2019

mklement0 commented Mar 30, 2019 • edited

msftrncs commented Mar 30, 2019

mklement0 commented Apr 1, 2019 • edited

sdwheeler commented Apr 10, 2020

msftrncs commented Dec 16, 2018 •

edited by sdwheeler

mklement0 commented Mar 30, 2019 •

edited

mklement0 commented Mar 30, 2019 •

edited

mklement0 commented Mar 30, 2019 •

edited

mklement0 commented Mar 30, 2019 •

edited

mklement0 commented Apr 1, 2019 •

edited