New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support array (collection) slicing with all-remaining and except-last-N elements logic #7940

Open
mklement0 opened this Issue Oct 3, 2018 · 6 comments

Comments

Projects
None yet
5 participants
@mklement0
Contributor

mklement0 commented Oct 3, 2018

Note:

  • What I'm proposing requires adding special handling to array indices, which is currently not the case: PowerShell has no special array-subscript syntax, it allows use of any expression, as long as it results in an array of (valid) indices.

  • While this is a very powerful concept, the lack of awareness of the array context precludes some useful features, hence this suggestion.

Note that a related array-slicing feature suggestion, #7928, does not require array awareness and could be implemented on the range operator (..) itself.


  • Provide all-remaining-elements logic:
# Sample array
$a = 'one', 'two', 'three', 'four', 'five'

# CURRENTLY required syntax for returning everything starting with the 3rd element:
$a[2..($a.Count-1)]
three
four
five

# WISHFUL THINKING: having $a.Count-1 be *implied* as the end of the range.
$a[2..]
three
four
five

Aside from being more concise, this has the added advantage of not needing the input array to be stored in a variable beforehand.

  • Provide an except-last-N-elements idiom:
# Sample array
$a = 'one', 'two', 'three', 'four', 'five'

# CURRENTLY required syntax for returning everything except the last 3 elements:
$a[0..($a.Count-1 - 3)]
one
two

# WISHFUL THINKING: allow specifying just N, without explicitly needing to refer
# to the end of the array.
# Note that $a[0..-3] does NOT work, because it creates array 0, -1, -2, which does something different.
$a[0..@-3]
one
two

As stated, this would require introducing special syntax specific to collection indexing, and such modified range expressions wouldn't make sense outside that context.

I am not wedded to the specific syntax forms proposed above - [<n>..] and [<m>..@-<n>], but what makes them appealing is not having to explicitly refer to the array being sliced in the expression.

The less desirable alternative (from an end-user perspective) would be to introduce a new automatic variable representing the array's highest index, such as $#.

Environment data

Written as of:

PowerShell Core 6.1.0

@mklement0 mklement0 changed the title from Support array (collection) slicing with implied endpoint and except-last-N logic to Support array (collection) slicing with all-remaining and except-last-N elements logic Oct 3, 2018

@BrucePay

This comment has been minimized.

Show comment
Hide comment
@BrucePay

BrucePay Oct 3, 2018

Member

History notes: When I first implemented ranges, I'd planned to support a unary range operator e.g. $a[1..] but never got around to it (obviously). For the upper bound, I'd been thinking about having a magic variable $end which would be equivalent to $a.Length-1 so you could write $a[1..$end] to get everything but the first element. And unfortunately we got the precedence "wrong" for computed endpoints (because we wanted to allow ranges to be concatenatable as in $low .. $middle + $high .. $veryhigh) so I don't really see how we can avoid parens in things like $a[1..($end-1)] but it's still much nicer than $a[1..($a.length-2)]

Member

BrucePay commented Oct 3, 2018

History notes: When I first implemented ranges, I'd planned to support a unary range operator e.g. $a[1..] but never got around to it (obviously). For the upper bound, I'd been thinking about having a magic variable $end which would be equivalent to $a.Length-1 so you could write $a[1..$end] to get everything but the first element. And unfortunately we got the precedence "wrong" for computed endpoints (because we wanted to allow ranges to be concatenatable as in $low .. $middle + $high .. $veryhigh) so I don't really see how we can avoid parens in things like $a[1..($end-1)] but it's still much nicer than $a[1..($a.length-2)]

@vexx32

This comment has been minimized.

Show comment
Hide comment
@vexx32

vexx32 Oct 3, 2018

Contributor

Is there any particular reason why $end itself cannot map cleanly to the $_.Length - 1 value, circumventing that issue?

Contributor

vexx32 commented Oct 3, 2018

Is there any particular reason why $end itself cannot map cleanly to the $_.Length - 1 value, circumventing that issue?

@mklement0

This comment has been minimized.

Show comment
Hide comment
@mklement0

mklement0 Oct 3, 2018

Contributor

@vexx32:

I think @BrucePay indeed meant $end to be $_.Length - 1, which requires no arithmetic in the all-remaining scenario, but obviously still does in the except-last-N scenario.

Note that in the all-remaining scenario you can even currently get away without arithmetic, because it's benign (though potentially confusing) to exceed the array bounds by 1:

$a=1,2,3; $a[1..$a.Count]  # works, though strictly speaking it should be `($a.Count - 1 )`
2,
3

Yes, the need for parentheses is unfortunate, but providing the automatic highest-index variable would indeed help (I suggested $#, because, unlike $end, it is currently a syntax error and therefore cannot clash with existing user variables).

That said, using the variable-less special syntax I proposed would make both problems go away (parentheses, need for new variable).

Are we open to special syntax in the context of indexing ($a[1..] and $a[0..@-1])?

Introducing $# ($end) wouldn't require special syntax, but it still amounts to special-casing the indexing context.

Contributor

mklement0 commented Oct 3, 2018

@vexx32:

I think @BrucePay indeed meant $end to be $_.Length - 1, which requires no arithmetic in the all-remaining scenario, but obviously still does in the except-last-N scenario.

Note that in the all-remaining scenario you can even currently get away without arithmetic, because it's benign (though potentially confusing) to exceed the array bounds by 1:

$a=1,2,3; $a[1..$a.Count]  # works, though strictly speaking it should be `($a.Count - 1 )`
2,
3

Yes, the need for parentheses is unfortunate, but providing the automatic highest-index variable would indeed help (I suggested $#, because, unlike $end, it is currently a syntax error and therefore cannot clash with existing user variables).

That said, using the variable-less special syntax I proposed would make both problems go away (parentheses, need for new variable).

Are we open to special syntax in the context of indexing ($a[1..] and $a[0..@-1])?

Introducing $# ($end) wouldn't require special syntax, but it still amounts to special-casing the indexing context.

@HumanEquivalentUnit

This comment has been minimized.

Show comment
Hide comment
@HumanEquivalentUnit

HumanEquivalentUnit Oct 3, 2018

I think a lot of this is handled by Select -Skip and -SkipLast, and even works out shorter and clearer to do that:

$a = 'one', 'two', 'three', 'four', 'five'

# CURRENTLY required syntax for returning everything starting with the 3rd element:
$a[2..($a.Count-1)]
$a|select -skip 2

# CURRENTLY required syntax for returning everything except the last 3 elements:
$a[0..($a.Count-1 - 3)]
$a|select -SkipLast 3

Being able to select arbitrary elements such as $a[4,7,1,3] is nice, but the way ranges include both endpoints, the precedence rules that mean 0..($x-1) needs parens, they way you can't add/subtract to a whole array like 1,2,3 -1 to make it 0,1,2..

What about leaving range .. alone, and introducing an entirely new slice operator, which is basically "Python's slice operator" ?

# PS ranges stay the same
$a[0..3]   # items 0,1,2,3

# Pythonic slicing 
# start:end
# start:end:step
# missing values for "and the rest"

$a[2:]    # items from index 2 through end

$a[:-2]    # from the start, stopping 2 before the end

$a[::3]    # from the start to the end, in steps of 3

$a[1:-2]   # items from index 1.. stopping 2 before the end

$a[2::3]   # items  from index 2 through end, step 3 at a time

?

it's benign (though potentially confusing) to exceed the array bounds by 1

Except in Strict-Mode, then it's System.IndexOutOfRangeException

HumanEquivalentUnit commented Oct 3, 2018

I think a lot of this is handled by Select -Skip and -SkipLast, and even works out shorter and clearer to do that:

$a = 'one', 'two', 'three', 'four', 'five'

# CURRENTLY required syntax for returning everything starting with the 3rd element:
$a[2..($a.Count-1)]
$a|select -skip 2

# CURRENTLY required syntax for returning everything except the last 3 elements:
$a[0..($a.Count-1 - 3)]
$a|select -SkipLast 3

Being able to select arbitrary elements such as $a[4,7,1,3] is nice, but the way ranges include both endpoints, the precedence rules that mean 0..($x-1) needs parens, they way you can't add/subtract to a whole array like 1,2,3 -1 to make it 0,1,2..

What about leaving range .. alone, and introducing an entirely new slice operator, which is basically "Python's slice operator" ?

# PS ranges stay the same
$a[0..3]   # items 0,1,2,3

# Pythonic slicing 
# start:end
# start:end:step
# missing values for "and the rest"

$a[2:]    # items from index 2 through end

$a[:-2]    # from the start, stopping 2 before the end

$a[::3]    # from the start to the end, in steps of 3

$a[1:-2]   # items from index 1.. stopping 2 before the end

$a[2::3]   # items  from index 2 through end, step 3 at a time

?

it's benign (though potentially confusing) to exceed the array bounds by 1

Except in Strict-Mode, then it's System.IndexOutOfRangeException

@mklement0

This comment has been minimized.

Show comment
Hide comment
@mklement0

mklement0 Oct 4, 2018

Contributor

@HumanEquivalentUnit:

As a general rule, expression-mode solutions and pipeline solutions aren't interchangeable, for performance reasons.

Yes, Select -Skip and Select -SkipLast are the functional equivalent of what we're looking for in the realm of pipelines, but they're not an option for performant code in the realm of expressions.

What about leaving range .. alone, and introducing an entirely new slice operator

My preference is to make do with minor tweaks to the existing range-operator syntax, to avoid confusion and reduce complexity (another thing to learn).


As an aside:

Except in Strict-Mode, then it's System.IndexOutOfRangeException

Good point, though, to be precise, it is Set-StrictMode -Version 3 or higher, and given the limitations of Set-StrictMode, I tend to stay away from it:

Contributor

mklement0 commented Oct 4, 2018

@HumanEquivalentUnit:

As a general rule, expression-mode solutions and pipeline solutions aren't interchangeable, for performance reasons.

Yes, Select -Skip and Select -SkipLast are the functional equivalent of what we're looking for in the realm of pipelines, but they're not an option for performant code in the realm of expressions.

What about leaving range .. alone, and introducing an entirely new slice operator

My preference is to make do with minor tweaks to the existing range-operator syntax, to avoid confusion and reduce complexity (another thing to learn).


As an aside:

Except in Strict-Mode, then it's System.IndexOutOfRangeException

Good point, though, to be precise, it is Set-StrictMode -Version 3 or higher, and given the limitations of Set-StrictMode, I tend to stay away from it:

@mklement0

This comment has been minimized.

Show comment
Hide comment
@mklement0

mklement0 Oct 5, 2018

Contributor

@rkeithhill points out that the upcoming C# 8 will gain the features discussed here, as covered here and here.

If I read the linked articles correctly (and keep in mind that things might change before release):

Some of what's coming to C# 8 has been a part of PowerShell since the beginning (kudos, PowerShell):

C# 8 PS
^1 -1
1..2 (same)

Here's what's new, which covers what this issue proposes with some extra syntactic sugar:

C# 8 PS now This proposal (so far, written without knowledge of C# 8)
1.. 1..($arr.Length-1) 1..
..9 0..9
^9.. -9..-1 -9..
.. 0..($arr.Length-1)
1..^1 1..($arr.Length-1 - 1) 1..@-1

Important: In C# 8, array slices will actually refer to the returned elements in place, via Span objects, so that modifying elements of a slice means modifying the original array.
We cannot do that in PowerShell, as this behavior would be a breaking change, given that historically slices have been new arrays.
This is an important distinction users will have to be aware of, despite the syntax similarities.

Syntax-wise, an option is therefore to go with ^ instead of @- for the index-from-end syntax to align with future C#.
(I'm not sure why ^ was chosen for C#).

..9 for 0..9, though less compelling than 9.., might be nice for symmetry.

As for ..: I'm unclear on what its purpose is in C# 8, given that you'd essentially get the original array back; if implemented in PowerShell, it would be a concise way to create a (shallow) clone of an array.

Contributor

mklement0 commented Oct 5, 2018

@rkeithhill points out that the upcoming C# 8 will gain the features discussed here, as covered here and here.

If I read the linked articles correctly (and keep in mind that things might change before release):

Some of what's coming to C# 8 has been a part of PowerShell since the beginning (kudos, PowerShell):

C# 8 PS
^1 -1
1..2 (same)

Here's what's new, which covers what this issue proposes with some extra syntactic sugar:

C# 8 PS now This proposal (so far, written without knowledge of C# 8)
1.. 1..($arr.Length-1) 1..
..9 0..9
^9.. -9..-1 -9..
.. 0..($arr.Length-1)
1..^1 1..($arr.Length-1 - 1) 1..@-1

Important: In C# 8, array slices will actually refer to the returned elements in place, via Span objects, so that modifying elements of a slice means modifying the original array.
We cannot do that in PowerShell, as this behavior would be a breaking change, given that historically slices have been new arrays.
This is an important distinction users will have to be aware of, despite the syntax similarities.

Syntax-wise, an option is therefore to go with ^ instead of @- for the index-from-end syntax to align with future C#.
(I'm not sure why ^ was chosen for C#).

..9 for 0..9, though less compelling than 9.., might be nice for symmetry.

As for ..: I'm unclear on what its purpose is in C# 8, given that you'd essentially get the original array back; if implemented in PowerShell, it would be a concise way to create a (shallow) clone of an array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment