New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add char range overload to DotDot operator #5026

Merged
merged 7 commits into from Oct 28, 2017

Conversation

@IISResetMe
Contributor

IISResetMe commented Oct 5, 2017

This PR attempts to allow the DotDot operator (..) to generate
character ranges, in turn allowing shorthand syntax like:

$AtoZ = -join('A'..'Z')

Since PowerShell has no shorthand syntax for char literals, we
test whether the left-hand-side is of type String, and subsequently
attempts to convert both operands to a char (in accordance with
PowerShell operator overload behavior in general).

NOTE:

  1. This breaks a number of currently supported range operations, such
    as implicit numerical conversion of strings:
    '1'..'100'

  2. This implementation is fairly naive, making no attempts to assert whether the input characters (or any codepoints in between) fall outside UCS-2 (ie. surrogates). Attempts to expand ranges bound by surrogate pairs are expected to fail and not covered by this implementation (I will leave this an exercise for someone more experienced with unicode hackery than myself)

I was unable to locate existing tests for the integer range function, please advise on where to add those.

@msftclas

This comment has been minimized.

Show comment
Hide comment
@msftclas

msftclas Oct 5, 2017

CLA assistant check
All CLA requirements met.

msftclas commented Oct 5, 2017

CLA assistant check
All CLA requirements met.

@lzybkr lzybkr self-assigned this Oct 5, 2017

@lzybkr

The code changes look fine - but we need tests.

@@ -863,5 +863,34 @@ internal static object CompareIne(char lhs, char rhs)
char secondAsUpper = char.ToUpperInvariant(rhs);
return firstAsUpper != secondAsUpper ? Boxed.True : Boxed.False;
}
internal static object[] Range(char start, char end)

This comment has been minimized.

@lzybkr

lzybkr Oct 6, 2017

Member

What do you think about returning char[] instead?

@lzybkr

lzybkr Oct 6, 2017

Member

What do you think about returning char[] instead?

This comment has been minimized.

@iSazonov

iSazonov Oct 25, 2017

Collaborator

@IISResetMe Could you please address the comment?

@iSazonov

iSazonov Oct 25, 2017

Collaborator

@IISResetMe Could you please address the comment?

This comment has been minimized.

@IISResetMe

IISResetMe Oct 25, 2017

Contributor

I like the idea, but for now I think we should keep operator output consistent (ie. return object[] with individual items of the expected type).

There are are number of builtin operators that return object[] in this fashion (@(), -split, any comparison operator in array/filter mode etc.), changing the behavior should be a concerted effort across all of them IMO

@IISResetMe

IISResetMe Oct 25, 2017

Contributor

I like the idea, but for now I think we should keep operator output consistent (ie. return object[] with individual items of the expected type).

There are are number of builtin operators that return object[] in this fashion (@(), -split, any comparison operator in array/filter mode etc.), changing the behavior should be a concerted effort across all of them IMO

Show outdated Hide outdated src/System.Management.Automation/engine/runtime/Operations/NumericOps.cs Outdated
@@ -4901,6 +4903,11 @@ public object VisitBinaryExpression(BinaryExpressionAst binaryExpressionAst)
return Expression.Call(CachedReflectionInfo.TypeOps_AsOperator, lhs.Cast(typeof(object)), rhs.Convert(typeof(Type)));
case TokenKind.DotDot:
if(lhs.Type == typeof(string)){

This comment has been minimized.

@iSazonov

iSazonov Oct 10, 2017

Collaborator

Will this work:

'A'..'Z' | % { ... }
@iSazonov

iSazonov Oct 10, 2017

Collaborator

Will this work:

'A'..'Z' | % { ... }

This comment has been minimized.

@IISResetMe

IISResetMe Oct 10, 2017

Contributor

Yes, this will produce the sequence A (0x41) through Z (0x5A)

@IISResetMe

IISResetMe Oct 10, 2017

Contributor

Yes, this will produce the sequence A (0x41) through Z (0x5A)

This comment has been minimized.

@kborowinski

kborowinski Jan 21, 2018

Apparently it does not work on the Powershell Core 6.0.0 GA

> 'A'..'Z' | % {$_}
Cannot convert value "A" to type "System.Int32". Error: "Input string was not in a correct format."
At line:1 char:1
+ 'A'..'Z' | % {$_}
+ ~~~~~~~~~~~~~~~~~~
+ CategoryInfo          : InvalidArgument: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvalidCastFromStringToInteger
@kborowinski

kborowinski Jan 21, 2018

Apparently it does not work on the Powershell Core 6.0.0 GA

> 'A'..'Z' | % {$_}
Cannot convert value "A" to type "System.Int32". Error: "Input string was not in a correct format."
At line:1 char:1
+ 'A'..'Z' | % {$_}
+ ~~~~~~~~~~~~~~~~~~
+ CategoryInfo          : InvalidArgument: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvalidCastFromStringToInteger

This comment has been minimized.

@markekraus

markekraus Jan 21, 2018

Collaborator

@kborowinski there is an open issue for that #5519

@markekraus

markekraus Jan 21, 2018

Collaborator

@kborowinski there is an open issue for that #5519

This comment has been minimized.

@kborowinski

kborowinski Jan 21, 2018

@markekraus: sorry, I've missed that issue

@kborowinski

kborowinski Jan 21, 2018

@markekraus: sorry, I've missed that issue

internal static object[] Range(char start, char end)
{
int lower = (int)start;
int upper = (int)end;

This comment has been minimized.

@iSazonov

iSazonov Oct 10, 2017

Collaborator

I wonder - is this ASCII Arithmetic work correctly for unicode?

@iSazonov

iSazonov Oct 10, 2017

Collaborator

I wonder - is this ASCII Arithmetic work correctly for unicode?

This comment has been minimized.

@lzybkr

lzybkr Oct 10, 2017

Member

It won't work with surrogate pairs because those can't be represented in a char, but I don't think that is an interesting scenario.

@lzybkr

lzybkr Oct 10, 2017

Member

It won't work with surrogate pairs because those can't be represented in a char, but I don't think that is an interesting scenario.

This comment has been minimized.

@iSazonov

iSazonov Oct 10, 2017

Collaborator

Since we added `u{} should we detect surrogate pairs and write an error?

@iSazonov

iSazonov Oct 10, 2017

Collaborator

Since we added `u{} should we detect surrogate pairs and write an error?

This comment has been minimized.

@lzybkr

lzybkr Oct 10, 2017

Member

I guess we should error on invalid characters (half of the pair). We could also support surrogate pair ranges if we didn't convert the string to char immediately, instead analyzing the string more thoroughly. But I'm not sure I'd block this PR on either of those.

@lzybkr

lzybkr Oct 10, 2017

Member

I guess we should error on invalid characters (half of the pair). We could also support surrogate pair ranges if we didn't convert the string to char immediately, instead analyzing the string more thoroughly. But I'm not sure I'd block this PR on either of those.

This comment has been minimized.

@iSazonov

iSazonov Oct 11, 2017

Collaborator

I guess it will work with traditional languages in common scenarios but before I saw issues from China and Japan mans - if the new operator does not matter to them I agree with you.

@iSazonov

iSazonov Oct 11, 2017

Collaborator

I guess it will work with traditional languages in common scenarios but before I saw issues from China and Japan mans - if the new operator does not matter to them I agree with you.

This comment has been minimized.

@iSazonov

iSazonov Oct 24, 2017

Collaborator

@IISResetMe Could you please address the comment in code or add relevant comments in the PR description for future documentation?

@iSazonov

iSazonov Oct 24, 2017

Collaborator

@IISResetMe Could you please address the comment in code or add relevant comments in the PR description for future documentation?

This comment has been minimized.

@IISResetMe

IISResetMe Oct 24, 2017

Contributor

@iSazonov I don't feel confident enough in my unicode-fu to attempt to implement the appropriate handling in code for this at the moment, I've updated the PR description to reflect this deficiency instead.

@IISResetMe

IISResetMe Oct 24, 2017

Contributor

@iSazonov I don't feel confident enough in my unicode-fu to attempt to implement the appropriate handling in code for this at the moment, I've updated the PR description to reflect this deficiency instead.

This comment has been minimized.

@iSazonov

iSazonov Oct 25, 2017

Collaborator

Thanks.

Closed.

@iSazonov

iSazonov Oct 25, 2017

Collaborator

Thanks.

Closed.

@IISResetMe

This comment has been minimized.

Show comment
Hide comment
@IISResetMe

IISResetMe Oct 15, 2017

Contributor

Oops, rebased powershell:master into the branch, looks like all parent commits are in the PR now, how do I clean it up? 😞

Contributor

IISResetMe commented Oct 15, 2017

Oops, rebased powershell:master into the branch, looks like all parent commits are in the PR now, how do I clean it up? 😞

@markekraus

This comment has been minimized.

Show comment
Hide comment
@markekraus

markekraus Oct 15, 2017

Collaborator

@IISResetMe did you only have the 2 commits? if so you could do this (assuming PowerShell is the name of your upstream remote for the actual PowerShell repo). Make sure you stash or commit any workspace changes before doing this as they will be lost. I'm not a git expert, and maybe there is a better way. So maybe wait until someone more seasoned either "thumbs up" this or tells you just how wrong I am :)

git fetch
git checkout -b 20171015fix PowerShell/master
git cherry-pick 6e6dfffc2d0150b504585597ad4df997a174867a
git cherry-pick 6d6f20d28157ee49071f018051ac7b35be57bbb0
git checkout -b 20171015backup overload-DotDot-CharRange-1
git checkout overload-DotDot-CharRange-1
git reset --hard 20171015fix
git push --force

This will create a new 20171015fix branch at the current master on this repo, then cherry-picks your 2 commits, then backs up your existing branch to 20171015backup, and then hard reset your branch to the 20171015fix branch. This will both "rebase" your branch on to master and add your commits with out the merge mess. This will likely mess up any comment history here, but I guess it's already some what messed up.

Collaborator

markekraus commented Oct 15, 2017

@IISResetMe did you only have the 2 commits? if so you could do this (assuming PowerShell is the name of your upstream remote for the actual PowerShell repo). Make sure you stash or commit any workspace changes before doing this as they will be lost. I'm not a git expert, and maybe there is a better way. So maybe wait until someone more seasoned either "thumbs up" this or tells you just how wrong I am :)

git fetch
git checkout -b 20171015fix PowerShell/master
git cherry-pick 6e6dfffc2d0150b504585597ad4df997a174867a
git cherry-pick 6d6f20d28157ee49071f018051ac7b35be57bbb0
git checkout -b 20171015backup overload-DotDot-CharRange-1
git checkout overload-DotDot-CharRange-1
git reset --hard 20171015fix
git push --force

This will create a new 20171015fix branch at the current master on this repo, then cherry-picks your 2 commits, then backs up your existing branch to 20171015backup, and then hard reset your branch to the 20171015fix branch. This will both "rebase" your branch on to master and add your commits with out the merge mess. This will likely mess up any comment history here, but I guess it's already some what messed up.

IISResetMe added some commits Oct 5, 2017

Add char range overload to DotDot operator
This commit attempts to allow the DotDot operator (..) to generate
character ranges.

Since PowerShell has no shorthand syntax for char literals, this change
tests whether the left-hand-side is of type String, and subsequently
attempts to convert both operands to a char.

NOTE: This breaks a number of currently supported range operations, such
as implicit numerical conversion of strings:
'1'..'100'
Add RangeOperator.Tests.ps1
This commit adds basic tests for the current integer range (..) operator
@IISResetMe

These were the most basic tests I could think of for the existing Range operator. All examples I could find in the about_* help files pass these.

@IISResetMe

This comment has been minimized.

Show comment
Hide comment
@IISResetMe

IISResetMe Oct 15, 2017

Contributor

Thanks @markekraus, that seems to have done the trick.

Contributor

IISResetMe commented Oct 15, 2017

Thanks @markekraus, that seems to have done the trick.

Change RangeOperator.Tests.ps1 encoding, add tests
Add tests for the [char] overload feature. RangeOperator.Tests.ps1 encoding changed to UTF16LE (Windows "Unicode") to allows inline unicode chars in test case
@TravisEz13

This comment has been minimized.

Show comment
Hide comment
@TravisEz13

TravisEz13 Oct 16, 2017

Member

Can you make sure RangeOperator.Tests.ps1 is UTF8 without BOM? It shows up as binary.

Member

TravisEz13 commented Oct 16, 2017

Can you make sure RangeOperator.Tests.ps1 is UTF8 without BOM? It shows up as binary.

Change RangeOperator.Tests.ps1 encoding to UTF-8
Changed RangeOperator.Tests.ps1 encoding to UTF-8 (no BOM), substituted UTF-16 char literals with [char] casts
It "Range operator generates arrays of integers" {
$Range = 5..8
$Range.count | Should Be 4
$Range[0] | Should BeOfType [int]

This comment has been minimized.

@iSazonov

iSazonov Oct 21, 2017

Collaborator

Please add tests for values - $Range[0] | Should Be 5

@iSazonov

iSazonov Oct 21, 2017

Collaborator

Please add tests for values - $Range[0] | Should Be 5

This comment has been minimized.

@iSazonov

iSazonov Oct 24, 2017

Collaborator

Closed.

@iSazonov

iSazonov Oct 24, 2017

Collaborator

Closed.

Show outdated Hide outdated test/powershell/Language/Operators/RangeOperator.Tests.ps1 Outdated
Show outdated Hide outdated test/powershell/Language/Operators/RangeOperator.Tests.ps1 Outdated
Show outdated Hide outdated test/powershell/Language/Operators/RangeOperator.Tests.ps1 Outdated
Show outdated Hide outdated test/powershell/Language/Operators/RangeOperator.Tests.ps1 Outdated
Show outdated Hide outdated test/powershell/Language/Operators/RangeOperator.Tests.ps1 Outdated
Show outdated Hide outdated test/powershell/Language/Operators/RangeOperator.Tests.ps1 Outdated
Show outdated Hide outdated test/powershell/Language/Operators/RangeOperator.Tests.ps1 Outdated
Show outdated Hide outdated test/powershell/Language/Operators/RangeOperator.Tests.ps1 Outdated
Show outdated Hide outdated test/powershell/Language/Operators/RangeOperator.Tests.ps1 Outdated
Update RangeOperator.Tests.ps1
Fixed formatting, removed unnecessary checks
@TravisEz13

This comment has been minimized.

Show comment
Hide comment
@TravisEz13

TravisEz13 Oct 23, 2017

Member

@lzybkr Can you update your review?

Member

TravisEz13 commented Oct 23, 2017

@lzybkr Can you update your review?

@TravisEz13

This comment has been minimized.

Show comment
Hide comment
@TravisEz13

TravisEz13 Oct 23, 2017

Member

@iSazonov Can you indicate if you think this is ready or not (feel free to say if someone else signs off)?

Member

TravisEz13 commented Oct 23, 2017

@iSazonov Can you indicate if you think this is ready or not (feel free to say if someone else signs off)?

@iSazonov

This comment has been minimized.

Show comment
Hide comment
@iSazonov

iSazonov Oct 24, 2017

Collaborator

@TravisEz13 I updated three comment above - after they have been resolved we'll ready to merge.

Collaborator

iSazonov commented Oct 24, 2017

@TravisEz13 I updated three comment above - after they have been resolved we'll ready to merge.

Remove redundant code from Range()
Both Range() overloads have unnecessary branches for the special case of lhs == rhs, removed for simplicity.
Remove confusing/unnecessary RangeOp tests
Removed tests for .. using [long] and [bigint] input operands. I believe these will make more sense once we add support for ranges with values outside Int32, but for now might be confusing
@IISResetMe

This comment has been minimized.

Show comment
Hide comment
@IISResetMe

IISResetMe Oct 24, 2017

Contributor

@iSazonov I believe I've addressed all outstanding comments :-)

Contributor

IISResetMe commented Oct 24, 2017

@iSazonov I believe I've addressed all outstanding comments :-)

@iSazonov

This comment has been minimized.

Show comment
Hide comment
@iSazonov

iSazonov Oct 25, 2017

Collaborator

@IISResetMe Many thanks! We are near to merge. Please address @lzybkr comment above.

Collaborator

iSazonov commented Oct 25, 2017

@IISResetMe Many thanks! We are near to merge. Please address @lzybkr comment above.

@vors vors removed their request for review Oct 25, 2017

@IISResetMe

This comment has been minimized.

Show comment
Hide comment
@IISResetMe

IISResetMe Oct 25, 2017

Contributor

@iSazonov Done

Contributor

IISResetMe commented Oct 25, 2017

@iSazonov Done

@iSazonov

This comment has been minimized.

Show comment
Hide comment
@iSazonov

iSazonov Oct 25, 2017

Collaborator

@lzybkr Could you please take another look if you have a bit free time?

Collaborator

iSazonov commented Oct 25, 2017

@lzybkr Could you please take another look if you have a bit free time?

@lzybkr

lzybkr approved these changes Oct 26, 2017

@TravisEz13 TravisEz13 merged commit 9b32c1d into PowerShell:master Oct 28, 2017

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
license/cla All CLA requirements met.
Details

@IISResetMe IISResetMe deleted the IISResetMe:overload-DotDot-CharRange-1 branch Nov 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment