Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: List, Indexer (Dictionary) and String Patterns #5811

Closed
alrz opened this issue Oct 9, 2015 · 17 comments
Closed

Proposal: List, Indexer (Dictionary) and String Patterns #5811

alrz opened this issue Oct 9, 2015 · 17 comments

Comments

@alrz
Copy link
Contributor

alrz commented Oct 9, 2015

Following patterns are based on the syntax proposed in #206.

List Patterns

List patterns can be used on arrays and all types that have implemented the IList interface . It would iterate through the list and match the list-subpatterns.

Syntax

list-pattern:
{ list-subpatterns }
{ }

list-subpatterns:
 list-subpattern
 list-subpatterns
, list-subpattern

list-subpattern:
 pattern
 sequence-wildcard-pattern

sequence-wildcard-pattern:
**
***

Remarks

  • { } matches an empty list. Determined by Enumerable.Any or ICollection.Count based on type.
  • ** matches any sequence of one or more items.
  • *** matches any sequence of zero or more items.
  • To keep the rules simple, sequence wildcards can be used only one time in the whole pattern.
  • At runtime, considering sequence wildcards, it first calculates the minimum/maximum/exact length of the list, if it matches then it starts to iterate.

EDIT 1: Sequence wildcards are capable to be used also in tuple patterns.

switch(tuple) {
  case (true, **): ...
}

It would match a tuple with any length equal or greater than two.

EDIT 2: It would be nice to be able to use slices (#120) to catch the "rest" of the list at once, e.g.

arr is { int first, .. int[:] rest }

// equivalent to
if(arr is { int first, *** }) {
  rest[:] = arr[1:]; // skips the first item
}

Examples

switch(array) {
    // matches an empty array
    case { }:           
    // matches an array with exact length of two
    case { var i1, var i2 }:           
    // matches an array of tuples with minimum length of two and returns the first two item
    // with the first item deconstructed into t1 and t2
    case { (var t1, var t2) , var i2, *** }:           
    // matches an array with the minimum length of three,
    // picks the last three item and skips the one before the last
    case { *** , var i1, *, var i2 }: 
    // matches an array with the minimum length of three, returns the first and last item
    case { var i1, ** , var i2 }:
}

Indexer Patterns

Indexer patterns can be used on all types with an indexer, just like dictionary initializers.

Syntax

indexer-pattern:
{ indexer-subpatterns }

indexer-subpatterns:
 indexer-subpattern
 indexer-subpatterns
, indexer-subpattern

indexer-subpattern:
[ constant-expression-list ] is pattern

An indexer pattern matches the values extracted by the use of indexer with the specified arguments. Following the property pattern syntax, we use an is keyword instead of = sign. The whole match fails only if the one of indexer subpatterns fail. The order in which subpatterns are matched is not specified, and a failed match may not match all subpatterns. It's a compiler-error if type of the expression _e_ have not defined an indexer with parameters of type constant-expression-list, hence it's pattern incompatible.

EDIT: The indexer pattern syntax can be mixed with property-pattern so one can check a indexer and a property in the same pattern (analogous to object initializers).

String Patterns

String patterns can be considered as inverted string interpolation, and of course they can be combined with regular expression literals (#5806):

var str = "str123";
if(str is ~$"[a-z]+{int number:\d+}") {
    Debug.Assert(number == 123);
}

Behind the scene, this creates a regular expression group at the interpolating section ({ }) and applies the formatting part (\d+) to it, when the regex succeed, then it tries to convert the group capture via TypeConverter back to the variable type. If regular expression doesn't match or conversion wasn't successful, pattern fails.

Default type for variables in string patterns is System.String. The syntax for string patterns is pretty much the same as string interpolation, but formatting part after colon has a different meaning. This can be extended to bind repeating captures to an array or IEnumerable.

@gafter
Copy link
Member

gafter commented Oct 9, 2015

I have a bit of a problem with a general expression appearing as part of the pattern, as you have in the indexer pattern.

@alrz
Copy link
Contributor Author

alrz commented Oct 9, 2015

@gafter to avoid assignments in the pattern I think conditional-or-expression would be sufficient, right?

@gafter
Copy link
Member

gafter commented Oct 9, 2015

@alrz To avoid side-effects I suggest constant-expression.

@orthoxerox
Copy link
Contributor

What if I want to capture that *** into an variable so I can match recursively/repeatedly? Like...

void Value Cond(IReadOnlyList<Expr> list)
{
    while (list.Count != 0) {
        switch (list) {
            case { defaultCase }:
                return defaultCase.Evaluate();
            case { condition, result, *** tail }:
                if (condition.Evaluate() != Value.Nil) {
                    return result.Evaluate();
                } else {
                    list = tail;
                }
                break;
        }
    }
    return Value.Nil;
}

@svick
Copy link
Contributor

svick commented Oct 9, 2015

@orthoxerox How exactly would that work? How would the compiler create an IReadOnlyList<Expr> from the tail? Is that supposed to work on any type that implements IEnumerable or only some list of collection types known to the compiler?

@svick
Copy link
Contributor

svick commented Oct 9, 2015

@alrz

At runtime, considering sequence wildcards, it first calculates the minimum and/or maximum length of the list, if it matches then it starts iterate.

How do you figure out whether the count matches before iterating if the type is just IEnumerable<T> (i.e. it does not implement ICollection, or any other type that has Count)?

Or is that just an optimization that does not apply to general IEnumerable<T>?

@alrz
Copy link
Contributor Author

alrz commented Oct 9, 2015

@orthoxerox Wildcards are used solely for ignoring items. Actually, you're asking for array slices (#120) in patterns, which are not necessary. you can still use the slicing syntax in the case body.

@alrz
Copy link
Contributor Author

alrz commented Oct 9, 2015

@svick It will match the length while iterating, so if IEnumerbale length exceeds or doesn't reach the pattern length, pattern fails.

@AdamSpeight2008
Copy link
Contributor

Isn't syntax going to clash with collection initialisers, and array literals (VB).
Indexer syntax could be an issue in VB, since it doesn't use square brackets for array indexing.
VB as a pre-existing feature that is similar to regex-literals, The Like Operator

If thisString Like "[A-Z][A-Z]#" Then

@AdamSpeight2008
Copy link
Contributor

[ 1 : 10] should produce a Range

Structure Range
  Public ReadOnly Property XB As Int32 'Starting Index
  Public ReadOnly Property XE As Int32 'Finishing Index
End Structure

Which would allow use the specify a range as an Indexer parameter
foo[ [2 : 4] ] or mystring[ [ [1 : 4] ]
The Range could also be IEnumerable<Int>, which generates the sequence (inclusive, inclusive)
eg [2 : 4] produces { 2, 3, 4 }

@alrz
Copy link
Contributor Author

alrz commented Oct 9, 2015

@AdamSpeight2008 The syntax proposed here is not intended to be used in VB. The pattern that Like operator is using, is not a regular expression, nor it supports extracting values from strings the way that string patterns would do. IIRC there is no such thing as Range and your examples refer to slicing (#120) which is unrelated to the subject of this topic.

@AdamSpeight2008
Copy link
Contributor

@alrz
You should be consider the VB implementation as well, cough co-evolution cough
I said Like is similar not exactly.
IIRC there is no such thing as Slice cough
Range is not the same a `Slice

  • Slice is Index, Count of an underlying source collection.
  • Range is Indexbegin, Indexend is just begin and end, No collection involved.
    • you could specify Indexend, Indexbegin and get the reverse sequence.

(see Nemerle List Comprehensions)

@AdamSpeight2008
Copy link
Contributor

... would be good alternative to ***
What if we allowed unary postfix operators '_' and + so the _pattern objects* can implement
'__+' (one or more wildcard items) an d__* (zero or more wildcard operators).
If reqex literal are includes the extra operators could be handy.

@vladd
Copy link

vladd commented Oct 10, 2015

@alrz I've got mixed feelings about the string pattern part. It is a very powerful tool in the form as you propose, but of course they have a problem of silently introducing regular expression machinery into a simple syntactic construction. There was enough outcry against the fact that innocently-looking $"{x}" brings in a rather heavy string.Format behind the scenes, so I doubt that bringing in an even heavier Regex would enjoy a lot of support of the community. Such a feature looks better in scripting languages, where the performance considerations are usually neglected.

My preference would be to (somehow) bring the "back interpolation" to the Regex class, so that instead of analysing Matches one could get them auto-parsed into the outer variables. Unfortunately I don't see any good way for it without heavy compiler's support (the library should somehow get the references to the local variables from the input string or the object the string is going to be converted to), so I cannot really propose anything better.

@alrz
Copy link
Contributor Author

alrz commented Oct 14, 2015

@vladd not silently, you have to use regular expression literal syntax: ~"regex".

@bondsbw
Copy link

bondsbw commented Nov 5, 2016

Recent proposals on wildcards (#14794, #14862) use the underscore character _ as a single wildcard. As noted in those discussions, **, ***, .., and ... for sequence wildcards are now inconsistent.

Double-underscore __ was proposed but is difficult to distinguish visually from a single underscore _ because most fonts do not provide any whitespace between them.

I support something like @AdamSpeight2008's proposed syntax for sequence wildcards. Though I prefer a single (not double) underscore character followed by the repetition character. And I added _? for zero-or-one:

tuple is (1, _*) // tuple is oneple (1), or (1, _), or (1, _, _), or (1, _, _, _), etc.
tuple is (1, _+) // tuple is (1, _, _*)
tuple is (1, _?) // tuple is oneple (1) or (1, _) only

This syntax could also apply to sequence wildcards for use in lambda argument lists (#15027):

Loaded += _* => { /* not using the arguments */ };
Func<bool> f1 = _* => true;
Func<int, bool> f2 = _* => true;
Func<int, int, bool> f3 = _* => true;

@alrz
Copy link
Contributor Author

alrz commented Mar 21, 2017

Out of date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants