Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Active Patterns #277

Closed
alrz opened this issue Mar 16, 2017 · 26 comments
Closed

Discussion: Active Patterns #277

alrz opened this issue Mar 16, 2017 · 26 comments

Comments

@alrz
Copy link
Contributor

alrz commented Mar 16, 2017

Active Patterns

Original proposal: dotnet/roslyn#9005

Summary

Lets you define customs patterns to be used in pattern-matching constructs.

Proposal

An active pattern will be declared as a bool-returning extension or instance method,

public static bool Integer(this string str) => int.TryParse(str, out _);

if (str is Integer()) {
  Console.WriteLine($"{str} is an integer.");
}

An active pattern can be parameterized as well,

public static bool Regex(this string str, string pattern) { .. }

switch (str) {
  case Regex("\\d+"):
    Console.WriteLine($"{str} is an integer.");
    break;
  case Regex("\\w+"):
    Console.WriteLine($"{str} is a word.");
    break;
}

You can define output parameters to be matched against other patterns.

public static bool Regex(this string str, string pattern, out string[] captures) { .. }

if (str is Regex("(\\d)([A-Za-z])", [var number, var letter])) {
    Console.WriteLine($"number: {number}, letter: {letter}");
}

In order to be able to use active patterns, we will need to unify pattern and expression syntax. As long as we define the pattern syntax as a subset of expression syntax, there wouldn't be a special parsing rule for patterns and all the other requirements can be checked after this phase.

Note: These are not the exact production rules as they do not represent the precedence.

Syntax Pattern Expression Proposal
type identifier type pattern declaration expression
var identifier var pattern declaration expression
var! identifier var pattern (non-null) declaration expression #306
var ( ... ) deconstruction pattern declaration expression
( argument-list ) tuple pattern tuple literal
[ expression-list ] list pattern list literal dotnet/roslyn#6949
{ key-value-list } indexer pattern
property pattern
dictionary literal dotnet/roslyn#6949
expression as identifier as pattern as expression
expression | expression or pattern logical or #118
expression & expression and pattern logical and #118
( expression ) parenthesized pattern parenthesized expression
expression .. expression range pattern range literal #198
out identifier out pattern out argument dotnet/roslyn#13400

This whole construct can be represented as an "expression pattern" in the syntax tree.

list pattern

It's tempting to use collection initializer's syntax for list patterns but it would be ambigious for an empty list e.g. () => {}. However, using brackets as suggested in dotnet/roslyn#6949 can resolve that issue.

indexer and property patterns

A key-value is defined as a pair of expressions separated by a colon: expression : expression. In a property pattern we require the LHS to be an identifier, and in an indexer pattern we require the LHS be a constant. We decide which pattern is intended if the target expression is pattern compatible with either of patterns. It'd be nice if we could use indexer initializer syntax for indexer patterns: { [c]: p } but there is not any corresponding expression form for that syntax since we're using colons.

as pattern

As currently proposed, the as pattern variable does not need any token, that in particular makes it hard to keep patterns and expressions' syntax in sync. A disadvantage of using as is that in the expression form the RHS is a type and reusing it like this might be confusing. Other languages use sigils like @ which might be preferable over as.

out pattern

Originally proposed in dotnet/roslyn#13400, lets you bind the target value to an existing variable. It's useful to define active patterns that defined as another pattern match.

public static bool Member(this Expression @this, out Expression expr, out string memberName) {
  return @this is MemberExpression { Expression: out expr, Member: { Name: out memberName } };
}


switch (expr) {
  case Member(_, var memberName):
    break;
}

As for parsing, out arguments should be promoted as a general expression just like ref expressions, but unlike ref expressions, it would be illegal to use an out expression outside patterns or argument lists.

composite patterns

Note that type, property, indexer and tuple patterns can appear in a single pattern,

if (e is Point(X: 3, Y: 4) {Length: 5})

We parse that as an object-creation-expression without the new keyword. Said construct is proposed as a type invocation in the records proposal. The property pattern part depends on dotnet/roslyn#6949 as it aims to extend object initializers to accept json-like dictionaries.

Unresolved questions

What syntax we should be using for the declaration-site? As @HaloFour and @svick pointed out, a regular method can make a lot of meaningless patterns, for instance:

HashSet<int> set =;
if (set is Add(42))

Suggested alternatives

  • An operator: static bool operator is Integer(..)
  • A modifier on method like match or pattern
  • An attribute on method like [Pattern] (similar to Extension in VB)
  • A name suffix: static bool IntegerPattern(..)
  • A name prefix: static bool IsInteger(..)

Naming-conventions would permit regular usage of said method. The suffix/prefix is required for the method to be used as a pattern but the suffix/prefix would be trimmed in the use-site, e.g. is Integer(..).

@DavidArno
Copy link

This is - by an order of magnitude - my favourite proposal on this repo. 🍾

@YaakovDavis
Copy link

YaakovDavis commented Mar 16, 2017

Typically, methods returning bool are prefixed with Is, Are, etc., for obvious reasons.

Here, odd names (e.g. Integer) are used, just to be able to use the method as a pattern. Thus, the method becomes unusable on its own (you wouldn't expose it in an API).

The above examples could be as easily, and more naturally expressed as extension methods, with proper names:

public static bool IsInteger(this string str) => int.TryParse(str, out _);

...

if("34".IsInteger()) {...}

Also, x is Foo() where Foo is a method, looks like you're deconstructing x to the result of Foo().

@HaloFour
Copy link
Contributor

@YaakovDavis

Here, odd names (e.g. Integer) are used, just to be able to use the method as a pattern. Thus, the method becomes unusable on its own (you wouldn't expose it in an API).

The is comes from the keyword. x is IsInteger would be an awful API also.

The above examples could be as easily, and more naturally expressed as extension methods, with proper names

By doing so you'd immediately lose all of the benefits of recursive patterns, both in matching the output of the active pattern and in using the active pattern within another pattern. That's the entire point of this proposal.

@alrz
Copy link
Contributor Author

alrz commented Mar 16, 2017

@YaakovDavis

I think naming concern is the least important here since we are talking about the general mechanism to be able to define custom patterns. You can group them in a separate class to be not confused with general-purpose extensions.

As for your "32.IsInteger()" example, active patterns would allow you to define a "pattern" so in case you want to match a value against multiple patterns you can use switch instead of an if else chain.

Another advantage is that if leaks variable scope while case does not.

@HaloFour
Copy link
Contributor

I have reservations regarding basing active patterns on any bool-returning resolvable instance/extension method. Trying to match against an arbitrary type would likely produce a larger list of potential "patterns" which are not very appropriate as patterns. My preference is still based on custom extension is operators:

public static class StringPatterns {
    public static bool operator is Integer(this string input) {
        return int.TryParse(input, out var _);
    }

    public static bool operator is Regex(this string input, string pattern, out string[] captures) {
        Match match = Regex.Match(input, pattern);
        if (match.Success) {
            captures = new string[match.Groups.Count];
            for (int i = 0; i < match.Groups.Count; i++) {
                captures[i] = match.Groups[i].Value;
            }
            return true;
        }
        captures = null;
        return false;
    }
}

That would give the methods a specific shape that would help the compiler/Intellisense narrow down the curated patterns, e.g. StringPatterns.is_Integer and StringPatterns.is_Regex.

With the Regex example specifically there'd have to be specific rules regarding resolution since Regex is a type and is Regex would likely have to attempt to match against that type first.

@YaakovDavis
Copy link

YaakovDavis commented Mar 16, 2017

@HaloFour @alrz

Let's break this apart:
Matching against invocations is indeed desired. We can achieve this by either:

  • Treating any bool returning method as a pattern, or,
  • Having a dedicated syntax for invocation-matching.

Option 1 has the following disadvantages:

  1. The method is anyway unusable on its own, as a method, due to the odd name.
  2. The syntax suggests the matching is done against the method's result value, while this is not the case.
  3. As @HaloFour noted, it will introduce a large, possibly-unwanted list of potential matches.

A dedicated syntax instead might better communicate the feature. It could look something like the following:

public static (string str) is Regex(string pattern) => ...

The above syntax opens the door for other cases as well:

public static (A a, B b) is Foo(C c, D d) => ...

@alrz
Copy link
Contributor Author

alrz commented Mar 16, 2017

@HaloFour

With the Regex example specifically there'd have to be specific rules regarding resolution since Regex is a type and is Regex would likely have to attempt to match against that type first.

That's the responsibility of binding in the use-site, so even with operator is syntax it'll be resolved as the type. As long as there is no Regex type in the scope with a Deconstruct method with two parameters, it will be resolved as the Regex method.

Trying to match against an arbitrary type would likely produce a larger list of potential "patterns" which are not very appropriate as patterns.

I don't think introducing a new syntax to address tooling concerns is a sensible approach. An attribute on StringPatterns class would just work there.

@HaloFour
Copy link
Contributor

@YaakovDavis

Instead, a dedicated syntax for bool deconstruction might better communicate the feature

Deconstruction has already taken the shape of the Deconstruct method and per LDM decisions during C# 7.0 they left the door open for allowing those methods to be conditional by requiring the current unconditional methods return void. As of now I'm assuming that such patterns would be based on the syntax:

public static class Polar {
    public static bool Deconstruct(this Coordinates coordinates, out double r, out double theta) {
        r = Math.Sqrt(coordinates.X * coordinates.X + coordinates.Y * coordinates.Y);
        theta = Math.Atan2(coordinates.Y, coordinates.X);
        return coordinates.X != 0 || coordinates.Y != 0;
    }
}

But that comes with the following limitations:

  1. Must define new type per pattern.
  2. Can't define patterns to translate to an existing type given that extension methods must be defined on static classes. If you already had Coordinates and Polar classes you'd need a separate type to act as the pattern just to contain the extension method.
  3. There's no mechanism for providing any input to the pattern.

That was the impetus behind the referenced proposal on the Roslyn repo.

Also, the LDM already considered and dismissed using tuples for deconstruction as that would prevent the ability to overload based on the number of elements. That same problem would apply here.

@alrz

I don't think introducing a new syntax to address tooling concerns is a sensible approach. An attribute on StringPatterns class would just work there.

It's to address tooling and compiler concerns, just as == needs a more specific method than Equals. Extension methods also would have required no syntax changes, but having them makes them a more "official" part of the language. My opinion is that active patterns should be specifically and explicitly delineated as such.

Ultimately those are minor concerns. The bigger questions I think revolve around disambiguating between an expression as input to a pattern vs. recursive constant patterns themselves. See dotnet/roslyn#9005 (comment)

@alrz
Copy link
Contributor Author

alrz commented Mar 16, 2017

@HaloFour

Regarding dotnet/roslyn#9005 (comment),

if (expr is Name(M(), M()))

After Name is resolved as an active pattern (there were no Name type with a Deconstruct method with two parameters), depending on the Name method, out parameters will be resolved as patterns and others will be resolved as expressions. Parser see the whole thing as an expression beforehand; the binding and pattern compatibility and constness requirements can be applied to the resultant expression syntax.

@YaakovDavis
Copy link

YaakovDavis commented Mar 16, 2017

@HaloFour

Deconstruct and "invocation-matching" (or "active patterns") serve different usecases. The syntax I proposed is for the latter, not the former. It's merely an alternative to treating bool methods as patterns.

@svick
Copy link
Contributor

svick commented Mar 16, 2017

To make the complaint about allowing any bool-returning instance method as a pattern more concrete, consider this code:

HashSet<int> set =;
if (set is Add(42))

This code does not make any sense, Add is not a pattern. But such code would be allowed under this proposal, which I think is a really bad idea.

@alrz
Copy link
Contributor Author

alrz commented Mar 16, 2017

@svick

Ok, I see your point. This proposal mainly aims to address expression-pattern unification. I'm open for syntax suggestions for the declaration-site. Just that I think it shouldn't be an operator as @HaloFour suggested.

@alrz
Copy link
Contributor Author

alrz commented Mar 16, 2017

I've added @svick's example as an instance of misinterpreted patterns under unresolved questions.

My suggestion is to use a Pattern suffix (just like attributes) to declare extension patterns,

public static bool IntegerPattern(this string str) { .. }

if (str is Integer()) { .. }

/cc @HaloFour

@DavidArno
Copy link

@alrz,

I'd prefer something like:

public static bool operator is Integer(this string str) { .. }

rather than a Pattern suffix, but there's probably good reasons not to do it that way.

@alrz
Copy link
Contributor Author

alrz commented Mar 16, 2017

@DavidArno My disagreement with an operator is because that this is not an operator and has a name.

@DavidArno
Copy link

@alrz,

Difficult to ague with that! 😀

@alrz
Copy link
Contributor Author

alrz commented Mar 19, 2017

Since iterator modifier is being considered for iterators I think a pattern (or whatever) modifier can be another option here. Just like iterator , pattern would restrict permitted return types i.e. bool.

@YaakovDavis
Copy link

YaakovDavis commented Mar 19, 2017

Or, if we're going the naming-convention route, why not the Is prefix?

This will make "Is" methods usable both as extensions, and as patterns.

@DavidArno
Copy link

DavidArno commented Mar 22, 2017

@YaakovDavis,

The problem with Is as a suffix is that sometimes it doesn't work semantically. For example, this Stack Overflow question would require an IsStartsWith method to be defined, which is grammatically nonsense.

@YaakovDavis
Copy link

@DavidArno

The preferred method IMO is a specialized syntax; see my proposal above.
The Is prefix convention is only proposed as an improvement upon the Pattern suffix suggested above.

@DavidArno
Copy link

@YaakovDavis,

I agree ... however new syntax is costly to implement. For example, some fancy new syntax around deconstructs would have been nice, but (I'm assuming) it was easier to implement in the messy void Deconstruct(out, out, out...) way the team did it. So <ActivePatternName>Pattern seems a good one to back if it increases the chances of someone from the team championing it and getting it added to the language.

@YaakovDavis
Copy link

YaakovDavis commented Mar 22, 2017

@DavidArno

I'm not sure I follow.
You say that IsStartsWith is nonsesnse. StartsWithPattern is equally nonsense, (as well as being misguiding).

Arguably IsX covers a larger percentage of real "is" cases than XPattern.

The first decision to be made is whether we go with a syntax or a convention.
If convention is chosen, then we need to decide between the Is prefix and the Pattern suffix, based on relevance to actual codebases.

@HaloFour
Copy link
Contributor

@YaakovDavis

It's a question of conjugation. Since patterns already are contained within an is or case clause requiring an Is prefix on the method name would make it sound weird. I agree with that position.

@alrz
Copy link
Contributor Author

alrz commented Mar 22, 2017

@HaloFour

Is prefix (or Pattern suffix) should be trimmed in the use-site, e.g. e is Integer(), just that it would be a requirement for it to be taken into account as an active pattern. Nevertheless, I think a modifier is less opinionated on how users might use these patterns eventually. Existence of the is operator is a successor to the ability of using extension patterns as regular methods i.e. using a naming-convention.

@HaloFour
Copy link
Contributor

@alrz

Yes, I realize that. But if the compiler is going to perform such trickery I'd prefer that the shape be more explicit, e.g. a Pattern suffix rather than an Is prefix.

@alrz
Copy link
Contributor Author

alrz commented Dec 4, 2017

With recent updates to recursive patterns this proposal is obsolete. I think we should do it the other way around, i.e. permit a set of expressions (constants) to be parsed as patterns and convert them to expressions after an active pattern is resolved.

This is the relevant paragraph from F# spec:

When an active pattern function takes arguments, the pat-params are interpreted as expressions that are passed as arguments to the active pattern function. The pat-params are converted to the syntactically identical corresponding expression forms and are passed as arguments to the active pattern function f.

No consideration in the current spec is required to enable this in the future, because constant patterns already cover all the expressions that we need to accept as pattern arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants