Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Champion: Name lookup for simple name when target type known #2926

Open
gafter opened this issue Nov 2, 2019 · 61 comments
Open

Champion: Name lookup for simple name when target type known #2926

gafter opened this issue Nov 2, 2019 · 61 comments

Comments

@gafter
Copy link
Member

gafter commented Nov 2, 2019

There are three contexts in which lookup of a simple (not qualified) name would benefit from the fallback of looking up the identifier in a target type.

Simple ID expression

When an expression is an unqualified identifier, and lookup fails to find any accessible binding for that identifier, it is treated as target-typed. When the target type becomes available, the identifier is looked up in the context of the target type. If the identifier in that type designates an accessible constant, static field, or static property whose type is that target type, the identifier binds to that member.

Example:

enum Color { Red, Greed }

Color M1() => Red; // OK, binds to Color.Red

int IsRed(Color c) => c switch {
    Red => true, // OK, binds to Color.Red
    _ => false,
};

Object creation expression

When an object creation expression's type is given as an unqualified identifier (possibly with type arguments), and lookup fails to find any accessible binding for that identifier, it is treated as target-typed. When the target type becomes available, the identifier is looked up in the context of the target type (and the rest of the creation expression is bound, like the target-typed new expression). If the identifier in that type designates an accessible type that has an implicit reference conversion to that target type, the identifier binds to that type.

Example:

class Outer
{
    public class Inner1: Outer { }
    public class Inner2<T>: Outer { }
}

Outer M1() => new Inner1(); // binds to Outer.Inner1
Outer M2() => new Inner2<int>(); // binds to Outer.Inner2<T>

Type in a pattern

When the type appearing in a pattern is given as an unqualified identifier (possibly with type arguments), and lookup fails to find any accessible binding for that identifier, the identifier is looked up in the context of the pattern's input type. If the identifier in that type designates an accessible type, the identifier binds to that accessible type.

Example:

class Outer
{
    public class Inner1: Outer { }
    public class Inner2<T>: Outer { }
}

int M1(Outer o) => o switch
{
    Inner1 => 1,          // ok, binds to Outer.Inner1
    Inner2<int> => 2, // ok, binds to Outer.Inner2<T>
    _ => 3,
};
bool M2(Outer o) => o is Inner1;  // ok, binds to Outer.Inner1
bool M3(Outer o) => o is Inner2<int>;  // ok, binds to Outer.Inner2<T>

Design Meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-09-26.md#discriminated-unions

@gafter gafter self-assigned this Nov 2, 2019
@gafter
Copy link
Member Author

gafter commented Nov 2, 2019

/cc @agocke @MadsTorgersen

@gafter gafter added this to TRIAGE NEEDED in Language Version Planning Nov 2, 2019
@DaZombieKiller
Copy link
Contributor

So with this, would I be able to do something akin to GetMethod("Name", Public | Instance) without using static System.Reflection.BindingFlags;?

@huoyaoyuan
Copy link
Member

Quite useful, but what about the compiler overhead of overload resolution?

enum E1 { EA, EB, EC }
enum E2 { EA, EB }

void M(E1 e, int x);
void M(E2 e, string x);

M(EA, someParameter);

Does this meet the similar overload resolution algorithm when using number literals?

@mattwar
Copy link

mattwar commented Nov 11, 2019

Greed is the color of money.

@gafter gafter moved this from TRIAGE NEEDED to 9.0 Candidate in Language Version Planning Nov 11, 2019
@gafter gafter added this to the 9.0 candidate milestone Nov 11, 2019
@alrz
Copy link
Contributor

alrz commented Nov 12, 2019

Public | Instance

As per current proposal, I think you would need to qualify at least one operand,

GetMethod("Name", BindingFlags.Public | Instance)

In order to omit type for all of them, the operator | itself needs to get target-typed.

@333fred
Copy link
Member

333fred commented Nov 12, 2019

That's exactly correct, and we actually brought up the | operator in the meeting when talking about this.

@reflectronic
Copy link

@alrz
Copy link
Contributor

alrz commented Dec 5, 2019

I've encountered with an issue with using static,

case IBinaryOperation { OperatorKind: Equals } op:

Equals binds to the class member and produces an error.

For target-typed lookup, would it make sense to alter the spec as follow?

When an expression is an unqualified identifier, and lookup fails to find any valid accessible binding for that identifier, it is treated as target-typed.

The lookup itself would have no information about the context and we'll need to add a new bottleneck to check the validity which is defined as being a constant of a particular type in this case.

@gafter
Copy link
Member Author

gafter commented Dec 5, 2019

@alrz We would need to define a concept of validity for every context in which an expression could appear. I don't think it is worth it.

@alrz
Copy link
Contributor

alrz commented Dec 6, 2019

We would need to define a concept of validity for every context in which an expression could appear.

To avoid that, as a simpler rule, we could say if an identifier is already found but the conversion fails, we treat as target-typed. In other words, it's defined as the fallback conversion iif the source expression is a simple identifier. Then it wouldn't depend on the context.

In the example above, method group fails to convert to enum, so it will be treated as target-typed.

edit: I realized the above might be a breaking change in some scenarios involving overload resolution, candidates that were not applicable becomes applicable and could change program's behaviour.

@gafter
Copy link
Member Author

gafter commented Feb 27, 2020

To avoid that, as a simpler rule, we could say if an identifier is already found but the conversion fails, we treat as target-typed. In other words, it's defined as the fallback conversion iif the source expression is a simple identifier. Then it wouldn't depend on the context.

That would be a breaking change. In an expression such as M(Name, e2, e3), where M is overloaded, there might be some M in the method group whose first argument would not accept the "normal" meaning of Name. But with this new rule that method would become a candidate, and that could change the result of overload resolution. I don't think we should do that.

@alrz
Copy link
Contributor

alrz commented Feb 27, 2020

In a design note there was a consideration to separate out overload resolution in target-typing scenarios to be able to tweak conversions without breaking change. I think this one would be another case where that could help.

https://github.com/dotnet/csharplang/blob/master/meetings/2019/LDM-2019-10-28.md

I think the situation is almost the same as short x = b ? 1 : 2; for target-typed ?: expressions.

@jcouv jcouv moved this from 9.0 Candidate to 10.0 Candidate in Language Version Planning Apr 22, 2020
@jcouv jcouv modified the milestones: 9.0 candidate, 10.0 candidate Apr 22, 2020
@gafter
Copy link
Member Author

gafter commented Jul 27, 2020

I think the situation is almost the same as short x = b ? 1 : 2; for target-typed ?: expressions.

No, this is give a meaning to code that was definitely an error before. There is no breaking change in that case.

@alrz
Copy link
Contributor

alrz commented Jul 28, 2020

No, this is give a meaning to code that was definitely an error before. There is no breaking change in that case.

I was responding to:

That would be a breaking change. In an expression such as M(Name, e2, e3)

Target-typing Name would be breaking as much as target-typing the ternary in M(b?1:2, e2, e3). I think whatever semantic is being used there, can be also applied to names so that the example above with Equals gets target-typed instead of an error.

@gafter
Copy link
Member Author

gafter commented Jul 28, 2020

That would be a breaking change. In an expression such as M(Name, e2, e3)

If Name bound to something before, then it would not be target-typed. If it did not bind to something before, then this was illegal code. Either way, it isn't a breaking change.

@mgravell
Copy link
Member

mgravell commented Aug 11, 2020

The idea of something being treated as target-typed only when the identifier is unqualified is IMO very dangerous, as the simple fact of adding a using directive (for something unrelated), or a library update adding a new type into a namespace you are already have a using directive for, can suddenly and silently change the meaning of the code. The above things already happen today, but the compiler flags them as ambiguous and makes you qualify/disambiguate - but that wouldn't be the case here. Sometimes it will fail, but at least some of the examples above could work, meaning: the code is unexpectedly and unobviously changed in ways that are almost impossible to detect during PR review.

Any "fallback" usage has this problem.

Some of the examples look fairly resilient to this, agreed. But not all. Consider the following, inspired by the examples in the top post:

class Outer
{
    public interface IFoo {}
}
bool M4(Outer o) => o is IFoo; 

The intent here is to to test for Outer.IFoo via target typing. Then someone adds IFoo in a <PackageReference> update that brings IFoo into scope (via a pre-existing using). So IFoo is no longer unqualified, and it is no longer target typed. The meaning is changed, and you won't be told.

I wonder whether something as simple as saying "leading period means target typed" is a pragmatic solution here:

bool M5(Outer o) => o is .IFoo; 

@gafter
Copy link
Member Author

gafter commented Aug 11, 2020

@mgravell This is something that we had discussed. I apologize that the discussion did not take place here. The expression form (as opposed to the type form) is less susceptible to this issue, so we might consider target-typing only the expression form.

Unfortunately, a straightforward application of the language tactic you suggest doesn't work, as it introduces an ambiguity. There are two different syntactically valid parses of the following:

_ = A ? . B ? . C : D;

(one of the ? is part of a ?: and one is part of a ?. but there is no way to tell which is which)

We could probably make a grammar that recognizes that . identifier can never appear in a receiver position (i.e. on the left of a ?. or . operator) but that might be pretty complex and the parser might require some lookahead. I have not tried to work out the details.

@alrz
Copy link
Contributor

alrz commented Aug 12, 2020

The expression form (as opposed to the type form) is less susceptible to this issue, so we might consider target-typing only the expression form.

Because we're target-typing only as the last resort after any other binding rule has failed, the issue may be still pretty much present. Especially if we don't do anything about #2926 (comment). You may introduce any identical name in the scope in any possible way and all the target-typed identifiers will break. I think having a prefix could help to ensure that an identifier is target-typed regardless of anything else. I think ::ident would be a good alternative since it is being used for a similar purpose in C++ (skips to global scope rather than the target-type, but we already have a syntax for that global::.)

@gafter
Copy link
Member Author

gafter commented Aug 13, 2020

In my opinion a problem of this sort arising from this feature would be rare and if it happens would probably bring in a value of the wrong type. You would not give a variable of type Color the name Red when there is a constant Color.Red with a different value.

@ghost
Copy link

ghost commented Sep 2, 2020

I guess Flags enums could have this implicit conversion:

E e = { Flag1, Flag2 };

@333fred 333fred removed this from the 10.0 candidate milestone Sep 9, 2020
@TahirAhmadov
Copy link

TahirAhmadov commented May 6, 2022

The obvious and simple improvement around enums and potentially static readonly fields and constants (like Color.Blue etc.) would be greatly appreciate, even if we don't tackle the complex scenarios at the same time.

PS. Somewhat esoteric - this wouldn't work with OpCodes, probably. Here you have static readonly fields, but their type is OpCode; C# cannot tie OpCodes fields to OpCode - unless we come up with an attribute of some sort?

@CyrusNajmabadi
Copy link
Member

even if we don't tackle the complex scenarios at the same time.

The scenarios you mentioned are complex :)

@jnm2
Copy link
Contributor

jnm2 commented May 7, 2022

PS. Somewhat esoteric - this wouldn't work with OpCodes, probably. Here you have static readonly fields, but their type is OpCode; C# cannot tie OpCodes fields to OpCode - unless we come up with an attribute of some sort?

Static extension methods might provide this for free? (Once you define a ton of extension methods on OpCode)

@TahirAhmadov
Copy link

Static extension methods might provide this for free? (Once you define a ton of extension methods on OpCode)

If I understand you correctly, wouldn't we want to use static extension properties?

@TahirAhmadov
Copy link

even if we don't tackle the complex scenarios at the same time.

The scenarios you mentioned are complex :)

I know, I know, everything is complex :) I was wondering if we could limit the scope to just enums to begin with - surely that's not as complicated as static fields and such?

@333fred
Copy link
Member

333fred commented May 7, 2022

I know, I know, everything is complex :) I was wondering if we could limit the scope to just enums to begin with - surely that's not as complicated as static fields and such?

Enums are static fields.

@floatms
Copy link

floatms commented Jun 8, 2022

@gafter @CyrusNajmabadi I just wanted to follow up on the ambiguity in this example:
_ = A ? . B ? . C : D;
So the problem here is that both of these parses are syntactically valid, correct? (parenthesis added for clarity)
_ = (A?.B) ? .C : D;
_ = A ? (.B?.C) : D;
But couldn't the ambiguity be eliminated by simply not lexing ? . (notice the whitespace) as ?. ?
In that case only one parse would be valid:

_ = A ? (.B?.C) : D;
And having whitespace in this expression: .B?.C would simply result in a syntax error.

Note that I'm not necessarily saying that this is the right thing to do.
Thinking about the implementation, this change would probably only affect the lexer and not event touch the parser.
Of course, if the dotless syntax is accepted then this is not important anyway.

Edit: I think I might have gotten this completely wrong. Making a seemingly small and simple syntax rule change like above might not be enough to solve this.
The point bellow is still worth reading in my opinion.

Another interesting case to consider is if the name lookup could be extended to also include static methods / properties of
the target type.
Consider this example:

public record struct Rect(int X, int Y, int Width, int Height) {
    public static Rect FromFoo(FrameworkFoo.Rect rect)
        => new(X: rect.Left, Y: rect.Top, Width: rect.Right - rect.Left, Height: rect.Bottom - rect.Top);
}

// Assume that this method exists: void DrawRect(Rect rect)

FrameworkFoo.Rect rect = new(Left: 50, Right: 100, Top: 25, Bottom: 125);
// Here, the method is looked up based on the target type. I used dot syntax, but dotless could also work
// (but dotless would probably cause more ambiguities in this case).
DrawRect(.FromFoo(rect));

This example was just of the top of my head, but I encounter this kind of code a lot when dealing
with types with lots of static functionality. I'm not saying that this is the primary motivating example, but it would probably
result in less friction when using static methods / extensions (and basically all the same benefits that this proposal would bring to the original three cases).
This would expand the scope of this proposal quite a bit, so I'm not saying that this is a feature that is absolutely necessary.

@gafter
Copy link
Member Author

gafter commented Jun 22, 2022

CyrusNajmabadi assigned CyrusNajmabadi and unassigned gafter on Apr 26

😞

Championing this myself. Specifically though, through a syntactic form that is unambiguous. E.g. (with a dot)

It is the version with a dot that has a syntactic ambiguity. The version without a dot (as in the OP) does not.

@CyrusNajmabadi
Copy link
Member

It is the version with a dot that has a syntactic ambiguity.

Yes, i mentioned that below:

Note: while this form has an ambiguity with a?.b i think that's acceptable and we can say in that case you need to write a ? (.b) : (.c)

I'm not saying it's exactly this form. I'm saying i'm willing to champion a form that has no syntactic ambiguity but has some syntactic difference beyond just a simple name. Thanks!

@cytoph
Copy link

cytoph commented Jun 22, 2022

I have to say, that I don't really get why syntactic ambiguity is so much of a problem here. So meticulously as this topic is approached here, one could argue that properties or fields should also always have a dot in front, when used, to show that they are properties/fields and not class or variable names. But for that kind of ambiguity there is the this or base keyword. So why is that approach not applicable to enum members, too? If the enum member is used only by its name and there is a member, variable or whatever that has the same name, the compiler throws an error and asks to explicitly put the enums name in front of the member name, to clarify what exactly is used in that case (like with CS0229).

Or am I totally missing a point here? 🤔

@Logerfo
Copy link
Contributor

Logerfo commented Jun 22, 2022

@cytoph I'm afraid that would be a breaking change, since code that already has an enum member identified exactly as some other accessible identifier at the same context doesn't issue a warning today, but will issue with the new compiler.

@cytoph
Copy link

cytoph commented Jun 22, 2022

@Logerfo Ah, I think I get what you mean.

In the following case (which is perfectly fine and clear with the current compiler, but with the changes I proposed) the compiler wouldn't be able to tell exactly what I want to assign to the variable test (the value of the property Value1 or the enum value Value1). That is what you meant (or at least an example for what you meant), right?

enum MyTestEnum
{
    Value1, 
    Value2
} 

class MyTestClass
{
    public MyTestEnum Value1 { get; set; }

    public void MyTestMethod() 
    {
        MyTestEnum test = Value1; // here would be the problem
    } 
} 

@sab39
Copy link

sab39 commented Jun 22, 2022

Surely if we want this to be a non-breaking change then it's a no-brainer to pick the non-breaking interpretation, that this new interpretation only applies if the name isn't found any other way. Also the new interpretation should only apply if the enum value name is the entire expression; MyTestEnum test = Value2.Something() would not be valid if the only Value2 in scope is the one in MyTestEnum.

@cytoph
Copy link

cytoph commented Jun 22, 2022

@sab39 That would also fit the way as it works with properties/fields and variables: If both are named exactly the same, it's always assumed the variable is used, unless of course you specifically use the this keyword. And I think that would be a perfectly fine solution to this topic, too, and it would ensure that there are no breaking changes in existing code. If there's an ambiguity, everything else is picked above the enum member. Only if the enum members name is distinctly identified in the current context, it is used as is.

@gafter
Copy link
Member Author

gafter commented Jun 23, 2022

I think you folks are missing a crucial point. This is not a semantic ambiguity, where we know how to parse the code and just need to do name lookup to decide between one interpretation or another. This is a syntactic ambiguity while trying to parse the code (figure out which punctuation means what). Parsing occurs long before name lookup, and cannot depend on name lookup.

@tuscen
Copy link

tuscen commented Jul 26, 2022

Hi. Would this work for cases when the outer type is used as a return type?

Consider

public abstract record Result<TR, TE>
{
    public sealed record Ok(TR Value) : Result<TR, TE>;
    public sealed record Error(TE Value) : Result<TR, TE>;
}

Will this proposal allow me to write some thing like that?

Result<int, string> Validate(int value) => value switch
{
    > 100 => Ok(value),
    _ => Error("Some error text")
};

var result = Validate(101);

int val = result switch
{
    Ok(var value) => value,
    Error(var value) => throw new InvalidOperationException(value)
};

@Richiban
Copy link

Richiban commented Jul 27, 2022

@tuscen I would assume so, since the feature is likely to be built upon the existing target typing rules and return type is considered there. For example, this compiles fine:

public class C 
{
    public object M() 
    {
        return new();
    }
}

@AlexeiScherbakov
Copy link

AlexeiScherbakov commented Aug 21, 2022

Today "using static" solves this problem. If you have to many "using static" - may be it is a time to split code to different classes? :-D

Also in this thread I see also problem "simplification of generic names when taget type known". This is other than initial problem!

@edward-a
Copy link

edward-a commented Jan 1, 2023

Today "using static" solves this problem.

No, it doesn't. Consider you initialize GUI tree in code and have HorizontalAlignment and VerticalAlignment enums both having Center entry.

@edward-a
Copy link

edward-a commented Jan 1, 2023

I think you folks are missing a crucial point. This is not a semantic ambiguity, where we know how to parse the code and just need to do name lookup to decide between one interpretation or another. This is a syntactic ambiguity while trying to parse the code (figure out which punctuation means what). Parsing occurs long before name lookup, and cannot depend on name lookup.

Interpretation-wise, do you care if it's a local variable or a class member (assuming there is no this. prefix)? If not, then i don't see why you would care if it's an enum value. Hope you elaborate.

@agocke
Copy link
Member

agocke commented Mar 31, 2023

If we end up making _ a reserved keyword in the language, theoretically I think the _.Name syntax would be viable for this feature.

@aradalvand
Copy link
Contributor

aradalvand commented Nov 28, 2023

I would very much prefer explicitness in this case — i.e. use a leading dot (a la Swift) to denote "target-typed" for enums.

Makes the rules more straightforward. It also makes things less ambiguous and makes code easier to scan in PRs, etc.

@gafter
Copy link
Member Author

gafter commented Jun 24, 2024

I would very much prefer explicitness in this case — i.e. use a leading dot (a la Swift) to denote "target-typed" for enums.

Makes the rules more straightforward. It also makes things less ambiguous and makes code easier to scan in PRs, etc.

See #2926 (comment) for a discussion explaining why that proposal leads to a syntactic ambiguity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests