Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deconstruction assignment: basic implementation #11384

Merged
merged 7 commits into from
May 19, 2016
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
180 changes: 180 additions & 0 deletions docs/features/deconstruction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@

Deconstruction
--------------

This design doc will cover two kinds of deconstruction: deconstruction into existing variables (deconstruction-assignment) and deconstruction into new variables (deconstruction-declaration).
It is still very much work-in-progress.

Here is an example of deconstruction-assignment:
```C#
class C
{
static void Main()
{
long x;
string y;

(x, y) = new C();
System.Console.WriteLine(x + "" "" + y);
}

public void Deconstruct(out int a, out string b)
{
a = 1;
b = ""hello"";
}
}
```

Treat deconstruction of a tuple into existing variables as a kind of assignment, using the existing AssignmentExpression.


###Deconstruction-assignment (deconstruction into into existing variables):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"into into"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

This doesn't introduce any changes to the language grammar. We have an `assignment-expression` (also simply called `assignment` in the C# grammar) where the `unary-expression` (the left-hand-side) is a `tuple-literal`.
In short, what this does is find a `Deconstruct` method on the expression on the right-hand-side of the assignment, invoke it, collect its `out` parameters and assign them to the variables on the left-hand-side.

The existing assignment binding currently checks if the variable on its left-hand-side can be assigned to and if the two sides are compatible.
It will be updated to support deconstruction-assignment, ie. when the left-hand-side is a tuple-literal/tuple-expression:

- Each item on the left needs to be assignable and needs to be compatible with corresponding position on the right
- Needs to handle nesting case such as `(x, (y, z)) = M();`, but note that the second item in the top-level group has no discernable type.

The lowering for deconstruction-assignment would translate: `(expressionX, expressionY, expressionZ) = (expressionA, expressionB, expressionC)` into:

```
tempX = &evaluate expressionX
tempY = &evaluate expressionY
tempZ = &evaluate expressionZ

tempRight = evaluate right and evaluate Deconstruct

tempX = tempRight.A (including conversions)
tempY = tempRight.B (including conversions)
tempZ = tempRight.C (including conversions)

“return/continue” with newTupleIncludingNames tempRight (so you can do get Item1 from the assignment)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for an ordinary assignment, result includes conversions. It is not clear whether this is true for this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean what is the return type of the expression, that is an open issue for Friday.
Some options: returning the result of M(), returning a new tuple, returning void, or making the whole thing a statement instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.


In reply to: 63822866 [](ancestors = 63822866)

```

The evaluation order for nesting `(x, (y, z))` is:
```
tempX = &evaluate expressionX

tempRight = evaluate right and evaluate Deconstruct

tempX = tempRight.A (including conversions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels unexpected, should we evaluate all side-effects for expressions on the left before evaluating the expression on the right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, I think one reason I thought this order made sense was it's strange to evaluate the side-effects of y and z before M() or the first Deconstruct is called. I'm not sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, the intuition behind the above was the following:

  1. (x, temp) = M(); (including Deconstruct call and conversions)
  2. (y, z) = temp; (including Deconstruct call and conversions).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the temp is an implementation detail.


In reply to: 63910391 [](ancestors = 63910391)

Copy link
Member

@VSadov VSadov May 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the confusing part is whether nested parens on the left create or do not create a variable.

At one extreme, parens in (a, (b, c)) are meaningless and that means the same as (a,b,c). In such case it is not clear why nesting even allowed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, nesting is useful as a short-hand for two deconstruct-assignments. But Aleksey makes a good point that in general assignments evaluate everything on the left first, then everything on the right, etc.
This topic is on the list for tomorrow.

tempLNested = tempRight.B (no conversions)

tempY = &evaluate expressionY
tempZ = &evaluate expressionZ

tempRNest = evaluate Deconstruct on tempRight

tempY = tempRNest.B (including conversions)
tempZ = tempRNest.C (including conversions)

```

The evaluation order for the simplest cases (locals, fields, array indexers, or anything returning ref) without needing conversion:
```
evaluate side-effect on the left-hand-side variables
evaluate Deconstruct passing the references directly in
```

Note that the feature is built around the `Deconstruct` mechanism for deconstructing types.
`ValueTuple` and `System.Tuple` will rely on that same mechanism, except that the compiler may need to synthesize the proper `Deconstruct` methods.

Copy link
Contributor

@AlekseyTs AlekseyTs May 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent to allow explicit calls to the Deconstruct method in source? If not then, perhaps it would be simpler to just specially recognize the types (without relying on the presence of the synthesized method) and let the lowering to transfer element values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, between synthesizing Deconstruct or having special handling for tuples here, we need to special-case tuples anyways. I don't think anyone needs to manually call Deconstruct in source (in fact that would cause more trouble, if you wanted to assign it into a delegate for example).

I'm thinking to maybe even go further and change the Deconstruct method to return a tuple instead of out parameters. This would make sense in the case we want the expression to return a tuple too. It's on the agenda for LDM.


**Work items, open issues and assumptions to confirm with LDM:**

- I assume this should work even if `System.ValueTuple` is not present.
- How is the Deconstruct method resolved?
- I assumed there can be no ambiguity. Only one `Deconstruct` is allowed (in nesting cases we have no type to guide the resolution process).
- But we may allow a little bit of ambiguity and preferring an instance over extension method.
- Do the names matter? `int x, y; (a: x, b: y) = M();`
- Can we deconstruct into a single out variable? I assume no.
- I assume no compound assignment `(x, y) += M();`
- [ ] Provide more details on the semantic of deconstruction-assignment, both static (The LHS of the an assignment-expression used be a L-value, but now it can be L-value -- which uses existing rules -- or tuple_literal. The new rules for tuple_literal on the LHS...) and dynamic.
- [ ] Discuss with Aleksey about "Deconstruct and flow analysis for nullable ref type"
- [ ] Validate any target typing or type inference scenarios.
- The deconstruction-assignment is treated separately from deconstruction-declaration, which means it doesn't allow combinations such as `int x; (x, int y) = M();`.

###Deconstruction-declaration (deconstruction into new variables):

```ANTLR
declaration_statement
: local_variable_declaration ';'
| local_constant_declaration ';'
| local_variable_combo_declaration ';' // new
;

local_variable_combo_declaration
: local_variable_combo_declaration_lhs '=' expression

local_variable_combo_declaration_lhs
: 'var' '(' identifier_list ')'
| '(' local_variable_list ')'
;

identifier_list
: identifier ',' identifier
| identifier_list ',' identifier
;

local_variable_list
: local_variable_type identifier ',' local_variable_type identifier
| local_variable_list ',' local_variable_type identifier
;

foreach_statement
: 'foreach' '(' local_variable_type identifier 'in' expression ')' embedded_statement
| 'foreach' '(' local_variable_combo_declaration_lhs 'in' expression ')' embedded_statement // new
;

for_initializer
: local_variable_declaration
| local_variable_combo_declaration // new
| statement_expression_list
;

let_clause
: 'let' identifier '=' expression
| 'let' '(' identifier_list ')' '=' expression // new
;

from_clause // not sure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are doing this for foreach, probably should do this from, join, select

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Thanks for confirming.

: 'from' type? identifier 'in' expression
;

join_clause // not sure
: 'join' type? identifier 'in' expression 'on' expression 'equals' expression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an existing production, no? Are you intending to add an alternative with local_variable_combo_declaration_lhs? Here and below two productions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list above resulted from my fishing for syntax nodes that look like they declare variables and so would be candidates for deconstruction-declaration.
For those marked "not sure", I wasn't sure whether they are good candidates and what the updated syntax would be.

;

join_into_clause // not sure
: 'join' type? identifier 'in' expression 'on' expression 'equals' expression 'into' identifier
;

constant_declarator // not sure
: identifier '=' constant_expression
;
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following two contexts were also identified as being candidates for possible special treatment (but no LDM decision made):

  • Implicit anonymous function parameter list, e.g. (x, (y, z)) => ...
  • out var, e.g. M(out var (x, y))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll add those to the list I'm tracking.


Treat deconstruction of a tuple into new variables as a new kind of node (AssignmentExpression).
It would pick up the behavior of each contexts where new variables can be declared (TODO: need to list). For instance, in LINQ, new variables go into a transparent identifiers.
It is seen as deconstructing into separate variables (we don't introduce transparent identifiers in contexts where they didn't exist previously).

Should we allow this?
`var t = (x: 1, y: 2); (x: var a, y: var b) = t;`
or `var (x: a, y: b) = t;`
(if not, tuple names aren't very useful?)

- [ ] Add example: var (x, y) =
- [ ] Semantic (cardinality should match, ordering including conversion,
- [ ] What are the type rules? `(string s, int x) = (null, 3);`

- Deconstruction for `System.ValueTuple`, `System.Tuple` and any other type involves a call to `Deconstruct`.

**References**

[C# Design Notes for Apr 12-22, 2016](https://github.com/dotnet/roslyn/issues/11031)


113 changes: 111 additions & 2 deletions src/Compilers/CSharp/Portable/Binder/Binder_Statements.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1698,18 +1698,127 @@ private static bool CheckNotNamespaceOrType(BoundExpression expr, DiagnosticBag
}
}

private BoundAssignmentOperator BindAssignment(AssignmentExpressionSyntax node, DiagnosticBag diagnostics)
private BoundExpression BindAssignment(AssignmentExpressionSyntax node, DiagnosticBag diagnostics)
{
Debug.Assert(node != null);
Debug.Assert(node.Left != null);
Debug.Assert(node.Right != null);

if (node.Left.Kind() == SyntaxKind.TupleExpression)
{
return BindDeconstructionAssignment(node, diagnostics);
}

var op1 = BindValue(node.Left, diagnostics, BindValueKind.Assignment); // , BIND_MEMBERSET);
var op2 = BindValue(node.Right, diagnostics, BindValueKind.RValue); // , BIND_RVALUEREQUIRED);

return BindAssignment(node, op1, op2, diagnostics);
}

private BoundExpression BindDeconstructionAssignment(AssignmentExpressionSyntax node, DiagnosticBag diagnostics)
{
SeparatedSyntaxList<ArgumentSyntax> arguments = ((TupleExpressionSyntax)node.Left).Arguments;

// receiver for Deconstruct
var boundRHS = BindValue(node.Right, diagnostics, BindValueKind.RValue);

int numElements = arguments.Count;
Debug.Assert(numElements >= 2); // this should not have parsed as a tuple.

// bind the variables and check they can be assigned to
var checkedVariablesBuilder = ArrayBuilder<BoundExpression>.GetInstance(numElements);
for (int i = 0; i < numElements; i++)
{
var boundVariable = BindExpression(arguments[i].Expression, diagnostics, invoked: false, indexed: false);
var checkedVariable = CheckValue(boundVariable, BindValueKind.Assignment, diagnostics);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this won't handle a nested case properly. Is this scenario not supported yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The nested case is on the list in #11299.


checkedVariablesBuilder.Add(checkedVariable);
}

var checkedVariables = checkedVariablesBuilder.ToImmutableAndFree();

// symbol and parameters for Deconstruct
MethodSymbol deconstructMethod = FindDeconstruct(checkedVariables, boundRHS, node, diagnostics);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to suppress any additional diagnostics if variables or boundRHS HasErrors, but continue binding for SemanticModel benefits. I usually do this by overwriting diagnostics with new bag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if ((object)deconstructMethod == null)
{
return new BoundDeconstructionAssignmentOperator(node, checkedVariables, boundRHS, ErrorMethodSymbol.UnknownMethod,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to produce a BoundDeconstructionAssignmentOperator for an error case? Perhaps BoundBadExpression would be better, I think candidate symbols can be attached to it and SemanticModel would just do the right thing for GetSymbolInfo API. That said, we might want to keep the candidates that were considered for the Deconstruct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't have to be addressed in this PR.


In reply to: 63825210 [](ancestors = 63825210)

ImmutableArray<BoundDeconstructionAssignmentOperator.AssignmentInfo>.Empty,
ErrorTypeSymbol.UnknownResultType,
hasErrors: true);
}

var parameterTypes = deconstructMethod.Parameters.SelectAsArray(p => p.Type);
Copy link
Contributor

@AlekseyTs AlekseyTs May 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An allocation of this ImmutableArray feels unnecessary. We could index into the Parameters array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'll remove.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Thanks


// figure out the pairwise conversions
var assignmentsBuilder = ArrayBuilder<BoundDeconstructionAssignmentOperator.AssignmentInfo>.GetInstance(numElements);
for (int i = 0; i < checkedVariables.Length; i++)
{
var leftPlaceholder = new BoundLValuePlaceholder(checkedVariables[i].Syntax, checkedVariables[i].Type) { WasCompilerGenerated = true };
var rightPlaceholder = new BoundRValuePlaceholder(node.Right, parameterTypes[i]) { WasCompilerGenerated = true };

// each assignment has a placeholder for a receiver and another for the source
BoundAssignmentOperator op = BindAssignment(node, leftPlaceholder, rightPlaceholder, diagnostics);
assignmentsBuilder.Add(new BoundDeconstructionAssignmentOperator.AssignmentInfo() { Assignment = op, LValuePlaceholder = leftPlaceholder, RValuePlaceholder = rightPlaceholder });
}

var assignments = assignmentsBuilder.ToImmutableAndFree();

TypeSymbol lastType = parameterTypes[parameterTypes.Length - 1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there Last method or property?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is. I was worried it would just be the IEnumerable implementation. But ImmutableArray does have its own (via extension). Fixed.
Also, this will change once we figure out the proper return type.

return new BoundDeconstructionAssignmentOperator(node, checkedVariables, boundRHS, deconstructMethod, assignments, lastType);
}

/// <summary>
/// Find the Deconstruct method for the expression on the right, that will fit the assignable bound expressions on the left.
/// Returns true if the Deconstruct method is found.
/// If so, it outputs the method.
/// </summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the caller could unwrap the parameter types. Then this method could be changed to simply return the Deconstruct method or null.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FindDeconstruct method already already scans the parameter symbols (to check they are out) so.
I don't feel strongly either way (I don't think scanning the parameters twice would cost much).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for out parameters in FindDeconstruct could be simplified if decoupled from copying types:

if (deconstructMethod.Parameters.Any(p => p.RefKind != Out)) { ... }

And if FindDeconstruct doesn't return the parameter types, then there is no need to allocate a ImmutableArray<TypeSymbol>.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'll make the change. I think it will streamline the code.

private static MethodSymbol FindDeconstruct(ImmutableArray<BoundExpression> checkedVariables, BoundExpression boundRHS, SyntaxNode node, DiagnosticBag diagnostics)
{
// find symbol for Deconstruct member
ImmutableArray<Symbol> candidates = boundRHS.Type.GetMembers("Deconstruct");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a guarantee the Type is not null. Do we have a test case with a null literal or a lambda on the right side?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably broken. I'm already tracking this in #11299 (to verify with a tuple literal without natural type).

switch (candidates.Length)
{
case 0:
Error(diagnostics, ErrorCode.ERR_MissingDeconstruct, boundRHS.Syntax, boundRHS.Type);
return null;
case 1:
break;
default:
Error(diagnostics, ErrorCode.ERR_AmbiguousDeconstruct, boundRHS.Syntax, boundRHS.Type);
return null;
}

Symbol deconstructMember = candidates[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we checking the method is an instance method and accessible? Consider using existing Lookup methods instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is the plan for a subsequent PR.


// check that the deconstruct fits
if (deconstructMember.Kind != SymbolKind.Method)
{
Error(diagnostics, ErrorCode.ERR_MissingDeconstruct, boundRHS.Syntax, boundRHS.Type);
return null;
}

MethodSymbol deconstructMethod = (MethodSymbol)deconstructMember;
if (deconstructMethod.MethodKind != MethodKind.Ordinary)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we checking that the method is an instance method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at this point. I'll add to the TODO list.

{
Error(diagnostics, ErrorCode.ERR_MissingDeconstruct, boundRHS.Syntax, boundRHS.Type);
return null;
}

if (deconstructMethod.ParameterCount != checkedVariables.Length)
{
Error(diagnostics, ErrorCode.ERR_DeconstructWrongParams, boundRHS.Syntax, boundRHS.Type, checkedVariables.Length);
return null;
}

if (deconstructMethod.Parameters.Any(p => p.RefKind != RefKind.Out))
{
Error(diagnostics, ErrorCode.ERR_DeconstructRequiresOutParams, boundRHS.Syntax, boundRHS.Type);
return null;
}

return deconstructMethod;
}

private BoundAssignmentOperator BindAssignment(AssignmentExpressionSyntax node, BoundExpression op1, BoundExpression op2, DiagnosticBag diagnostics)
{
Debug.Assert(op1 != null);
Expand Down Expand Up @@ -3422,7 +3531,7 @@ private BoundCatchBlock BindCatchBlock(CatchClauseSyntax node, ArrayBuilder<Boun
var binder = GetBinder(node);
Debug.Assert(binder != null);

ImmutableArray<LocalSymbol> locals = binder.GetDeclaredLocalsForScope(node);
ImmutableArray<LocalSymbol> locals = binder.GetDeclaredLocalsForScope(node);
BoundExpression exceptionSource = null;
LocalSymbol local = locals.FirstOrDefault();

Expand Down
26 changes: 26 additions & 0 deletions src/Compilers/CSharp/Portable/BoundTree/BoundNodes.xml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,19 @@
<Field Name="Type" Type="TypeSymbol" Null="allow"/>
</AbstractNode>

<AbstractNode Name="BoundValuePlaceholderBase" Base="BoundExpression">
<Field Name="Type" Type="TypeSymbol" Override="true" Null="disallow"/>
</AbstractNode>

<!--
This node is used to represent an expression returning value of a certain type.
It is used to perform intermediate binding, and will not survive the local rewriting.
-->
<Node Name="BoundLValuePlaceholder" Base="BoundValuePlaceholderBase" HasValidate="true">
</Node>
<Node Name="BoundRValuePlaceholder" Base="BoundValuePlaceholderBase" HasValidate="true">
</Node>

<!-- only used by codegen -->
<Node Name="BoundDup" Base="BoundExpression">
<!-- when duplicating a local or parameter, must remember original ref kind -->
Expand Down Expand Up @@ -377,6 +390,19 @@
<Field Name="RefKind" Type="RefKind" Null="NotApplicable"/>
</Node>

<Node Name="BoundDeconstructionAssignmentOperator" Base="BoundExpression">
<!-- Non-null type is required for this node kind -->
<Field Name="Type" Type="TypeSymbol" Override="true" Null="disallow"/>

<Field Name="LeftVariables" Type="ImmutableArray&lt;BoundExpression&gt;"/>

<Field Name="Right" Type="BoundExpression"/>
<Field Name="DeconstructMember" Type="MethodSymbol" Null="disallow"/>

<!-- The assignments have placeholders for the left and right part of the assignment -->
<Field Name="Assignments" Type="ImmutableArray&lt;BoundDeconstructionAssignmentOperator.AssignmentInfo&gt;" Null="NotApplicable" SkipInVisitor="true"/>
</Node>

<Node Name="BoundNullCoalescingOperator" Base="BoundExpression">
<Field Name="LeftOperand" Type="BoundExpression"/>
<Field Name="RightOperand" Type="BoundExpression"/>
Expand Down
Loading