-
Notifications
You must be signed in to change notification settings - Fork 79
[LValue Generalization] Canonicalize MemberExprs [6/n] #1030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ode * This will support additional Node subclasses that also have child Nodes, since a Node will no longer necessarily be a child of an OperatorNode.
…into canonicalize-member-exprs
mgrang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general observation, I see that now we need to have switch-cases in all the functions which act on the PreorderAST like ConstantFold, Coalesce, etc. Maybe we should move the class-specific implementations of these functions into their respective classes. Then we could simply invoke Object->ConstantFold(), Object->Coalesce(), etc. This would avoid switch-cases and thus result in simpler code.
This can be done in a follow-up PR.
| // Nodes with two different kinds are sorted according to the order in | ||
| // which their kinds appear in this enum. | ||
| enum class NodeKind { | ||
| OperatorNode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should rename OperatorNode to BinaryOperatorNode since we now have a separate UnaryOperatorNode.
| Expr *E; | ||
|
|
||
| LeafExprNode(Expr *E, OperatorNode *Parent) : | ||
| LeafExprNode(Expr *E, Node *Parent) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the parent is now of type Node it means even a LeafExprNode can be a parent. But we should not allow parent to be a LeafExprNode. Maybe an assert and a check should be added.
clang/lib/AST/PreorderAST.cpp
Outdated
| Coalesce(I->Child, Changed); | ||
| break; | ||
| } | ||
| default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having default as the first case in a switch allows better code readability as the switch may become longer when new cases are added to it in future.
Also do we even need a default case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default case avoids a warning that would be emitted if the switch did not handle the LeafExprNode case.
clang/lib/AST/PreorderAST.cpp
Outdated
| Sort(I->Child); | ||
| break; | ||
| } | ||
| default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments on the default case here.
clang/lib/AST/PreorderAST.cpp
Outdated
| } | ||
| } | ||
|
|
||
| void PreorderAST::ConstantFoldOperator(OperatorNode *O, bool &Changed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need to change the name of this function. It can be called as ConstantFold and C++ function overloading would figure out that this function needs to be called based on the type of the first argument (which would be Node * in one function and OperatorNode * in the other).
Of course, when we move the implementation of ConstantFold into respective classes this can be invoked as Object->ConstantFold().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried renaming ConstantFoldOperator to ConstantFold and unfortunately constant folding no longer worked. It seems that ConstantFold(BinaryOperatorNode *B) will only be called on a Node * N if N is declared as a BinaryOperatorNode *. For example, I tried changing the type of PreorderAST.Root from Node * to BinaryOperatorNode * and then ConstantFold(Root) called the correct function; however, this does not work for any descendants of a BinaryOperatorNode. For example, the expression *(*(p + 1 + 2) + 3 + 4) should be constant folded to *(*(p + 3) + 7), but it is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline - the need to have two different ConstantFold and ConstantFoldOperator methods should no longer be a consideration after the follow-up work in issue #1032 to move methods such as ConstantFold to their respective Node classes. For now, the methods ConstantFold and ConstantFoldOperator can keep their respective names.
| namespace clang { | ||
| using Result = Lexicographic::Result; | ||
| class OperatorNode; | ||
| class LeafExprNode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this class declaration still needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used for the assert in the Node constructor if Parent is a LeafExprNode.
| void ConstantFold(Node *N, bool &Changed); | ||
|
|
||
| // Constant fold integer expressions within an operator node. | ||
| // Constant fold integer expressions within a binary operator node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binary operator node --> BinaryOperatorNode
clang/lib/AST/PreorderAST.cpp
Outdated
| E = OParent->Children.end(); I != E; ++I) { | ||
| if (*I == O) { | ||
| OParent->Children.erase(I); | ||
| for (auto I = BParent->Children.begin(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because BParent is mutated in the loop we need to compute its end each time through the loop. Currently we are not doing so because we compute E = BParent->Children.end() in the loop initializer. So the correct initializer should be as follows:
for (auto I = BParent->Children.begin(); I != BParent->Children.end(); ++I)
Could you also add a comment here indicating that the reason we need to evaluate loop end each time is because the container is mutated in the loop?
For more details see https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop
mgrang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
sulekhark
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the inlined review comments, I have a few general comments:
- I noticed one issue that seems to have been present before the work on this PR started: the use of
CanCoalesceNodeis sub-optimal. Every call toCoalesceNodeis preceded by a call toCanCoalesceNodebutCanCoalesceNodeis called again as soon asCoalesceNodeis entered. The logic ofCanCoalesceNodeis also a bit clunky - I can explain offline in more detail. - Most importantly, we need to discuss the following design/implementation issue: It is difficult to go from the canonical form of a subexpression (of a "top-level" expression) to the
AbstractSetthat may record useful information (like aliasing) for the subexpression. While creating a canonicalPreorderASTfor each "top-level" expression, a fresh set ofNodesare allocated to construct the canonical forms for subexpressions. We cannot compare the pointer to the subtree (in the preorder AST) corresponding to the subexpression, with the root nodes of the canonical forms of otherAbstractSets. For example, letpbe a pointer to an array of structures (i.e.pis of type_Array_ptr<struct S>). Consider the expressions*(p+i)and(*(p+i)).fwherefis a field instruct S. Further suppose thatp+iis aliased toq. Therefore,*(p+i)and*qwill belong to the sameAbstractSet, sayA. We may need this information while processing expression(*(p+i)).f- to recognize that(*(p+i)).fand(*q).fmust belong to the sameAbstractSet. But, under the present design/implementation, it may be very expensive to figure out that the subexpression*(p+i)of(*(p+i)).fis actually the same as the canonical form ofAbstractSet A. We need to discuss this point.
| // Note: An OperatorNode has a list of children because the preorder AST is | ||
| // an n-ary tree. | ||
| // Note: A BinaryOperatorNode has a list of children because the preorder | ||
| // AST is an n-ary tree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is confusing. It should communicate that:
A BinaryOperatorNode representing a commutative and associative binary operation may have more than two children because of coalescing. Ex: a +(b + c) will be represented by one BinaryOperatorNode for + with three children nodes for a, b and c after coalescing.
| void Create(Expr *E, Node *Parent = nullptr); | ||
|
|
||
| // Create a BinaryOperatorNode with an addition operator and two children | ||
| // (E and 0), and add the created BinaryOperatorNode to the Parent node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add -> attach
| void Coalesce(Node *N, bool &Changed); | ||
|
|
||
| // Sort the children expressions in a OperatorNode of the AST. | ||
| // Sort the children expressions in a non-leaf Node of the AST. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than BinaryOperatorNode, which other non-leaf Node has its children sorted? I wonder if you mean this:
Recursively descend the PreorderAST to sort the children of all BinaryOperatorNodes if the binary operator is commutative.
|
|
||
| // If the root is null, make the current node the root. | ||
| if (!Root) | ||
| Root = N; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Root of a PreorderAST can only be a BinaryOperatorNode, right? Also, may be an assert that parent is null is required.
| // BinaryOperatorNode to be constant folded, it must have at least two | ||
| // LeafExprNode children whose expressions are integer constants. For | ||
| // example, i + -(1 + 2) + 0 will be constant folded to i + -3, but | ||
| // i + -(1 + 2) will not be constant folded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add a 0 here? Now that we are supporting a UnaryOperatorNode, will this situation still arise? As per the design doc, the unary minus in the subexpression -(1 + 2) should have a corresponding UnaryOperatorNode whose child is a BinaryOperatorNode, and whose children are the LeafExprNodescorresponding to1and2`. Could you please clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integer constant expressions of the form +e and -e are LeafExprNodes rather than UnaryOperatorNodes. The reason for this is so that they can be correctly constant folded. The only children of a BinaryOperatorNode that are constant folded are LeafExprNodes whose Exprs are integer constant expressions. So, for example, the expression + p -(1 + 2) 4 will be constant folded to + p 1 only if -(1 + 2) is a LeafExprNode with the Expr -(1 + 2). If -(1 + 2) is a UnaryOperatorNode, this expression will not be constant folded. I don't think this is currently reflected in the design doc - I will update the doc so this is more clear.
The children within a BinaryOperatorNode will only be constant folded if the BinaryOperatorNode has at least two LeafExprNode children whose Exprs are integer constant expressions. For example, the BinaryOperatorNode + p -(1 + 2) will not be constant folded since it only has one integer constant child node, but + p -(1 + 2) 0 will be constant folded to + p -3.
| } | ||
|
|
||
| auto *N = new OperatorNode(BinOp, Parent); | ||
| auto *N = new BinaryOperatorNode(BinOp, Parent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the design document, there is an example that adds a "+ 0" to a BinaryOperatorNode even though it is not the root node of the PreorderAST. I don't see the corresponding code here. Could you please clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only places where 0 is added to a BinaryOperatorNode are where the AddZero method is called, currently in the following places:
- If the
Rootis currently null, so+ e 0is the root of thePreorderAST - To create the
Basenode of an arrowMemberNode, soe->fis canonicalized to(e + 0)->f - To create the
Childnode of a dereferenceUnaryOperatorNode, so*eis canonicalized to*(e + 0) - To create the
Childnode of a dereferenceUnaryOperatorNodethat is created from anArraySubscriptExpr, soe1[e2]is canonicalized to*(e1 + e2 + 0)
The intent was for the examples in the design doc to only add 0 to a BinaryOperatorNode in these cases. I'll double check the examples in the doc.
| } | ||
| } | ||
|
|
||
| void PreorderAST::AddZero(Expr *E, Node *Parent) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name AddZero is confusing: the meaning of "Add" in AddZero and AddNode is very different. AddNode is just "attaching" a node to the parent node while AddZero is creating a BinaryOperatorNode that performs the "+" operation with zero and also calls create to continue the recursive creation of the preorder AST.
sulekhark
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
mgrang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This PR modifies PreorderAST so that it can create canonical forms for member expressions such as
a->f,(*a).f,a[0].f, etc.Three new
Nodekinds are introduced:MemberNodefor canonicalizing member expressions,UnaryOperatorNodefor canonicalizing unary operators and array subscripts such as*p,*(p + i),p[i], etc., andImplicitCastNodefor canonicalizing implicit cast expressions such asLValueToRValue(e).This is part of the ongoing work to generalize bounds checking to lvalue expressions. In order to create an
AbstractSetfor an lvalue expressioneto use as the key inObservedBoundsto track the inferred bounds of the value produced bye, we need to usePreorderASTto create a canonical form fore.