Skip to content

Conversation

@kkjeer
Copy link
Contributor

@kkjeer kkjeer commented Apr 14, 2021

This PR modifies PreorderAST so that it can create canonical forms for member expressions such as a->f, (*a).f, a[0].f, etc.

Three new Node kinds are introduced: MemberNode for canonicalizing member expressions, UnaryOperatorNode for canonicalizing unary operators and array subscripts such as *p, *(p + i), p[i], etc., and ImplicitCastNode for canonicalizing implicit cast expressions such as LValueToRValue(e).

This is part of the ongoing work to generalize bounds checking to lvalue expressions. In order to create an AbstractSet for an lvalue expression e to use as the key in ObservedBounds to track the inferred bounds of the value produced by e, we need to use PreorderAST to create a canonical form for e.

@kkjeer kkjeer requested review from dtarditi, mgrang and sulekhark April 14, 2021 07:11
Copy link

@mgrang mgrang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general observation, I see that now we need to have switch-cases in all the functions which act on the PreorderAST like ConstantFold, Coalesce, etc. Maybe we should move the class-specific implementations of these functions into their respective classes. Then we could simply invoke Object->ConstantFold(), Object->Coalesce(), etc. This would avoid switch-cases and thus result in simpler code.

This can be done in a follow-up PR.

// Nodes with two different kinds are sorted according to the order in
// which their kinds appear in this enum.
enum class NodeKind {
OperatorNode,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should rename OperatorNode to BinaryOperatorNode since we now have a separate UnaryOperatorNode.

Expr *E;

LeafExprNode(Expr *E, OperatorNode *Parent) :
LeafExprNode(Expr *E, Node *Parent) :
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the parent is now of type Node it means even a LeafExprNode can be a parent. But we should not allow parent to be a LeafExprNode. Maybe an assert and a check should be added.

Coalesce(I->Child, Changed);
break;
}
default:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having default as the first case in a switch allows better code readability as the switch may become longer when new cases are added to it in future.

Also do we even need a default case here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default case avoids a warning that would be emitted if the switch did not handle the LeafExprNode case.

Sort(I->Child);
break;
}
default:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments on the default case here.

}
}

void PreorderAST::ConstantFoldOperator(OperatorNode *O, bool &Changed) {
Copy link

@mgrang mgrang Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to change the name of this function. It can be called as ConstantFold and C++ function overloading would figure out that this function needs to be called based on the type of the first argument (which would be Node * in one function and OperatorNode * in the other).

Of course, when we move the implementation of ConstantFold into respective classes this can be invoked as Object->ConstantFold().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried renaming ConstantFoldOperator to ConstantFold and unfortunately constant folding no longer worked. It seems that ConstantFold(BinaryOperatorNode *B) will only be called on a Node * N if N is declared as a BinaryOperatorNode *. For example, I tried changing the type of PreorderAST.Root from Node * to BinaryOperatorNode * and then ConstantFold(Root) called the correct function; however, this does not work for any descendants of a BinaryOperatorNode. For example, the expression *(*(p + 1 + 2) + 3 + 4) should be constant folded to *(*(p + 3) + 7), but it is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline - the need to have two different ConstantFold and ConstantFoldOperator methods should no longer be a consideration after the follow-up work in issue #1032 to move methods such as ConstantFold to their respective Node classes. For now, the methods ConstantFold and ConstantFoldOperator can keep their respective names.

namespace clang {
using Result = Lexicographic::Result;
class OperatorNode;
class LeafExprNode;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class declaration still needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used for the assert in the Node constructor if Parent is a LeafExprNode.

void ConstantFold(Node *N, bool &Changed);

// Constant fold integer expressions within an operator node.
// Constant fold integer expressions within a binary operator node.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binary operator node --> BinaryOperatorNode

E = OParent->Children.end(); I != E; ++I) {
if (*I == O) {
OParent->Children.erase(I);
for (auto I = BParent->Children.begin(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because BParent is mutated in the loop we need to compute its end each time through the loop. Currently we are not doing so because we compute E = BParent->Children.end() in the loop initializer. So the correct initializer should be as follows:
for (auto I = BParent->Children.begin(); I != BParent->Children.end(); ++I)

Could you also add a comment here indicating that the reason we need to evaluate loop end each time is because the container is mutated in the loop?

For more details see https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop

Copy link

@mgrang mgrang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@sulekhark sulekhark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the inlined review comments, I have a few general comments:

  1. I noticed one issue that seems to have been present before the work on this PR started: the use of CanCoalesceNode is sub-optimal. Every call to CoalesceNode is preceded by a call to CanCoalesceNode but CanCoalesceNode is called again as soon as CoalesceNode is entered. The logic of CanCoalesceNode is also a bit clunky - I can explain offline in more detail.
  2. Most importantly, we need to discuss the following design/implementation issue: It is difficult to go from the canonical form of a subexpression (of a "top-level" expression) to the AbstractSet that may record useful information (like aliasing) for the subexpression. While creating a canonical PreorderAST for each "top-level" expression, a fresh set of Nodes are allocated to construct the canonical forms for subexpressions. We cannot compare the pointer to the subtree (in the preorder AST) corresponding to the subexpression, with the root nodes of the canonical forms of other AbstractSets. For example, let p be a pointer to an array of structures (i.e. p is of type _Array_ptr<struct S>). Consider the expressions *(p+i) and (*(p+i)).f where f is a field in struct S. Further suppose that p+i is aliased to q. Therefore, *(p+i) and *q will belong to the same AbstractSet, say A. We may need this information while processing expression (*(p+i)).f - to recognize that (*(p+i)).f and (*q).f must belong to the same AbstractSet. But, under the present design/implementation, it may be very expensive to figure out that the subexpression *(p+i) of (*(p+i)).f is actually the same as the canonical form of AbstractSet A. We need to discuss this point.

// Note: An OperatorNode has a list of children because the preorder AST is
// an n-ary tree.
// Note: A BinaryOperatorNode has a list of children because the preorder
// AST is an n-ary tree.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is confusing. It should communicate that:
A BinaryOperatorNode representing a commutative and associative binary operation may have more than two children because of coalescing. Ex: a +(b + c) will be represented by one BinaryOperatorNode for + with three children nodes for a, b and c after coalescing.

void Create(Expr *E, Node *Parent = nullptr);

// Create a BinaryOperatorNode with an addition operator and two children
// (E and 0), and add the created BinaryOperatorNode to the Parent node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add -> attach

void Coalesce(Node *N, bool &Changed);

// Sort the children expressions in a OperatorNode of the AST.
// Sort the children expressions in a non-leaf Node of the AST.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than BinaryOperatorNode, which other non-leaf Node has its children sorted? I wonder if you mean this:
Recursively descend the PreorderAST to sort the children of all BinaryOperatorNodes if the binary operator is commutative.


// If the root is null, make the current node the root.
if (!Root)
Root = N;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Root of a PreorderAST can only be a BinaryOperatorNode, right? Also, may be an assert that parent is null is required.

// BinaryOperatorNode to be constant folded, it must have at least two
// LeafExprNode children whose expressions are integer constants. For
// example, i + -(1 + 2) + 0 will be constant folded to i + -3, but
// i + -(1 + 2) will not be constant folded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a 0 here? Now that we are supporting a UnaryOperatorNode, will this situation still arise? As per the design doc, the unary minus in the subexpression -(1 + 2) should have a corresponding UnaryOperatorNode whose child is a BinaryOperatorNode, and whose children are the LeafExprNodescorresponding to1and2`. Could you please clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer constant expressions of the form +e and -e are LeafExprNodes rather than UnaryOperatorNodes. The reason for this is so that they can be correctly constant folded. The only children of a BinaryOperatorNode that are constant folded are LeafExprNodes whose Exprs are integer constant expressions. So, for example, the expression + p -(1 + 2) 4 will be constant folded to + p 1 only if -(1 + 2) is a LeafExprNode with the Expr -(1 + 2). If -(1 + 2) is a UnaryOperatorNode, this expression will not be constant folded. I don't think this is currently reflected in the design doc - I will update the doc so this is more clear.

The children within a BinaryOperatorNode will only be constant folded if the BinaryOperatorNode has at least two LeafExprNode children whose Exprs are integer constant expressions. For example, the BinaryOperatorNode + p -(1 + 2) will not be constant folded since it only has one integer constant child node, but + p -(1 + 2) 0 will be constant folded to + p -3.

}

auto *N = new OperatorNode(BinOp, Parent);
auto *N = new BinaryOperatorNode(BinOp, Parent);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the design document, there is an example that adds a "+ 0" to a BinaryOperatorNode even though it is not the root node of the PreorderAST. I don't see the corresponding code here. Could you please clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only places where 0 is added to a BinaryOperatorNode are where the AddZero method is called, currently in the following places:

  1. If the Root is currently null, so + e 0 is the root of the PreorderAST
  2. To create the Base node of an arrow MemberNode, so e->f is canonicalized to (e + 0)->f
  3. To create the Child node of a dereference UnaryOperatorNode, so *e is canonicalized to *(e + 0)
  4. To create the Child node of a dereference UnaryOperatorNode that is created from an ArraySubscriptExpr, so e1[e2] is canonicalized to *(e1 + e2 + 0)

The intent was for the examples in the design doc to only add 0 to a BinaryOperatorNode in these cases. I'll double check the examples in the doc.

}
}

void PreorderAST::AddZero(Expr *E, Node *Parent) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name AddZero is confusing: the meaning of "Add" in AddZero and AddNode is very different. AddNode is just "attaching" a node to the parent node while AddZero is creating a BinaryOperatorNode that performs the "+" operation with zero and also calls create to continue the recursive creation of the preorder AST.

Copy link
Contributor

@sulekhark sulekhark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link

@mgrang mgrang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants