New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace flex/lemon parser with libcypher-parser #488
Conversation
1bb7acb
to
cd85936
Compare
8a8d835
to
d62858e
Compare
dd40d61
to
573a7f0
Compare
src/graph/entities/qg_node.c
Outdated
n->label = orig->label; | ||
n->labelID = orig->labelID; | ||
n->alias = orig->alias; | ||
n->incoming_edges = array_new(QGEdge*, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember, I can only guess we don't need / shouldn't copy edges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't by design, this is only called from the QueryGraph_Clone
routine. We clone all nodes without edges first, then introduce edges, which automatically updates the QGNode objects.
src/graph/entities/qg_node.c
Outdated
int offset = 0; | ||
offset += snprintf(buff + offset, buff_len - offset, "("); | ||
if(n->alias) offset += snprintf(buff + offset, buff_len - offset, "%s", n->alias); | ||
if(n->label) offset += snprintf(buff + offset, buff_len - offset, ":%s", n->label); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the node ID be included in the string representation as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! This ID is scoped to the query, but could be useful in EXPLAIN/PROFILE introspection (before we'd have aliases like "anon_0", which is a bit of insight that's lost in the current representation).
src/graph/entities/qg_edge.c
Outdated
offset += snprintf(buff + offset, buff_len - offset, "%s", e->alias); | ||
uint reltype_count = array_len(e->reltypes); | ||
for (uint i = 0; i < reltype_count; i ++) { | ||
offset += snprintf(buff + offset, buff_len - offset, ":%s", e->reltypes[i]); // TODO how should this print? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Neo's edge string representation
to_transpose[i] = !to_transpose[i]; | ||
if (to_transpose[i]) QGEdge_Reverse(path[i]); | ||
} | ||
// Reverse the path array as well as the transposition array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain why are we doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, we'd call Edge_Reverse
while iterating over a path, and at the end we'd transpose the AlgebraicExpression if the transpose count was too high.
With the switch to QG
entities, this became inadequate, because AlgebraicExpression_Transpose
would only fix the src node, dest node, and edge (if referred). The inconsistencies in QG
entities caused some breaks in the execution plan logic immediately following AlgebraicExpression construction.
// TODO I feel like this is really over-elaborate - handle it in parser? | ||
// TODO this can obviously be improved greatly | ||
AR_ExpNode *ar_exp = AR_EXP_FromExpression(record_map, arg); | ||
SIValue exp = AR_EXP_Evaluate(ar_exp, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for arg to be a complex expression, e.g. N.x ? in which case we can't evaluate at this point in time
Maybe for a minus unary op we should introduce: SIValue(-1) * Exp ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! You're right, we need to make an expression tree here to handle all cases. Which does feel really excessive if the value was something like -5
...
SIValue exp = AR_EXP_Evaluate(ar_exp, NULL); | ||
exp = SIValue_Subtract(SI_LongVal(0), exp); | ||
return AR_EXP_NewConstOperandNode(exp); | ||
} else if (operator == CYPHER_OP_UNARY_PLUS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+5 instead of just 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, yeah. It's easy enough to handle, I just haven't added it as a matter of prioritization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added handling for this.
if (root->operand.variadic.entity_prop) rm_free(root->operand.variadic.entity_prop); | ||
} | ||
} else if (root->operand.type == AR_EXP_CONSTANT) { | ||
SIValue_Free(&root->operand.constant); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good practice, but are we really expecting the SIValue to free anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt it! String manipulation functions like toLower(a.name)
allocate SIValues that must be freed, but off the top of my head, I can't think of a way for those SIValues to get here.
src/ast/ast.h
Outdated
bool AST_ContainsErrors(const cypher_parse_result_t *result); | ||
|
||
// Report encountered errors. | ||
char* AST_ReportErrors(const cypher_parse_result_t *result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
src/ast/ast.h
Outdated
void AST_ReferredFunctions(const cypher_astnode_t *root, TrieMap *referred_funcs); | ||
|
||
// Returns specified clause or NULL. | ||
const cypher_astnode_t* AST_GetClause(const AST *ast, cypher_astnode_type_t clause_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen in the case where there are multiple clauses of the same type?
MATCH (a) WITH a MATCH (b)-[]->(a) RETURN a,b
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first clause of a matching type (in a segment, which is delimited by WITH Clauses), is returned. When we're concerned with multiple clauses, we use AST_GetTopLevelClauses
or a similar call (there's some redundancy in this file).
src/ast/ast.c
Outdated
long AST_ParseIntegerNode(const cypher_astnode_t *int_node) { | ||
assert(int_node); | ||
|
||
const char *value_str = cypher_ast_integer_get_valuestr(int_node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numeric values are represented by the parser internally as strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. My guess is that Leishman didn't want the parser to be responsible for numerics that are smaller or larger than the range of standard integer types, but it's definitely a nuisance for us.
src/ast/ast.c
Outdated
} | ||
|
||
void AST_Free(AST *ast) { | ||
if (ast->entity_map) TrieMap_Free(ast->entity_map, TrieMap_NOP_CB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we releasing ast->root?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments/Questions mostly about Execution-Plan
src/execution_plan/execution_plan.h
Outdated
} ExecutionPlan; | ||
QueryGraph *query_graph; // QueryGraph representing all graph entities in this segment. // TODO del? | ||
FT_FilterNode *filter_tree; // FilterTree containing filters to be applied to this segment. | ||
AR_ExpNode **projections; // Expressions to be constructed for a WITH or RETURN clause. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need both projections and order_expressions as part of the ExecutionPlanSegment struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't think we need either of them - I'll try to rework this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swilly22 If there's a WITH clause, the projections
of the pre-WITH segment need to be accessible while building the next segment. We can remove them from the struct and instead add a to-be-populated argument to _NewExecutionPlanSegment
, but I don't know if I prefer that.
src/execution_plan/execution_plan.h
Outdated
); | ||
typedef struct { | ||
OpBase *root; // Root operation of overall ExecutionPlan. | ||
ExecutionPlanSegment **segments; // TODO might be able to refactor to remove this element |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible let's drop segments, I like to think about an execution-plan as a tree of operations,
I believe it is preferable not to revel the fact that it's composed by segments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe it's possible right now; but it's only used to properly free things after execution. The actual execution is the same root
traversal as we've always used.
c9d8215
to
2cfc1cf
Compare
src/ast/ast_validations.c
Outdated
@@ -411,6 +562,13 @@ static AST_Validation _AST_Validate(const AST *ast, char **reason) { | |||
return AST_VALID; | |||
} | |||
|
|||
AST_Validation AST_WhitelistQuery(const cypher_astnode_t *root, char **reason) { | |||
rax *whitelist = _BuildWhitelist(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're constructing whitelist
for each query, as whitelist
not going to change from one query to the other (same white-list) construct it once, and reuse it. (declare it in global scope as static)
3b46a01
to
1fe8f2f
Compare
fc38494
to
36e9411
Compare
36e9411
to
597a178
Compare
4b6aaa8
to
5d37581
Compare
for(uint i = 0; i < prev_projection_count; i ++) { | ||
op_apply->modifies = array_append(op_apply->modifies, i); | ||
op_cp->modifies = array_append(op_cp->modifies, i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you shad some light on the modifies array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! It's unchanged from master except that it now stores Record IDs rather than aliases (and is an array instead of a Vector).
When choosing the optimal position to put Filter ops, we traverse the operation chain until every Record ID specified in a filter tree (FilterTree_CollectModified
) has been seen in a modifies
array (ExecutionPlan_LocateReferences
).
Record IDs are scoped to an ExecutionPlanSegment, and WITH projections are the first n Record IDs in the subsequent segment. (Which is the reason for this logic.)
Since Cypher allows aliases to be reused in different WITH-separated segments, we would need a fix like this even if we were using aliases instead of IDs.
@@ -498,7 +503,7 @@ AR_ExpNode *AR_EXP_Clone(AR_ExpNode *exp) { | |||
assert(false); | |||
break; | |||
} | |||
|
|||
clone->resolved_name = exp->resolved_name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to duplicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; although resolved_name
looks like the result of AR_EXP_ToString
, it is a const
parser artifact that gets freed with the parse_result
after execution.
@@ -205,7 +211,7 @@ Record CondTraverseConsume(OpBase *opBase) { | |||
&op->edges); | |||
} | |||
|
|||
_CondTraverse_SetEdge(op, op->r); | |||
if(!_CondTraverse_SetEdge(op, op->r)) return NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How's it possible for _CondTraverse_SetEdge to fail at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I think this is unnecessary.
src/ast/ast_validations.c
Outdated
uint path_len = cypher_ast_pattern_path_nelements(path); | ||
|
||
// Check all entities on the path | ||
for (uint i = 0; i < path_len; i ++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we only care about odd indices
for (uint i = 1; i < path_len; i +=2) {
now the if(1 % 2)
can be omitted.
src/ast/ast_validations.c
Outdated
static AST_Validation _ValidateInlinedPropertiesOnPath(const cypher_astnode_t *path, char **reason) { | ||
uint path_len = cypher_ast_pattern_path_nelements(path); | ||
// Check all entities on the path | ||
for (uint i = 0; i < path_len; i ++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be better to divide this into two for loops, one which will go over nodes and another which will scan edges
this way, we won't be messing with the branch predictor and save computing of modulo
src/ast/ast_validations.c
Outdated
goto cleanup; | ||
} | ||
// Validate that inlined properties do not use parameters | ||
res = _Validate_CREATE_Clause_Properties(create_clauses[i], reason); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we pass reason
here but not for _Validate_CREATE_Clause_TypedRelations
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When _Validate_CREATE_Clause_TypedRelations
fails, we always emit the error:
"Exactly one relationship type must be specified for CREATE"
Which is similar to Neo's response, as there's not much to provide the user with in the case of a query like CREATE (:a)-[]->(:b)
.
For property validation, we can at least differentiate between whether the user is trying to use a parameter or a non-constant value, and as such can give a slightly more specific erro.
src/ast/ast_validations.c
Outdated
@@ -135,18 +135,18 @@ static AST_Validation _AST_ValidateReferredFunctions(TrieMap *referred_functions | |||
return res; | |||
} | |||
|
|||
static AST_Validation _MATCH_Clause_Validate_Range(const cypher_astnode_t *node, char **reason) { | |||
static AST_Validation _MATCH_Clause_Validate_Range(const cypher_astnode_t *range, char **reason) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So many validations...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ick, I know. We'll be able to remove some of them over time, at least, but to some degree there's no way around it unless we want to write them into the parser.
if (reltype_id == GRAPH_UNKNOWN_RELATION) { | ||
// No matrix to add | ||
continue; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this else
in case reltype_id == GRAPH_UNKNOWN_RELATION we'll jump to the beginning of the loop
@jeffreylovitz HERO |
* Fix steps for ordered and unordered results * Merge TCK updates, remove modified lines * Single-commit parser updates * Post-rebase fixes * Enable additional code paths; delete unused * WIP unit test updates * Edge reference fixes * Use array_del macros * Improve QG interfaces and linking * Contd * Fix complex transpositions and RETURN * * Update libcypher-parser version * Adjust query buffer size to 1mb * Update CircleCI cache * Force parser header inclusion * PR improvements * Fix Clang compiler warnings * include sys/types for uint * build deps before source * Fix memory leaks * AST interface improvements * Clean up execution plan * Fix memory leaks * PR fixes * Remove extraneous interfaces * Remove extraneous interfaces, part 2 * Post-rebase fixes * Remove repeated unlock * Various improvements * Continued cleanup * Add whitelist to block unsupported queries * Only build Cypher whitelist once * Check for empty query in PROFILE commands * Build Cypher whitelist on first query * Only emit first AST error * Fix 0-initialization in Record_Extend * Improve documentation * Various improvements * Standardize AST validation across different commands * Add whitelist for AST operators * Build modifies arrays for create, update, delete ops * Fix invalid multi-relation specification * Properly scope WITH-introduced aliases * Add TCK steps for ordered and unordered result sets * Improved validations * Format files with astyle * skip ORDER BY projection TCK test * Fixes * Full parity with TCK features * Add skips to TCK * Update astyle exclusions, ignore exclude errors * Fix filter operation placement on WITH-projected entities * Combine data-producing WITH streams using Cartesian Product instead of Apply * PR fixes * Improve formatting of header file
No description provided.