Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace flex/lemon parser with libcypher-parser #488

Merged
merged 59 commits into from Jul 29, 2019

Conversation

jeffreylovitz
Copy link
Contributor

No description provided.

@jeffreylovitz jeffreylovitz changed the title Replace yacc/lemon parser with libcypher-parser Replace flex/lemon parser with libcypher-parser May 15, 2019
@jeffreylovitz jeffreylovitz self-assigned this May 15, 2019
@jeffreylovitz jeffreylovitz added this to In progress in RedisGraph 2.0 via automation May 15, 2019
@jeffreylovitz jeffreylovitz changed the base branch from libcypher-parser-dependency to master May 19, 2019 06:38
@jeffreylovitz jeffreylovitz force-pushed the libcypher-parser-contd branch 2 times, most recently from 8a8d835 to d62858e Compare June 3, 2019 16:00
@jeffreylovitz jeffreylovitz force-pushed the libcypher-parser-contd branch 4 times, most recently from dd40d61 to 573a7f0 Compare July 8, 2019 17:28
src/graph/entities/qg_node.c Outdated Show resolved Hide resolved
src/graph/entities/qg_node.c Outdated Show resolved Hide resolved
src/graph/entities/qg_node.c Outdated Show resolved Hide resolved
n->label = orig->label;
n->labelID = orig->labelID;
n->alias = orig->alias;
n->incoming_edges = array_new(QGEdge*, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember, I can only guess we don't need / shouldn't copy edges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't by design, this is only called from the QueryGraph_Clone routine. We clone all nodes without edges first, then introduce edges, which automatically updates the QGNode objects.

int offset = 0;
offset += snprintf(buff + offset, buff_len - offset, "(");
if(n->alias) offset += snprintf(buff + offset, buff_len - offset, "%s", n->alias);
if(n->label) offset += snprintf(buff + offset, buff_len - offset, ":%s", n->label);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the node ID be included in the string representation as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! This ID is scoped to the query, but could be useful in EXPLAIN/PROFILE introspection (before we'd have aliases like "anon_0", which is a bit of insight that's lost in the current representation).

src/graph/entities/qg_edge.c Outdated Show resolved Hide resolved
src/graph/entities/qg_edge.c Outdated Show resolved Hide resolved
src/graph/entities/qg_edge.c Outdated Show resolved Hide resolved
offset += snprintf(buff + offset, buff_len - offset, "%s", e->alias);
uint reltype_count = array_len(e->reltypes);
for (uint i = 0; i < reltype_count; i ++) {
offset += snprintf(buff + offset, buff_len - offset, ":%s", e->reltypes[i]); // TODO how should this print?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Neo's edge string representation

src/graph/entities/qg_edge.c Outdated Show resolved Hide resolved
src/arithmetic/algebraic_expression.c Outdated Show resolved Hide resolved
src/arithmetic/algebraic_expression.c Outdated Show resolved Hide resolved
src/arithmetic/algebraic_expression.c Outdated Show resolved Hide resolved
src/arithmetic/algebraic_expression.c Outdated Show resolved Hide resolved
to_transpose[i] = !to_transpose[i];
if (to_transpose[i]) QGEdge_Reverse(path[i]);
}
// Reverse the path array as well as the transposition array
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain why are we doing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we'd call Edge_Reverse while iterating over a path, and at the end we'd transpose the AlgebraicExpression if the transpose count was too high.

With the switch to QG entities, this became inadequate, because AlgebraicExpression_Transpose would only fix the src node, dest node, and edge (if referred). The inconsistencies in QG entities caused some breaks in the execution plan logic immediately following AlgebraicExpression construction.

// TODO I feel like this is really over-elaborate - handle it in parser?
// TODO this can obviously be improved greatly
AR_ExpNode *ar_exp = AR_EXP_FromExpression(record_map, arg);
SIValue exp = AR_EXP_Evaluate(ar_exp, NULL);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for arg to be a complex expression, e.g. N.x ? in which case we can't evaluate at this point in time

Maybe for a minus unary op we should introduce: SIValue(-1) * Exp ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! You're right, we need to make an expression tree here to handle all cases. Which does feel really excessive if the value was something like -5...

SIValue exp = AR_EXP_Evaluate(ar_exp, NULL);
exp = SIValue_Subtract(SI_LongVal(0), exp);
return AR_EXP_NewConstOperandNode(exp);
} else if (operator == CYPHER_OP_UNARY_PLUS) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+5 instead of just 5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, yeah. It's easy enough to handle, I just haven't added it as a matter of prioritization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added handling for this.

src/arithmetic/arithmetic_expression.c Outdated Show resolved Hide resolved
src/arithmetic/arithmetic_expression.c Outdated Show resolved Hide resolved
if (root->operand.variadic.entity_prop) rm_free(root->operand.variadic.entity_prop);
}
} else if (root->operand.type == AR_EXP_CONSTANT) {
SIValue_Free(&root->operand.constant);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good practice, but are we really expecting the SIValue to free anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it! String manipulation functions like toLower(a.name) allocate SIValues that must be freed, but off the top of my head, I can't think of a way for those SIValues to get here.

src/ast/ast.h Outdated Show resolved Hide resolved
src/ast/ast.h Outdated Show resolved Hide resolved
src/ast/ast.h Outdated
bool AST_ContainsErrors(const cypher_parse_result_t *result);

// Report encountered errors.
char* AST_ReportErrors(const cypher_parse_result_t *result);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

src/ast/ast.h Outdated
void AST_ReferredFunctions(const cypher_astnode_t *root, TrieMap *referred_funcs);

// Returns specified clause or NULL.
const cypher_astnode_t* AST_GetClause(const AST *ast, cypher_astnode_type_t clause_type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen in the case where there are multiple clauses of the same type?
MATCH (a) WITH a MATCH (b)-[]->(a) RETURN a,b
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first clause of a matching type (in a segment, which is delimited by WITH Clauses), is returned. When we're concerned with multiple clauses, we use AST_GetTopLevelClauses or a similar call (there's some redundancy in this file).

src/ast/ast.h Outdated Show resolved Hide resolved
src/ast/ast.c Outdated
long AST_ParseIntegerNode(const cypher_astnode_t *int_node) {
assert(int_node);

const char *value_str = cypher_ast_integer_get_valuestr(int_node);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numeric values are represented by the parser internally as strings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. My guess is that Leishman didn't want the parser to be responsible for numerics that are smaller or larger than the range of standard integer types, but it's definitely a nuisance for us.

src/ast/ast.c Outdated Show resolved Hide resolved
src/ast/ast.c Outdated Show resolved Hide resolved
src/ast/ast.c Outdated Show resolved Hide resolved
src/ast/ast.c Outdated
}

void AST_Free(AST *ast) {
if (ast->entity_map) TrieMap_Free(ast->entity_map, TrieMap_NOP_CB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we releasing ast->root?

Copy link
Collaborator

@swilly22 swilly22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments/Questions mostly about Execution-Plan

} ExecutionPlan;
QueryGraph *query_graph; // QueryGraph representing all graph entities in this segment. // TODO del?
FT_FilterNode *filter_tree; // FilterTree containing filters to be applied to this segment.
AR_ExpNode **projections; // Expressions to be constructed for a WITH or RETURN clause.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both projections and order_expressions as part of the ExecutionPlanSegment struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think we need either of them - I'll try to rework this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swilly22 If there's a WITH clause, the projections of the pre-WITH segment need to be accessible while building the next segment. We can remove them from the struct and instead add a to-be-populated argument to _NewExecutionPlanSegment, but I don't know if I prefer that.

);
typedef struct {
OpBase *root; // Root operation of overall ExecutionPlan.
ExecutionPlanSegment **segments; // TODO might be able to refactor to remove this element
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible let's drop segments, I like to think about an execution-plan as a tree of operations,
I believe it is preferable not to revel the fact that it's composed by segments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe it's possible right now; but it's only used to properly free things after execution. The actual execution is the same root traversal as we've always used.

src/execution_plan/execution_plan.h Outdated Show resolved Hide resolved
src/execution_plan/execution_plan.h Outdated Show resolved Hide resolved
src/execution_plan/execution_plan.h Outdated Show resolved Hide resolved
src/execution_plan/execution_plan.c Outdated Show resolved Hide resolved
src/execution_plan/execution_plan.c Outdated Show resolved Hide resolved
src/ast/ast_build_op_contexts.c Outdated Show resolved Hide resolved
src/execution_plan/execution_plan.c Outdated Show resolved Hide resolved
src/execution_plan/execution_plan.c Outdated Show resolved Hide resolved
@jeffreylovitz jeffreylovitz force-pushed the libcypher-parser-contd branch 3 times, most recently from c9d8215 to 2cfc1cf Compare July 16, 2019 14:30
@@ -411,6 +562,13 @@ static AST_Validation _AST_Validate(const AST *ast, char **reason) {
return AST_VALID;
}

AST_Validation AST_WhitelistQuery(const cypher_astnode_t *root, char **reason) {
rax *whitelist = _BuildWhitelist();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're constructing whitelist for each query, as whitelist not going to change from one query to the other (same white-list) construct it once, and reuse it. (declare it in global scope as static)

@jeffreylovitz jeffreylovitz force-pushed the libcypher-parser-contd branch 3 times, most recently from 3b46a01 to 1fe8f2f Compare July 18, 2019 20:08
for(uint i = 0; i < prev_projection_count; i ++) {
op_apply->modifies = array_append(op_apply->modifies, i);
op_cp->modifies = array_append(op_cp->modifies, i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you shad some light on the modifies array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! It's unchanged from master except that it now stores Record IDs rather than aliases (and is an array instead of a Vector).

When choosing the optimal position to put Filter ops, we traverse the operation chain until every Record ID specified in a filter tree (FilterTree_CollectModified) has been seen in a modifies array (ExecutionPlan_LocateReferences).

Record IDs are scoped to an ExecutionPlanSegment, and WITH projections are the first n Record IDs in the subsequent segment. (Which is the reason for this logic.)

Since Cypher allows aliases to be reused in different WITH-separated segments, we would need a fix like this even if we were using aliases instead of IDs.

@@ -498,7 +503,7 @@ AR_ExpNode *AR_EXP_Clone(AR_ExpNode *exp) {
assert(false);
break;
}

clone->resolved_name = exp->resolved_name;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to duplicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No; although resolved_name looks like the result of AR_EXP_ToString, it is a const parser artifact that gets freed with the parse_result after execution.

@@ -205,7 +211,7 @@ Record CondTraverseConsume(OpBase *opBase) {
&op->edges);
}

_CondTraverse_SetEdge(op, op->r);
if(!_CondTraverse_SetEdge(op, op->r)) return NULL;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's it possible for _CondTraverse_SetEdge to fail at this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I think this is unnecessary.

uint path_len = cypher_ast_pattern_path_nelements(path);

// Check all entities on the path
for (uint i = 0; i < path_len; i ++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we only care about odd indices
for (uint i = 1; i < path_len; i +=2) {
now the if(1 % 2) can be omitted.

static AST_Validation _ValidateInlinedPropertiesOnPath(const cypher_astnode_t *path, char **reason) {
uint path_len = cypher_ast_pattern_path_nelements(path);
// Check all entities on the path
for (uint i = 0; i < path_len; i ++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be better to divide this into two for loops, one which will go over nodes and another which will scan edges
this way, we won't be messing with the branch predictor and save computing of modulo

goto cleanup;
}
// Validate that inlined properties do not use parameters
res = _Validate_CREATE_Clause_Properties(create_clauses[i], reason);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we pass reason here but not for _Validate_CREATE_Clause_TypedRelations ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When _Validate_CREATE_Clause_TypedRelations fails, we always emit the error:
"Exactly one relationship type must be specified for CREATE"
Which is similar to Neo's response, as there's not much to provide the user with in the case of a query like CREATE (:a)-[]->(:b).

For property validation, we can at least differentiate between whether the user is trying to use a parameter or a non-constant value, and as such can give a slightly more specific erro.

@@ -135,18 +135,18 @@ static AST_Validation _AST_ValidateReferredFunctions(TrieMap *referred_functions
return res;
}

static AST_Validation _MATCH_Clause_Validate_Range(const cypher_astnode_t *node, char **reason) {
static AST_Validation _MATCH_Clause_Validate_Range(const cypher_astnode_t *range, char **reason) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So many validations...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ick, I know. We'll be able to remove some of them over time, at least, but to some degree there's no way around it unless we want to write them into the parser.

if (reltype_id == GRAPH_UNKNOWN_RELATION) {
// No matrix to add
continue;
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this else in case reltype_id == GRAPH_UNKNOWN_RELATION we'll jump to the beginning of the loop

@swilly22 swilly22 merged commit 6ab1f5f into master Jul 29, 2019
RedisGraph 2.0 automation moved this from In progress to Done Jul 29, 2019
@K-Jo
Copy link
Collaborator

K-Jo commented Jul 29, 2019

@jeffreylovitz HERO

@jeffreylovitz jeffreylovitz deleted the libcypher-parser-contd branch August 2, 2019 13:40
pnxguide pushed a commit to CMU-SPEED/RedisGraph that referenced this pull request Mar 22, 2023
* Fix steps for ordered and unordered results

* Merge TCK updates, remove modified lines

* Single-commit parser updates

* Post-rebase fixes

* Enable additional code paths; delete unused

* WIP unit test updates

* Edge reference fixes

* Use array_del macros

* Improve QG interfaces and linking

* Contd

* Fix complex transpositions and RETURN *

* Update libcypher-parser version

* Adjust query buffer size to 1mb

* Update CircleCI cache

* Force parser header inclusion

* PR improvements

* Fix Clang compiler warnings

* include sys/types for uint

* build deps before source

* Fix memory leaks

* AST interface improvements

* Clean up execution plan

* Fix memory leaks

* PR fixes

* Remove extraneous interfaces

* Remove extraneous interfaces, part 2

* Post-rebase fixes

* Remove repeated unlock

* Various improvements

* Continued cleanup

* Add whitelist to block unsupported queries

* Only build Cypher whitelist once

* Check for empty query in PROFILE commands

* Build Cypher whitelist on first query

* Only emit first AST error

* Fix 0-initialization in Record_Extend

* Improve documentation

* Various improvements

* Standardize AST validation across different commands

* Add whitelist for AST operators

* Build modifies arrays for create, update, delete ops

* Fix invalid multi-relation specification

* Properly scope WITH-introduced aliases

* Add TCK steps for ordered and unordered result sets

* Improved validations

* Format files with astyle

* skip ORDER BY projection TCK test

* Fixes

* Full parity with TCK features

* Add skips to TCK

* Update astyle exclusions, ignore exclude errors

* Fix filter operation placement on WITH-projected entities

* Combine data-producing WITH streams using Cartesian Product instead of Apply

* PR fixes

* Improve formatting of header file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants