Replace flex/lemon parser with libcypher-parser #488

jeffreylovitz · 2019-05-15T17:50:42Z

No description provided.

src/graph/entities/qg_node.c

swilly22 · 2019-07-09T14:03:36Z

src/graph/entities/qg_node.c

+    n->label = orig->label;
+    n->labelID = orig->labelID;
+    n->alias = orig->alias;
+    n->incoming_edges = array_new(QGEdge*, 0);


I don't remember, I can only guess we don't need / shouldn't copy edges.

We don't by design, this is only called from the QueryGraph_Clone routine. We clone all nodes without edges first, then introduce edges, which automatically updates the QGNode objects.

swilly22 · 2019-07-09T14:04:15Z

src/graph/entities/qg_node.c

+    int offset = 0;
+    offset += snprintf(buff + offset, buff_len - offset, "(");
+    if(n->alias) offset += snprintf(buff + offset, buff_len - offset, "%s", n->alias);
+    if(n->label) offset += snprintf(buff + offset, buff_len - offset, ":%s", n->label);


should the node ID be included in the string representation as well?

Good question! This ID is scoped to the query, but could be useful in EXPLAIN/PROFILE introspection (before we'd have aliases like "anon_0", which is a bit of insight that's lost in the current representation).

src/graph/entities/qg_edge.c

swilly22 · 2019-07-09T14:54:39Z

src/graph/entities/qg_edge.c

+        offset += snprintf(buff + offset, buff_len - offset, "%s", e->alias);
+    uint reltype_count = array_len(e->reltypes);
+    for (uint i = 0; i < reltype_count; i ++) {
+        offset += snprintf(buff + offset, buff_len - offset, ":%s", e->reltypes[i]); // TODO how should this print?


See Neo's edge string representation

src/graph/entities/qg_edge.c

src/arithmetic/algebraic_expression.c

swilly22 · 2019-07-10T11:25:48Z

src/arithmetic/algebraic_expression.c

+        to_transpose[i] = !to_transpose[i];
+        if (to_transpose[i]) QGEdge_Reverse(path[i]);
+    }
+    // Reverse the path array as well as the transposition array


Please explain why are we doing this.

Previously, we'd call Edge_Reverse while iterating over a path, and at the end we'd transpose the AlgebraicExpression if the transpose count was too high.

With the switch to QG entities, this became inadequate, because AlgebraicExpression_Transpose would only fix the src node, dest node, and edge (if referred). The inconsistencies in QG entities caused some breaks in the execution plan logic immediately following AlgebraicExpression construction.

swilly22 · 2019-07-10T11:53:19Z

src/arithmetic/arithmetic_expression.c

+            // TODO I feel like this is really over-elaborate - handle it in parser?
+            // TODO this can obviously be improved greatly
+            AR_ExpNode *ar_exp = AR_EXP_FromExpression(record_map, arg);
+            SIValue exp = AR_EXP_Evaluate(ar_exp, NULL);


Is it possible for arg to be a complex expression, e.g. N.x ? in which case we can't evaluate at this point in time

Maybe for a minus unary op we should introduce: SIValue(-1) * Exp ?

Good catch! You're right, we need to make an expression tree here to handle all cases. Which does feel really excessive if the value was something like -5...

swilly22 · 2019-07-10T11:54:03Z

src/arithmetic/arithmetic_expression.c

+            SIValue exp = AR_EXP_Evaluate(ar_exp, NULL);
+            exp = SIValue_Subtract(SI_LongVal(0), exp);
+            return AR_EXP_NewConstOperandNode(exp);
+        } else if (operator == CYPHER_OP_UNARY_PLUS) {


+5 instead of just 5?

Exactly, yeah. It's easy enough to handle, I just haven't added it as a matter of prioritization.

Added handling for this.

src/arithmetic/arithmetic_expression.c

swilly22 · 2019-07-10T12:00:38Z

src/arithmetic/arithmetic_expression.c

-            if (root->operand.variadic.entity_prop) rm_free(root->operand.variadic.entity_prop);
-        }
+    } else if (root->operand.type == AR_EXP_CONSTANT) {
+        SIValue_Free(&root->operand.constant);


Good practice, but are we really expecting the SIValue to free anything?

I doubt it! String manipulation functions like toLower(a.name) allocate SIValues that must be freed, but off the top of my head, I can't think of a way for those SIValues to get here.

src/ast/ast.h

swilly22 · 2019-07-11T09:20:36Z

src/ast/ast.h

+bool AST_ContainsErrors(const cypher_parse_result_t *result);
+
+// Report encountered errors.
+char* AST_ReportErrors(const cypher_parse_result_t *result);


swilly22 · 2019-07-11T09:21:40Z

src/ast/ast.h

+void AST_ReferredFunctions(const cypher_astnode_t *root, TrieMap *referred_funcs);
+
+// Returns specified clause or NULL.
+const cypher_astnode_t* AST_GetClause(const AST *ast, cypher_astnode_type_t clause_type);


What will happen in the case where there are multiple clauses of the same type?
MATCH (a) WITH a MATCH (b)-[]->(a) RETURN a,b
?

The first clause of a matching type (in a segment, which is delimited by WITH Clauses), is returned. When we're concerned with multiple clauses, we use AST_GetTopLevelClauses or a similar call (there's some redundancy in this file).

src/ast/ast.h

swilly22 · 2019-07-11T11:10:52Z

src/ast/ast.c

+long AST_ParseIntegerNode(const cypher_astnode_t *int_node) {
+    assert(int_node);
+
+    const char *value_str = cypher_ast_integer_get_valuestr(int_node);


Numeric values are represented by the parser internally as strings?

Yeah. My guess is that Leishman didn't want the parser to be responsible for numerics that are smaller or larger than the range of standard integer types, but it's definitely a nuisance for us.

src/ast/ast.c

swilly22 · 2019-07-11T11:15:31Z

src/ast/ast.c

+}
+
+void AST_Free(AST *ast) {
+    if (ast->entity_map) TrieMap_Free(ast->entity_map, TrieMap_NOP_CB);


Where are we releasing ast->root?

swilly22

Comments/Questions mostly about Execution-Plan

swilly22 · 2019-07-15T05:45:26Z

src/execution_plan/execution_plan.h

-} ExecutionPlan;
+    QueryGraph *query_graph;         // QueryGraph representing all graph entities in this segment. // TODO del?
+    FT_FilterNode *filter_tree;      // FilterTree containing filters to be applied to this segment.
+    AR_ExpNode **projections;        // Expressions to be constructed for a WITH or RETURN clause.


Why do we need both projections and order_expressions as part of the ExecutionPlanSegment struct?

Hmm, I don't think we need either of them - I'll try to rework this.

@swilly22 If there's a WITH clause, the projections of the pre-WITH segment need to be accessible while building the next segment. We can remove them from the struct and instead add a to-be-populated argument to _NewExecutionPlanSegment, but I don't know if I prefer that.

swilly22 · 2019-07-15T05:47:59Z

src/execution_plan/execution_plan.h

-);
+typedef struct {
+    OpBase *root;                    // Root operation of overall ExecutionPlan.
+    ExecutionPlanSegment **segments; // TODO might be able to refactor to remove this element


If possible let's drop segments, I like to think about an execution-plan as a tree of operations,
I believe it is preferable not to revel the fact that it's composed by segments.

I don't believe it's possible right now; but it's only used to properly free things after execution. The actual execution is the same root traversal as we've always used.

src/execution_plan/execution_plan.h

src/execution_plan/execution_plan.c

src/ast/ast_build_op_contexts.c

src/execution_plan/execution_plan.c

swilly22 · 2019-07-18T05:43:01Z

src/ast/ast_validations.c

@@ -411,6 +562,13 @@ static AST_Validation _AST_Validate(const AST *ast, char **reason) {
    return AST_VALID;
 }

+AST_Validation AST_WhitelistQuery(const cypher_astnode_t *root, char **reason) {
+    rax *whitelist = _BuildWhitelist();


we're constructing whitelist for each query, as whitelist not going to change from one query to the other (same white-list) construct it once, and reuse it. (declare it in global scope as static)

…f Apply

swilly22 · 2019-07-29T11:05:52Z

src/execution_plan/execution_plan.c

 			for(uint i = 0; i < prev_projection_count; i ++) {
-				op_apply->modifies = array_append(op_apply->modifies, i);
+				op_cp->modifies = array_append(op_cp->modifies, i);


Can you shad some light on the modifies array?

Sure! It's unchanged from master except that it now stores Record IDs rather than aliases (and is an array instead of a Vector).

When choosing the optimal position to put Filter ops, we traverse the operation chain until every Record ID specified in a filter tree (FilterTree_CollectModified) has been seen in a modifies array (ExecutionPlan_LocateReferences).

Record IDs are scoped to an ExecutionPlanSegment, and WITH projections are the first n Record IDs in the subsequent segment. (Which is the reason for this logic.)

Since Cypher allows aliases to be reused in different WITH-separated segments, we would need a fix like this even if we were using aliases instead of IDs.

swilly22 · 2019-07-29T11:10:06Z

src/arithmetic/arithmetic_expression.c

@@ -498,7 +503,7 @@ AR_ExpNode *AR_EXP_Clone(AR_ExpNode *exp) {
 		assert(false);
 		break;
 	}
-
+	clone->resolved_name = exp->resolved_name;


No need to duplicate?

No; although resolved_name looks like the result of AR_EXP_ToString, it is a const parser artifact that gets freed with the parse_result after execution.

swilly22 · 2019-07-29T11:17:06Z

src/execution_plan/ops/op_conditional_traverse.c

@@ -205,7 +211,7 @@ Record CondTraverseConsume(OpBase *opBase) {
 										  &op->edges);
 		}

-		_CondTraverse_SetEdge(op, op->r);
+		if(!_CondTraverse_SetEdge(op, op->r)) return NULL;


How's it possible for _CondTraverse_SetEdge to fail at this point?

You're right, I think this is unnecessary.

swilly22 · 2019-07-29T11:20:35Z

src/ast/ast_validations.c

+	uint path_len = cypher_ast_pattern_path_nelements(path);
+
+	// Check all entities on the path
+	for (uint i = 0; i < path_len; i ++) {


if we only care about odd indices
for (uint i = 1; i < path_len; i +=2) {
now the if(1 % 2) can be omitted.

swilly22 · 2019-07-29T11:24:03Z

src/ast/ast_validations.c

+static AST_Validation _ValidateInlinedPropertiesOnPath(const cypher_astnode_t *path, char **reason) {
+	uint path_len = cypher_ast_pattern_path_nelements(path);
+	// Check all entities on the path
+	for (uint i = 0; i < path_len; i ++) {


might be better to divide this into two for loops, one which will go over nodes and another which will scan edges
this way, we won't be messing with the branch predictor and save computing of modulo

swilly22 · 2019-07-29T11:27:23Z

src/ast/ast_validations.c

+			goto cleanup;
+		}
+		// Validate that inlined properties do not use parameters
+        res = _Validate_CREATE_Clause_Properties(create_clauses[i], reason);


Why do we pass reason here but not for _Validate_CREATE_Clause_TypedRelations ?

When _Validate_CREATE_Clause_TypedRelations fails, we always emit the error:
"Exactly one relationship type must be specified for CREATE"
Which is similar to Neo's response, as there's not much to provide the user with in the case of a query like CREATE (:a)-[]->(:b).

For property validation, we can at least differentiate between whether the user is trying to use a parameter or a non-constant value, and as such can give a slightly more specific erro.

swilly22 · 2019-07-29T11:28:11Z

src/ast/ast_validations.c

@@ -135,18 +135,18 @@ static AST_Validation _AST_ValidateReferredFunctions(TrieMap *referred_functions
    return res;
 }

-static AST_Validation _MATCH_Clause_Validate_Range(const cypher_astnode_t *node, char **reason) {
+static AST_Validation _MATCH_Clause_Validate_Range(const cypher_astnode_t *range, char **reason) {


So many validations...

Ick, I know. We'll be able to remove some of them over time, at least, but to some degree there's no way around it unless we want to write them into the parser.

swilly22 · 2019-07-29T11:34:58Z

src/arithmetic/algebraic_expression.c

+            if (reltype_id == GRAPH_UNKNOWN_RELATION) {
+                // No matrix to add
+                continue;
+            } else {


No need for this else in case reltype_id == GRAPH_UNKNOWN_RELATION we'll jump to the beginning of the loop

K-Jo · 2019-07-29T23:25:27Z

@jeffreylovitz HERO

* Fix steps for ordered and unordered results * Merge TCK updates, remove modified lines * Single-commit parser updates * Post-rebase fixes * Enable additional code paths; delete unused * WIP unit test updates * Edge reference fixes * Use array_del macros * Improve QG interfaces and linking * Contd * Fix complex transpositions and RETURN * * Update libcypher-parser version * Adjust query buffer size to 1mb * Update CircleCI cache * Force parser header inclusion * PR improvements * Fix Clang compiler warnings * include sys/types for uint * build deps before source * Fix memory leaks * AST interface improvements * Clean up execution plan * Fix memory leaks * PR fixes * Remove extraneous interfaces * Remove extraneous interfaces, part 2 * Post-rebase fixes * Remove repeated unlock * Various improvements * Continued cleanup * Add whitelist to block unsupported queries * Only build Cypher whitelist once * Check for empty query in PROFILE commands * Build Cypher whitelist on first query * Only emit first AST error * Fix 0-initialization in Record_Extend * Improve documentation * Various improvements * Standardize AST validation across different commands * Add whitelist for AST operators * Build modifies arrays for create, update, delete ops * Fix invalid multi-relation specification * Properly scope WITH-introduced aliases * Add TCK steps for ordered and unordered result sets * Improved validations * Format files with astyle * skip ORDER BY projection TCK test * Fixes * Full parity with TCK features * Add skips to TCK * Update astyle exclusions, ignore exclude errors * Fix filter operation placement on WITH-projected entities * Combine data-producing WITH streams using Cartesian Product instead of Apply * PR fixes * Improve formatting of header file

jeffreylovitz changed the title ~~Replace yacc/lemon parser with libcypher-parser~~ Replace flex/lemon parser with libcypher-parser May 15, 2019

jeffreylovitz requested a review from swilly22 May 15, 2019 18:02

jeffreylovitz self-assigned this May 15, 2019

jeffreylovitz added the enhancement label May 15, 2019

jeffreylovitz added this to In progress in RedisGraph 2.0 via automation May 15, 2019

jeffreylovitz mentioned this pull request May 15, 2019

Replace current parser with libcypher parser #454

Closed

jeffreylovitz changed the base branch from libcypher-parser-dependency to master May 19, 2019 06:38

jeffreylovitz force-pushed the libcypher-parser-contd branch from 1bb7acb to cd85936 Compare May 22, 2019 06:48

jeffreylovitz force-pushed the libcypher-parser-contd branch 2 times, most recently from 8a8d835 to d62858e Compare June 3, 2019 16:00

jeffreylovitz force-pushed the libcypher-parser-contd branch 4 times, most recently from dd40d61 to 573a7f0 Compare July 8, 2019 17:28

swilly22 reviewed Jul 9, 2019

View reviewed changes

swilly22 reviewed Jul 10, 2019

View reviewed changes

swilly22 requested changes Jul 11, 2019

View reviewed changes

swilly22 requested changes Jul 15, 2019

View reviewed changes

jeffreylovitz force-pushed the libcypher-parser-contd branch 3 times, most recently from c9d8215 to 2cfc1cf Compare July 16, 2019 14:30

swilly22 requested changes Jul 18, 2019

View reviewed changes

jeffreylovitz force-pushed the libcypher-parser-contd branch 3 times, most recently from 3b46a01 to 1fe8f2f Compare July 18, 2019 20:08

jeffreylovitz added 5 commits July 24, 2019 09:01

Fix steps for ordered and unordered results

c921e11

Merge TCK updates, remove modified lines

84e38e8

Single-commit parser updates

8f0e573

Post-rebase fixes

828bbbc

Enable additional code paths; delete unused

6c05c87

jeffreylovitz added 9 commits July 24, 2019 10:03

Add whitelist for AST operators

270cbc0

Build modifies arrays for create, update, delete ops

f6173d5

Fix invalid multi-relation specification

6789eeb

Properly scope WITH-introduced aliases

ca446e5

Add TCK steps for ordered and unordered result sets

4cba7b1

Improved validations

ffc03fb

Format files with astyle

6869046

Merge remote-tracking branch 'origin/master' into libcypher-parser-contd

3fda5ae

Merge branch 'tck-updates' into libcypher-parser-contd

528bb54

jeffreylovitz force-pushed the libcypher-parser-contd branch from fc38494 to 36e9411 Compare July 24, 2019 14:20

jeffreylovitz added 4 commits July 25, 2019 08:52

skip ORDER BY projection TCK test

d2f4802

Fixes

9e1d9b4

Full parity with TCK features

1a01a0f

Add skips to TCK

597a178

jeffreylovitz force-pushed the libcypher-parser-contd branch from 36e9411 to 597a178 Compare July 26, 2019 17:45

jeffreylovitz added 2 commits July 26, 2019 13:53

Merge remote-tracking branch 'origin/master' into libcypher-parser-contd

46218d5

Update astyle exclusions, ignore exclude errors

5d37581

jeffreylovitz force-pushed the libcypher-parser-contd branch from 4b6aaa8 to 5d37581 Compare July 26, 2019 17:56

jeffreylovitz added 2 commits July 26, 2019 14:33

Fix filter operation placement on WITH-projected entities

eb3da2a

Combine data-producing WITH streams using Cartesian Product instead o…

6d90ef3

…f Apply

swilly22 reviewed Jul 29, 2019

View reviewed changes

jeffreylovitz added 3 commits July 29, 2019 09:52

PR fixes

f47e1aa

Merge remote-tracking branch 'origin/master' into libcypher-parser-contd

09a751b

Improve formatting of header file

557af51

swilly22 approved these changes Jul 29, 2019

View reviewed changes

swilly22 merged commit 6ab1f5f into master Jul 29, 2019

RedisGraph 2.0 automation moved this from In progress to Done Jul 29, 2019

jeffreylovitz deleted the libcypher-parser-contd branch August 2, 2019 13:40

Replace flex/lemon parser with libcypher-parser #488

Replace flex/lemon parser with libcypher-parser #488

Conversation

jeffreylovitz commented May 15, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swilly22 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

K-Jo commented Jul 29, 2019