-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variable length path #127
Variable length path #127
Conversation
src/parser/grammar.y
Outdated
A = New_AST_LinkEntity(B.strval, C.strval, E, N_DIR_UNKNOWN, D); | ||
} | ||
|
||
// Edge with length [alias:*1..3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cypher doesn't support edges that have aliases but no labels in general (a change we implemented about a month ago, I think), which remains true with length specifiers:
MATCH (a)-[alias:*]->(b) RETURN b
> Invalid input '*': expected whitespace or a rel type name
We should, however, support edges with labels and length specifications, like:
MATCH (a)-[:KNOWS*..2]->(b) RETURN b
The Cypher Query Language Reference (version 9, pages 11-12) doesn't explicitly mention edges with labels, aliases, and length specifications, but they are supported in Neo.
MATCH (a)-[alias:KNOWS*]->(b) RETURN b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct, although I don't see a reason why we shouldn't support this
I think its quite powerful to ask (A)-[:*1..20]->(B)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swilly22 As a minor clarification, (A)-[*1..20]->(B)
(returning all matching paths without providing an alias to refer to those paths in other parts of the query) is totally canonical syntax, and we should support it!
Cypher does not explicitly support (A)-[alias:*1..20]->(B)
, and - conjecturing - I suspect that is because it's proven computationally challenging to merge the properties of all matching paths into a single edge alias that can be filtered, returned, and so forth. I can't think of any syntactical reason for this constraint other than that it taxes the system, especially given that Cypher provides support for multiple explicit edge labels with syntax like (a)-[:KNOWS|HAS_MET)->(b)
.
I have no objection to going beyond the language specification, but providing support for edge aliases that don't correspond to specific labels will introduce a fair few cases we'll need to add tooling for. As an example, if one wants to filter on a property found along a path, we'll need to find and track the properties of all paths matching the query, as opposed to the masking approach suggested in this PR, which at each hop only considers edges that can modify the paths being built. I'm specifically referring to the _VarLenTraverseOpHop
logic that builds and employs a mask to ignore traversals that, on a boolean level, have been made redundant, but I expect that is not the only edge case we'd need to compensate for.
This is just food for thought - I'm on board with providing more functionality whenever someone might find it useful! Let me know what you think.
src/parser/grammar.y
Outdated
} | ||
|
||
// Edge with length [alias:*1..3] | ||
edge(A) ::= LEFT_BRACKET UQSTRING(B) COLON edgeLength(C) properties(D) RIGHT_BRACKET . { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edgeLength
should not be preceded by a colon.
src/parser/grammar.y
Outdated
} | ||
|
||
// Edge with length [:*1..3] | ||
edge(A) ::= LEFT_BRACKET COLON edgeLength(B) properties(C) RIGHT_BRACKET . { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edgeLength
should not be preceded by a colon.
src/parser/grammar.y
Outdated
} | ||
|
||
// *minHops..maxHops | ||
edgeLength(A) ::= MUL INTEGER(B) DOT DOT INTEGER(C). { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DOT DOT
logic isn't working for me:
127.0.0.1:6379> GRAPH.QUERY social "MATCH (a)-[:*1..2]->(b) RETURN a, b"
(error) Syntax error at offset 18 near '.2'
As a guess, it may be trying to parse a double instead of an int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
introducing DOTDOT
src/parser/grammar.y
Outdated
edgeLength(A) ::= MUL INTEGER(B) DOT DOT INTEGER(C). { | ||
A = New_AST_LinkLength(B.intval, C.intval); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should also be legal to omit a bound, as in:
(a)-[*..3]->(b)
or
(a)-[*3..]->(b)
src/parser/ast_common.h
Outdated
@@ -35,13 +35,20 @@ typedef struct { | |||
char *property; | |||
} AST_Variable; | |||
|
|||
typedef struct { | |||
unsigned int minHops; | |||
float maxHops; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing we're using a float for maxHops
to provide simpler support for *
(infinite) length paths? That makes sense, and is probably fine, but I wouldn't mind defining UINT_MAX as an infinity value either.
The smallest integer value that can't be represented as a float is 2^24 + 1, which is not a path length we reasonably need to be concerned about, so it's not a big deal - your call!
NULL, // Accumulator | ||
GxB_LOR_LAND_BOOL, // Semiring | ||
op->A, // First matrix | ||
op->A, // Second matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this does! We use the same matrix for both inputs?
GrB_free(&op->mask); | ||
GrB_Matrix_new(&op->mask, GrB_BOOL, Graph_NodeCount(op->g), Graph_NodeCount(op->g)); | ||
GrB_Matrix_dup(&op->mask, op->A); | ||
TuplesIter_clear(op->it); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this accomplish? The iterator will return DEPLETED on its next invocation, but it does not actually get invoked before the TuplesIter_reuse
call on 107.
tests/unit/test_all_paths.cpp
Outdated
#endif | ||
|
||
#include "../../src/algorithms/algorithms.h" | ||
#include "../../src/util/arr.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to move the function definitions in arr.h
into a separate .c file before these tests will compile correctly for me/Circle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already solved, simple casting.
src/parser/grammar.y
Outdated
edge(A) ::= LEFT_BRACKET UQSTRING(B) COLON UQSTRING(C) edgeLength(D) properties(E) RIGHT_BRACKET . { | ||
A = New_AST_LinkEntity(B.strval, C.strval, E, N_DIR_UNKNOWN, D); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These rules look good! We still need to add one for edges that just have length specifiers, and no alias or label.
@@ -37,6 +38,13 @@ AST_LinkEntity* New_AST_LinkEntity(char *alias, char *label, Vector *properties, | |||
return le; | |||
} | |||
|
|||
AST_LinkLength* New_AST_LinkLength(unsigned int minHops, unsigned int maxHops) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good here or elsewhere to verify that minHops <= maxHops
src/graph/graph.h
Outdated
@@ -101,11 +101,19 @@ void Graph_GetEdgesConnectingNodes ( | |||
Vector *edges | |||
); | |||
|
|||
// Get both incoming and outgoing node edges. | |||
// Get node edges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment belongs above Graph_GetNodeEdges
@@ -617,9 +623,15 @@ GrB_Matrix Graph_GetLabel(const Graph *g, int label_idx) { | |||
} | |||
|
|||
GrB_Matrix Graph_GetRelation(const Graph *g, int relation_idx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t relation_idx
while(array_len(op->paths) == 0) { | ||
res = child->consume(child, graph); | ||
if(res != OP_OK) return res; | ||
printf("op->maxHops: %d\n", op->maxHops); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is of course just a debug line, but the print specifier should be %u
src/algorithms/path.c
Outdated
if(pathLen == 0) return NULL; | ||
|
||
Edge *e = p[pathLen-1]; | ||
array_pop(p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines are equivalent to:
Edge *e = array_pop(p);
Or can even return array_pop(p)
src/algorithms/path.c
Outdated
|
||
Path Path_append(Path p, Edge *e) { | ||
p = array_append(p, e); | ||
return p; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can simply return array_append
src/algorithms/path.c
Outdated
|
||
Path Path_new(size_t len) { | ||
Path p = array_new(Edge*, len); | ||
return p; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can simply return array_new
Path_pop(*path); | ||
|
||
// Mark edge as unvisited. | ||
GrB_Matrix_setElement_BOOL(visited, false, neighborID, frontierID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably necessitates a zero vector or GxB_NONZERO
reduction later on, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, never mind - I suppose it does not so long as we never try to iterate over visited with a TuplesIter or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
visited, won't be iterated, it only serves as a marker for checking if an edge been traversed.
src/graph/graph.c
Outdated
@@ -392,43 +392,49 @@ void Graph_GetEdgesConnectingNodes(const Graph *g, NodeID src, NodeID dest, int | |||
|
|||
/* Retrieves all either incoming or outgoing edges | |||
* to/from given node N, depending on given direction. */ | |||
void Graph_GetNodeEdges(const Graph *g, const Node *n, Vector *edges) { | |||
void Graph_GetNodeEdges(const Graph *g, const Node *n, Vector *edges, GRAPH_EDGE_DIR dir, int edgeType) { | |||
assert(g && n && edges); | |||
NodeID srcNodeID; | |||
NodeID destNodeID; | |||
TuplesIter *tupleIter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TuplesIter *tupleIter = NULL
will suppress a compiler warning for me.
return array_len(p); | ||
} | ||
|
||
Path Path_clone(const Path p) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me like we could just perform a memcpy on the array object to accomplish this in one instruction (arr.h provides an array_sizeof
macro already). That's not particularly important, though, we can just keep it in mind for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, although this involves tinkering with arr.h internals, if we'll find out that this is a bottleneck then we could revisit this.
src/algorithms/path.c
Outdated
|
||
void Path_print(Path p) { | ||
assert(p); | ||
Edge *e; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edge *e = NULL
will suppress a compiler warning for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! There are two spots where I get "x may be used uninitialized in this function" compiler warnings from GCC - if you could NULL-initialize those, that would resolve the problem.
After that, 👍
009af9e
to
8cbc464
Compare
* variable length must be preceded by a label * implement all paths * isolate variable length algebraic expression
No description provided.