Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable length path #127

Merged
merged 4 commits into from
Sep 20, 2018
Merged

Variable length path #127

merged 4 commits into from
Sep 20, 2018

Conversation

swilly22
Copy link
Contributor

No description provided.

@swilly22 swilly22 added this to the GA milestone Sep 13, 2018
@swilly22 swilly22 self-assigned this Sep 13, 2018
A = New_AST_LinkEntity(B.strval, C.strval, E, N_DIR_UNKNOWN, D);
}

// Edge with length [alias:*1..3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cypher doesn't support edges that have aliases but no labels in general (a change we implemented about a month ago, I think), which remains true with length specifiers:

MATCH (a)-[alias:*]->(b) RETURN b
> Invalid input '*': expected whitespace or a rel type name

We should, however, support edges with labels and length specifications, like:
MATCH (a)-[:KNOWS*..2]->(b) RETURN b

The Cypher Query Language Reference (version 9, pages 11-12) doesn't explicitly mention edges with labels, aliases, and length specifications, but they are supported in Neo.
MATCH (a)-[alias:KNOWS*]->(b) RETURN b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, although I don't see a reason why we shouldn't support this
I think its quite powerful to ask (A)-[:*1..20]->(B)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swilly22 As a minor clarification, (A)-[*1..20]->(B) (returning all matching paths without providing an alias to refer to those paths in other parts of the query) is totally canonical syntax, and we should support it!

Cypher does not explicitly support (A)-[alias:*1..20]->(B), and - conjecturing - I suspect that is because it's proven computationally challenging to merge the properties of all matching paths into a single edge alias that can be filtered, returned, and so forth. I can't think of any syntactical reason for this constraint other than that it taxes the system, especially given that Cypher provides support for multiple explicit edge labels with syntax like (a)-[:KNOWS|HAS_MET)->(b).

I have no objection to going beyond the language specification, but providing support for edge aliases that don't correspond to specific labels will introduce a fair few cases we'll need to add tooling for. As an example, if one wants to filter on a property found along a path, we'll need to find and track the properties of all paths matching the query, as opposed to the masking approach suggested in this PR, which at each hop only considers edges that can modify the paths being built. I'm specifically referring to the _VarLenTraverseOpHop logic that builds and employs a mask to ignore traversals that, on a boolean level, have been made redundant, but I expect that is not the only edge case we'd need to compensate for.

This is just food for thought - I'm on board with providing more functionality whenever someone might find it useful! Let me know what you think.

}

// Edge with length [alias:*1..3]
edge(A) ::= LEFT_BRACKET UQSTRING(B) COLON edgeLength(C) properties(D) RIGHT_BRACKET . {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edgeLength should not be preceded by a colon.

}

// Edge with length [:*1..3]
edge(A) ::= LEFT_BRACKET COLON edgeLength(B) properties(C) RIGHT_BRACKET . {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edgeLength should not be preceded by a colon.

}

// *minHops..maxHops
edgeLength(A) ::= MUL INTEGER(B) DOT DOT INTEGER(C). {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DOT DOT logic isn't working for me:

127.0.0.1:6379> GRAPH.QUERY social "MATCH (a)-[:*1..2]->(b) RETURN a, b"
(error) Syntax error at offset 18 near '.2'

As a guess, it may be trying to parse a double instead of an int?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introducing DOTDOT

edgeLength(A) ::= MUL INTEGER(B) DOT DOT INTEGER(C). {
A = New_AST_LinkLength(B.intval, C.intval);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should also be legal to omit a bound, as in:
(a)-[*..3]->(b)
or
(a)-[*3..]->(b)

@@ -35,13 +35,20 @@ typedef struct {
char *property;
} AST_Variable;

typedef struct {
unsigned int minHops;
float maxHops;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing we're using a float for maxHops to provide simpler support for * (infinite) length paths? That makes sense, and is probably fine, but I wouldn't mind defining UINT_MAX as an infinity value either.

The smallest integer value that can't be represented as a float is 2^24 + 1, which is not a path length we reasonably need to be concerned about, so it's not a big deal - your call!

NULL, // Accumulator
GxB_LOR_LAND_BOOL, // Semiring
op->A, // First matrix
op->A, // Second matrix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this does! We use the same matrix for both inputs?

GrB_free(&op->mask);
GrB_Matrix_new(&op->mask, GrB_BOOL, Graph_NodeCount(op->g), Graph_NodeCount(op->g));
GrB_Matrix_dup(&op->mask, op->A);
TuplesIter_clear(op->it);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this accomplish? The iterator will return DEPLETED on its next invocation, but it does not actually get invoked before the TuplesIter_reuse call on 107.

#endif

#include "../../src/algorithms/algorithms.h"
#include "../../src/util/arr.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to move the function definitions in arr.h into a separate .c file before these tests will compile correctly for me/Circle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already solved, simple casting.

edge(A) ::= LEFT_BRACKET UQSTRING(B) COLON UQSTRING(C) edgeLength(D) properties(E) RIGHT_BRACKET . {
A = New_AST_LinkEntity(B.strval, C.strval, E, N_DIR_UNKNOWN, D);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These rules look good! We still need to add one for edges that just have length specifiers, and no alias or label.

@@ -37,6 +38,13 @@ AST_LinkEntity* New_AST_LinkEntity(char *alias, char *label, Vector *properties,
return le;
}

AST_LinkLength* New_AST_LinkLength(unsigned int minHops, unsigned int maxHops) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good here or elsewhere to verify that minHops <= maxHops

@@ -101,11 +101,19 @@ void Graph_GetEdgesConnectingNodes (
Vector *edges
);

// Get both incoming and outgoing node edges.
// Get node edges.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment belongs above Graph_GetNodeEdges

@@ -617,9 +623,15 @@ GrB_Matrix Graph_GetLabel(const Graph *g, int label_idx) {
}

GrB_Matrix Graph_GetRelation(const Graph *g, int relation_idx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_t relation_idx

while(array_len(op->paths) == 0) {
res = child->consume(child, graph);
if(res != OP_OK) return res;
printf("op->maxHops: %d\n", op->maxHops);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is of course just a debug line, but the print specifier should be %u

if(pathLen == 0) return NULL;

Edge *e = p[pathLen-1];
array_pop(p);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines are equivalent to:
Edge *e = array_pop(p);
Or can even return array_pop(p)


Path Path_append(Path p, Edge *e) {
p = array_append(p, e);
return p;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can simply return array_append


Path Path_new(size_t len) {
Path p = array_new(Edge*, len);
return p;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can simply return array_new

Path_pop(*path);

// Mark edge as unvisited.
GrB_Matrix_setElement_BOOL(visited, false, neighborID, frontierID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably necessitates a zero vector or GxB_NONZERO reduction later on, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, never mind - I suppose it does not so long as we never try to iterate over visited with a TuplesIter or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

visited, won't be iterated, it only serves as a marker for checking if an edge been traversed.

@swilly22 swilly22 changed the title WIP Variable length path Sep 20, 2018
@@ -392,43 +392,49 @@ void Graph_GetEdgesConnectingNodes(const Graph *g, NodeID src, NodeID dest, int

/* Retrieves all either incoming or outgoing edges
* to/from given node N, depending on given direction. */
void Graph_GetNodeEdges(const Graph *g, const Node *n, Vector *edges) {
void Graph_GetNodeEdges(const Graph *g, const Node *n, Vector *edges, GRAPH_EDGE_DIR dir, int edgeType) {
assert(g && n && edges);
NodeID srcNodeID;
NodeID destNodeID;
TuplesIter *tupleIter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TuplesIter *tupleIter = NULL will suppress a compiler warning for me.

return array_len(p);
}

Path Path_clone(const Path p) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me like we could just perform a memcpy on the array object to accomplish this in one instruction (arr.h provides an array_sizeof macro already). That's not particularly important, though, we can just keep it in mind for the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, although this involves tinkering with arr.h internals, if we'll find out that this is a bottleneck then we could revisit this.


void Path_print(Path p) {
assert(p);
Edge *e;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge *e = NULL will suppress a compiler warning for me.

jeffreylovitz
jeffreylovitz previously approved these changes Sep 20, 2018
Copy link
Contributor

@jeffreylovitz jeffreylovitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! There are two spots where I get "x may be used uninitialized in this function" compiler warnings from GCC - if you could NULL-initialize those, that would resolve the problem.

After that, 👍

@swilly22 swilly22 merged commit 0654705 into master Sep 20, 2018
@swilly22 swilly22 deleted the var_length_expand branch September 20, 2018 14:31
pnxguide pushed a commit to CMU-SPEED/RedisGraph that referenced this pull request Mar 22, 2023
* variable length must be preceded by a label

* implement all paths

* isolate variable length algebraic expression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants