-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing Cypher statements using Schema information #235
Optimizing Cypher statements using Schema information #235
Conversation
src/BuildCypherLib.cmake
Outdated
@@ -61,6 +61,8 @@ set(LGRAPH_CYPHER_SRC # find cypher/ -name "*.cpp" | sort | |||
cypher/procedure/procedure.cpp | |||
cypher/resultset/record.cpp | |||
cypher/monitor/monitor_manager.cpp | |||
cypher/execution_plan/rewrite/schema_rewrite.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
整体上该工作应该做成一个optimization pass,参考 src/cypher/execution_plan/optimization/reduce_count.h
@@ -29,6 +29,9 @@ | |||
#include "optimization/pass_manager.h" | |||
#include "procedure/procedure.h" | |||
#include "validation/check_graph.h" | |||
#include "rewrite/schema_rewrite.h" | |||
|
|||
#define IsSchemaRewrite true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
通过Gate()函数作为该优化的开关
|
||
class Edge; | ||
|
||
class Node { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
能否复用 src/cypher/graph/node.h ?
@@ -40,6 +47,7 @@ class ExecutionPlan { | |||
ResultInfo _result_info; | |||
// query parts local member | |||
std::vector<PatternGraph> _pattern_graphs; | |||
lgraph::SchemaInfo *_schema_info = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_schema_info不要做成ExecutionPlan的成员,在优化时实时构建
plan = std::make_shared<ExecutionPlan>(); | ||
plan->Build(visitor.GetQuery(), visitor.CommandType()); | ||
// 在生成执行计划时获取Schema信息 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要在这里构建schemaInfo,在优化时构建
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我们之前也有想过,但是因为TuGraph在生成执行计划时并没有传入ctx信息:plan->Build(visitor.GetQuery(), visitor.CommandType());
这就导致我们在执行计划的生成时没有办法获取Schema信息,所以我们之前在执行计划生成前将Schema信息传入,我们可以尽量不修改源代码,也许可以在执行计划生成时传入ctx信息:plan->Build(visitor.GetQuery(), visitor.CommandType(),ctx); 然后在优化时传入ctx信息:pass_manager.ExecutePasses(ctx);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要在这里构建schemaInfo,在优化时构建
目前更改了SchemaInfo的构建代码,更改为在优化时构建,但是由于需要ctx提供Schema信息,所以我们更改了ExecutionPlan的Build方法,增添了cypher::RTContext参数
@@ -109,6 +117,10 @@ class ExecutionPlan { | |||
|
|||
const ResultInfo &GetResultInfo() const; | |||
|
|||
void SetSchemaInfo(lgraph::SchemaInfo *schema_info) { _schema_info = schema_info; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
优化要尽量做成无侵入的,就是说尽量不改动已有代码,而是新增代码
…ng/tugraph-db into opt_rewrite_with_schema_inference
src/cypher/graph/node.h
Outdated
@@ -57,6 +57,8 @@ class Node { | |||
|
|||
const std::string &Label() const; | |||
|
|||
void SetLabel(std::string schema_label) { label_ = schema_label; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void SetLabel(const std::string& label)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
收到,已经更改
src/cypher/graph/relationship.h
Outdated
@@ -60,6 +60,8 @@ class Relationship { | |||
|
|||
const std::set<std::string> &Types() const; | |||
|
|||
void SetTypes(std::set<std::string> types) { types_ = types; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void SetTypes(const std::setstd::string& types)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
收到,已经更改
@spasserby 看看有什么评论? |
|
||
void Graph::PrintGraph() { | ||
for (Node node : m_nodes) { | ||
std::cout << "Node id:" << node.m_id << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不使用std::cout, 使用FMA_LOG(), FMA_DBG()
FMA_DBG() << "Node id:" << node.m_id;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不使用std::cout, 使用FMA_LOG(), FMA_DBG() FMA_DBG() << "Node id:" << node.m_id;
收到 已经更改
@@ -1,4 +1,4 @@ | |||
/** | |||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
补充ut: 把所有测例都添加到test_cypher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
补充ut: 把所有测例都添加到test_cypher
test_cypher使用的yago图没有加edge_constraint,所以我们新加了一个有约束的yago图,更改了graph_factory.h、cypher_plan_validate.json、test_cypher_plan.cpp、test_cypher.cpp
@@ -0,0 +1,24 @@ | |||
#pragma once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
能否复用 src/cypher/graph/node.h ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
能否复用 src/cypher/graph/node.h ?
这部分可能改动较大 不知道可不可以暂时先不改动
@@ -31,6 +31,13 @@ class StateMachine; | |||
|
|||
namespace cypher { | |||
|
|||
// key为pattern graph中的点id,value为label值 | |||
typedef std::map<NodeID, std::string> SchemaNodeMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SchemaNodeMap, SchemaRelpMap, SchemaGraphMap可以都放到schema_rewrite.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SchemaNodeMap, SchemaRelpMap, SchemaGraphMap可以都放到schema_rewrite.h
收到 已经更改
#include "node.h" | ||
#include <vector> | ||
#include "parser/data_typedef.h" | ||
namespace rewrite_cypher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespace rewrite_cypher是不是可以改成cypher::rewrite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespace rewrite_cypher是不是可以改成cypher::rewrite
收到 已经更改
|
||
bool Gate() override { return true; } | ||
|
||
int Execute(ExecutionPlan *plan) override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Execute阶段获取schema的有事务问题,这个我需要再考虑怎么修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seijiang @wangtao9
这里要保证schema在执行阶段是不变的
考虑:
schema的一致性依赖于std::make_uniquelgraph::AccessControlledDB(ctx->galaxy->OpenGraph(ctx->user, ctx->graph))
AccessControlledDB会在它生命周期内持有LightningGraph的HoldReadLock(meta_lock),对schema的修改会HoldWriteLock(meta_lock),因此在它的生命周期内schema是安全的
所以:
在optimization Execute阶段和execution_plan Execute阶段都会做OpenGraph
应该保证AccessControlledDB一直存在,而非再打开
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seijiang @wangtao9 这里要保证schema在执行阶段是不变的
考虑:
schema的一致性依赖于std::make_uniquelgraph::AccessControlledDB(ctx->galaxy->OpenGraph(ctx->user, ctx->graph))
AccessControlledDB会在它生命周期内持有LightningGraph的HoldReadLock(meta_lock),对schema的修改会HoldWriteLock(meta_lock),因此在它的生命周期内schema是安全的
所以:
在optimization Execute阶段和execution_plan Execute阶段都会做OpenGraph 应该保证AccessControlledDB一直存在,而非再打开
- 1.在opt_rewrite_with_schema_inference.h不重置AccessControlledDB
- 2.更改runtime_context.h的Check,不检查ac_db_是否不为空(不知道会不会有问题)
- 3.在execution_plan Execute阶段判断ctx->ac_db_是否为空,为空则创建ctx->ac_db_,不为空则不重复创建
@@ -102,6 +102,8 @@ class AllNodeScan : public OpBase { | |||
|
|||
Node *GetNode() const { return node_; } | |||
|
|||
const SymbolTable *GetSymbolTable() { return sym_tab_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const SymbolTable *GetSymbolTable() { return sym_tab_; } | |
const SymbolTable *SymTab() const { return sym_tab_; } |
@@ -110,6 +110,8 @@ class AllNodeScanDynamic : public OpBase { | |||
|
|||
Node *GetNode() const { return node_; } | |||
|
|||
const SymbolTable *GetSymbolTable() const { return sym_tab_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const SymbolTable *GetSymbolTable() const { return sym_tab_; } | |
const SymbolTable *SymTab() const { return sym_tab_; } |
@@ -1323,8 +1324,10 @@ int ExecutionPlan::Execute(RTContext *ctx) { | |||
if (ctx->graph_.empty()) { | |||
ctx->ac_db_.reset(nullptr); | |||
} else { | |||
ctx->ac_db_ = std::make_unique<lgraph::AccessControlledDB>( | |||
ctx->galaxy_->OpenGraph(ctx->user_, ctx->graph_)); | |||
if (!ctx->ac_db_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加上注释,在什么情况下、在哪里已经设置了ac_db_
msg = "Access controlled db not empty"; | ||
return false; | ||
} | ||
// if (ac_db_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
删除无用代码,加一段说明 为什么没哟check ac_db
test/graph_factory.h
Outdated
CreateCsvFiles(data); | ||
} | ||
|
||
static void WriteYagoFilesWithConstraints() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以把constraint直接加在WriteYagoFiles中
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以把constraint直接加在WriteYagoFiles中
这个主要是因为test_cypher.cpp中有非常多不遵循yago图的Schema约束的cypher语句,比如这种"MATCH (a:Film),(b:City) CREATE (a)-[:BORN_IN]->(b)",所以为了不破坏原本的test,我只能新加了一个有约束的图(之前本来想在test_cypher.cpp用一些Cypher语句添加边约束,但是因为这可能会导致大量数据的删除,TuGraph就禁止了这种操作)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
把 static const std::map<std::string, std::string> yago_data 提到WriteYagoFiles外面,然后let yago_data_with_constraints = yago_data
最后修改yago_data_with_constraints["yago.conf"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
把 static const std::map<std::string, std::string> yago_data 提到WriteYagoFiles外面,然后let yago_data_with_constraints = yago_data
最后修改yago_data_with_constraints["yago.conf"]
好的 已经修改
@@ -362,4 +591,22 @@ Liam Neeson,Batman Begins,Henri Ducard | |||
import_v3::Importer importer(config); | |||
importer.DoImportOffline(); | |||
} | |||
|
|||
// add edge constraints for yago | |||
static void create_yago_with_constraints(const std::string& dir = "./lgraph_db") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议删除
test/graph_factory.h
Outdated
yago_conf.insert(n + original_strings[i].length(), constraints_strings[i]); | ||
} | ||
auto yago_data_with_constraints = yago_data; | ||
yago_data_with_constraints.at("yago.conf") = yago_conf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里直接赋新值就可以了:
yago_data_with_constraints.at("yago.conf") =
R"(
{
"schema": [
{
"label" : "Person",
"type" : "VERTEX",
"primary" : "name",
"properties" : [
{"name" : "name", "type":"STRING"},
{"name" : "birthyear", "type":"INT16", "optional":true}
]
},
{
"label" : "City",
"type" : "VERTEX",
"primary" : "name",
"properties" : [
{"name": "name", "type":"STRING"}
]
},
{
"label" : "Film",
"primary": "title",
"type" : "VERTEX",
"properties" : [
{"name": "title", "type":"STRING"}
]
},
{
"label" : "HAS_CHILD",
"type" : "EDGE",
"constraints": [["Person", "Person"]]
},
{
"label" : "MARRIED",
"type" : "EDGE",
"constraints": [["Person", "Person"]]
},
{
"label" : "BORN_IN",
"type" : "EDGE",
"properties" : [
{"name" : "weight", "type":"FLOAT", "optional":true}
],
"constraints": [["Person", "City"]]
},
{
"label" : "DIRECTED",
"type" : "EDGE",
"constraints": [["Person", "Film"]]
},
{
"label" : "WROTE_MUSIC_FOR",
"type" : "EDGE",
"constraints": [["Person", "Film"]]
},
{
"label" : "ACTED_IN",
"type" : "EDGE",
"properties" : [
{"name" : "charactername", "type":"STRING"}
],
"constraints": [["Person", "Film"]]
}
],
"files" : [
{
"path" : "person.csv",
"format" : "CSV",
"label" : "Person",
"columns" : ["name","birthyear"]
},
{
"path" : "city.csv",
"format" : "CSV",
"label" : "City",
"columns" : ["name"]
},
{
"path" : "film.csv",
"format" : "CSV",
"label" : "Film",
"columns" : ["title"]
},
{
"path" : "has_child.csv",
"format" : "CSV",
"label" : "HAS_CHILD",
"SRC_ID" : "Person",
"DST_ID" : "Person",
"columns" : ["SRC_ID","DST_ID"]
},
{
"path" : "married.csv",
"format" : "CSV",
"label" : "MARRIED",
"SRC_ID" : "Person",
"DST_ID" : "Person",
"columns" : ["SRC_ID","DST_ID"]
},
{
"path" : "born_in.csv",
"format" : "CSV",
"label" : "BORN_IN",
"SRC_ID" : "Person",
"DST_ID" : "City",
"columns" : ["SRC_ID","DST_ID","weight"]
},
{
"path" : "directed.csv",
"format" : "CSV",
"label" : "DIRECTED",
"SRC_ID" : "Person",
"DST_ID" : "Film",
"columns" : ["SRC_ID","DST_ID"]
},
{
"path" : "wrote.csv",
"format" : "CSV",
"label" : "WROTE_MUSIC_FOR",
"SRC_ID" : "Person",
"DST_ID" : "Film",
"columns" : ["SRC_ID","DST_ID"]
},
{
"path" : "acted_in.csv",
"format" : "CSV",
"label" : "ACTED_IN",
"SRC_ID" : "Person",
"DST_ID" : "Film",
"columns" : ["SRC_ID","DST_ID","charactername"]
}
]
}
)";
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里直接赋新值就可以了:
yago_data_with_constraints.at("yago.conf") = R"( { "schema": [ { "label" : "Person", "type" : "VERTEX", "primary" : "name", "properties" : [ {"name" : "name", "type":"STRING"}, {"name" : "birthyear", "type":"INT16", "optional":true} ] }, { "label" : "City", "type" : "VERTEX", "primary" : "name", "properties" : [ {"name": "name", "type":"STRING"} ] }, { "label" : "Film", "primary": "title", "type" : "VERTEX", "properties" : [ {"name": "title", "type":"STRING"} ] }, { "label" : "HAS_CHILD", "type" : "EDGE", "constraints": [["Person", "Person"]] }, { "label" : "MARRIED", "type" : "EDGE", "constraints": [["Person", "Person"]] }, { "label" : "BORN_IN", "type" : "EDGE", "properties" : [ {"name" : "weight", "type":"FLOAT", "optional":true} ], "constraints": [["Person", "City"]] }, { "label" : "DIRECTED", "type" : "EDGE", "constraints": [["Person", "Film"]] }, { "label" : "WROTE_MUSIC_FOR", "type" : "EDGE", "constraints": [["Person", "Film"]] }, { "label" : "ACTED_IN", "type" : "EDGE", "properties" : [ {"name" : "charactername", "type":"STRING"} ], "constraints": [["Person", "Film"]] } ], "files" : [ { "path" : "person.csv", "format" : "CSV", "label" : "Person", "columns" : ["name","birthyear"] }, { "path" : "city.csv", "format" : "CSV", "label" : "City", "columns" : ["name"] }, { "path" : "film.csv", "format" : "CSV", "label" : "Film", "columns" : ["title"] }, { "path" : "has_child.csv", "format" : "CSV", "label" : "HAS_CHILD", "SRC_ID" : "Person", "DST_ID" : "Person", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "married.csv", "format" : "CSV", "label" : "MARRIED", "SRC_ID" : "Person", "DST_ID" : "Person", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "born_in.csv", "format" : "CSV", "label" : "BORN_IN", "SRC_ID" : "Person", "DST_ID" : "City", "columns" : ["SRC_ID","DST_ID","weight"] }, { "path" : "directed.csv", "format" : "CSV", "label" : "DIRECTED", "SRC_ID" : "Person", "DST_ID" : "Film", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "wrote.csv", "format" : "CSV", "label" : "WROTE_MUSIC_FOR", "SRC_ID" : "Person", "DST_ID" : "Film", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "acted_in.csv", "format" : "CSV", "label" : "ACTED_IN", "SRC_ID" : "Person", "DST_ID" : "Film", "columns" : ["SRC_ID","DST_ID","charactername"] } ] } )";
我一开始是想直接赋新值的,想着这样太长了,而且如果要改yago_data的话两个都得改,就又重新写了,那我重新改回来吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里直接赋新值就可以了:
yago_data_with_constraints.at("yago.conf") = R"( { "schema": [ { "label" : "Person", "type" : "VERTEX", "primary" : "name", "properties" : [ {"name" : "name", "type":"STRING"}, {"name" : "birthyear", "type":"INT16", "optional":true} ] }, { "label" : "City", "type" : "VERTEX", "primary" : "name", "properties" : [ {"name": "name", "type":"STRING"} ] }, { "label" : "Film", "primary": "title", "type" : "VERTEX", "properties" : [ {"name": "title", "type":"STRING"} ] }, { "label" : "HAS_CHILD", "type" : "EDGE", "constraints": [["Person", "Person"]] }, { "label" : "MARRIED", "type" : "EDGE", "constraints": [["Person", "Person"]] }, { "label" : "BORN_IN", "type" : "EDGE", "properties" : [ {"name" : "weight", "type":"FLOAT", "optional":true} ], "constraints": [["Person", "City"]] }, { "label" : "DIRECTED", "type" : "EDGE", "constraints": [["Person", "Film"]] }, { "label" : "WROTE_MUSIC_FOR", "type" : "EDGE", "constraints": [["Person", "Film"]] }, { "label" : "ACTED_IN", "type" : "EDGE", "properties" : [ {"name" : "charactername", "type":"STRING"} ], "constraints": [["Person", "Film"]] } ], "files" : [ { "path" : "person.csv", "format" : "CSV", "label" : "Person", "columns" : ["name","birthyear"] }, { "path" : "city.csv", "format" : "CSV", "label" : "City", "columns" : ["name"] }, { "path" : "film.csv", "format" : "CSV", "label" : "Film", "columns" : ["title"] }, { "path" : "has_child.csv", "format" : "CSV", "label" : "HAS_CHILD", "SRC_ID" : "Person", "DST_ID" : "Person", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "married.csv", "format" : "CSV", "label" : "MARRIED", "SRC_ID" : "Person", "DST_ID" : "Person", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "born_in.csv", "format" : "CSV", "label" : "BORN_IN", "SRC_ID" : "Person", "DST_ID" : "City", "columns" : ["SRC_ID","DST_ID","weight"] }, { "path" : "directed.csv", "format" : "CSV", "label" : "DIRECTED", "SRC_ID" : "Person", "DST_ID" : "Film", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "wrote.csv", "format" : "CSV", "label" : "WROTE_MUSIC_FOR", "SRC_ID" : "Person", "DST_ID" : "Film", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "acted_in.csv", "format" : "CSV", "label" : "ACTED_IN", "SRC_ID" : "Person", "DST_ID" : "Film", "columns" : ["SRC_ID","DST_ID","charactername"] } ] } )";
已修改
@spasserby 还有更多评论吗? |
Thank you for your contribution! @seijiang @spasserby |
Contribution
General Introduction
Schema-Guided optimization in TuGraph
m..n',
*..n, *m.. => path pattern with fixed-length edgesSchema-Guided optimization for other databases
optimize
at the beginning):Specific code introduction
Next, we will only introduce the portion implemented in this pull request, i.e., 'element variables with no labels => element variables with labels' in Schema-Guided optimization in TuGraph.
We divided our codes into two parts:
Optimizations added to the original TuGraph
Finding all feasible graphs using Schema information
Correctness verification
Performance comparison