Optimizing Cypher statements using Schema information #235

seijiang · 2023-06-21T09:44:44Z

Contribution

Contributions come from the s4plus-GraphDB group at USTC and are described as follows:
- Contributors: Chaijun Xu, Yunlong Liang, Yu Zhang, Hairong Hu
- Instructor: Prof. Zhang Yu

General Introduction

Schema-Guided optimization in TuGraph

Use Schema information to provide additional information for Cypher execution plan generation to optimize the execution plan. In conjunction with Schema information, specialization of Cypher query patterns with the following characteristics is currently supported:
1. element variables with no labels => element variables with labels
2. path pattern with quantifiers such as m..n', *..n, *m.. => path pattern with fixed-length edges
However, 2. has not yet been submitted to pull request which may be submitted later, but it has been implemented in our own recent Github submission(the core-codes are in src/cypher/execution_plan/optimization/rewrite/ and src/cypher/execution_plan/ops/op_var_len_expand.h). The current pull request is only optimized for the case where there is only one feasible graph. For example, the following Cypher:
```
match p=(n0)-[e0]->(n1)-[e1]->(n2)-[e2]->(m:keyword) return COUNT(p);
```
It can be optimized as follows running under Tugraph's MovieDemo:
```
match p=(n0:user)-[e0:is_friend]->(n1:user)-[e1:rate]->(n2:movie)-[e2:has_keyword]->(m:keyword) return COUNT(p);
```

But for the following Cypher:

match p=(n0)-[e0]->(n1:movie) return COUNT(p);

There are two cases that satisfy:

match p=(n0:person)-[e0:acted_in|directed|produce|write]->(n1:movie) return COUNT(p);
match p=(n0:user)-[e0:rate]->(n1:movie) return COUNT(p);

This situation is not optimized in this commit for the time being

Schema-Guided optimization for other databases

The optimizations mentioned above are implemented at the execution plan level in TuGraph. However, if we need to test them on other graph databases, we require end-to-end optimizations, where input is a Cypher query and output is an optimized Cypher query. It has also not yet been submitted to pull request, but it has been implemented in our own recent Github submission(the core-codes are in src/cypher/parser/parse_tree_to_cypher_visitor.h). For example, the input Cypher (We add optimize at the beginning):
```
optimize match p=(n0)-[e0]->(n1)-[e1]->(n2)-[e2]->(m:keyword) return COUNT(p);
```
the output is (running under Tugraph's MovieDemo):
```
match p=(n0:user)-[e0:is_friend]->(n1:user)-[e1:rate]->(n2:movie)-[e2:has_keyword]->(m:keyword) return COUNT(p);
```

Specific code introduction

Next, we will only introduce the portion implemented in this pull request, i.e., 'element variables with no labels => element variables with labels' in Schema-Guided optimization in TuGraph.
We divided our codes into two parts:

1. add new optimizations to the original TuGraph to change the execution plan
1. Find all feasible graphs using Schema information

Optimizations added to the original TuGraph

To guide the generation of the execution plan, change the Build method of execution_plan.cpp and execution_plan.h to include the cypher::RTContext parameter. At the same time we add the cypher::RTContext parameter when calling the Build method in scheduler.cpp.
A new optimization file [opt_rewrite_with_schema_inference.h] was added later in /tugraph-db/src/cypher/execution_plan/optimization, the general logic of which is to find all connected components of the pattern graph and optimize each connected component.
- First get lgraph::SchemaInfo based on cypher::RTContext as follows:
```
const lgraph::SchemaInfo *schema_info;
if (_ctx->graph_.empty()) {
    _ctx->ac_db_.reset(nullptr);
    schema_info = nullptr;
} else {
    _ctx->ac_db_ = std::make_unique<lgraph::AccessControlledDB>(
        _ctx->galaxy_->OpenGraph(_ctx->user_, _ctx->graph_));
    lgraph_api::GraphDB db(_ctx->ac_db_.get(), true);
    _ctx->txn_ = std::make_unique<lgraph_api::Transaction>(db.CreateReadTxn());
    schema_info = &_ctx->txn_->GetTxn()->GetSchemaInfo();
}
_ctx->txn_.reset(nullptr);
```
- An example of the execution plan tree structure constructed by the Match clause is as follows (take "match p=(n0)-[e0]->(n1)-[e1]->(n2)-[e2]->(m:keyword)" as an example):
```
Expand(All) [n2 --> m ] 
    Expand(All) [n1 --> n2 ] 
        Expand(All) [n0 --> n1 ] 
            All Node Scan [n0]
```
- This structure is all obtained by using the scan or attribute index of a node as a leaf node and then constantly doing Expand operations. Currently we don't deal with the presence of variable length operators (VarLenExpand) and there is no need for optimization if there is only a single point, so we use the _RewriteWithSchemaInference function to depth-first traverse the whole execution plan tree, and when the ExpandAll operator is accessed, we call the _ExtractStreamAndAddLabels function to access the subtree with its root, adding the nodes and edges appearing above to the SchemaNodeMap and SchemaRelpMap.
- The structure of SchemaNodeMap and SchemaRelpMap is as follows
```
//map key is the id of the node in the pattern graph, map value is the node label.
typedef  std::map<NodeID,std::string> SchemaNodeMap;
//map key is the edge id in the pattern graph, map value is a quaternion of (source id, end id, edge label , edge direction).
typedef std::map<RelpID,std::tuple<NodeID,NodeID,std::set<std::string>,parser::LinkDirection>> SchemaRelpMap;
```
- Then pass the SchemaNodeMap and SchemaRelpMap to the GetEffectivePath method of cypher::rewrite::SchemaRewrite class to get the feasible path, if there is only one feasible graph, we add labels for the nodes and edges in the subtree rooted by ExpandAll operator, if it is AllNodeScan or or AllNodeScanDynamic operator, we need to reconstruct the operator as NodeByLabelScan or NodeByLabelScanDynamic to replace the original operator into the following execution plan tree(All Node Scan [n0] -> Node By Label Scan [n0:user]):
```
Expand(All) [n2 --> m ] 
    Expand(All) [n1 --> n2 ] 
        Expand(All) [n0 --> n1 ] 
            Node By Label Scan [n0:user]
```

Finding all feasible graphs using Schema information

SchemaRewrite code is based on recursive backtracking and matches all feasible graph patterns. The Node class is used to represent a node in the graph and contains label, edge information. The Edge class is used to represent an edge in the graph and contains information about the source, the end point, the label, etc. And the Graph class represents the graph and contains all nodes, edges in the graph.
Convert Schema information to Graph class object target_graph, convert incoming SchemaNodeMap and SchemaRelpMap to Graph class object query_graph, and find all feasible graph patterns by recursive matching on query_graph.
The SchemaRewrite class is used to implement recursive backtracking to get the matching information of all feasible query graphs to the target graph, the main idea is:
1. Match a vertex q_v1 in the query graph with a vertex t_v1 in the target graph in turn, if the vertex labels match, then start the recursive matching method MatchRecursive;
2. If the depth of the current match is the number of vertices of the query graph, then the match ends;
3. Generate all candidate state information based on the current matching state via the GenCandidateStateInfo method;
  - a. Iterate over all vertices in the query graph that have been matched, and if another node on its neighboring edge does not match, find the set of matching edges edge_ids in the target graph based on this neighboring edge.
  - b. Get all candidate vids of the vertices in target graphs based on the edge_ids and save them in the StateInfo object which stores a collection of node pairs.
4. Check all node pairs in the StateInfo and if the labels of the node pairs match, the mapped states are recorded and the recursive matching continues to step 2.
Finally, all the matching results are saved into SchemaNodeMap and SchemaRelpMap.

Correctness verification

Our submitted codes can pass the tests provided by TuGraph's workflow.

Performance comparison

See https://github.com/seijiang/opt_performance_test for automated test scripts
The configuration is as follows:
- OS: Ubuntu 22.04.2 LTS
- Memory: 1007GB
- CPU: AMD EPYC 7763 64-Core Processor
We have selected some cases with only a single feasible graph for comparison and the results are as follows:

Test Cases	before-opt Execution time(s) (average time of 10 runs)	after-opt Execution time(s) (average time of 10 runs)	speed-up Ratio (original time/current time)	Number of Records	Subgraph Names
match p=(n0)-[e0]->(n1)-[e1]->(n2)-[e2]->(m:keyword) return COUNT(p);	0.2454206	0.2139376	114.72%	67662	MovieDemo
match p=(n0)-[e0:produce]->(n1)-[e1:has_keyword]->(m) return COUNT(p);	0.02513217	0.01820702	138.04%	5937	MovieDemo
match p=(n0)-[e0]->(n1)-[e1]->(n2)-[e2]->(m:genre) return COUNT(p);	0.12154151	0.02842814	427.54%	2362	MovieDemo
match p=(n0)-[e0]->(n1)-[e1]->(n2:movie)-[e2]-(m:keyword) return COUNT(p);	0.2289552	0.2056315	111.34%	67662	MovieDemo
match p=(n0)<-[e0:produce\|write\|directed]-(m) return COUNT(p);	0.005585791	0.001786231	312.71%	232	MovieDemo
match p=(n0)-[e0:is_friend]->(n1)-[e1:is_friend]->(m) return COUNT(p);	0.012829403	0.009870283	129.98%	2089	MovieDemo
match p=(n0)-[e0]->(n1)-[e1]->(n2)-[e2]->(n3)-[e3]->(m:street) return COUNT(p);	4.891758	2.894261	169.02%	660405	CovidDemo
match p=(n0)-[e0:town_to_street]->(n)-[e1:street_to_address]->(m) return COUNT(p);	0.9901912	0.8067309	122.74%	1000	CovidDemo
match p=(n0)-[e0]->(n)-[e1:street_to_address]->(m:address) return COUNT(p);	1.069902	0.8068914	132.60%	1000	CovidDemo
match p=(n0)-[e0]->(n1)-[e1]->(n2:street)-[e2]-(m:address) return COUNT(p);	1.917448	1.160086	165.28%	999	CovidDemo
match p=(n0)-[e0:person_live_with_person]->(n1)-[e1:person_live_with_person]->(m) return COUNT(p);	0.6432601	0.1404375	458.04%	39969	CovidDemo

The geometric mean of the execution time before optimization is 0.226016685s, and the geometric mean of the execution time after optimization is 0.125647334s, with an average speedup ratio of 179.88%.

merge with TuGraph

…erence

wangtao9 · 2023-06-26T07:24:13Z

src/BuildCypherLib.cmake

@@ -61,6 +61,8 @@ set(LGRAPH_CYPHER_SRC   # find cypher/ -name "*.cpp" | sort
        cypher/procedure/procedure.cpp
        cypher/resultset/record.cpp
        cypher/monitor/monitor_manager.cpp
+        cypher/execution_plan/rewrite/schema_rewrite.cpp


整体上该工作应该做成一个optimization pass，参考 src/cypher/execution_plan/optimization/reduce_count.h

wangtao9 · 2023-06-26T07:24:55Z

src/cypher/execution_plan/execution_plan.cpp

@@ -29,6 +29,9 @@
 #include "optimization/pass_manager.h"
 #include "procedure/procedure.h"
 #include "validation/check_graph.h"
+#include "rewrite/schema_rewrite.h"
+
+#define IsSchemaRewrite true


通过Gate()函数作为该优化的开关

wangtao9 · 2023-06-26T07:26:58Z

src/cypher/execution_plan/rewrite/node.h

+
+class Edge;
+
+class Node {


能否复用 src/cypher/graph/node.h ?

wangtao9 · 2023-07-03T07:52:04Z

src/cypher/execution_plan/execution_plan.h

@@ -40,6 +47,7 @@ class ExecutionPlan {
    ResultInfo _result_info;
    // query parts local member
    std::vector<PatternGraph> _pattern_graphs;
+    lgraph::SchemaInfo *_schema_info = nullptr;


_schema_info不要做成ExecutionPlan的成员，在优化时实时构建

wangtao9 · 2023-07-03T07:52:23Z

src/cypher/execution_plan/scheduler.cpp

        plan = std::make_shared<ExecutionPlan>();
-        plan->Build(visitor.GetQuery(), visitor.CommandType());
+        // 在生成执行计划时获取Schema信息


不要在这里构建schemaInfo，在优化时构建

我们之前也有想过，但是因为TuGraph在生成执行计划时并没有传入ctx信息：plan->Build(visitor.GetQuery(), visitor.CommandType());
这就导致我们在执行计划的生成时没有办法获取Schema信息，所以我们之前在执行计划生成前将Schema信息传入，我们可以尽量不修改源代码，也许可以在执行计划生成时传入ctx信息：plan->Build(visitor.GetQuery(), visitor.CommandType(),ctx); 然后在优化时传入ctx信息：pass_manager.ExecutePasses(ctx);

不要在这里构建schemaInfo，在优化时构建

目前更改了SchemaInfo的构建代码，更改为在优化时构建，但是由于需要ctx提供Schema信息，所以我们更改了ExecutionPlan的Build方法，增添了cypher::RTContext参数

wangtao9 · 2023-07-03T07:58:05Z

src/cypher/execution_plan/execution_plan.h

@@ -109,6 +117,10 @@ class ExecutionPlan {

    const ResultInfo &GetResultInfo() const;

+    void SetSchemaInfo(lgraph::SchemaInfo *schema_info) { _schema_info = schema_info; }


优化要尽量做成无侵入的，就是说尽量不改动已有代码，而是新增代码

…ng/tugraph-db into opt_rewrite_with_schema_inference

wangtao9 · 2023-07-04T05:54:50Z

src/cypher/graph/node.h

@@ -57,6 +57,8 @@ class Node {

    const std::string &Label() const;

+    void SetLabel(std::string schema_label) { label_ = schema_label; }


void SetLabel(const std::string& label)

收到，已经更改

wangtao9 · 2023-07-04T05:55:15Z

src/cypher/graph/relationship.h

@@ -60,6 +60,8 @@ class Relationship {

    const std::set<std::string> &Types() const;

+    void SetTypes(std::set<std::string> types) { types_ = types; }


void SetTypes(const std::setstd::string& types)

收到，已经更改

wangtao9 · 2023-07-04T06:02:05Z

@spasserby 看看有什么评论？

spasserby · 2023-07-11T08:40:25Z

src/cypher/execution_plan/optimization/rewrite/graph.cpp

+
+void Graph::PrintGraph() {
+    for (Node node : m_nodes) {
+        std::cout << "Node id:" << node.m_id << std::endl;


不使用std::cout, 使用FMA_LOG(), FMA_DBG()
FMA_DBG() << "Node id:" << node.m_id;

不使用std::cout, 使用FMA_LOG(), FMA_DBG() FMA_DBG() << "Node id:" << node.m_id;

收到已经更改

spasserby · 2023-07-11T08:41:18Z

test/test_cypher.cpp

@@ -1,4 +1,4 @@
-/**
+/**


补充ut: 把所有测例都添加到test_cypher

补充ut: 把所有测例都添加到test_cypher

test_cypher使用的yago图没有加edge_constraint，所以我们新加了一个有约束的yago图，更改了graph_factory.h、cypher_plan_validate.json、test_cypher_plan.cpp、test_cypher.cpp

spasserby · 2023-07-11T08:41:56Z

src/cypher/execution_plan/optimization/rewrite/node.h

@@ -0,0 +1,24 @@
+#pragma once


能否复用 src/cypher/graph/node.h ?

能否复用 src/cypher/graph/node.h ?

这部分可能改动较大不知道可不可以暂时先不改动

spasserby · 2023-07-11T08:42:13Z

src/cypher/execution_plan/execution_plan.h

@@ -31,6 +31,13 @@ class StateMachine;

 namespace cypher {

+// key为pattern graph中的点id,value为label值
+typedef std::map<NodeID, std::string> SchemaNodeMap;


SchemaNodeMap, SchemaRelpMap, SchemaGraphMap可以都放到schema_rewrite.h

SchemaNodeMap, SchemaRelpMap, SchemaGraphMap可以都放到schema_rewrite.h

收到已经更改

spasserby · 2023-07-11T08:42:39Z

src/cypher/execution_plan/optimization/rewrite/edge.h

+#include "node.h"
+#include <vector>
+#include "parser/data_typedef.h"
+namespace rewrite_cypher {


namespace rewrite_cypher是不是可以改成cypher::rewrite

namespace rewrite_cypher是不是可以改成cypher::rewrite

收到已经更改

spasserby · 2023-07-11T08:46:43Z

src/cypher/execution_plan/optimization/opt_rewrite_with_schema_inference.h

+
+    bool Gate() override { return true; }
+
+    int Execute(ExecutionPlan *plan) override {


Execute阶段获取schema的有事务问题，这个我需要再考虑怎么修改

@seijiang @wangtao9
这里要保证schema在执行阶段是不变的

考虑：

schema的一致性依赖于std::make_uniquelgraph::AccessControlledDB(ctx->galaxy->OpenGraph(ctx->user, ctx->graph))

AccessControlledDB会在它生命周期内持有LightningGraph的HoldReadLock(meta_lock)，对schema的修改会HoldWriteLock(meta_lock)，因此在它的生命周期内schema是安全的

所以：

在optimization Execute阶段和execution_plan Execute阶段都会做OpenGraph
应该保证AccessControlledDB一直存在，而非再打开

@seijiang @wangtao9 这里要保证schema在执行阶段是不变的

考虑：

schema的一致性依赖于std::make_uniquelgraph::AccessControlledDB(ctx->galaxy->OpenGraph(ctx->user, ctx->graph))

AccessControlledDB会在它生命周期内持有LightningGraph的HoldReadLock(meta_lock)，对schema的修改会HoldWriteLock(meta_lock)，因此在它的生命周期内schema是安全的

所以：

在optimization Execute阶段和execution_plan Execute阶段都会做OpenGraph 应该保证AccessControlledDB一直存在，而非再打开

1.在opt_rewrite_with_schema_inference.h不重置AccessControlledDB

2.更改runtime_context.h的Check，不检查ac_db_是否不为空(不知道会不会有问题)

3.在execution_plan Execute阶段判断ctx->ac_db_是否为空，为空则创建ctx->ac_db_，不为空则不重复创建

src/BuildCypherLib.cmake

wangtao9 · 2023-07-25T02:19:05Z

src/cypher/execution_plan/ops/op_all_node_scan.h

@@ -102,6 +102,8 @@ class AllNodeScan : public OpBase {

    Node *GetNode() const { return node_; }

+    const SymbolTable *GetSymbolTable() { return sym_tab_; }


Suggested change

const SymbolTable *GetSymbolTable() { return sym_tab_; }

const SymbolTable *SymTab() const { return sym_tab_; }

wangtao9 · 2023-07-25T02:20:14Z

src/cypher/execution_plan/ops/op_all_node_scan_dynamic.h

@@ -110,6 +110,8 @@ class AllNodeScanDynamic : public OpBase {

    Node *GetNode() const { return node_; }

+    const SymbolTable *GetSymbolTable() const { return sym_tab_; }


Suggested change

const SymbolTable *GetSymbolTable() const { return sym_tab_; }

const SymbolTable *SymTab() const { return sym_tab_; }

wangtao9 · 2023-07-25T02:23:00Z

src/cypher/execution_plan/execution_plan.cpp

@@ -1323,8 +1324,10 @@ int ExecutionPlan::Execute(RTContext *ctx) {
    if (ctx->graph_.empty()) {
        ctx->ac_db_.reset(nullptr);
    } else {
-        ctx->ac_db_ = std::make_unique<lgraph::AccessControlledDB>(
-            ctx->galaxy_->OpenGraph(ctx->user_, ctx->graph_));
+        if (!ctx->ac_db_) {


加上注释，在什么情况下、在哪里已经设置了ac_db_

src/cypher/execution_plan/execution_plan.cpp

wangtao9 · 2023-07-25T02:45:44Z

src/cypher/execution_plan/runtime_context.h

-            msg = "Access controlled db not empty";
-            return false;
-        }
+        // if (ac_db_) {


删除无用代码，加一段说明为什么没哟check ac_db

wangtao9 · 2023-07-25T08:26:06Z

test/graph_factory.h

+        CreateCsvFiles(data);
+    }
+
+static void WriteYagoFilesWithConstraints() {


可以把constraint直接加在WriteYagoFiles中

可以把constraint直接加在WriteYagoFiles中

这个主要是因为test_cypher.cpp中有非常多不遵循yago图的Schema约束的cypher语句，比如这种"MATCH (a:Film),(b:City) CREATE (a)-[:BORN_IN]->(b)"，所以为了不破坏原本的test，我只能新加了一个有约束的图(之前本来想在test_cypher.cpp用一些Cypher语句添加边约束，但是因为这可能会导致大量数据的删除，TuGraph就禁止了这种操作)

把 static const std::map<std::string, std::string> yago_data 提到WriteYagoFiles外面，然后let yago_data_with_constraints = yago_data

最后修改yago_data_with_constraints["yago.conf"]

把 static const std::map<std::string, std::string> yago_data 提到WriteYagoFiles外面，然后let yago_data_with_constraints = yago_data

最后修改yago_data_with_constraints["yago.conf"]

好的已经修改

wangtao9 · 2023-07-25T08:26:43Z

test/graph_factory.h

@@ -362,4 +591,22 @@ Liam Neeson,Batman Begins,Henri Ducard
        import_v3::Importer importer(config);
        importer.DoImportOffline();
    }
+
+    // add edge constraints for yago
+    static void create_yago_with_constraints(const std::string& dir = "./lgraph_db") {


建议删除

wangtao9 · 2023-07-26T03:03:33Z

test/graph_factory.h

+            yago_conf.insert(n + original_strings[i].length(), constraints_strings[i]);
+        }
+        auto yago_data_with_constraints = yago_data;
+        yago_data_with_constraints.at("yago.conf") = yago_conf;


这里直接赋新值就可以了：

yago_data_with_constraints.at("yago.conf") = R"( { "schema": [ { "label" : "Person", "type" : "VERTEX", "primary" : "name", "properties" : [ {"name" : "name", "type":"STRING"}, {"name" : "birthyear", "type":"INT16", "optional":true} ] }, { "label" : "City", "type" : "VERTEX", "primary" : "name", "properties" : [ {"name": "name", "type":"STRING"} ] }, { "label" : "Film", "primary": "title", "type" : "VERTEX", "properties" : [ {"name": "title", "type":"STRING"} ] }, { "label" : "HAS_CHILD", "type" : "EDGE", "constraints": [["Person", "Person"]] }, { "label" : "MARRIED", "type" : "EDGE", "constraints": [["Person", "Person"]] }, { "label" : "BORN_IN", "type" : "EDGE", "properties" : [ {"name" : "weight", "type":"FLOAT", "optional":true} ], "constraints": [["Person", "City"]] }, { "label" : "DIRECTED", "type" : "EDGE", "constraints": [["Person", "Film"]] }, { "label" : "WROTE_MUSIC_FOR", "type" : "EDGE", "constraints": [["Person", "Film"]] }, { "label" : "ACTED_IN", "type" : "EDGE", "properties" : [ {"name" : "charactername", "type":"STRING"} ], "constraints": [["Person", "Film"]] } ], "files" : [ { "path" : "person.csv", "format" : "CSV", "label" : "Person", "columns" : ["name","birthyear"] }, { "path" : "city.csv", "format" : "CSV", "label" : "City", "columns" : ["name"] }, { "path" : "film.csv", "format" : "CSV", "label" : "Film", "columns" : ["title"] }, { "path" : "has_child.csv", "format" : "CSV", "label" : "HAS_CHILD", "SRC_ID" : "Person", "DST_ID" : "Person", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "married.csv", "format" : "CSV", "label" : "MARRIED", "SRC_ID" : "Person", "DST_ID" : "Person", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "born_in.csv", "format" : "CSV", "label" : "BORN_IN", "SRC_ID" : "Person", "DST_ID" : "City", "columns" : ["SRC_ID","DST_ID","weight"] }, { "path" : "directed.csv", "format" : "CSV", "label" : "DIRECTED", "SRC_ID" : "Person", "DST_ID" : "Film", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "wrote.csv", "format" : "CSV", "label" : "WROTE_MUSIC_FOR", "SRC_ID" : "Person", "DST_ID" : "Film", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "acted_in.csv", "format" : "CSV", "label" : "ACTED_IN", "SRC_ID" : "Person", "DST_ID" : "Film", "columns" : ["SRC_ID","DST_ID","charactername"] } ] } )";

这里直接赋新值就可以了：

yago_data_with_constraints.at("yago.conf") = R"( { "schema": [ { "label" : "Person", "type" : "VERTEX", "primary" : "name", "properties" : [ {"name" : "name", "type":"STRING"}, {"name" : "birthyear", "type":"INT16", "optional":true} ] }, { "label" : "City", "type" : "VERTEX", "primary" : "name", "properties" : [ {"name": "name", "type":"STRING"} ] }, { "label" : "Film", "primary": "title", "type" : "VERTEX", "properties" : [ {"name": "title", "type":"STRING"} ] }, { "label" : "HAS_CHILD", "type" : "EDGE", "constraints": [["Person", "Person"]] }, { "label" : "MARRIED", "type" : "EDGE", "constraints": [["Person", "Person"]] }, { "label" : "BORN_IN", "type" : "EDGE", "properties" : [ {"name" : "weight", "type":"FLOAT", "optional":true} ], "constraints": [["Person", "City"]] }, { "label" : "DIRECTED", "type" : "EDGE", "constraints": [["Person", "Film"]] }, { "label" : "WROTE_MUSIC_FOR", "type" : "EDGE", "constraints": [["Person", "Film"]] }, { "label" : "ACTED_IN", "type" : "EDGE", "properties" : [ {"name" : "charactername", "type":"STRING"} ], "constraints": [["Person", "Film"]] } ], "files" : [ { "path" : "person.csv", "format" : "CSV", "label" : "Person", "columns" : ["name","birthyear"] }, { "path" : "city.csv", "format" : "CSV", "label" : "City", "columns" : ["name"] }, { "path" : "film.csv", "format" : "CSV", "label" : "Film", "columns" : ["title"] }, { "path" : "has_child.csv", "format" : "CSV", "label" : "HAS_CHILD", "SRC_ID" : "Person", "DST_ID" : "Person", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "married.csv", "format" : "CSV", "label" : "MARRIED", "SRC_ID" : "Person", "DST_ID" : "Person", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "born_in.csv", "format" : "CSV", "label" : "BORN_IN", "SRC_ID" : "Person", "DST_ID" : "City", "columns" : ["SRC_ID","DST_ID","weight"] }, { "path" : "directed.csv", "format" : "CSV", "label" : "DIRECTED", "SRC_ID" : "Person", "DST_ID" : "Film", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "wrote.csv", "format" : "CSV", "label" : "WROTE_MUSIC_FOR", "SRC_ID" : "Person", "DST_ID" : "Film", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "acted_in.csv", "format" : "CSV", "label" : "ACTED_IN", "SRC_ID" : "Person", "DST_ID" : "Film", "columns" : ["SRC_ID","DST_ID","charactername"] } ] } )";

我一开始是想直接赋新值的，想着这样太长了，而且如果要改yago_data的话两个都得改，就又重新写了，那我重新改回来吧

这里直接赋新值就可以了：

yago_data_with_constraints.at("yago.conf") = R"( { "schema": [ { "label" : "Person", "type" : "VERTEX", "primary" : "name", "properties" : [ {"name" : "name", "type":"STRING"}, {"name" : "birthyear", "type":"INT16", "optional":true} ] }, { "label" : "City", "type" : "VERTEX", "primary" : "name", "properties" : [ {"name": "name", "type":"STRING"} ] }, { "label" : "Film", "primary": "title", "type" : "VERTEX", "properties" : [ {"name": "title", "type":"STRING"} ] }, { "label" : "HAS_CHILD", "type" : "EDGE", "constraints": [["Person", "Person"]] }, { "label" : "MARRIED", "type" : "EDGE", "constraints": [["Person", "Person"]] }, { "label" : "BORN_IN", "type" : "EDGE", "properties" : [ {"name" : "weight", "type":"FLOAT", "optional":true} ], "constraints": [["Person", "City"]] }, { "label" : "DIRECTED", "type" : "EDGE", "constraints": [["Person", "Film"]] }, { "label" : "WROTE_MUSIC_FOR", "type" : "EDGE", "constraints": [["Person", "Film"]] }, { "label" : "ACTED_IN", "type" : "EDGE", "properties" : [ {"name" : "charactername", "type":"STRING"} ], "constraints": [["Person", "Film"]] } ], "files" : [ { "path" : "person.csv", "format" : "CSV", "label" : "Person", "columns" : ["name","birthyear"] }, { "path" : "city.csv", "format" : "CSV", "label" : "City", "columns" : ["name"] }, { "path" : "film.csv", "format" : "CSV", "label" : "Film", "columns" : ["title"] }, { "path" : "has_child.csv", "format" : "CSV", "label" : "HAS_CHILD", "SRC_ID" : "Person", "DST_ID" : "Person", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "married.csv", "format" : "CSV", "label" : "MARRIED", "SRC_ID" : "Person", "DST_ID" : "Person", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "born_in.csv", "format" : "CSV", "label" : "BORN_IN", "SRC_ID" : "Person", "DST_ID" : "City", "columns" : ["SRC_ID","DST_ID","weight"] }, { "path" : "directed.csv", "format" : "CSV", "label" : "DIRECTED", "SRC_ID" : "Person", "DST_ID" : "Film", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "wrote.csv", "format" : "CSV", "label" : "WROTE_MUSIC_FOR", "SRC_ID" : "Person", "DST_ID" : "Film", "columns" : ["SRC_ID","DST_ID"] }, { "path" : "acted_in.csv", "format" : "CSV", "label" : "ACTED_IN", "SRC_ID" : "Person", "DST_ID" : "Film", "columns" : ["SRC_ID","DST_ID","charactername"] } ] } )";

已修改

wangtao9 · 2023-07-26T07:13:13Z

@spasserby 还有更多评论吗？

wangtao9 · 2023-07-27T02:54:38Z

Thank you for your contribution! @seijiang @spasserby

seijiang and others added 6 commits June 19, 2023 14:43

Merge pull request #1 from TuGraph-family/master

e2cb5de

merge with TuGraph

Merge branch 'TuGraph-family:master' into master

e44076b

schema rewrite

a238678

schema rewrite

b41174e

schema rewrite

cbd4eb7

Merge branch 'TuGraph-family:master' into opt_rewrite_with_schema_inf…

1cbd8c8

…erence

wangtao9 requested review from spasserby and wangtao9 June 22, 2023 07:36

seijiang added 2 commits June 25, 2023 11:02

format

56dc670

update schema_rewrite.cpp

e2f86df

wangtao9 reviewed Jun 26, 2023

View reviewed changes

seijiang and others added 5 commits June 28, 2023 09:27

opt_rewrite_with_schema_inference

9ee3efb

delete previous rewrite

b59af82

modify test_cypher

7263420

comment of optimization

64f6d92

Merge branch 'master' into opt_rewrite_with_schema_inference

8772d28

seijiang requested a review from wangtao9 June 29, 2023 06:09

wangtao9 reviewed Jul 3, 2023

View reviewed changes

seijiang added 3 commits July 4, 2023 11:15

add ctx

31fb864

Merge branch 'opt_rewrite_with_schema_inference' of github.com:seijia…

7852364

…ng/tugraph-db into opt_rewrite_with_schema_inference

fix add ctx

c541b6d

wangtao9 reviewed Jul 4, 2023

View reviewed changes

seijiang added 5 commits July 4, 2023 14:09

fix add ctx

e26cd5e

node and relationship

efd4f54

Build

91ddefa

node and relationship

8a1d7e3

node

69d6395

spasserby reviewed Jul 11, 2023

View reviewed changes

seijiang added 3 commits July 13, 2023 16:02

std::cout -> FMA_LOG or FMA_DBG

b28ec4e

test_cypher

39de24d

namespace change

08c2444

seijiang requested review from spasserby and wangtao9 July 13, 2023 09:33

seijiang added 7 commits July 19, 2023 09:56

change ac_db

2e71607

merge

ef678ec

format

439676a

format

0064cdf

format

91bcdd4

format

ca4479a

created person

81863c2

wangtao9 reviewed Jul 25, 2023

View reviewed changes

wangtao9 and others added 4 commits July 25, 2023 10:46

Update src/cypher/execution_plan/execution_plan.cpp

8744189

Update src/BuildCypherLib.cmake

fc21c23

ac_db_ explanation

97eec5c

SymTab

d53cdc8

wangtao9 reviewed Jul 25, 2023

View reviewed changes

seijiang added 4 commits July 25, 2023 20:04

yago_with_constraints

76c603b

yago with constraints

f457206

yago with constraints

b00e193

yago with constraints

4192bda

wangtao9 reviewed Jul 26, 2023

View reviewed changes

seijiang and others added 3 commits July 26, 2023 11:37

yago_data_with_constraints

c6fdc22

yago_data_with_constraints

923fdd9

Merge branch 'master' into opt_rewrite_with_schema_inference

7912a45

wangtao9 approved these changes Jul 27, 2023

View reviewed changes

wangtao9 merged commit 4c415cf into TuGraph-family:master Jul 27, 2023
2 checks passed

		@@ -109,6 +117,10 @@ class ExecutionPlan {

		const ResultInfo &GetResultInfo() const;

		void SetSchemaInfo(lgraph::SchemaInfo *schema_info) { _schema_info = schema_info; }

		@@ -57,6 +57,8 @@ class Node {

		const std::string &Label() const;

		void SetLabel(std::string schema_label) { label_ = schema_label; }

		@@ -60,6 +60,8 @@ class Relationship {

		const std::set<std::string> &Types() const;

		void SetTypes(std::set<std::string> types) { types_ = types; }


		bool Gate() override { return true; }

		int Execute(ExecutionPlan *plan) override {

		@@ -102,6 +102,8 @@ class AllNodeScan : public OpBase {

		Node *GetNode() const { return node_; }

		const SymbolTable *GetSymbolTable() { return sym_tab_; }

	const SymbolTable *GetSymbolTable() { return sym_tab_; }
	const SymbolTable *SymTab() const { return sym_tab_; }

		@@ -110,6 +110,8 @@ class AllNodeScanDynamic : public OpBase {

		Node *GetNode() const { return node_; }

		const SymbolTable *GetSymbolTable() const { return sym_tab_; }

		@@ -1,4 +1,4 @@
		/**
		/**

Optimizing Cypher statements using Schema information #235

Optimizing Cypher statements using Schema information #235

Conversation

seijiang commented Jun 21, 2023 • edited

Contribution

General Introduction

Schema-Guided optimization in TuGraph

Schema-Guided optimization for other databases

Specific code introduction

Optimizations added to the original TuGraph

Finding all feasible graphs using Schema information

Correctness verification

Performance comparison

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangtao9 commented Jul 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

考虑：

所以：

seijiang Jul 19, 2023 • edited

Choose a reason for hiding this comment

考虑：

所以：

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangtao9 Jul 25, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangtao9 commented Jul 26, 2023

wangtao9 commented Jul 27, 2023

seijiang commented Jun 21, 2023 •

edited

seijiang Jul 19, 2023 •

edited

wangtao9 Jul 25, 2023 •

edited