Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing Cypher statements using Schema information #235

Merged

Conversation

seijiang
Copy link
Contributor

@seijiang seijiang commented Jun 21, 2023

Contribution

  • Contributions come from the s4plus-GraphDB group at USTC and are described as follows:
    • Contributors: Chaijun Xu, Yunlong Liang, Yu Zhang, Hairong Hu
    • Instructor: Prof. Zhang Yu

General Introduction

Schema-Guided optimization in TuGraph

  • Use Schema information to provide additional information for Cypher execution plan generation to optimize the execution plan. In conjunction with Schema information, specialization of Cypher query patterns with the following characteristics is currently supported:
    1. element variables with no labels => element variables with labels
    2. path pattern with quantifiers such as m..n', *..n, *m.. => path pattern with fixed-length edges
  • However, 2. has not yet been submitted to pull request which may be submitted later, but it has been implemented in our own recent Github submission(the core-codes are in src/cypher/execution_plan/optimization/rewrite/ and src/cypher/execution_plan/ops/op_var_len_expand.h). The current pull request is only optimized for the case where there is only one feasible graph. For example, the following Cypher:
    match p=(n0)-[e0]->(n1)-[e1]->(n2)-[e2]->(m:keyword) return COUNT(p);
    It can be optimized as follows running under Tugraph's MovieDemo:
    match p=(n0:user)-[e0:is_friend]->(n1:user)-[e1:rate]->(n2:movie)-[e2:has_keyword]->(m:keyword) return COUNT(p);
    
  • But for the following Cypher:
    match p=(n0)-[e0]->(n1:movie) return COUNT(p);
    
    There are two cases that satisfy:
    match p=(n0:person)-[e0:acted_in|directed|produce|write]->(n1:movie) return COUNT(p);
    match p=(n0:user)-[e0:rate]->(n1:movie) return COUNT(p);
    
    This situation is not optimized in this commit for the time being

Schema-Guided optimization for other databases

  • The optimizations mentioned above are implemented at the execution plan level in TuGraph. However, if we need to test them on other graph databases, we require end-to-end optimizations, where input is a Cypher query and output is an optimized Cypher query. It has also not yet been submitted to pull request, but it has been implemented in our own recent Github submission(the core-codes are in src/cypher/parser/parse_tree_to_cypher_visitor.h). For example, the input Cypher (We add optimize at the beginning):
    optimize match p=(n0)-[e0]->(n1)-[e1]->(n2)-[e2]->(m:keyword) return COUNT(p);
    the output is (running under Tugraph's MovieDemo):
    match p=(n0:user)-[e0:is_friend]->(n1:user)-[e1:rate]->(n2:movie)-[e2:has_keyword]->(m:keyword) return COUNT(p);

Specific code introduction

Next, we will only introduce the portion implemented in this pull request, i.e., 'element variables with no labels => element variables with labels' in Schema-Guided optimization in TuGraph.
We divided our codes into two parts:

    1. add new optimizations to the original TuGraph to change the execution plan
    1. Find all feasible graphs using Schema information

Optimizations added to the original TuGraph

  • To guide the generation of the execution plan, change the Build method of execution_plan.cpp and execution_plan.h to include the cypher::RTContext parameter. At the same time we add the cypher::RTContext parameter when calling the Build method in scheduler.cpp.
  • A new optimization file [opt_rewrite_with_schema_inference.h] was added later in /tugraph-db/src/cypher/execution_plan/optimization, the general logic of which is to find all connected components of the pattern graph and optimize each connected component.
    • First get lgraph::SchemaInfo based on cypher::RTContext as follows:
      const lgraph::SchemaInfo *schema_info;
      if (_ctx->graph_.empty()) {
          _ctx->ac_db_.reset(nullptr);
          schema_info = nullptr;
      } else {
          _ctx->ac_db_ = std::make_unique<lgraph::AccessControlledDB>(
              _ctx->galaxy_->OpenGraph(_ctx->user_, _ctx->graph_));
          lgraph_api::GraphDB db(_ctx->ac_db_.get(), true);
          _ctx->txn_ = std::make_unique<lgraph_api::Transaction>(db.CreateReadTxn());
          schema_info = &_ctx->txn_->GetTxn()->GetSchemaInfo();
      }
      _ctx->txn_.reset(nullptr);
    • An example of the execution plan tree structure constructed by the Match clause is as follows (take "match p=(n0)-[e0]->(n1)-[e1]->(n2)-[e2]->(m:keyword)" as an example):
      Expand(All) [n2 --> m ] 
          Expand(All) [n1 --> n2 ] 
              Expand(All) [n0 --> n1 ] 
                  All Node Scan [n0]
      
    • This structure is all obtained by using the scan or attribute index of a node as a leaf node and then constantly doing Expand operations. Currently we don't deal with the presence of variable length operators (VarLenExpand) and there is no need for optimization if there is only a single point, so we use the _RewriteWithSchemaInference function to depth-first traverse the whole execution plan tree, and when the ExpandAll operator is accessed, we call the _ExtractStreamAndAddLabels function to access the subtree with its root, adding the nodes and edges appearing above to the SchemaNodeMap and SchemaRelpMap.
    • The structure of SchemaNodeMap and SchemaRelpMap is as follows
      //map key is the id of the node in the pattern graph, map value is the node label.
      typedef  std::map<NodeID,std::string> SchemaNodeMap;
      //map key is the edge id in the pattern graph, map value is a quaternion of (source id, end id, edge label , edge direction).
      typedef std::map<RelpID,std::tuple<NodeID,NodeID,std::set<std::string>,parser::LinkDirection>> SchemaRelpMap;
    • Then pass the SchemaNodeMap and SchemaRelpMap to the GetEffectivePath method of cypher::rewrite::SchemaRewrite class to get the feasible path, if there is only one feasible graph, we add labels for the nodes and edges in the subtree rooted by ExpandAll operator, if it is AllNodeScan or or AllNodeScanDynamic operator, we need to reconstruct the operator as NodeByLabelScan or NodeByLabelScanDynamic to replace the original operator into the following execution plan tree(All Node Scan [n0] -> Node By Label Scan [n0:user]):
      Expand(All) [n2 --> m ] 
          Expand(All) [n1 --> n2 ] 
              Expand(All) [n0 --> n1 ] 
                  Node By Label Scan [n0:user]
      

Finding all feasible graphs using Schema information

  • SchemaRewrite code is based on recursive backtracking and matches all feasible graph patterns. The Node class is used to represent a node in the graph and contains label, edge information. The Edge class is used to represent an edge in the graph and contains information about the source, the end point, the label, etc. And the Graph class represents the graph and contains all nodes, edges in the graph.
  • Convert Schema information to Graph class object target_graph, convert incoming SchemaNodeMap and SchemaRelpMap to Graph class object query_graph, and find all feasible graph patterns by recursive matching on query_graph.
  • The SchemaRewrite class is used to implement recursive backtracking to get the matching information of all feasible query graphs to the target graph, the main idea is:
    1. Match a vertex q_v1 in the query graph with a vertex t_v1 in the target graph in turn, if the vertex labels match, then start the recursive matching method MatchRecursive;
    2. If the depth of the current match is the number of vertices of the query graph, then the match ends;
    3. Generate all candidate state information based on the current matching state via the GenCandidateStateInfo method;
      • a. Iterate over all vertices in the query graph that have been matched, and if another node on its neighboring edge does not match, find the set of matching edges edge_ids in the target graph based on this neighboring edge.
      • b. Get all candidate vids of the vertices in target graphs based on the edge_ids and save them in the StateInfo object which stores a collection of node pairs.
    4. Check all node pairs in the StateInfo and if the labels of the node pairs match, the mapped states are recorded and the recursive matching continues to step 2.
  • Finally, all the matching results are saved into SchemaNodeMap and SchemaRelpMap.

Correctness verification

  • Our submitted codes can pass the tests provided by TuGraph's workflow.

Performance comparison

  • See https://github.com/seijiang/opt_performance_test for automated test scripts
  • The configuration is as follows:
    • OS: Ubuntu 22.04.2 LTS
    • Memory: 1007GB
    • CPU: AMD EPYC 7763 64-Core Processor
  • We have selected some cases with only a single feasible graph for comparison and the results are as follows:
Test Cases before-opt Execution time(s) (average time of 10 runs) after-opt Execution time(s) (average time of 10 runs) speed-up Ratio (original time/current time) Number of Records Subgraph Names
match p=(n0)-[e0]->(n1)-[e1]->(n2)-[e2]->(m:keyword) return COUNT(p); 0.2454206 0.2139376 114.72% 67662 MovieDemo
match p=(n0)-[e0:produce]->(n1)-[e1:has_keyword]->(m) return COUNT(p); 0.02513217 0.01820702 138.04% 5937 MovieDemo
match p=(n0)-[e0]->(n1)-[e1]->(n2)-[e2]->(m:genre) return COUNT(p); 0.12154151 0.02842814 427.54% 2362 MovieDemo
match p=(n0)-[e0]->(n1)-[e1]->(n2:movie)-[e2]-(m:keyword) return COUNT(p); 0.2289552 0.2056315 111.34% 67662 MovieDemo
match p=(n0)<-[e0:produce|write|directed]-(m) return COUNT(p); 0.005585791 0.001786231 312.71% 232 MovieDemo
match p=(n0)-[e0:is_friend]->(n1)-[e1:is_friend]->(m) return COUNT(p); 0.012829403 0.009870283 129.98% 2089 MovieDemo
match p=(n0)-[e0]->(n1)-[e1]->(n2)-[e2]->(n3)-[e3]->(m:street) return COUNT(p); 4.891758 2.894261 169.02% 660405 CovidDemo
match p=(n0)-[e0:town_to_street]->(n)-[e1:street_to_address]->(m) return COUNT(p); 0.9901912 0.8067309 122.74% 1000 CovidDemo
match p=(n0)-[e0]->(n)-[e1:street_to_address]->(m:address) return COUNT(p); 1.069902 0.8068914 132.60% 1000 CovidDemo
match p=(n0)-[e0]->(n1)-[e1]->(n2:street)-[e2]-(m:address) return COUNT(p); 1.917448 1.160086 165.28% 999 CovidDemo
match p=(n0)-[e0:person_live_with_person]->(n1)-[e1:person_live_with_person]->(m) return COUNT(p); 0.6432601 0.1404375 458.04% 39969 CovidDemo
  • The geometric mean of the execution time before optimization is 0.226016685s, and the geometric mean of the execution time after optimization is 0.125647334s, with an average speedup ratio of 179.88%.

@@ -61,6 +61,8 @@ set(LGRAPH_CYPHER_SRC # find cypher/ -name "*.cpp" | sort
cypher/procedure/procedure.cpp
cypher/resultset/record.cpp
cypher/monitor/monitor_manager.cpp
cypher/execution_plan/rewrite/schema_rewrite.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整体上该工作应该做成一个optimization pass,参考 src/cypher/execution_plan/optimization/reduce_count.h

@@ -29,6 +29,9 @@
#include "optimization/pass_manager.h"
#include "procedure/procedure.h"
#include "validation/check_graph.h"
#include "rewrite/schema_rewrite.h"

#define IsSchemaRewrite true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

通过Gate()函数作为该优化的开关


class Edge;

class Node {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

能否复用 src/cypher/graph/node.h ?

@seijiang seijiang requested a review from wangtao9 June 29, 2023 06:09
@@ -40,6 +47,7 @@ class ExecutionPlan {
ResultInfo _result_info;
// query parts local member
std::vector<PatternGraph> _pattern_graphs;
lgraph::SchemaInfo *_schema_info = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_schema_info不要做成ExecutionPlan的成员,在优化时实时构建

plan = std::make_shared<ExecutionPlan>();
plan->Build(visitor.GetQuery(), visitor.CommandType());
// 在生成执行计划时获取Schema信息
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要在这里构建schemaInfo,在优化时构建

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我们之前也有想过,但是因为TuGraph在生成执行计划时并没有传入ctx信息:plan->Build(visitor.GetQuery(), visitor.CommandType());
这就导致我们在执行计划的生成时没有办法获取Schema信息,所以我们之前在执行计划生成前将Schema信息传入,我们可以尽量不修改源代码,也许可以在执行计划生成时传入ctx信息:plan->Build(visitor.GetQuery(), visitor.CommandType(),ctx); 然后在优化时传入ctx信息:pass_manager.ExecutePasses(ctx);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要在这里构建schemaInfo,在优化时构建

目前更改了SchemaInfo的构建代码,更改为在优化时构建,但是由于需要ctx提供Schema信息,所以我们更改了ExecutionPlan的Build方法,增添了cypher::RTContext参数

@@ -109,6 +117,10 @@ class ExecutionPlan {

const ResultInfo &GetResultInfo() const;

void SetSchemaInfo(lgraph::SchemaInfo *schema_info) { _schema_info = schema_info; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

优化要尽量做成无侵入的,就是说尽量不改动已有代码,而是新增代码

@@ -57,6 +57,8 @@ class Node {

const std::string &Label() const;

void SetLabel(std::string schema_label) { label_ = schema_label; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void SetLabel(const std::string& label)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

收到,已经更改

@@ -60,6 +60,8 @@ class Relationship {

const std::set<std::string> &Types() const;

void SetTypes(std::set<std::string> types) { types_ = types; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void SetTypes(const std::setstd::string& types)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

收到,已经更改

@wangtao9
Copy link
Contributor

wangtao9 commented Jul 4, 2023

@spasserby 看看有什么评论?


void Graph::PrintGraph() {
for (Node node : m_nodes) {
std::cout << "Node id:" << node.m_id << std::endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不使用std::cout, 使用FMA_LOG(), FMA_DBG()
FMA_DBG() << "Node id:" << node.m_id;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不使用std::cout, 使用FMA_LOG(), FMA_DBG() FMA_DBG() << "Node id:" << node.m_id;

收到 已经更改

@@ -1,4 +1,4 @@
/**
/**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

补充ut: 把所有测例都添加到test_cypher

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

补充ut: 把所有测例都添加到test_cypher

test_cypher使用的yago图没有加edge_constraint,所以我们新加了一个有约束的yago图,更改了graph_factory.h、cypher_plan_validate.json、test_cypher_plan.cpp、test_cypher.cpp

@@ -0,0 +1,24 @@
#pragma once
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

能否复用 src/cypher/graph/node.h ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

能否复用 src/cypher/graph/node.h ?

这部分可能改动较大 不知道可不可以暂时先不改动

@@ -31,6 +31,13 @@ class StateMachine;

namespace cypher {

// key为pattern graph中的点id,value为label值
typedef std::map<NodeID, std::string> SchemaNodeMap;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchemaNodeMap, SchemaRelpMap, SchemaGraphMap可以都放到schema_rewrite.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchemaNodeMap, SchemaRelpMap, SchemaGraphMap可以都放到schema_rewrite.h

收到 已经更改

#include "node.h"
#include <vector>
#include "parser/data_typedef.h"
namespace rewrite_cypher {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespace rewrite_cypher是不是可以改成cypher::rewrite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespace rewrite_cypher是不是可以改成cypher::rewrite

收到 已经更改


bool Gate() override { return true; }

int Execute(ExecutionPlan *plan) override {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Execute阶段获取schema的有事务问题,这个我需要再考虑怎么修改

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seijiang @wangtao9
这里要保证schema在执行阶段是不变的

考虑:

schema的一致性依赖于std::make_uniquelgraph::AccessControlledDB(ctx->galaxy->OpenGraph(ctx->user, ctx->graph))

AccessControlledDB会在它生命周期内持有LightningGraph的HoldReadLock(meta_lock),对schema的修改会HoldWriteLock(meta_lock),因此在它的生命周期内schema是安全的

所以:

在optimization Execute阶段和execution_plan Execute阶段都会做OpenGraph
应该保证AccessControlledDB一直存在,而非再打开

Copy link
Contributor Author

@seijiang seijiang Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seijiang @wangtao9 这里要保证schema在执行阶段是不变的

考虑:

schema的一致性依赖于std::make_uniquelgraph::AccessControlledDB(ctx->galaxy->OpenGraph(ctx->user, ctx->graph))

AccessControlledDB会在它生命周期内持有LightningGraph的HoldReadLock(meta_lock),对schema的修改会HoldWriteLock(meta_lock),因此在它的生命周期内schema是安全的

所以:

在optimization Execute阶段和execution_plan Execute阶段都会做OpenGraph 应该保证AccessControlledDB一直存在,而非再打开

  • 1.在opt_rewrite_with_schema_inference.h不重置AccessControlledDB
  • 2.更改runtime_context.h的Check,不检查ac_db_是否不为空(不知道会不会有问题)
  • 3.在execution_plan Execute阶段判断ctx->ac_db_是否为空,为空则创建ctx->ac_db_,不为空则不重复创建

src/BuildCypherLib.cmake Outdated Show resolved Hide resolved
@@ -102,6 +102,8 @@ class AllNodeScan : public OpBase {

Node *GetNode() const { return node_; }

const SymbolTable *GetSymbolTable() { return sym_tab_; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const SymbolTable *GetSymbolTable() { return sym_tab_; }
const SymbolTable *SymTab() const { return sym_tab_; }

@@ -110,6 +110,8 @@ class AllNodeScanDynamic : public OpBase {

Node *GetNode() const { return node_; }

const SymbolTable *GetSymbolTable() const { return sym_tab_; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const SymbolTable *GetSymbolTable() const { return sym_tab_; }
const SymbolTable *SymTab() const { return sym_tab_; }

@@ -1323,8 +1324,10 @@ int ExecutionPlan::Execute(RTContext *ctx) {
if (ctx->graph_.empty()) {
ctx->ac_db_.reset(nullptr);
} else {
ctx->ac_db_ = std::make_unique<lgraph::AccessControlledDB>(
ctx->galaxy_->OpenGraph(ctx->user_, ctx->graph_));
if (!ctx->ac_db_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加上注释,在什么情况下、在哪里已经设置了ac_db_

src/cypher/execution_plan/execution_plan.cpp Show resolved Hide resolved
msg = "Access controlled db not empty";
return false;
}
// if (ac_db_) {
Copy link
Contributor

@wangtao9 wangtao9 Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除无用代码,加一段说明 为什么没哟check ac_db

CreateCsvFiles(data);
}

static void WriteYagoFilesWithConstraints() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以把constraint直接加在WriteYagoFiles中

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以把constraint直接加在WriteYagoFiles中

这个主要是因为test_cypher.cpp中有非常多不遵循yago图的Schema约束的cypher语句,比如这种"MATCH (a:Film),(b:City) CREATE (a)-[:BORN_IN]->(b)",所以为了不破坏原本的test,我只能新加了一个有约束的图(之前本来想在test_cypher.cpp用一些Cypher语句添加边约束,但是因为这可能会导致大量数据的删除,TuGraph就禁止了这种操作)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

把 static const std::map<std::string, std::string> yago_data 提到WriteYagoFiles外面,然后let yago_data_with_constraints = yago_data

最后修改yago_data_with_constraints["yago.conf"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

把 static const std::map<std::string, std::string> yago_data 提到WriteYagoFiles外面,然后let yago_data_with_constraints = yago_data

最后修改yago_data_with_constraints["yago.conf"]

好的 已经修改

@@ -362,4 +591,22 @@ Liam Neeson,Batman Begins,Henri Ducard
import_v3::Importer importer(config);
importer.DoImportOffline();
}

// add edge constraints for yago
static void create_yago_with_constraints(const std::string& dir = "./lgraph_db") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议删除

yago_conf.insert(n + original_strings[i].length(), constraints_strings[i]);
}
auto yago_data_with_constraints = yago_data;
yago_data_with_constraints.at("yago.conf") = yago_conf;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里直接赋新值就可以了:

yago_data_with_constraints.at("yago.conf") =
R"(
{
    "schema": [
        {
            "label" : "Person",
            "type" : "VERTEX",
            "primary" : "name",
            "properties" : [
                {"name" : "name", "type":"STRING"},
                {"name" : "birthyear", "type":"INT16", "optional":true}
            ]
        },
        {
            "label" : "City",
            "type" : "VERTEX",
            "primary" : "name",
            "properties" : [
                {"name": "name", "type":"STRING"}
            ]
        },
        {
            "label" : "Film",
            "primary": "title",
            "type" : "VERTEX",
            "properties" : [
                {"name": "title", "type":"STRING"}
            ]
        },
        {
            "label" : "HAS_CHILD", 
            "type" : "EDGE",
            "constraints": [["Person", "Person"]]
        },
        {
            "label" : "MARRIED",
            "type" : "EDGE",
            "constraints": [["Person", "Person"]]
        },
        {
            "label" : "BORN_IN", 
            "type" : "EDGE",
            "properties" : [
                {"name" : "weight", "type":"FLOAT", "optional":true}
            ],
            "constraints": [["Person", "City"]]
        },
        {
            "label" : "DIRECTED",
            "type" : "EDGE",
            "constraints": [["Person", "Film"]]
        },
        {
            "label" : "WROTE_MUSIC_FOR",
            "type" : "EDGE",
            "constraints": [["Person", "Film"]]
        },
        {
            "label" : "ACTED_IN",
            "type" : "EDGE",
            "properties" : [
                {"name" : "charactername", "type":"STRING"}
            ],
            "constraints": [["Person", "Film"]]
        }
    ],
    "files" : [
        {
            "path" : "person.csv",
            "format" : "CSV",
            "label" : "Person",
            "columns" : ["name","birthyear"]
        },
        {
            "path" : "city.csv",
            "format" : "CSV",
            "label" : "City",
            "columns" : ["name"]
        },
        {
            "path" : "film.csv",
            "format" : "CSV",
            "label" : "Film",
            "columns" : ["title"]
        },
        {
            "path" : "has_child.csv",
            "format" : "CSV",
            "label" : "HAS_CHILD",
            "SRC_ID" : "Person",
            "DST_ID" : "Person",
            "columns" : ["SRC_ID","DST_ID"]
        },
        {
            "path" : "married.csv",
            "format" : "CSV",
            "label" : "MARRIED",
            "SRC_ID" : "Person",
            "DST_ID" : "Person",
            "columns" : ["SRC_ID","DST_ID"]
        },
        {
            "path" : "born_in.csv",
            "format" : "CSV",
            "label" : "BORN_IN",
            "SRC_ID" : "Person",
            "DST_ID" : "City",
            "columns" : ["SRC_ID","DST_ID","weight"]
        },
        {
            "path" : "directed.csv",
            "format" : "CSV",
            "label" : "DIRECTED",
            "SRC_ID" : "Person",
            "DST_ID" : "Film",
            "columns" : ["SRC_ID","DST_ID"]
        },
        {
            "path" : "wrote.csv",
            "format" : "CSV",
            "label" : "WROTE_MUSIC_FOR",
            "SRC_ID" : "Person",
            "DST_ID" : "Film",
            "columns" : ["SRC_ID","DST_ID"]
        },
        {
            "path" : "acted_in.csv",
            "format" : "CSV",
            "label" : "ACTED_IN",
            "SRC_ID" : "Person",
            "DST_ID" : "Film",
            "columns" : ["SRC_ID","DST_ID","charactername"]
        }
    ]
}
)";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里直接赋新值就可以了:

yago_data_with_constraints.at("yago.conf") =
R"(
{
    "schema": [
        {
            "label" : "Person",
            "type" : "VERTEX",
            "primary" : "name",
            "properties" : [
                {"name" : "name", "type":"STRING"},
                {"name" : "birthyear", "type":"INT16", "optional":true}
            ]
        },
        {
            "label" : "City",
            "type" : "VERTEX",
            "primary" : "name",
            "properties" : [
                {"name": "name", "type":"STRING"}
            ]
        },
        {
            "label" : "Film",
            "primary": "title",
            "type" : "VERTEX",
            "properties" : [
                {"name": "title", "type":"STRING"}
            ]
        },
        {
            "label" : "HAS_CHILD", 
            "type" : "EDGE",
            "constraints": [["Person", "Person"]]
        },
        {
            "label" : "MARRIED",
            "type" : "EDGE",
            "constraints": [["Person", "Person"]]
        },
        {
            "label" : "BORN_IN", 
            "type" : "EDGE",
            "properties" : [
                {"name" : "weight", "type":"FLOAT", "optional":true}
            ],
            "constraints": [["Person", "City"]]
        },
        {
            "label" : "DIRECTED",
            "type" : "EDGE",
            "constraints": [["Person", "Film"]]
        },
        {
            "label" : "WROTE_MUSIC_FOR",
            "type" : "EDGE",
            "constraints": [["Person", "Film"]]
        },
        {
            "label" : "ACTED_IN",
            "type" : "EDGE",
            "properties" : [
                {"name" : "charactername", "type":"STRING"}
            ],
            "constraints": [["Person", "Film"]]
        }
    ],
    "files" : [
        {
            "path" : "person.csv",
            "format" : "CSV",
            "label" : "Person",
            "columns" : ["name","birthyear"]
        },
        {
            "path" : "city.csv",
            "format" : "CSV",
            "label" : "City",
            "columns" : ["name"]
        },
        {
            "path" : "film.csv",
            "format" : "CSV",
            "label" : "Film",
            "columns" : ["title"]
        },
        {
            "path" : "has_child.csv",
            "format" : "CSV",
            "label" : "HAS_CHILD",
            "SRC_ID" : "Person",
            "DST_ID" : "Person",
            "columns" : ["SRC_ID","DST_ID"]
        },
        {
            "path" : "married.csv",
            "format" : "CSV",
            "label" : "MARRIED",
            "SRC_ID" : "Person",
            "DST_ID" : "Person",
            "columns" : ["SRC_ID","DST_ID"]
        },
        {
            "path" : "born_in.csv",
            "format" : "CSV",
            "label" : "BORN_IN",
            "SRC_ID" : "Person",
            "DST_ID" : "City",
            "columns" : ["SRC_ID","DST_ID","weight"]
        },
        {
            "path" : "directed.csv",
            "format" : "CSV",
            "label" : "DIRECTED",
            "SRC_ID" : "Person",
            "DST_ID" : "Film",
            "columns" : ["SRC_ID","DST_ID"]
        },
        {
            "path" : "wrote.csv",
            "format" : "CSV",
            "label" : "WROTE_MUSIC_FOR",
            "SRC_ID" : "Person",
            "DST_ID" : "Film",
            "columns" : ["SRC_ID","DST_ID"]
        },
        {
            "path" : "acted_in.csv",
            "format" : "CSV",
            "label" : "ACTED_IN",
            "SRC_ID" : "Person",
            "DST_ID" : "Film",
            "columns" : ["SRC_ID","DST_ID","charactername"]
        }
    ]
}
)";

我一开始是想直接赋新值的,想着这样太长了,而且如果要改yago_data的话两个都得改,就又重新写了,那我重新改回来吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里直接赋新值就可以了:

yago_data_with_constraints.at("yago.conf") =
R"(
{
    "schema": [
        {
            "label" : "Person",
            "type" : "VERTEX",
            "primary" : "name",
            "properties" : [
                {"name" : "name", "type":"STRING"},
                {"name" : "birthyear", "type":"INT16", "optional":true}
            ]
        },
        {
            "label" : "City",
            "type" : "VERTEX",
            "primary" : "name",
            "properties" : [
                {"name": "name", "type":"STRING"}
            ]
        },
        {
            "label" : "Film",
            "primary": "title",
            "type" : "VERTEX",
            "properties" : [
                {"name": "title", "type":"STRING"}
            ]
        },
        {
            "label" : "HAS_CHILD", 
            "type" : "EDGE",
            "constraints": [["Person", "Person"]]
        },
        {
            "label" : "MARRIED",
            "type" : "EDGE",
            "constraints": [["Person", "Person"]]
        },
        {
            "label" : "BORN_IN", 
            "type" : "EDGE",
            "properties" : [
                {"name" : "weight", "type":"FLOAT", "optional":true}
            ],
            "constraints": [["Person", "City"]]
        },
        {
            "label" : "DIRECTED",
            "type" : "EDGE",
            "constraints": [["Person", "Film"]]
        },
        {
            "label" : "WROTE_MUSIC_FOR",
            "type" : "EDGE",
            "constraints": [["Person", "Film"]]
        },
        {
            "label" : "ACTED_IN",
            "type" : "EDGE",
            "properties" : [
                {"name" : "charactername", "type":"STRING"}
            ],
            "constraints": [["Person", "Film"]]
        }
    ],
    "files" : [
        {
            "path" : "person.csv",
            "format" : "CSV",
            "label" : "Person",
            "columns" : ["name","birthyear"]
        },
        {
            "path" : "city.csv",
            "format" : "CSV",
            "label" : "City",
            "columns" : ["name"]
        },
        {
            "path" : "film.csv",
            "format" : "CSV",
            "label" : "Film",
            "columns" : ["title"]
        },
        {
            "path" : "has_child.csv",
            "format" : "CSV",
            "label" : "HAS_CHILD",
            "SRC_ID" : "Person",
            "DST_ID" : "Person",
            "columns" : ["SRC_ID","DST_ID"]
        },
        {
            "path" : "married.csv",
            "format" : "CSV",
            "label" : "MARRIED",
            "SRC_ID" : "Person",
            "DST_ID" : "Person",
            "columns" : ["SRC_ID","DST_ID"]
        },
        {
            "path" : "born_in.csv",
            "format" : "CSV",
            "label" : "BORN_IN",
            "SRC_ID" : "Person",
            "DST_ID" : "City",
            "columns" : ["SRC_ID","DST_ID","weight"]
        },
        {
            "path" : "directed.csv",
            "format" : "CSV",
            "label" : "DIRECTED",
            "SRC_ID" : "Person",
            "DST_ID" : "Film",
            "columns" : ["SRC_ID","DST_ID"]
        },
        {
            "path" : "wrote.csv",
            "format" : "CSV",
            "label" : "WROTE_MUSIC_FOR",
            "SRC_ID" : "Person",
            "DST_ID" : "Film",
            "columns" : ["SRC_ID","DST_ID"]
        },
        {
            "path" : "acted_in.csv",
            "format" : "CSV",
            "label" : "ACTED_IN",
            "SRC_ID" : "Person",
            "DST_ID" : "Film",
            "columns" : ["SRC_ID","DST_ID","charactername"]
        }
    ]
}
)";

已修改

@wangtao9
Copy link
Contributor

@spasserby 还有更多评论吗?

@wangtao9 wangtao9 merged commit 4c415cf into TuGraph-family:master Jul 27, 2023
2 checks passed
@wangtao9
Copy link
Contributor

Thank you for your contribution! @seijiang @spasserby

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants