Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GIE/IR] Introduce a new strategy for pattern matching #2159

Merged
merged 30 commits into from Nov 18, 2022

Conversation

BingqingLyu
Copy link
Collaborator

What do these changes do?

As titled.

Related issue number

Fixes #2033

@BingqingLyu BingqingLyu marked this pull request as draft October 25, 2022 05:53
@codecov-commenter
Copy link

codecov-commenter commented Oct 25, 2022

Codecov Report

Merging #2159 (43f3425) into main (92f7da3) will decrease coverage by 31.75%.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #2159       +/-   ##
===========================================
- Coverage   71.95%   40.19%   -31.76%     
===========================================
  Files          89       87        -2     
  Lines        9884     9847       -37     
===========================================
- Hits         7112     3958     -3154     
- Misses       2772     5889     +3117     
Impacted Files Coverage Δ
...ython/graphscope/tests/unittest/test_cython_ast.py 0.00% <0.00%> (-100.00%) ⬇️
...ython/graphscope/tests/unittest/test_serailaize.py 0.00% <0.00%> (-100.00%) ⬇️
python/graphscope/analytical/udf/patch.py 3.47% <0.00%> (-96.53%) ⬇️
python/graphscope/tests/unittest/test_lazy.py 0.00% <0.00%> (-96.19%) ⬇️
python/graphscope/tests/unittest/test_app.py 0.00% <0.00%> (-96.11%) ⬇️
...thon/graphscope/tests/unittest/test_scalability.py 0.00% <0.00%> (-96.00%) ⬇️
...hon/graphscope/tests/unittest/test_create_graph.py 0.00% <0.00%> (-92.73%) ⬇️
python/graphscope/tests/unittest/test_graph.py 0.00% <0.00%> (-85.47%) ⬇️
python/graphscope/tests/unittest/test_context.py 0.00% <0.00%> (-81.04%) ⬇️
python/graphscope/analytical/udf/wrapper.py 27.27% <0.00%> (-72.73%) ⬇️
... and 38 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 92f7da3...43f3425. Read the comment docs.


/// Given a DefiniteExtendEdge, we can uniquely locate an edge with dir in the pattern
#[derive(Debug, Clone)]
pub struct DefiniteExtendEdge {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what DefiniteExtendEdge is? A comment may help. In addition, ExactExtendEdge could be a better name.


/// Given a DefiniteExtendStep, we can uniquely find which part of the pattern to extend
#[derive(Debug, Clone)]
pub struct DefiniteExtendStep {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the comments of DefiniteExtendEdge

pub type DynIter<'a, T> = Box<dyn Iterator<Item = T> + 'a>;

#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Serialize, Deserialize)]
pub enum PatternDirection {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like PatternEdgeDirection. How about Both direction?

}
}

pub mod canonical_label;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move mod declaration to the top, after use.

}

#[derive(Debug, Clone)]
pub enum EdgeData {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not EdgeData. PbEdgeOrPath?

match self.cmp_vertices(pattern, current_v_id, vertex_group[j])? {
Ordering::Greater => vertex_group_tmp_vec[i] += 1,
Ordering::Less => vertex_group_tmp_vec[j] += 1,
Ordering::Equal => (),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering::Equal => /* add comment for why this needs no processing */

updated_vertex_groups
.entry((v_label, v_group))
.and_modify(|vertex_group| vertex_group.push(current_v_id))
.or_insert(vec![current_v_id]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or_insert_with(Vec::new)
.push(current_v_vid);

.get(&current_v_id)
.unwrap_or(&None)
.is_some();
if !is_vertex_visited {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if veretx_rank_map.containsKey(&current_v_id) { ... }
do not need is_vertex_visited above

// Since vertices in the same pattern will never be given the same rank, two adjacencies cannot be equal.
let adj1_v_rank = self.get_vertex_rank(adj1_v_id);
let adj2_v_rank = self.get_vertex_rank(adj2_v_id);
if adj1_v_rank.is_none() && adj2_v_rank.is_none() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if adj1_v_rank.is_none() && adj2_v_rank.is_none() {
            Ordering::Equal
        } else if adj1_v_rank.is_none() {
            Ordering::Less
        } else if adj2_v_rank.is_none() {
             Ordering::Greater
        } else {
            adj1_v_rank.unwrap().cmp(&adj2_v_rank.unwrap())
        }

// Since vertices in the same pattern will never be given the same rank, two adjacencies cannot be equal.
let adj1_v_rank = vertex_rank_map.get(&adj1_v_id);
let adj2_v_rank = vertex_rank_map.get(&adj2_v_id);
if adj1_v_rank.is_none() && adj2_v_rank.is_none() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above codes

…table(label) value, and end with `Project` op with user-given alias
… only when any system-given alias exists.
… simplicity, instead, maintain max_tag_id
@BingqingLyu BingqingLyu force-pushed the ir_catalog_naive_path_expand branch 2 times, most recently from f535fda to 446bd46 Compare November 2, 2022 03:04
@BingqingLyu BingqingLyu marked this pull request as ready for review November 2, 2022 09:11
@@ -75,7 +74,7 @@ pub struct LogicalPlan {
pub(crate) nodes: VecMap<NodeType>,
/// To record the nodes' maximum id in the logical plan. Note that the nodes
/// **include the removed ones**
pub(crate) max_node_id: NodeId,
pub max_node_id: NodeId,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果想要在crate外面可见,那就用一个函数而非直接改这个可见范围

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if !source.is_queryable() {
if let Ok(store_meta) = STORE_META.read() {
if let Some(pattern_meta) = store_meta.pattern_meta.as_ref() {
if let Ok(extend_strategy) =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里两种不同的strategy的应用应该更清晰一点,类似于
if 满足xx条件:
Apply ExtendStrategy
else:
Apply Naive Strategy

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -37,6 +38,16 @@ lazy_static! {
pub static ref STORE_META: RwLock<StoreMeta> = RwLock::new(StoreMeta::default());
}

pub fn set_meta_from_json<R: io::Read>(read: R) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个用set schema的方式设置pattern meta,是不是可以仔细考虑一下。set schema一般只会设置一次,而set_pattern_meta,每个user frontend都可能会去设置。是不是现在的方式不太合理?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为目前schema中信息够了,暂时先把pattern_meta去掉了

/// Key: vertex label name, Value: vertex labal id
///
/// Usage: get a vertex label id from its label name
vertex_label_map: BiBTreeMap<String, PatternLabelId>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些在schema里面不是有吗?为啥需要在PatternMeta里存一遍?

Copy link
Collaborator

@longbinlai longbinlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setting of PatternMeta requires extra consideration.

/// 3. vertex adjacent to more path_expand should be extended later;
/// 4. vertex with larger degree will be extended later
fn compare_vertices(v1_id: PatternId, v2_id: PatternId, trace_pattern: &Pattern) -> Ordering {
let v1_weight = estimate_vertex_weight(v1_id, trace_pattern);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些estimate_xx_weight() 的函数可不可以给个抽象,然后现在这个estimation的方法只是一个naive实现?

@@ -42,3 +43,12 @@ pub(crate) fn query_params(
extra: HashMap::new(),
}
}

pub trait PatternOrderTrait<D> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do not just implement std::cmp::Ord?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个其实是给了一个sort_by()里的udf,因为可能是输入pattern_id来做order,而并不是直接compare pattern本身

}

pub trait PatternWeightTrait<W: PartialOrd> {
fn estimate_vertex_weight(&self, vid: PatternId) -> W;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_vertex_weight() and get_adjacencies_weight() are better names.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@longbinlai longbinlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay for this PR. We may consider abstract the Entry as a trait and clarify the layer between a logical and a physical plan.

@BingqingLyu BingqingLyu merged commit aed1f13 into alibaba:main Nov 18, 2022
@BingqingLyu BingqingLyu deleted the ir_catalog_naive_path_expand branch November 29, 2022 07:18
@BingqingLyu BingqingLyu restored the ir_catalog_naive_path_expand branch November 29, 2022 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[GIE/IR] Introduce a new strategy for pattern matching
3 participants