Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][RFC] Heterogeneous graph interface #553

Merged
merged 16 commits into from May 26, 2019

Conversation

Projects
None yet
3 participants
@BarclayII
Copy link
Collaborator

commented May 23, 2019

Description

The Python interface proposal for heterogeneous graphs (heterograph for short - although the word heterograph means something unrelated).

  • A heterograph in DGL additionally requires that the edges with the same type MUST have the same source node type and destination node type.
  • Users can only assign features on one node/edge type at a time.
  • Users can induce subgraphs from heterograph with node/edge types.
  • Different node/edge types may have different node/edge feature schemes.
  • Message functions and edge apply functions can be dictionary of edge types and edge UDFs.
  • Reduce functions can be dictionary of edge types and node UDFs.
  • Node apply functions can be dictionary of node types and node UDFs.

Please feel free to add other features you wish to have or make comments.

I accidentally merged from master and messed up the diff so I'm reopening a new PR (and closed the old one)

@BarclayII BarclayII requested review from jermainewang and zheng-da May 23, 2019

queries the subgraph structure (e.g. calling ``in_edges``, but not
``update_all``).
"""
pass

This comment has been minimized.

Copy link
@zheng-da

zheng-da May 23, 2019

Collaborator

what are the methods in this class?

This comment has been minimized.

Copy link
@BarclayII

BarclayII May 24, 2019

Author Collaborator

The methods in DGLBaseGraph would materialize the subgraph since they explicitly ask for the structure of the subgraph.
For message passing APIs we clearly don't have to materialize it.

This comment has been minimized.

Copy link
@zheng-da

zheng-da May 25, 2019

Collaborator

If the class is inherited from DGLBaseHeteroGraph, there won't be computation APIs. It should be inherited from DGLHeteroGraph?

This comment has been minimized.

Copy link
@zheng-da

zheng-da May 25, 2019

Collaborator

I don't think we should materialize the subgraph for query APIs. For example, g['user'].number_of_nodes(). If we need to materialize the subgraph before getting the number of nodes. It'll be too expensive.

Show resolved Hide resolved python/dgl/heterograph.py
Show resolved Hide resolved python/dgl/heterograph.py Outdated
Show resolved Hide resolved python/dgl/heterograph.py Outdated
@zheng-da

This comment has been minimized.

Copy link
Collaborator

commented May 24, 2019

BTW, the interface doesn't contain the ones in DGLBaseGraph.

@zheng-da

This comment has been minimized.

Copy link
Collaborator

commented May 24, 2019

I think the class also needs the following functions.

def register_message_func(self, func):
def register_reduce_func(self, func):
def register_apply_node_func(self, func):
def register_apply_edge_func(self, func):
"""
pass

def predecessors(self, v):

This comment has been minimized.

Copy link
@zheng-da

zheng-da May 25, 2019

Collaborator

this method needs node type. otherwise, it doesn't work with bipartite graph.

This comment has been minimized.

Copy link
@BarclayII

BarclayII May 25, 2019

Author Collaborator

If we require the subgraph to have only one edge type then it does. v would always be of the node type of destination.

This comment has been minimized.

Copy link
@zheng-da

zheng-da May 25, 2019

Collaborator

so you assume that all edges in a bipartite graph go from one type of nodes to another type? Is it always true?

This comment has been minimized.

Copy link
@BarclayII

BarclayII May 25, 2019

Author Collaborator

We assume that all edges belonging to the same edge type always have the same source node type and destination node type (which is the assumption of the metagraph)

"""
pass

def in_edges(self, v, form='uv'):

This comment has been minimized.

Copy link
@zheng-da

zheng-da May 25, 2019

Collaborator

this also needs node type.

This comment has been minimized.

Copy link
@BarclayII

BarclayII May 25, 2019

Author Collaborator

Ditto.

"""
pass

def out_edges(self, v, form='uv'):

This comment has been minimized.

Copy link
@zheng-da

zheng-da May 25, 2019

Collaborator

the same here.

This comment has been minimized.

Copy link
@BarclayII

BarclayII May 25, 2019

Author Collaborator

Ditto but of source type.

"""
pass

def in_degree(self, v):

This comment has been minimized.

Copy link
@zheng-da

zheng-da May 25, 2019

Collaborator

this also require node type.

This comment has been minimized.

Copy link
@zheng-da

zheng-da May 25, 2019

Collaborator

how to get degree in the original heterogeneous graph.

This comment has been minimized.

Copy link
@BarclayII

BarclayII May 25, 2019

Author Collaborator

how to get degree in the original heterogeneous graph.

If you want to you can loop over the in-edges of the metagraph, get all the edge types connecting to the node type you're considering, and call this method for each edge type.

And I don't know if the degree of all types is meaningful enough; usually different entities like "User" and "Developer" should not be counted together anyway.

This comment has been minimized.

Copy link
@zheng-da

zheng-da May 25, 2019

Collaborator

I don't know what people want to do. Since this is query API, I feel it should be reasonably flexible. I guess your solution will be something like this:

in_deg = zeros(...)
for ntype1, ntype2, etype in metagraph:
    in_deg = in_deg + g[ntype1, ntype2, etype].in_degree(v)

something like this. It might be a little slow. Maybe it's fine.

For a bipartite graph, what are v? Do they have to be left nodes? right nodes? It has to be one type of nodes, right? It's a little confusing which side it should be.

This comment has been minimized.

Copy link
@BarclayII

BarclayII May 25, 2019

Author Collaborator

I don't know what people want to do. Since this is query API, I feel it should be reasonably flexible. I guess your solution will be something like this:

in_deg = zeros(...)
for ntype1, ntype2, etype in metagraph:
    in_deg = in_deg + g[ntype1, ntype2, etype].in_degree(v)

something like this. It might be a little slow. Maybe it's fine.

Let's keep the interface simple first, as the current query interface is consistent with DGLBaseGraph. I doubt if people would only care about the total number of predecessors without caring about the types. If users do want more flexible interfaces we can later add new ones such as typed_successors.

For a bipartite graph, what are v? Do they have to be left nodes? right nodes? It has to be one type of nodes, right? It's pretty confusing which side it should be.

They will always be "destination" nodes, since the bipartite graph would be directed. A undirected bipartite graph would be "directionized" and the result would have a relationship (edge type) pointing left to right, and another pointing right to left. So one can still do

g['left', 'right', 'ltor'].in_degrees(...)  # in degrees of right nodes
g['right', 'left', 'rtol'].in_degrees(...)  # in degrees of left nodes

This comment has been minimized.

Copy link
@zheng-da

zheng-da May 25, 2019

Collaborator

i see. i feel this is a little counter-intuitive, but it works. we could have users to give us feedback later.

BarclayII added some commits May 25, 2019

self,
metagraph,
number_of_nodes_by_type,
edge_connections_by_type):

This comment has been minimized.

Copy link
@jermainewang

jermainewang May 25, 2019

Member

How to support converting from other format like networkx, scipy.

This comment has been minimized.

Copy link
@BarclayII

BarclayII May 25, 2019

Author Collaborator

Added from_networkx. I'm not sure how to handle scipy matrices since it's likely that we'll specify a list of scipy matrices for each edge type (which is handled by __init__)

BarclayII added some commits May 25, 2019

@zheng-da zheng-da merged commit 538127a into dmlc:heterograph May 26, 2019

1 check passed

continuous-integration/jenkins/pr-merge This commit looks good
Details

BarclayII added a commit that referenced this pull request Jun 17, 2019

[API] Heterograph (#657)
* [Feature][RFC] Heterogeneous graph interface (#553)

* heterogeneous graph interface

* lint

* disable lints

* disable lint checks

* change node_types to dict

* update

* update

* heterograph view

* message passing with types

* clarifications

* graph queries

* clarifications

* moving add_XXX to Base

* from_networkx

* register functions

* Update heterograph.py

* lint.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.