Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GAE Docs] Add intro to Pregel and PIE model #2469

Merged
merged 9 commits into from
Mar 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
72 changes: 70 additions & 2 deletions docs/analytical_engine/programming_model_pie.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,74 @@
# Programming Model: PIE

- Workflow, PEval and IncEval: intro
- Under the hook
Although the [vertex-centric programming model](https://graphscope.io/docs/latest/analytical_engine/vertex_centric_models.html) can express various graph analytics algorithms, existing sequential (single-machine) graph algorithms have to be modified to comply with the “think like a vertex” principle, making parallel graph computation a privilege for experienced users only. In addition, the performance of graph algorithms with vertex-centric model is sub-optimal in many cases: each vertex only has information about its 1-hop neighbors, and thus information is propagated through the graph slowly, one hop at a time. As a result, it may take many computation iterations to propagate a piece of information from a source to a destination.

## What is the PIE Model?

To address the abovementioned problems, we proposed a new programming model PIE (PEval-IncEval-Assemble) in a [SIGMOD paper](https://dl.acm.org/doi/10.1145/3035918.3035942) published in 2017. Different from the vertex-centric, the PIE model can automatically parallelize existing sequential graph algorithms, with only some small changes. This makes parallel graph computations accessible to users who know conventional graph algorithms covered in college textbooks, and there is no need to recast existing graph algorithms into a new model.

Specifically, in the PIE model, users only need to provide three functions,

- (1) *PEval*, a sequential (single-machine) function for given a query, computes the answer on a local partition;
- (2) *IncEval*, a sequential incremental function, computes changes to the old output by treating incoming messages as updates; and
- (3) *Assemble*, which collects partial answers, and combines them into a complete answer.

:::{figure-md}

<img src="../images/pie.png"
alt="The PIE model"
width="80%">

The PIE model.
:::

## Workflow of PIE

The PIE model works on a graph *G* and each worker maintains a partition of *G*. Given a query, each worker first executes *PEval* against its local partition, to compute partial answers in parallel. Then each worker may exchange partial results with other workers via synchronous message passing. Upon receiving messages, each worker incrementally computes *IncEval*. The incremental step iterates until no further messages can be generated. At this point, *Assemble* pulls partial answers and assembles the final result. In this way, the PIE model parallelizes existing sequential graph algorithms, without revising their logic and workflow.

In this model, users do not need to know the details of the distributed setting while processing big graphs in a cluster, and the PIE model auto-parallelizes the graph analytics tasks across
a cluster of workers, based on a fixpoint computation. Under a monotonic condition, it guarantees to
converge with correct answers as long as the three sequential algorithms provided are correct.

The following pseudo-code shows how SSSP is expressed in the PIE model, where the Dijkstra’s algorithm is directly used for the computation of parallel SSSP.

```python
def dijkstra(g, vals, updates):
heap = VertexHeap()
for i in updates:
vals[i] = updates[i]
heap.push(i, vals[i])

updates.clear()

while not heap.empty():
u = heap.top().vid
distu = heap.top().val
heap.pop()
for e in g.get_outgoing_edges(u):
v = e.get_neighbor()
distv = distu + e.data()
if vals[v] > distv:
vals[v] = distv
if g.is_inner_vertex(v):
heap.push(v, distv)
updates[v] = distv
return updates

def PEval(source, g, vals, updates):
for v in vertices:
updates[v] = MAX_INT
updates[source] = 0
dijkstra(g, vals, updates)

def IncEval(source, g, vals, updates):
dijkstra(g, vals, updates)
```






TODO(wanglei): Under the hook
- MessageBuffer,
- ParallelExecutor,
69 changes: 60 additions & 9 deletions docs/analytical_engine/vertex_centric_models.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,71 @@
# Vertex Centric Models
# Vertex-Centric Model

biref intro to vertex centric models, history;
In a single machine environment, developers can easily implement graph analytics algorithms as they have a global view of the graph and can freely iterate through all vertices and edges. When the size of graph data grows beyond the memory capacity of a single machine, graph data must be partitioned to distributed memory, leading to the indivisibility of the whole graph structure.

## GAS Model
To allow developers to succinctly express graph analytics algorithms under such environment, the *vertex-centric programming model* have been developed, with the philosophy of "think like a vertex". Specifically, a graph analytics algorithm iteratively executes a user-defined program over vertices of a graph. The user-defined vertex program
typically takes data from other vertices as input, and the
resultant output of a vertex is sent to other vertices. Vertex programs are
executed iteratively for a certain number of rounds, or until a convergence condition is
satisfied. As opposed to *global* perspective of the graph, vertex-centric models
employ a local, vertex-oriented perspective.

intro and figure
The philosophy of vertex-centric model encourages many programming models, including the [Pregel model](https://research.google/pubs/pub37252/) proposed by Google and the [GAS model](https://www.usenix.org/conference/osdi12/technical-sessions/presentation/gonzalez). These programming models
have widely applied in various graph processing systems, such as Giraph, GraphX and PowerGraph.
doudoubobo marked this conversation as resolved.
Show resolved Hide resolved

## Pregel Model

intro and figure
Pregel was first introduced in a [SIGMOD paper](https://dl.acm.org/doi/10.1145/1807167.1807184) published by Google in 2010. A graph analytics algorithm with the Pregel model consists of a sequence of iterations(called *supersteps*).

> “During a superstep the framework invokes a user-defined function for each vertex, conceptually in parallel. The function specifies behavior at a single vertex *V* and a single superstep *S*. It can read messages sent to *V* in superstep *S − 1*, send messages to other vertices that will be received at superstep *S + 1*, and modify the state of *V* and its outgoing edges. Messages are typically sent along outgoing edges, but a message may be sent to any vertex whose identifier is known.”

:::{figure-md}

<img src="../images/pregel.png"
alt="The Pregel model"
width="50%">

The Pregel model.
:::

The iterations terminate until no messages are sent from any vertex, indicating a halt.

The vertex function can be invoked at each vertex in parallel, since individual vertices communicate via message-passing.

With the Pregel model, the vertex program of single source shortest paths (SSSP) is expressed as follows.

```c++
void Compute(MessageIterator* msgs) {
int mindist = IsSource(vertex_id()) ? 0 : INF;
for (; !msgs->Done(); msgs->Next())
mindist = min(mindist, msgs->Value());
if (mindist < GetValue()) {
*MutableValue() = mindist;
OutEdgeIterator iter = GetOutEdgeIterator();
for (; !iter.Done(); iter.Next())
SendMessageTo(iter.Target(), mindist + iter.GetValue());
}
VoteToHalt();
}
```

## GAS Model

However, the performance of Pregel drops dramatically when facing natural graphs which follow a power-law distribution. To solve this problem, [PowerGraph](https://www.usenix.org/conference/osdi12/technical-sessions/presentation/gonzalez) proposed the GAS (Gather-Apply-Scatter) programming model for the vertex-cut graph partitioning strategy. The *Gather* function runs locally on each partition and then one accumulator is sent from each mirror to the master. The master runs the *Apply* function and then sends the updated vertex data to all mirrors. Finally, the *Scatter* phase is run in parallel on mirrors to update the data on adjacent edges.

:::{figure-md}

<img src="../images/gas.png"
alt="The GAS model"
width="50%">

The GAS model.
:::

## Simulation of Pregel Model in Analytical Engine

As we proved in the paper, the analytical engine is able to simulate the vertex centric models. We have implemented the support for Pregel model in the analytical engine, and you can use the Pregel API to write your own algorithms. In addition, if you already have your graph applications implemented in Giraph or GraphX, you can run them on GraphScope directly. Better still, the analytical engine can achieve better performance than Giraph and GraphX.
As we proved in this [paper](https://dl.acm.org/doi/pdf/10.1145/3035918.3035942), the analytical engine of GraphScope is able to simulate the vertex centric models. We have implemented the support for Pregel model in the analytical engine, and you can use the Pregel APIs to write your own algorithms. In addition, if you already have your graph applications implemented in Giraph or GraphX, you can run them on GraphScope directly. Better still, the analytical engine can achieve better performance than Giraph and GraphX.

Please refer to related tutorials.
- t1
- t2
- t3

- [Tutorial: Run Giraph Applications on GraphScope](https://graphscope.io/docs/latest/analytical_engine/tutorial_run_giraph_apps.html)
- [Tutorial: Run GraphX Applications on GraphScope](https://graphscope.io/docs/latest/analytical_engine/tutorial_run_graphx_apps.html)
Binary file added docs/images/gas.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/pregel.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.