References: https://graphframes.github.io/graphframes/docs/_site/index.html

GraphFrames represent graphs. GraphX is to RDDs as GraphFrames are to DataFrames. The key difference is that GraphFrames are based upon Spark DataFrames, rather than RDDs.


# Install

1. Download a package of graphframe from https://spark-packages.org/package/graphframes/graphframes

2. Copy the jar to the subfolder jars of your spark directory.

3. Go to the jar directory and run the following:

```
pyspark --packages graphframes:graphframes:0.8.0-spark3.0-s_2.12 --jars graphframes-0.8.0-spark3.0-s_2.12.jar
```

4. On the screen, you can see the information about Ivy Default Cache. After exiting spark. copy all the jar files in the Ivy Default Cache directory to your spark's jars directory.

5. Do Step 3 again and use GraphFrame.

# GraphFrame

```
GraphFrame(v, e)
```

* v is a DataFrame holding vertex information. It must contain a column named "id" that stores unique vertex IDs.
* e is a DataFrame holding edge information. It must contain two columns "src" and "dst" storing source vertex IDs and destination vertex IDs of edges, respectively.

```python
v = spark.createDataFrame([(1,"A"), (2,"B"), (3, "C")], ["id", "name"])
e = spark.createDataFrame([(1,2,3.2,"love"), (2,1,1.5,"hate"), (2,3,0.8,"follow")], ["src", "dst", "amount", "action"])

v.show()
+---+----+
| id|name|
+---+----+
|  1|   A|
|  2|   B|
|  3|   C|
+---+----+

e.show()
+---+---+------+------+
|src|dst|amount|action|
+---+---+------+------+
|  1|  2|   3.2|  love|
|  2|  1|   1.5|  hate|
|  2|  3|   0.8|follow|
+---+---+------+------+

g = GraphFrame(v, e)
```

## degrees, inDegrees, outDegrees

```python
g.degrees.show()
+---+------+
| id|degree|
+---+------+
|  1|     2|
|  3|     1|
|  2|     3|
+---+------+

g.inDegrees.show()
+---+--------+
| id|inDegree|
+---+--------+
|  1|       1|
|  3|       1|
|  2|       1|
+---+--------+

g.outDegrees.show()
+---+---------+
| id|outDegree|
+---+---------+
|  1|        1|
|  2|        2|
+---+---------+
```

## find()

```python
motifs = g.find("(a)-[ab]->(b); (b)-[bc]->(c)")
motifs.show()
+------+-----------------+------+-------------------+------+
|     a|               ab|     b|                 bc|     c|
+------+-----------------+------+-------------------+------+
|[1, A]|[1, 2, 3.2, love]|[2, B]|  [2, 1, 1.5, hate]|[1, A]|
|[1, A]|[1, 2, 3.2, love]|[2, B]|[2, 3, 0.8, follow]|[3, C]|
|[2, B]|[2, 1, 1.5, hate]|[1, A]|  [1, 2, 3.2, love]|[2, B]|
+------+-----------------+------+-------------------+------+

motifs.filter("b.id = 2 and bc.amount > 1").show()
+------+-----------------+------+-----------------+------+
|     a|               ab|     b|               bc|     c|
+------+-----------------+------+-----------------+------+
|[1, A]|[1, 2, 3.2, love]|[2, B]|[2, 1, 1.5, hate]|[1, A]|
+------+-----------------+------+-----------------+------+
```

## pageRank()

```python
ranks.vertices.show()
+---+----+------------------+
| id|name|          pagerank|
+---+----+------------------+
|  1|   A|0.8853120342811294|
|  3|   C|0.8853120342811294|
|  2|   B|1.2293759314377415|
+---+----+------------------+
```

## bfs()

Breadth-First Search

bfs(fromExpr, toExpr, edgeFilter=None, maxPathLength=10)

```python
g.bfs(fromExpr="id=1", toExpr="id=2", maxPathLength=2).show()
+------+-----------------+------+
|  from|               e0|    to|
+------+-----------------+------+
|[1, A]|[1, 2, 3.2, love]|[2, B]|
+------+-----------------+------+
```