# [Graph Algorithms: Practical Examples in Apache Spark and Neo4j](https://neo4j.com/neo4j-graph-analytics)

- https://github.com/neo4j-graph-analytics/book
- https://neo4j.com/neo4j-graph-analytics
- https://resources.oreilly.com/examples/0636920233145

In [None]:
%%bash

mkdir -p data vendor

curl -sL 'https://resources.oreilly.com/examples/0636920233145/-/archive/master/0636920233145-master.tar.bz2' |
  tar --strip-components=1 -xjC data
curl -sL 'https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/3.5.0.9/apoc-3.5.0.9-all.jar' \
  >vendor/apoc-3.5.0.9-all.jar
curl -sL 'https://s3-eu-west-1.amazonaws.com/com.neo4j.graphalgorithms.dist/neo4j-graph-algorithms-3.5.9.0-standalone.jar' \
  >vendor/neo4j-graph-algorithms-3.5.9.0-standalone.jar

## Chapter 3. Graph Platforms and Processing

### PySpark

```bash
pyspark --packages=graphframes:graphframes:0.8.0-spark2.4-s_2.11
```

### Neo4j

See https://github.com/neo4j-contrib/neo4j-apoc-procedures/#manual-installation-download-latest-release.

```bash
d=/tmp/neo4j-graph-analytics/neo4j bash -c 'mkdir -p $d/data $d/import $d/plugins'
cp -t /tmp/neo4j-graph-analytics/neo4j/plugins \
  vendor/apoc-3.5.0.9-all.jar \
  vendor/neo4j-graph-algorithms-3.5.9.0-standalone.jar
docker run --rm -e NEO4J_AUTH=none -p 7474:7474 -p 7687:7687 \
  -e NEO4J_dbms_security_procedures_unrestricted='algo.*,apoc.\\\*' \
  -v /tmp/neo4j-graph-analytics/neo4j/data:/data \
  -v /tmp/neo4j-graph-analytics/neo4j/import:/import \
  -v /tmp/neo4j-graph-analytics/neo4j/plugins:/plugins \
  neo4j:3.5.9
```

## Chapter 4. Pathfinding and Graph Search Algorithms

In [None]:
from graphframes import GraphFrame
from pyspark.sql import SparkSession
from pyspark.sql.types import (
    FloatType,
    IntegerType,
    StringType,
    StructField,
    StructType,
)

spark = (
    SparkSession.builder.appName("neo4j-graph-analytics")
    .config("spark.packages", "graphframes:graphframes:0.8.0-spark2.4-s_2.11")
    .getOrCreate()
)

g = GraphFrame(
    spark.read.csv(
        "data/data/transport-nodes.csv",
        header=True,
        schema=StructType(
            [
                StructField("id", StringType(), True),
                StructField("latitude", FloatType(), True),
                StructField("longitude", FloatType(), True),
                StructField("population", IntegerType(), True),
            ]
        ),
    ),
    (
        spark.read.csv(
            "data/data/transport-relationships.csv", header=True
        ).createOrReplaceTempView("transport_relationships")
        or spark.sql(
            """
            select src, dst, relationship, cost from transport_relationships
            union
            select dst as src, src as dst, relationship, cost from transport_relationships
            """
        )
    ),
)