GitHub - dbs-leipzig/gradoop: Distributed Temporal Graph Analytics with Apache Flink

Gradoop: Distributed Graph Analytics on Hadoop

Gradoop is an open source (ALv2) research framework for scalable graph analytics built on top of Apache Flink. It offers a graph data model which extends the widespread property graph model by the concept of logical graphs and further provides operators that can be applied on single logical graphs and collections of logical graphs. The combination of these operators allows the flexible, declarative definition of graph analytical workflows. Gradoop can be easily integrated in a workflow which already uses Flink® operators and Flink® libraries (i.e. Gelly, ML and Table).

Gradoop is work in progress which means APIs may change. It is currently used as a proof of concept implementation and far from production ready.

The project's documentation can be found in our Wiki. The Wiki also contains a tutorial to help getting started using Gradoop.

Further Information (articles and talks)

Data Model

In the extended property graph model (EPGM), a database consists of multiple property graphs which are called logical graphs. These graphs describe application-specific subsets of vertices and edges, i.e. a vertex or an edge can be contained in multiple logical graphs. Additionally, not only vertices and edges but also logical graphs have a type label and can have different properties.

Data Model elements (logical graphs, vertices and edges) have a unique identifier, a single label (e.g. User) and a number of key-value properties (e.g. name = Alice). There is no schema involved, meaning each element can have an arbitrary number of properties even if they have the same label.

Graph operators

The EPGM provides operators for both single logical graphs as well as collections of logical graphs; operators may also return single graphs or graph collections. An overview and detailed descriptions of the implemented operators can be found in the Gradoop Wiki.

Setup

Use gradoop via Maven

Add one of the following dependencies to your maven project

Stable:

<dependency>
    <groupId>org.gradoop</groupId>
    <artifactId>gradoop-flink</artifactId>
    <version>0.6.0</version>
</dependency>

Latest weekly build (additional repository is required):

<repositories>
    <repository>
        <id>oss.sonatype.org-snapshot</id>
        <url>http://oss.sonatype.org/content/repositories/snapshots</url>
        <releases><enabled>false</enabled></releases>
        <snapshots><enabled>true</enabled></snapshots>
    </repository>
</repositories>

<dependency>
    <groupId>org.gradoop</groupId>
    <artifactId>gradoop-flink</artifactId>
    <version>0.7.0-SNAPSHOT</version>
</dependency>

In any case you also need Apache Flink (version 1.9.3):

<dependencies>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-java</artifactId>
        <version>1.9.3</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-clients_2.11</artifactId>
        <version>1.9.3</version>
    </dependency>
</dependencies>

Build gradoop from source

Gradoop requires Java 8
Clone Gradoop into your local file system

git clone https://github.com/dbs-leipzig/gradoop.git
Build and execute tests

cd gradoop

mvn clean install
You might want to skip tests for faster builds. Also, some tests fail on Windows due to missing test dependencies

mvn clean install -DskipTests

Windows

Some operators require the Hadoop winutils

Gradoop modules

gradoop-common

The main contents of that module are the EPGM data model and a corresponding POJO implementation which is used in Flink®. The persistent representation of the EPGM is also contained in gradoop-common and together with its mapping to HBase™.

gradoop-data-integration

Provides functionalities to support graph data integration. This includes minimal CSV and JSON importers as well as graph transformation operators (e.g. connect neighbors or conversion of edges to vertices and vice versa).

gradoop-accumulo

Input and output formats for reading and writing graph collections from Apache Accumulo®.

gradoop-hbase

Input and output formats for reading and writing graph collections from Apache HBase™.

gradoop-flink

This module contains reference implementations of the EPGM operators. The EPGM is mapped to Flink® DataSets while the operators are implemented using DataSet transformations. The module also contains implementations of general graph algorithms (e.g. Label Propagation, Frequent Subgraph Mining) adapted to be used with the EPGM model.

gradoop-temporal

This module contains a reference implementation of the Temporal Property Graph Model (TPGM) and it's operators used to perform graph analysis with respect to the additional time dimension in real-world graphs.

gradoop-examples

Contains example pipelines showing use cases for Gradoop.

Graph grouping example (build structural aggregates of property graphs)
Social network examples (composition of multiple operators to analyze social networks graphs)
Input/Output examples (usage of DataSource and DataSink implementations)

gradoop-checkstyle

Used to maintain the code style for the whole project.

Related Repositories

Gradoop Tutorial

Gradoop Tutorial which has been shown in BOSS20' Workshop of VLDB 2020 international conference.

Gradoop Benchmarks

This repository contains sets of Gradoop operator benchmarks designed to run on a cluster to measure scalability and speedup of the operators.

Gradoop Demo

Demo application to show the functionalities of the grouping and query operator in an interactive web UI.

Temporal Graph Explorer

Gradoop Temporal Graph Explorer Demo which showcases some operators of the Temporal Property Graph Model.

Gradoop GDL

This repository contains the definition of our Temporal Graph Definition Language (Temporal-GDL).

Version History

See the Changelog at the Wiki pages.

Disclaimer

Apache®, Apache Accumulo®, Apache Flink, Flink®, Apache HBase™ and HBase™ are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

Name		Name	Last commit message	Last commit date
Latest commit History 1,225 Commits
.github		.github
dev-support		dev-support
gradoop-checkstyle		gradoop-checkstyle
gradoop-common		gradoop-common
gradoop-data-integration		gradoop-data-integration
gradoop-examples		gradoop-examples
gradoop-flink		gradoop-flink
gradoop-quickstart		gradoop-quickstart
gradoop-store		gradoop-store
gradoop-temporal		gradoop-temporal
licenses-binary		licenses-binary
licenses		licenses
src/site		src/site
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
NOTICE-binary		NOTICE-binary
README.md		README.md
pom.xml		pom.xml
spotbugs-exclude.xml		spotbugs-exclude.xml

License

dbs-leipzig/gradoop

Folders and files

Latest commit

History

Repository files navigation