GraphFlash

GraphFlash is a graph processing framework running on serverless architectures. This is the AWS Lambda version of GraphFlash.

Install dependency

GraphFlash use Python for code generation, use below instruction to install dependency. Below scripts can be down in an alpine Docker image.

apk add --no-cache \
    build-base \
    cmake \
    git \
    curl-dev \
    openssl-dev \
    zstd-dev \
    boost-dev \
    nlohmann-json \
    hiredis-dev \
    libcurl \
    ninja \
    asio-dev

cd /root && git clone https://github.com/gflags/gflags.git && cd gflags && \
    mkdir build && cd build && \
    cmake .. -DCMAKE_POSITION_INDEPENDENT_CODE=ON && \
    make -j$(nproc) && make install

cd /root && git clone https://github.com/google/glog.git && cd glog \
    && git checkout v0.6.0 \
    && mkdir build && cd build && \
    cmake .. && \
    make -j$(nproc) && make install

cd /root && git clone https://github.com/google/googletest.git && cd googletest && mkdir build && cd build && \
    cmake .. && make -j$(nproc) && make install

cd /root && git clone https://github.com/redis/hiredis.git && cd hiredis && make -j$(nproc) USE_SSL=1 && make USE_SSL=1 install
cd /root && git clone https://github.com/sewenew/redis-plus-plus.git && cd redis-plus-plus && mkdir build && cd build && \
    cmake -DREDIS_PLUS_PLUS_USE_TLS=ON .. && make -j$(nproc) && make install


cd /root && git clone --recurse-submodules https://github.com/aws/aws-sdk-cpp.git && cd aws-sdk-cpp && \
    mkdir -p build && cd build && cmake .. \
                                    -DCMAKE_BUILD_TYPE=Release \
                                    -DBUILD_ONLY="core;s3" \
                                    -DENABLE_UNITY_BUILD=ON \
                                    -DBUILD_SHARED_LIBS=ON \
                                    -DENABLE_TESTING=OFF \
                                    -DCMAKE_INSTALL_PREFIX=$HOME/aws-sdk-cpp-install && make -j$(nproc) && \
     make install

cd /root && git clone https://github.com/libcpr/cpr.git && \
                cd cpr && mkdir build && cd build && \
                cmake .. -DCPR_USE_SYSTEM_CURL=ON -DBUILD_SHARED_LIBS=OFF && \
                cmake --build . --parallel &&\
                cmake --install .

cd /root && git clone https://github.com/awslabs/aws-lambda-cpp.git && cd aws-lambda-cpp && mkdir build && cd build && \
    cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$HOME/lambda-install && \
    make -j$(nproc) && make install

Deploy AWS Lambda Functions

Build docker images and push to AWS ECR.

docker buildx build \
--platform linux/arm64 \
--build-arg TARGET_EXECUTABLE=coordinator \
-t {REPLACE_WITH_ECR_URL}/g0-coordinator:1.0 \
--push \
--provenance=false \
--sbom=false \
--output=type=registry,oci-mediatypes=false .

docker buildx build \
--platform linux/arm64 \
--build-arg TARGET_EXECUTABLE=worker \
-t {REPLACE_WITH_ECR_URL}/g0-worker:1.0 \
--push \
--provenance=false \
--sbom=false \
--output=type=registry,oci-mediatypes=false .

Deploy functions.

# example script
aws lambda create-function \
  --function-name g0-coordinator-func \
  --package-type Image \
  --code ImageUri={REPLACE_WITH_ECR_URL}/g0-coordinator:1.0 \
  --role XXX \
  --architectures arm64 \
  --region XXX

aws lambda create-function \
  --function-name g0-worker-func \
  --package-type Image \
  --code ImageUri={REPLACE_WITH_ECR_URL}/g0-worker:1.0 \
  --role XXX \
  --architectures arm64 \
  --region XXX

Then create URL for both functions and add permission for them to allow access, you may also change the configuration of functions such as timeout and memory limit.

Run

Download code.

git clone --recurse-submodules -b lambda git@github.com:disnetlab/GraphFlash.git
cd GraphFlash

# Configure
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release \
      -DCMAKE_PREFIX_PATH=$HOME/aws-sdk-cpp-install \
      -Daws-lambda-runtime_DIR=$HOME/lambda-install/lib/aws-lambda-runtime/cmake
# Build CLI tools, worker binary, and all detected plugins
cmake --build build --target partition_tiny upload download combine

The plugin shared libraries (lib*.so) are produced inside the build/ folder next to the worker binary; ship them alongside the worker executable so dlopen can locate them at runtime.

extra_args is a flat JSON object passed verbatim to the plugin. Common parameters such as activate_superstep are sent at the top level of the request and automatically wired into every algorithm.

Demo

In the example, we process an example dataset kgs from LDBC.

First, download the dataset and decompress it into current directory.

ls kgs
# output should be: 
# kgs-BFS  kgs-CDLP  kgs-LCC  kgs-PR  kgs-SSSP  kgs-WCC  kgs.e  kgs.properties  kgs.v

# partition the file into two partitions
mkdir kgs-p2
./partition_tiny \
  --dataset_name=kgs-p2 \
  --directed=false \
  --edge_file=./kgs/kgs.e \
  --vertex_file=./kgs/kgs.v \
  --vertex_num=832247 \
  --edge_num=17891698 \
  --weighted=true \
  --partition_num=2 \
  --output_directory=./kgs-p2


# run a redis server on 127.0.0.1:6379 and upload partition into MaaS
./upload \
  --input_directory=./kgs-p2 \
  --dataset_name=kgs-p2 \
  --partition_num=2 \
  --bucket=XXX \
  --access_key=XXX \
  --secret_key=XXX
  
# trigger the function
curl -X POST XXX \
  -H "Content-Type: application/json" \
  -d '{
    "algorithm": "BFS",
    "max_workers": 2,
    "partition_num": 2,
    "vertex_num": 832247,
    "run_id": "kgs-p2-BFS",
    "dataset": "kgs-p2",
    "directed": false,
    "weighted": true,
    "maas_addr": "XXX",
    "worker_url": "XXX",
    "bucket": "XXX",
    "access_key": "XXX",
    "secret_key": "XXX",
    "activate_superstep": 1,
    "thread_num": 1,
    "extra_args": {
      "source-vertex": 239044
    }
  }'
 
# `activate_superstep` is optional; omit it when the algorithm should begin activating outer vertices from the first
# incremental round (the default value is 1). Algorithm-specific settings stay in `extra_args` as a flat object.

# download results
./download --output_directory=./kgs-p2/ --partition_num=2 --run_id=kgs-p2-BFS --type=int --access_key=XXX --secret_key=XXX --bucket=XXX

# generate the final result
./combine --dataset_name=kgs-p2 --directory=./kgs-p2 --partition_num=2 --run_id=kgs-p2-BFS --type=int --vertex_num=832247

# check the result
diff kgs-p2/kgs-p2-BFS.g0r kgs/kgs-BFS

Note that ./partition remap the vertex IDs to constant integers, the vertex IDs used in request body should be the remapped IDs. The IDs can be find in {DATASET_NAME}.g0m, in the above example, it is in ./kgs-p2/kgs-p2.g0m, to find 239044, run grep -n '^239044$' kgs-p2/kgs-p2.g0m | cut -d: -f1 | awk '{print $1-1}'

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
include		include
src		src
third-party		third-party
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
README.md		README.md
bootstrap		bootstrap

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphFlash

Install dependency

Deploy AWS Lambda Functions

Run

Demo

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GraphFlash

Install dependency

Deploy AWS Lambda Functions

Run

Demo

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages