GraphFlash is a graph processing framework running on serverless architectures. This is the AWS Lambda version of GraphFlash.
GraphFlash use Python for code generation, use below instruction to install dependency. Below scripts can be down in an alpine Docker image.
apk add --no-cache \
build-base \
cmake \
git \
curl-dev \
openssl-dev \
zstd-dev \
boost-dev \
nlohmann-json \
hiredis-dev \
libcurl \
ninja \
asio-dev
cd /root && git clone https://github.com/gflags/gflags.git && cd gflags && \
mkdir build && cd build && \
cmake .. -DCMAKE_POSITION_INDEPENDENT_CODE=ON && \
make -j$(nproc) && make install
cd /root && git clone https://github.com/google/glog.git && cd glog \
&& git checkout v0.6.0 \
&& mkdir build && cd build && \
cmake .. && \
make -j$(nproc) && make install
cd /root && git clone https://github.com/google/googletest.git && cd googletest && mkdir build && cd build && \
cmake .. && make -j$(nproc) && make install
cd /root && git clone https://github.com/redis/hiredis.git && cd hiredis && make -j$(nproc) USE_SSL=1 && make USE_SSL=1 install
cd /root && git clone https://github.com/sewenew/redis-plus-plus.git && cd redis-plus-plus && mkdir build && cd build && \
cmake -DREDIS_PLUS_PLUS_USE_TLS=ON .. && make -j$(nproc) && make install
cd /root && git clone --recurse-submodules https://github.com/aws/aws-sdk-cpp.git && cd aws-sdk-cpp && \
mkdir -p build && cd build && cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_ONLY="core;s3" \
-DENABLE_UNITY_BUILD=ON \
-DBUILD_SHARED_LIBS=ON \
-DENABLE_TESTING=OFF \
-DCMAKE_INSTALL_PREFIX=$HOME/aws-sdk-cpp-install && make -j$(nproc) && \
make install
cd /root && git clone https://github.com/libcpr/cpr.git && \
cd cpr && mkdir build && cd build && \
cmake .. -DCPR_USE_SYSTEM_CURL=ON -DBUILD_SHARED_LIBS=OFF && \
cmake --build . --parallel &&\
cmake --install .
cd /root && git clone https://github.com/awslabs/aws-lambda-cpp.git && cd aws-lambda-cpp && mkdir build && cd build && \
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$HOME/lambda-install && \
make -j$(nproc) && make install
- Build docker images and push to AWS ECR.
docker buildx build \
--platform linux/arm64 \
--build-arg TARGET_EXECUTABLE=coordinator \
-t {REPLACE_WITH_ECR_URL}/g0-coordinator:1.0 \
--push \
--provenance=false \
--sbom=false \
--output=type=registry,oci-mediatypes=false .
docker buildx build \
--platform linux/arm64 \
--build-arg TARGET_EXECUTABLE=worker \
-t {REPLACE_WITH_ECR_URL}/g0-worker:1.0 \
--push \
--provenance=false \
--sbom=false \
--output=type=registry,oci-mediatypes=false .- Deploy functions.
# example script
aws lambda create-function \
--function-name g0-coordinator-func \
--package-type Image \
--code ImageUri={REPLACE_WITH_ECR_URL}/g0-coordinator:1.0 \
--role XXX \
--architectures arm64 \
--region XXX
aws lambda create-function \
--function-name g0-worker-func \
--package-type Image \
--code ImageUri={REPLACE_WITH_ECR_URL}/g0-worker:1.0 \
--role XXX \
--architectures arm64 \
--region XXX
- Then create URL for both functions and add permission for them to allow access, you may also change the configuration of functions such as timeout and memory limit.
Download code.
git clone --recurse-submodules -b lambda git@github.com:disnetlab/GraphFlash.git
cd GraphFlash
# Configure
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_PREFIX_PATH=$HOME/aws-sdk-cpp-install \
-Daws-lambda-runtime_DIR=$HOME/lambda-install/lib/aws-lambda-runtime/cmake
# Build CLI tools, worker binary, and all detected plugins
cmake --build build --target partition_tiny upload download combineThe plugin shared libraries (lib*.so) are produced inside the build/ folder next to the worker binary; ship them alongside the worker executable so dlopen can locate them at runtime.
extra_args is a flat JSON object passed verbatim to the plugin. Common parameters such as activate_superstep are sent at the top level of the request and automatically wired into every algorithm.
In the example, we process an example dataset kgs from LDBC.
First, download the dataset and decompress it into current directory.
ls kgs
# output should be:
# kgs-BFS kgs-CDLP kgs-LCC kgs-PR kgs-SSSP kgs-WCC kgs.e kgs.properties kgs.v# partition the file into two partitions
mkdir kgs-p2
./partition_tiny \
--dataset_name=kgs-p2 \
--directed=false \
--edge_file=./kgs/kgs.e \
--vertex_file=./kgs/kgs.v \
--vertex_num=832247 \
--edge_num=17891698 \
--weighted=true \
--partition_num=2 \
--output_directory=./kgs-p2
# run a redis server on 127.0.0.1:6379 and upload partition into MaaS
./upload \
--input_directory=./kgs-p2 \
--dataset_name=kgs-p2 \
--partition_num=2 \
--bucket=XXX \
--access_key=XXX \
--secret_key=XXX
# trigger the function
curl -X POST XXX \
-H "Content-Type: application/json" \
-d '{
"algorithm": "BFS",
"max_workers": 2,
"partition_num": 2,
"vertex_num": 832247,
"run_id": "kgs-p2-BFS",
"dataset": "kgs-p2",
"directed": false,
"weighted": true,
"maas_addr": "XXX",
"worker_url": "XXX",
"bucket": "XXX",
"access_key": "XXX",
"secret_key": "XXX",
"activate_superstep": 1,
"thread_num": 1,
"extra_args": {
"source-vertex": 239044
}
}'
# `activate_superstep` is optional; omit it when the algorithm should begin activating outer vertices from the first
# incremental round (the default value is 1). Algorithm-specific settings stay in `extra_args` as a flat object.
# download results
./download --output_directory=./kgs-p2/ --partition_num=2 --run_id=kgs-p2-BFS --type=int --access_key=XXX --secret_key=XXX --bucket=XXX
# generate the final result
./combine --dataset_name=kgs-p2 --directory=./kgs-p2 --partition_num=2 --run_id=kgs-p2-BFS --type=int --vertex_num=832247
# check the result
diff kgs-p2/kgs-p2-BFS.g0r kgs/kgs-BFSNote that ./partition remap the vertex IDs to constant integers, the vertex IDs used in request body should be the
remapped IDs. The IDs can be find in {DATASET_NAME}.g0m, in the above example, it is in ./kgs-p2/kgs-p2.g0m, to find
239044, run grep -n '^239044$' kgs-p2/kgs-p2.g0m | cut -d: -f1 | awk '{print $1-1}'