Skip to content


Data Lineage for Data Lakes Example

This repository contains an example project for building Data Lineage for data lakes using AWS Glue, Amazon Neptune and Spline Agent.

Remarks: This setup works with AWS Glue Data Permissions Model and does not support Lake Formation Permission Model.


  • API: HTTP API powered by an API Gateway and lambda functions
  • Frontend: Vue.js application

Build & Deployment

To deploy the solution to AWS Cloud with terraform, export your AWS Credentials to terraform (AWS Profile or environment variables)

alt text

brew install terraform
git clone
cd data-lineage-for-data-lake-example

# download spline agent jar
wget -O ./asset/lib/spark-3.1-spline-agent-bundle_2.12-0.6.1.jar

terraform init

terraform apply

To build and test the lineage visual application locally:

  • update the lineage backend address in src/lineage-visual/src/main.js
axios.defaults.baseURL = "https://xxx.execute-api.<aws-region>";
cd src/lineage-visual
npm install
npm run serve

Getting started

  • Run the first Glue Job
aws glue start-job-run --job-name "RawToCurated_employee_optimize"
  • Open the lineage visual application
  • Run the second Glue Job
aws glue start-job-run --job-name "CuratedToAggregated_employee"
  • Refresh the lineage visual application


See CONTRIBUTING for more information.


This library is licensed under the MIT-0 License. See the LICENSE file.