Embucket is a Snowflake-compatible data engine based on Apache DataFusion and iceberg-rust. This repo is an example on how to run Snowflake flavour of dbt-snowplow-web pipeline on AWS Lambda with AWS S3 Tables as Iceberg data store
- Access to AWS account with permissions to create CloudFormation stacks: Lambda, IAM, DynamoDB, S3Tables
- Access to AWS S3 Table Bucket
- AWS CLI configured (
aws configure) locally - uv (or Python 3.10+ with pip)
- Git
git clone https://github.com/Embucket/embucket-snowplow.git
cd embucket-snowplowSet your variables:
STACK_NAME="embucket-demo-$(whoami)-$(date +%s)"
BUCKET_ARN="arn:aws:s3tables:us-east-2:YOUR_ACCOUNT:bucket/YOUR_BUCKET"Note:
BUCKET_ARNis an S3 Table Bucket ARN (not a regular S3 bucket). The format isarn:aws:s3tables:REGION:ACCOUNT:bucket/NAME. Make sure your AWS CLI is configured for the same region as your S3 Table Bucket.
aws cloudformation deploy \
--template-file deploy/embucket-lambda.cfn.yaml \
--stack-name $STACK_NAME \
--capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides S3TableBucketArn=$BUCKET_ARNThis creates a Lambda function, an IAM role, and a DynamoDB state table. Takes 2-3 minutes.
Grab the Lambda ARN:
LAMBDA_ARN=$(aws cloudformation describe-stacks --stack-name $STACK_NAME \
--query 'Stacks[0].Outputs[?OutputKey==`LambdaFunctionArn`].OutputValue' \
--output text)
echo $LAMBDA_ARNuv synccp profiles.yml.example profiles.yml
sed -i '' "s|YOUR_LAMBDA_ARN_HERE|$LAMBDA_ARN|" profiles.ymluv run dbt deps --profiles-dir .
./scripts/patch_snowplow.shThe dbt-snowplow-web package doesn't natively recognize the embucket adapter type. The patch script adds embucket alongside snowflake in the package's target-type checks. You need to re-run this script after every dbt deps.
uv run python scripts/load_data.py $LAMBDA_ARNThis creates the required schemas and the atomic.events table, then loads ~28 MB of synthetic Snowplow web event data from S3.
uv run dbt seed --profiles-dir .
uv run dbt run --profiles-dir .This loads reference tables (GA4 source categories, geo/language mappings) and builds the full Snowplow web analytics pipeline. Runs 18 models in about 45 seconds, producing page views, sessions, and users tables.
uv run dbt show --profiles-dir . --inline "SELECT * FROM demo.atomic_derived.snowplow_web_page_views" --limit 10
uv run dbt show --profiles-dir . --inline "SELECT * FROM demo.atomic_derived.snowplow_web_sessions" --limit 10
uv run dbt show --profiles-dir . --inline "SELECT * FROM demo.atomic_derived.snowplow_web_users" --limit 10Delete the CloudFormation stack (Lambda, IAM role, DynamoDB table):
aws cloudformation delete-stack --stack-name $STACK_NAMENote: This does not delete data in your S3 Table Bucket. Iceberg tables created by Embucket persist there until you remove them manually.
If you lost the $STACK_NAME variable, find it with:
aws cloudformation list-stacks --stack-status-filter CREATE_COMPLETE \
--query 'StackSummaries[?starts_with(StackName,`embucket-demo`)].StackName' --output tabledbt (your machine or dbt orchestrator)
│
│ boto3.invoke()
▼
AWS Lambda (embucket-lambda)
│
│ Apache Iceberg
▼
AWS S3 Table Bucket
- dbt-embucket adapter calls Lambda directly via AWS IAM — no public endpoints
- Embucket is a Snowflake-compatible query engine built on Apache DataFusion + Apache Iceberg
- Data is stored as Iceberg tables in your S3 Table Bucket