Snowplow Web Analytics with dbt

Embucket is a Snowflake-compatible data engine based on Apache DataFusion and iceberg-rust. This repo is an example on how to run Snowflake flavour of dbt-snowplow-web pipeline on AWS Lambda with AWS S3 Tables as Iceberg data store

Prerequisites

Access to AWS account with permissions to create CloudFormation stacks: Lambda, IAM, DynamoDB, S3Tables
Access to AWS S3 Table Bucket
AWS CLI configured (aws configure) locally
uv (or Python 3.10+ with pip)
Git

Quick Start

git clone https://github.com/Embucket/embucket-snowplow.git
cd embucket-snowplow

Set your variables:

STACK_NAME="embucket-demo-$(whoami)-$(date +%s)"
BUCKET_ARN="arn:aws:s3tables:us-east-2:YOUR_ACCOUNT:bucket/YOUR_BUCKET"

Note: BUCKET_ARN is an S3 Table Bucket ARN (not a regular S3 bucket). The format is arn:aws:s3tables:REGION:ACCOUNT:bucket/NAME. Make sure your AWS CLI is configured for the same region as your S3 Table Bucket.

1. Deploy Embucket Lambda

aws cloudformation deploy \
  --template-file deploy/embucket-lambda.cfn.yaml \
  --stack-name $STACK_NAME \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides S3TableBucketArn=$BUCKET_ARN

This creates a Lambda function, an IAM role, and a DynamoDB state table. Takes 2-3 minutes.

Grab the Lambda ARN:

LAMBDA_ARN=$(aws cloudformation describe-stacks --stack-name $STACK_NAME \
  --query 'Stacks[0].Outputs[?OutputKey==`LambdaFunctionArn`].OutputValue' \
  --output text)
echo $LAMBDA_ARN

2. Install dependencies

uv sync

3. Configure profile

cp profiles.yml.example profiles.yml
sed -i '' "s|YOUR_LAMBDA_ARN_HERE|$LAMBDA_ARN|" profiles.yml

4. Install dbt packages

uv run dbt deps --profiles-dir .
./scripts/patch_snowplow.sh

The dbt-snowplow-web package doesn't natively recognize the embucket adapter type. The patch script adds embucket alongside snowflake in the package's target-type checks. You need to re-run this script after every dbt deps.

5. Load source data

uv run python scripts/load_data.py $LAMBDA_ARN

This creates the required schemas and the atomic.events table, then loads ~28 MB of synthetic Snowplow web event data from S3.

6. Load seeds and run the models

uv run dbt seed --profiles-dir .
uv run dbt run --profiles-dir .

This loads reference tables (GA4 source categories, geo/language mappings) and builds the full Snowplow web analytics pipeline. Runs 18 models in about 45 seconds, producing page views, sessions, and users tables.

7. Query the results

uv run dbt show --profiles-dir . --inline "SELECT * FROM demo.atomic_derived.snowplow_web_page_views" --limit 10
uv run dbt show --profiles-dir . --inline "SELECT * FROM demo.atomic_derived.snowplow_web_sessions" --limit 10
uv run dbt show --profiles-dir . --inline "SELECT * FROM demo.atomic_derived.snowplow_web_users" --limit 10

Cleanup

Delete the CloudFormation stack (Lambda, IAM role, DynamoDB table):

aws cloudformation delete-stack --stack-name $STACK_NAME

Note: This does not delete data in your S3 Table Bucket. Iceberg tables created by Embucket persist there until you remove them manually.

If you lost the $STACK_NAME variable, find it with:

aws cloudformation list-stacks --stack-status-filter CREATE_COMPLETE \
  --query 'StackSummaries[?starts_with(StackName,`embucket-demo`)].StackName' --output table

How it works

dbt (your machine or dbt orchestrator)
  │
  │  boto3.invoke()
  ▼
AWS Lambda (embucket-lambda)
  │
  │  Apache Iceberg
  ▼
AWS S3 Table Bucket

dbt-embucket adapter calls Lambda directly via AWS IAM — no public endpoints
Embucket is a Snowflake-compatible query engine built on Apache DataFusion + Apache Iceberg
Data is stored as Iceberg tables in your S3 Table Bucket

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
deploy		deploy
models		models
scripts		scripts
.gitignore		.gitignore
README.md		README.md
dbt_project.yml		dbt_project.yml
packages.yml		packages.yml
profiles.yml.example		profiles.yml.example
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snowplow Web Analytics with dbt

Prerequisites

Quick Start

1. Deploy Embucket Lambda

2. Install dependencies

3. Configure profile

4. Install dbt packages

5. Load source data

6. Load seeds and run the models

7. Query the results

Cleanup

How it works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Snowplow Web Analytics with dbt

Prerequisites

Quick Start

1. Deploy Embucket Lambda

2. Install dependencies

3. Configure profile

4. Install dbt packages

5. Load source data

6. Load seeds and run the models

7. Query the results

Cleanup

How it works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages