Skip to content
This repository has been archived by the owner on Sep 23, 2023. It is now read-only.

Commit

Permalink
Feat: Add Extract-Load pipeline using Meltano (#9)
Browse files Browse the repository at this point in the history
* install meltano feature in codespace

* `meltano init`

* feat: define meltano plugins, jobs, environments

* chore: limit to 1 year for faster dev builds

* chore: simplify config

* docs: how to invoke with meltano

* bump evidence version, update readme

* dynamic schema and db names

* chore: fix tabs v spaces

* feat: working bi dev environment via `meltano invoke evidence:dev`

* chore: bump dbt versions

* docs: fix evidence header

* ci: add e2e ci test

* chore: drop extra package.json file

* feat: prefer extracted datasets

includes code comment with instructions on how to use seed files

* feat: meltano 'full-build' job

* chore: add env var JAFFLE_RAW_SCHEMA

* fix: profile cannot access dbt vars (fixed conflicts)

* chore: workaround var issue

* fix: sources don't parse vars

* fix: missing 'run' in CI cli call

* run on push

* ci: add missing 'meltano install'

* ci: remove extra check, improve output readability

* ci: use dbt:build

* ci: fix plugin name

* chore: bump tap-jaffle-shop version

* fix: evidence duckdb filename

* fix: add explicit datetime casts

* ci: add step descriptions

* chore: revert extra datetime casts

* chore: Bump evidence to v14 (#3)

* feat: dynamic raw data

* fix: remove extra dbt var

* docs: delete dupe section

* fix: use env var for schema name

* ci: add github pages publish (#5)

* hotfix: relative path for gh pages

* ci: upload duckdb file artifact (#7)

* Use `meltano install` in Codespaces setup (#8)

* Chore: move project-id to client-id (#9)

* Bump-evidence-to-v15 (#10)

* chore: delete lock files

* chore: simplify devcontainer

* chore: remove comment

* chore: remove non-EL utilities and jobs

* chore: tidy up comments

* chore: revert and simplify ci pipeline

* docs: improve readme for the EL use case

* Update reports/pages/index.md

* chore: remove extra gitignore line

* ci: streamline install

* chore: further simplify meltano.yml

* ci: add pip caching to speed up builds

* ci: keep isolated venvs

* fix: relative path to db

* chore: simplify header comment

* chore: simplify raw sources.yml, get data from extracts

* fix: explicit load schema

* chore: human-readable project id

* codespaces: add `meltano run el` to post-create script

* chore: bump tap-jaffle-shop (soon-to-be released)

* chore: dummy commit to force ci build

* revert: dummy change

---------

Co-authored-by: winnie <91998347+gwenwindflower@users.noreply.github.com>
  • Loading branch information
aaronsteers and gwenwindflower committed Apr 7, 2023
1 parent 8e75458 commit d63be46
Show file tree
Hide file tree
Showing 13 changed files with 88 additions and 156,306 deletions.
3 changes: 2 additions & 1 deletion .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@
"image": "mcr.microsoft.com/devcontainers/python:3.10",
"features": {
"ghcr.io/devcontainers-contrib/features/black:1": {},
"ghcr.io/devcontainers-contrib/features/meltano": {},
"ghcr.io/devcontainers/features/node:1": {},
"ghcr.io/eitsupi/devcontainer-features/duckdb-cli:0": {}
},
// Use 'postCreateCommand' to run commands after the container is created.
"postCreateCommand": "python -m pip install -r requirements.txt && dbt deps && npm --prefix ./reports install",
"postCreateCommand": "python -m pip install -r requirements.txt && dbt deps && npm --prefix ./reports install && meltano install && meltano run el",
"remoteEnv": {
"DATABASE": "duckdb",
"FILENAME": "jaffle_shop.duckdb"
Expand Down
14 changes: 11 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,14 @@ jobs:
- uses: actions/setup-python@v4
with:
python-version: "3.10"
- run: python -m pip install -r requirements.txt
- run: dbt deps
- run: dbt build
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install -r requirements.txt
pipx install meltano && meltano install
- name: Run Extract and Load
run: meltano run tap-jaffle-shop target-duckdb
- name: Install dbt Dependencies
run: dbt deps
- name: dbt Build
run: dbt build
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ logs/
*.duckdb
*.duckdb.wal
reports/sources/*.csv
.meltano
29 changes: 27 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ After the container is built and connected to, VSCode will run a few clean up co
This template includes two additional tools for the other parts of the stack to create a more realistic experience:

- BI reporting built with [Evidence](https://evidence.dev) - an open source, code-based BI tool to write reports with markdown and SQL.
- EL with Meltano
- EL with [Meltano](https://meltano.com/) - an open source tool that provides a CLI & version control for ELT pipelines.

### Evidence

Expand All @@ -49,13 +49,38 @@ See the [Evidence CLI docs](https://docs.evidence.dev/cli) for more details.

You can make changes to the markdown pages in the `reports/pages` folder and see the reports update in the browser preview.

#### Learning More
#### Learning More about Evidence

- [Getting Started Walkthrough](https://docs.evidence.dev/getting-started/install-evidence)
- [Project Home Page](https://www.evidence.dev)
- [Github](https://github.com/evidence-dev/evidence)
- [Evidence.dev Releases](https://github.com/evidence-dev/evidence/releases)


### Meltano

This project is preconfigured with Meltano, which can be used to extract and load raw data into DuckDB.

#### Run EL (Extract and Load) using Meltano

```console
meltano run tap-jaffle-shop target-duckdb
```

Optionally, you can modify extract parameters using environment variables. For instance, this modified version will extract five years of data instead of the default 1 year.

```console
TAP_JAFFLE_SHOP_YEARS=5
meltano run tap-jaffle-shop target-duckdb
```

You can also modify any tap or target config with the interactive `config` command:

```console
meltano config tap-jaffle-shop set --interactive
meltano config target-duckdb set --interactive
```

## Contributing

We welcome issues and PRs requesting or adding new features. The package that generates the synthetic data, [`jafgen`](https://pypi.org/project/jafgen/), is also under active development, and will add more types of source data to model as we go along. If you have tests, descriptions, new models, metrics, materializations types, or techniques you use this repo to demonstrate, which you feel would make for a more expansive baseline experience, we encourage you to consider contributing them back in so that this project becomes an even better collective tool for exploring and learning dbt over time.
943 changes: 0 additions & 943 deletions jaffle-data/raw_customers.csv

This file was deleted.

96,068 changes: 0 additions & 96,068 deletions jaffle-data/raw_items.csv

This file was deleted.

59,200 changes: 0 additions & 59,200 deletions jaffle-data/raw_orders.csv

This file was deleted.

11 changes: 0 additions & 11 deletions jaffle-data/raw_products.csv

This file was deleted.

6 changes: 0 additions & 6 deletions jaffle-data/raw_stores.csv

This file was deleted.

66 changes: 0 additions & 66 deletions jaffle-data/raw_supplies.csv

This file was deleted.

46 changes: 46 additions & 0 deletions meltano.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Meltano Configuration File
#
# Sample usage:
# > meltano run tap-jaffle-shop target-duckdb
#
# Or equivalently:
# > meltano run el # Run the job named 'el' to extract and load data

version: 1
project_id: Jaffle Shop Template Project

env:
JAFFLE_DB_PATH: ./reports/jaffle_shop.duckdb
JAFFLE_DB_NAME: jaffle_shop
JAFFLE_RAW_SCHEMA: jaffle_raw

default_environment: dev
environments:
- name: dev

plugins:
extractors:
- name: tap-jaffle-shop
namespace: tap_jaffle_shop
variant: meltanolabs
pip_url: git+https://github.com/MeltanoLabs/tap-jaffle-shop.git@v0.3.0
capabilities:
- catalog
- discover
config:
years: 1
stream_name_prefix: ${JAFFLE_RAW_SCHEMA}-raw_
loaders:
- name: target-duckdb
variant: jwills
pip_url: target-duckdb~=0.4
config:
filepath: ${JAFFLE_DB_PATH}
default_target_schema: $JAFFLE_RAW_SCHEMA

jobs:
# Sample usage: `meltano run el`
# Equivalent to: `meltano run tap-jaffle-shop target-duckdb`
- name: el # Extract and load the raw data
tasks:
- tap-jaffle-shop target-duckdb
5 changes: 0 additions & 5 deletions models/staging/__sources.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,6 @@ sources:
- name: ecom
schema: "{{ env_var('JAFFLE_RAW_SCHEMA', 'jaffle_raw') }}"
description: E-commerce data
meta:
# If `$JAFFLE_RAW_SCHEMA` is specified, use the provided raw data. Otherwise, use the csv seed data from the repo.
external_location: >-
{{ '' if env_var('JAFFLE_RAW_SCHEMA', '') else 'read_csv_auto("./jaffle-data/{name}.csv", header=1)' }}
tables:
- name: raw_customers
description: One record per person who has purchased one or more items
Expand Down
2 changes: 1 addition & 1 deletion reports/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
build
node_modules
.DS_Store
*.duckdb
*.*duckdb

0 comments on commit d63be46

Please sign in to comment.