Egor/dev 1212 jaffle shop to duckdb ingestion first notebook by ZmeiGorynych · Pull Request #12 · MotleyAI/slayer

ZmeiGorynych · 2026-04-06T08:10:46Z

Summary by CodeRabbit

New Features
- Added a Jaffle Shop example set to generate sample data, build a DuckDB database, load a schema, and auto-ingest into the semantic layer; includes a CLI-style example that verifies integrity and prints reports.
Documentation
- Added a notebook walkthrough demonstrating generation, loading, ingestion, and semantic-layer queries; updated docs to reflect new example and test locations.
Tests
- Added integration tests for the example pipeline and notebook execution.
Chores
- Added optional example dependencies and tooling.

…ev-1212-jaffle-shop-to-duckdb-ingestion-first-notebook

coderabbitai · 2026-04-06T08:11:03Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a complete Jaffle Shop example set: CSV data generation, DuckDB schema and ingestion scripts, a notebook demonstrating YAML-backed SLayer ingestion, integration tests to validate data and notebook execution, project dependency updates, and docs updated to use tests/integration/ paths. (48 words)

Changes

Cohort / File(s)	Summary
Database Schema `docs/examples/jaffle_shop_schema.sql`	New DuckDB schema creating eight tables with primary keys and FK relationships used by examples and tests.
Data generation & DuckDB ingestion `docs/examples/jaffle_shop_duckdb.py`	New CLI-style script: generates Jaffle Shop CSVs via `jafgen`, creates schema, loads CSVs into DuckDB (converts cent columns to dollars), and runs verification queries; exposes helper functions used by tests.
SLayer ingestion example `docs/examples/ingest.py`	New script that ensures a DuckDB DB exists, configures YAMLStorage, persists a `DatasourceConfig`, runs `ingest_datasource`, sets default time dimensions for specific models, saves models, and prints discovery reports.
Notebook example `docs/examples/jaffle_shop_ingest.ipynb`	New Jupyter notebook demonstrating full pipeline: data generation, DuckDB loading, SLayer ingestion into YAML storage, model inspection, and example semantic-layer queries.
Integration tests `tests/integration/test_jaffle_shop_duckdb.py`, `tests/integration/test_jaffle_shop_notebook.py`, `tests/integration/test_integration.py`	New integration tests for the DuckDB example and notebook execution; fixture to build test DB; assertions for row counts, FK integrity, join/aggregate sanity checks; docstring path update in `test_integration.py`.
Docs & developer guidance `CLAUDE.md`, `docs/development.md`	Updated docs to reference the new `tests/integration/` layout and revised integration test commands; removed old package-structure section.
Project config / deps `pyproject.toml`	Added optional `jafgen` (git dependency) and new `examples` extra (`jafgen`, `duckdb`, `duckdb-engine`); added `jafgen` to `all` extra; added `nbclient`, `nbformat`, and `ipykernel` to dev deps.

Sequence Diagram(s)

sequenceDiagram
    participant Gen as Data Generator (jafgen)
    participant Temp as Temp CSV Dir
    participant DB as DuckDB
    participant YAML as YAMLStorage
    participant SLayer as SLayer Ingest
    participant User as User/Console

    Gen->>Temp: generate CSVs
    Temp->>DB: load schema & ingest CSVs (convert cents→dollars)
    DB->>DB: run verification queries
    DB->>YAML: provide connection / DatasourceConfig
    YAML->>SLayer: persist datasource config
    SLayer->>DB: ingest datasource, discover models
    SLayer->>YAML: persist discovered models (apply time dims)
    SLayer->>User: print discovery report

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

AivanF

Poem

🐰 I hopped through CSV rows, crunchy and neat,
DuckDB warmed the data into a treat.
YAML shelves hold models tidy and bright,
I nudged time dims under the moonlit byte.
Jaffle Shop sings — carrots, queries, delight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 38.89% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title references the main objective (Jaffle Shop to DuckDB ingestion notebook) but is somewhat verbose and includes an internal reference (Egor/dev 1212) that lacks clarity for external audiences.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch egor/dev-1212-jaffle-shop-to-duckdb-ingestion-first-notebook

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

AivanF · 2026-04-06T08:15:44Z

Let's move all the integration tests into a separate folder

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (1)

docs/examples/jaffle_shop_schema.sql (1)

13-13: Prefer fixed-point types for currency columns.

Using DOUBLE for monetary values can introduce rounding drift in sums and comparisons.

💡 Proposed fix

-    tax_rate DOUBLE NOT NULL
+    tax_rate DECIMAL(6,4) NOT NULL

-    price DOUBLE NOT NULL,
+    price DECIMAL(12,2) NOT NULL,

-    subtotal DOUBLE NOT NULL,
-    tax_paid DOUBLE NOT NULL,
-    order_total DOUBLE NOT NULL
+    subtotal DECIMAL(12,2) NOT NULL,
+    tax_paid DECIMAL(12,2) NOT NULL,
+    order_total DECIMAL(12,2) NOT NULL

-    cost DOUBLE NOT NULL,
+    cost DECIMAL(12,2) NOT NULL,

Also applies to: 20-20, 29-31, 44-44

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/examples/jaffle_shop_schema.sql` at line 13, Replace floating-point
DOUBLE types used for monetary values with a fixed-point type: change the
tax_rate column definition (and any other currency-related columns flagged
elsewhere) from DOUBLE to a DECIMAL/NUMERIC with appropriate precision and scale
(for example DECIMAL(10,4) or NUMERIC(10,4)) to avoid rounding drift; update the
CREATE TABLE column definitions (e.g., the tax_rate declaration and the other
flagged DOUBLE columns) to use the chosen DECIMAL/NUMERIC type and ensure any
INSERTs/SELECTs or schema docs reflect the new type.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/examples/ingest.py`:
- Around line 41-46: The calls to generate_data, create_schema, and load_data
use positional arguments for functions that accept multiple parameters; update
these invocations to use keyword arguments so parameters are explicit and follow
the guideline — e.g., call generate_data(work_dir=WORK_DIR, years=years),
create_schema(conn=conn, schema_file=SCHEMA_FILE), and load_data(conn=conn,
data_dir=data_dir) (referencing the generate_data, create_schema, and load_data
symbols to locate the calls).
- Around line 44-47: The DuckDB connection opened via duckdb.connect (assigned
to conn) can leak if create_schema or load_data throws; wrap the calls that use
conn in a try/finally (or use a context manager) so that conn.close() is always
executed: ensure conn is closed in the finally block after calling
create_schema(conn, SCHEMA_FILE) and load_data(conn, data_dir), or use with
duckdb.connect(DB_PATH) as conn to guarantee cleanup.

In `@docs/examples/jaffle_shop_duckdb.py`:
- Around line 41-44: Calls to functions with multiple parameters should use
keyword arguments: replace the positional call subprocess.run(cmd,
cwd=output_dir, check=True) with subprocess.run(args=cmd, cwd=output_dir,
check=True) and update the generate_data(...) and load_data(...) invocations to
pass named parameters (e.g., years=..., days=..., output_dir=...) instead of
relying on positional ordering so each argument is explicit (refer to
subprocess.run, generate_data, and load_data in the diff).
- Around line 139-171: Wrap the sequence that uses the DuckDB connection (conn)
— the calls to create_schema(conn), load_data(conn, data_dir), verify(conn) and
the subsequent printing — in a try/finally so conn.close() is always executed;
specifically open conn = duckdb.connect(db_path) then try: perform the prints,
create_schema, load_data, verify and printing of results, and finally: call
conn.close(); if you want to preserve original error behavior re-raise the
exception after cleaning up.

In `@pyproject.toml`:
- Around line 54-55: The dev dependency lists in pyproject.toml are missing the
notebook runtime packages required by tests; update the extras so that nbclient
and nbformat are included (add "nbclient" and "nbformat" to the appropriate dev
dependencies and to the "all" extra) so tests/test_jaffle_shop_notebook.py
(which calls pytest.importorskip) will run under poetry install -E all; locate
and modify the entries that define examples and all (the "examples = [...]" and
"all = [...]" lists) to include these two packages.

In `@tests/test_jaffle_shop_duckdb.py`:
- Line 16: The test import includes an unused symbol LOAD_ORDER which triggers
Ruff F401; remove LOAD_ORDER from the import list so the statement only imports
TABLE_NAMES, create_schema, generate_data, load_data, and verify (i.e., update
the import in tests/test_jaffle_shop_duckdb.py to drop LOAD_ORDER).
- Line 23: The fixture calls use positional arguments for multi-parameter
helpers (notably generate_data) — update those calls to pass arguments by name
instead of position (e.g., pass the tmpdir path as the parameter name for the
path argument and use years= and days= for generate_data), and do the same for
the other multi-parameter helper calls in this fixture (the two other helper
invocations referenced in the comment) so they follow the project rule "use
keyword arguments for functions with more than 1 parameter."

---

Nitpick comments:
In `@docs/examples/jaffle_shop_schema.sql`:
- Line 13: Replace floating-point DOUBLE types used for monetary values with a
fixed-point type: change the tax_rate column definition (and any other
currency-related columns flagged elsewhere) from DOUBLE to a DECIMAL/NUMERIC
with appropriate precision and scale (for example DECIMAL(10,4) or
NUMERIC(10,4)) to avoid rounding drift; update the CREATE TABLE column
definitions (e.g., the tax_rate declaration and the other flagged DOUBLE
columns) to use the chosen DECIMAL/NUMERIC type and ensure any INSERTs/SELECTs
or schema docs reflect the new type.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a945f3f-9c33-4650-a637-6b63919ff061

📥 Commits

Reviewing files that changed from the base of the PR and between 96eecb8 and 2df8612.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

docs/examples/ingest.py
docs/examples/jaffle_shop_duckdb.py
docs/examples/jaffle_shop_ingest.ipynb
docs/examples/jaffle_shop_schema.sql
pyproject.toml
tests/test_jaffle_shop_duckdb.py
tests/test_jaffle_shop_notebook.py

docs/examples/ingest.py

docs/examples/jaffle_shop_duckdb.py

pyproject.toml

tests/test_jaffle_shop_duckdb.py

tests/integration/test_jaffle_shop_duckdb.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/test_jaffle_shop_notebook.py`:
- Around line 23-29: The call to nbformat.read and the NotebookClient
constructor use positional args; change nbformat.read(f, as_version=4) to use
the fp keyword (nbformat.read(fp=f, as_version=4)) and change
nbclient.NotebookClient(nb, timeout=600, kernel_name="python3",
resources={"metadata": {"path": os.path.dirname(NOTEBOOK_PATH)}}) to use the nb
keyword for the notebook argument (nbclient.NotebookClient(nb=nb, timeout=600,
kernel_name="python3", resources={"metadata": {"path":
os.path.dirname(NOTEBOOK_PATH)}})) so both multi-parameter function calls use
explicit keyword arguments.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e52a02eb-1860-4ae9-aecd-cc80fd99fd35

📥 Commits

Reviewing files that changed from the base of the PR and between 2df8612 and af7418c.

📒 Files selected for processing (8)

CLAUDE.md
docs/development.md
tests/integration/__init__.py
tests/integration/test_integration.py
tests/integration/test_integration_duckdb.py
tests/integration/test_integration_postgres.py
tests/integration/test_jaffle_shop_duckdb.py
tests/integration/test_jaffle_shop_notebook.py

✅ Files skipped from review due to trivial changes (3)

tests/integration/test_integration.py
docs/development.md
CLAUDE.md

tests/integration/test_jaffle_shop_notebook.py

coderabbitai

🧹 Nitpick comments (1)

tests/integration/test_jaffle_shop_notebook.py (1)
22-31: Consider using context managers for proper resource cleanup.

Two observations:

Missing encoding (line 22): open() without explicit encoding may behave inconsistently across platforms. Use encoding="utf-8" for Jupyter notebooks.

Kernel cleanup (lines 25-31): NotebookClient supports context manager protocol. If execute() raises an exception or the test fails, the kernel process may be left orphaned. The related test fixture in test_jaffle_shop_duckdb.py (context snippet 2) demonstrates proper cleanup with explicit teardown.
♻️ Proposed refactor using context managers
-    with open(NOTEBOOK_PATH) as f:
+    with open(NOTEBOOK_PATH, encoding="utf-8") as f:
         nb = nbformat.read(fp=f, as_version=4)
 
-    client = nbclient.NotebookClient(
+    with nbclient.NotebookClient(
         nb=nb,
         timeout=600,
         kernel_name="python3",
         resources={"metadata": {"path": os.path.dirname(NOTEBOOK_PATH)}},
-    )
-    client.execute()
+    ) as client:
+        client.execute()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_jaffle_shop_notebook.py` around lines 22 - 31, Open
the notebook file using open(NOTEBOOK_PATH, encoding="utf-8") and pass the file
handle to nbformat.read to avoid platform-dependent decoding, then instantiate
nbclient.NotebookClient using its context manager (e.g., with
nbclient.NotebookClient(nb=nb, timeout=600, kernel_name="python3",
resources={"metadata": {"path": os.path.dirname(NOTEBOOK_PATH)}}) as client:)
and call client.execute() inside that with-block so the kernel is always cleaned
up on success or exception; reference NOTEBOOK_PATH, nbformat.read,
nbclient.NotebookClient, and client.execute to locate where to apply these
changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/integration/test_jaffle_shop_notebook.py`:
- Around line 22-31: Open the notebook file using open(NOTEBOOK_PATH,
encoding="utf-8") and pass the file handle to nbformat.read to avoid
platform-dependent decoding, then instantiate nbclient.NotebookClient using its
context manager (e.g., with nbclient.NotebookClient(nb=nb, timeout=600,
kernel_name="python3", resources={"metadata": {"path":
os.path.dirname(NOTEBOOK_PATH)}}) as client:) and call client.execute() inside
that with-block so the kernel is always cleaned up on success or exception;
reference NOTEBOOK_PATH, nbformat.read, nbclient.NotebookClient, and
client.execute to locate where to apply these changes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d2b47999-07de-4ca9-bc78-0bca7c9ac0fc

📥 Commits

Reviewing files that changed from the base of the PR and between 0571d73 and 7a19cb2.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (3)

docs/examples/jaffle_shop_duckdb.py
pyproject.toml
tests/integration/test_jaffle_shop_notebook.py

🚧 Files skipped from review as they are similar to previous changes (2)

docs/examples/jaffle_shop_duckdb.py
pyproject.toml

…ev-1212-jaffle-shop-to-duckdb-ingestion-first-notebook

…SQL and query-level dimension/measure references: 1. Auto-convert multi-dot references in model SQL snippets Model filters, dimension sql, and measure sql fields are SQL snippets that must use __ alias syntax for multi-hop join paths. Multi-dot references (e.g., customers.regions.name) are now auto-converted to the correct form (customers__regions.name) with a warning, instead of silently producing invalid SQL. 2. Reject __ in entity names Model, dimension, measure, and query names now reject double underscores, which are reserved for join path aliases in generated SQL. 3. Resolve query filter columns through model dimensions Query-level filters now resolve column references through model dimensions and measures rather than passing them as raw SQL. Bare names are looked up on the source model; dotted names walk the join graph to find the target model and use the dimension's sql expression. This fixes multi-hop filters like "customers.regions.name == 'US'" which previously produced invalid SQL.

… example notebooks.

…ev-1212-jaffle-shop-to-duckdb-ingestion-first-notebook # Conflicts: # poetry.lock # pyproject.toml

ZmeiGorynych added 3 commits April 4, 2026 17:25

Ingest jafflegen data into a duckdb file, for future examples

4ee6ae9

Add example notebook and .py for autoingest

d50558c

Merge branch 'main' of https://github.com/MotleyAI/slayer into egor/d…

2df8612

…ev-1212-jaffle-shop-to-duckdb-ingestion-first-notebook

coderabbitai bot reviewed Apr 6, 2026

View reviewed changes

Move integration tests to their own folder

af7418c

coderabbitai bot reviewed Apr 6, 2026

View reviewed changes

tests/integration/test_jaffle_shop_notebook.py Outdated Show resolved Hide resolved

ZmeiGorynych added 2 commits April 6, 2026 18:53

CodeRabbit fixes

0571d73

More CodeRabbit fixes

7a19cb2

coderabbitai bot reviewed Apr 6, 2026

View reviewed changes

ZmeiGorynych added 12 commits April 7, 2026 09:17

Add ipykernel dependency

b606012

More CodeRabbit fixes

245a9aa

Merge branch 'main' of https://github.com/MotleyAI/slayer into egor/d…

eb747d0

…ev-1212-jaffle-shop-to-duckdb-ingestion-first-notebook

Run the notebook

1023f2a

Merge branch 'main' of https://github.com/MotleyAI/slayer into egor/d…

5ffb172

…ev-1212-jaffle-shop-to-duckdb-ingestion-first-notebook

Add missing files

0de354f

Remove traces of Python from filter syntax; add integration tests for…

9db0baf

… example notebooks.

Merge branch 'main' of https://github.com/MotleyAI/slayer into egor/d…

2bd7ad6

…ev-1212-jaffle-shop-to-duckdb-ingestion-first-notebook # Conflicts: # poetry.lock # pyproject.toml

Merge upstream

c7f5332

Add test for preserving <> in string literals

44c3b04

Fix linter issue

1fe813e

AivanF self-requested a review April 8, 2026 11:20

Fix poetry config

1f72c34

ZmeiGorynych merged commit dcc9eb0 into main Apr 8, 2026
3 checks passed

coderabbitai bot mentioned this pull request Apr 13, 2026

Removed jafgen from pyproject.toml #28

Merged

Conversation

ZmeiGorynych commented Apr 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

AivanF commented Apr 6, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ZmeiGorynych commented Apr 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 6, 2026 •

edited

Loading