Egor/dev 1212 jaffle shop to duckdb ingestion first notebook#12
Conversation
…ev-1212-jaffle-shop-to-duckdb-ingestion-first-notebook
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a complete Jaffle Shop example set: CSV data generation, DuckDB schema and ingestion scripts, a notebook demonstrating YAML-backed SLayer ingestion, integration tests to validate data and notebook execution, project dependency updates, and docs updated to use Changes
Sequence Diagram(s)sequenceDiagram
participant Gen as Data Generator (jafgen)
participant Temp as Temp CSV Dir
participant DB as DuckDB
participant YAML as YAMLStorage
participant SLayer as SLayer Ingest
participant User as User/Console
Gen->>Temp: generate CSVs
Temp->>DB: load schema & ingest CSVs (convert cents→dollars)
DB->>DB: run verification queries
DB->>YAML: provide connection / DatasourceConfig
YAML->>SLayer: persist datasource config
SLayer->>DB: ingest datasource, discover models
SLayer->>YAML: persist discovered models (apply time dims)
SLayer->>User: print discovery report
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Let's move all the integration tests into a separate folder |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
docs/examples/jaffle_shop_schema.sql (1)
13-13: Prefer fixed-point types for currency columns.Using
DOUBLEfor monetary values can introduce rounding drift in sums and comparisons.💡 Proposed fix
- tax_rate DOUBLE NOT NULL + tax_rate DECIMAL(6,4) NOT NULL - price DOUBLE NOT NULL, + price DECIMAL(12,2) NOT NULL, - subtotal DOUBLE NOT NULL, - tax_paid DOUBLE NOT NULL, - order_total DOUBLE NOT NULL + subtotal DECIMAL(12,2) NOT NULL, + tax_paid DECIMAL(12,2) NOT NULL, + order_total DECIMAL(12,2) NOT NULL - cost DOUBLE NOT NULL, + cost DECIMAL(12,2) NOT NULL,Also applies to: 20-20, 29-31, 44-44
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/examples/jaffle_shop_schema.sql` at line 13, Replace floating-point DOUBLE types used for monetary values with a fixed-point type: change the tax_rate column definition (and any other currency-related columns flagged elsewhere) from DOUBLE to a DECIMAL/NUMERIC with appropriate precision and scale (for example DECIMAL(10,4) or NUMERIC(10,4)) to avoid rounding drift; update the CREATE TABLE column definitions (e.g., the tax_rate declaration and the other flagged DOUBLE columns) to use the chosen DECIMAL/NUMERIC type and ensure any INSERTs/SELECTs or schema docs reflect the new type.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/examples/ingest.py`:
- Around line 41-46: The calls to generate_data, create_schema, and load_data
use positional arguments for functions that accept multiple parameters; update
these invocations to use keyword arguments so parameters are explicit and follow
the guideline — e.g., call generate_data(work_dir=WORK_DIR, years=years),
create_schema(conn=conn, schema_file=SCHEMA_FILE), and load_data(conn=conn,
data_dir=data_dir) (referencing the generate_data, create_schema, and load_data
symbols to locate the calls).
- Around line 44-47: The DuckDB connection opened via duckdb.connect (assigned
to conn) can leak if create_schema or load_data throws; wrap the calls that use
conn in a try/finally (or use a context manager) so that conn.close() is always
executed: ensure conn is closed in the finally block after calling
create_schema(conn, SCHEMA_FILE) and load_data(conn, data_dir), or use with
duckdb.connect(DB_PATH) as conn to guarantee cleanup.
In `@docs/examples/jaffle_shop_duckdb.py`:
- Around line 41-44: Calls to functions with multiple parameters should use
keyword arguments: replace the positional call subprocess.run(cmd,
cwd=output_dir, check=True) with subprocess.run(args=cmd, cwd=output_dir,
check=True) and update the generate_data(...) and load_data(...) invocations to
pass named parameters (e.g., years=..., days=..., output_dir=...) instead of
relying on positional ordering so each argument is explicit (refer to
subprocess.run, generate_data, and load_data in the diff).
- Around line 139-171: Wrap the sequence that uses the DuckDB connection (conn)
— the calls to create_schema(conn), load_data(conn, data_dir), verify(conn) and
the subsequent printing — in a try/finally so conn.close() is always executed;
specifically open conn = duckdb.connect(db_path) then try: perform the prints,
create_schema, load_data, verify and printing of results, and finally: call
conn.close(); if you want to preserve original error behavior re-raise the
exception after cleaning up.
In `@pyproject.toml`:
- Around line 54-55: The dev dependency lists in pyproject.toml are missing the
notebook runtime packages required by tests; update the extras so that nbclient
and nbformat are included (add "nbclient" and "nbformat" to the appropriate dev
dependencies and to the "all" extra) so tests/test_jaffle_shop_notebook.py
(which calls pytest.importorskip) will run under poetry install -E all; locate
and modify the entries that define examples and all (the "examples = [...]" and
"all = [...]" lists) to include these two packages.
In `@tests/test_jaffle_shop_duckdb.py`:
- Line 16: The test import includes an unused symbol LOAD_ORDER which triggers
Ruff F401; remove LOAD_ORDER from the import list so the statement only imports
TABLE_NAMES, create_schema, generate_data, load_data, and verify (i.e., update
the import in tests/test_jaffle_shop_duckdb.py to drop LOAD_ORDER).
- Line 23: The fixture calls use positional arguments for multi-parameter
helpers (notably generate_data) — update those calls to pass arguments by name
instead of position (e.g., pass the tmpdir path as the parameter name for the
path argument and use years= and days= for generate_data), and do the same for
the other multi-parameter helper calls in this fixture (the two other helper
invocations referenced in the comment) so they follow the project rule "use
keyword arguments for functions with more than 1 parameter."
---
Nitpick comments:
In `@docs/examples/jaffle_shop_schema.sql`:
- Line 13: Replace floating-point DOUBLE types used for monetary values with a
fixed-point type: change the tax_rate column definition (and any other
currency-related columns flagged elsewhere) from DOUBLE to a DECIMAL/NUMERIC
with appropriate precision and scale (for example DECIMAL(10,4) or
NUMERIC(10,4)) to avoid rounding drift; update the CREATE TABLE column
definitions (e.g., the tax_rate declaration and the other flagged DOUBLE
columns) to use the chosen DECIMAL/NUMERIC type and ensure any INSERTs/SELECTs
or schema docs reflect the new type.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1a945f3f-9c33-4650-a637-6b63919ff061
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
docs/examples/ingest.pydocs/examples/jaffle_shop_duckdb.pydocs/examples/jaffle_shop_ingest.ipynbdocs/examples/jaffle_shop_schema.sqlpyproject.tomltests/test_jaffle_shop_duckdb.pytests/test_jaffle_shop_notebook.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/test_jaffle_shop_notebook.py`:
- Around line 23-29: The call to nbformat.read and the NotebookClient
constructor use positional args; change nbformat.read(f, as_version=4) to use
the fp keyword (nbformat.read(fp=f, as_version=4)) and change
nbclient.NotebookClient(nb, timeout=600, kernel_name="python3",
resources={"metadata": {"path": os.path.dirname(NOTEBOOK_PATH)}}) to use the nb
keyword for the notebook argument (nbclient.NotebookClient(nb=nb, timeout=600,
kernel_name="python3", resources={"metadata": {"path":
os.path.dirname(NOTEBOOK_PATH)}})) so both multi-parameter function calls use
explicit keyword arguments.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e52a02eb-1860-4ae9-aecd-cc80fd99fd35
📒 Files selected for processing (8)
CLAUDE.mddocs/development.mdtests/integration/__init__.pytests/integration/test_integration.pytests/integration/test_integration_duckdb.pytests/integration/test_integration_postgres.pytests/integration/test_jaffle_shop_duckdb.pytests/integration/test_jaffle_shop_notebook.py
✅ Files skipped from review due to trivial changes (3)
- tests/integration/test_integration.py
- docs/development.md
- CLAUDE.md
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/integration/test_jaffle_shop_notebook.py (1)
22-31: Consider using context managers for proper resource cleanup.Two observations:
Missing encoding (line 22):
open()without explicit encoding may behave inconsistently across platforms. Useencoding="utf-8"for Jupyter notebooks.Kernel cleanup (lines 25-31):
NotebookClientsupports context manager protocol. Ifexecute()raises an exception or the test fails, the kernel process may be left orphaned. The related test fixture intest_jaffle_shop_duckdb.py(context snippet 2) demonstrates proper cleanup with explicit teardown.♻️ Proposed refactor using context managers
- with open(NOTEBOOK_PATH) as f: + with open(NOTEBOOK_PATH, encoding="utf-8") as f: nb = nbformat.read(fp=f, as_version=4) - client = nbclient.NotebookClient( + with nbclient.NotebookClient( nb=nb, timeout=600, kernel_name="python3", resources={"metadata": {"path": os.path.dirname(NOTEBOOK_PATH)}}, - ) - client.execute() + ) as client: + client.execute()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_jaffle_shop_notebook.py` around lines 22 - 31, Open the notebook file using open(NOTEBOOK_PATH, encoding="utf-8") and pass the file handle to nbformat.read to avoid platform-dependent decoding, then instantiate nbclient.NotebookClient using its context manager (e.g., with nbclient.NotebookClient(nb=nb, timeout=600, kernel_name="python3", resources={"metadata": {"path": os.path.dirname(NOTEBOOK_PATH)}}) as client:) and call client.execute() inside that with-block so the kernel is always cleaned up on success or exception; reference NOTEBOOK_PATH, nbformat.read, nbclient.NotebookClient, and client.execute to locate where to apply these changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/integration/test_jaffle_shop_notebook.py`:
- Around line 22-31: Open the notebook file using open(NOTEBOOK_PATH,
encoding="utf-8") and pass the file handle to nbformat.read to avoid
platform-dependent decoding, then instantiate nbclient.NotebookClient using its
context manager (e.g., with nbclient.NotebookClient(nb=nb, timeout=600,
kernel_name="python3", resources={"metadata": {"path":
os.path.dirname(NOTEBOOK_PATH)}}) as client:) and call client.execute() inside
that with-block so the kernel is always cleaned up on success or exception;
reference NOTEBOOK_PATH, nbformat.read, nbclient.NotebookClient, and
client.execute to locate where to apply these changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d2b47999-07de-4ca9-bc78-0bca7c9ac0fc
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
docs/examples/jaffle_shop_duckdb.pypyproject.tomltests/integration/test_jaffle_shop_notebook.py
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/examples/jaffle_shop_duckdb.py
- pyproject.toml
…ev-1212-jaffle-shop-to-duckdb-ingestion-first-notebook
…ev-1212-jaffle-shop-to-duckdb-ingestion-first-notebook
…SQL and query-level dimension/measure references:
1. Auto-convert multi-dot references in model SQL snippets
Model filters, dimension sql, and measure sql fields are SQL snippets that must use __ alias syntax for multi-hop join paths. Multi-dot references (e.g., customers.regions.name) are now auto-converted to the correct form (customers__regions.name) with a warning, instead of silently producing invalid SQL.
2. Reject __ in entity names
Model, dimension, measure, and query names now reject double underscores, which are reserved for join path aliases in generated SQL.
3. Resolve query filter columns through model dimensions
Query-level filters now resolve column references through model dimensions and measures rather than passing them as raw SQL. Bare names are looked up on the source model; dotted names walk the join graph to find the target model and use the dimension's sql expression. This fixes multi-hop filters like "customers.regions.name == 'US'" which previously produced invalid SQL.
… example notebooks.
…ev-1212-jaffle-shop-to-duckdb-ingestion-first-notebook # Conflicts: # poetry.lock # pyproject.toml
Summary by CodeRabbit
New Features
Documentation
Tests
Chores