fix(wren-ai-service): make DDLChunker synchronous to fix asyncio Task passed to embedding node by octo-patch · Pull Request #2194 · Canner/WrenAI

octo-patch · 2026-04-19T04:44:33Z

Problem

When Hamilton's AsyncDriver executes the DB schema indexing DAG, it wraps async nodes in asyncio Tasks. Under complex MDL schemas (many tables with many relationships), the async chunk node's Task was sometimes passed unawaited to the downstream embedding node.

This caused embedding to receive an asyncio Task object (e.g. "<Task finished name='Task-1479' coro=<AsyncGraphAd...") instead of the actual {"documents": [...]} dict, which then caused embedder.run() to send invalid content to the embedding endpoint, returning a 400 Bad Request from Ollama.

The bug was consistently reproducible with complex MDL schemas (20+ tables, 30+ relationships) but did not appear with simpler schemas, matching the issue description.

Solution

Make DDLChunker.run() and its internal helpers synchronous, eliminating the async Task from the Hamilton DAG. This matches the pattern used by all other indexing pipelines (HistoricalQuestion, TableDescription, ProjectMeta, SqlPairs) where the chunk node is already synchronous.

The async machinery in _model_preprocessor (using asyncio.gather) was unnecessary because:

MODEL_PREPROCESSORS is empty by default — the await in the dict comprehension never executes
All actual work (string manipulation, dict building) is CPU-bound and has no I/O to parallelize

Changes:

DDLChunker.run(): async def → def, removes await
DDLChunker._get_ddl_commands(): async def → def, removes await
DDLChunker._model_preprocessor(): async def → def, removes asyncio.gather
DDLChunker._preprocessor() (inner): async def → def, removes await in dict comprehension
chunk() Hamilton node: async def → def, removes await
Removes now-unused import asyncio
Updates tests to call chunker.run() synchronously

Testing

All existing unit tests in test_db_schema.py updated and pass with the sync interface
The test_pipeline_run integration test (which uses the full DBSchema Hamilton pipeline) continues to work since the downstream embedding, clean, and write nodes remain async

Summary by CodeRabbit

Refactor
- Converted the database schema indexing pipeline from asynchronous to synchronous execution, simplifying the DDL chunking and preprocessing flow while preserving indexing behavior.
Tests
- Updated test suite to match the synchronous pipeline changes, converting async tests to synchronous equivalents.

…to embedding node When Hamilton's AsyncDriver executes the indexing DAG, it wraps async nodes in asyncio Tasks. Under complex MDL schemas with many relationships, the async chunk node's Task was being passed unawaited to the downstream embedding node instead of the actual dict result, causing the embedder to receive an asyncio Task repr string rather than the document chunks. This makes DDLChunker.run() and its helpers synchronous, matching the pattern used by all other indexing pipelines (historical_question, table_description, project_meta). The async machinery in _model_preprocessor was unnecessary since MODEL_PREPROCESSORS is empty by default and all helper operations are CPU-bound string manipulations. Update tests to call chunker.run() synchronously accordingly. Fixes Canner#2138

coderabbitai · 2026-04-19T04:44:45Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ff467508-971a-41c7-b1f2-8b310bf8faa0

📥 Commits

Reviewing files that changed from the base of the PR and between 166e4af and 33999d1.

📒 Files selected for processing (1)

wren-ai-service/tests/pytest/pipelines/indexing/test_db_schema.py

Walkthrough

Convert the DB schema DDL chunking pipeline from asynchronous to synchronous execution by removing async/await from DDLChunker methods and the chunk pipeline stage, and update tests to call the synchronous APIs.

Changes

Cohort / File(s)	Summary
DB Schema Pipeline Synchronization `wren-ai-service/src/pipelines/indexing/db_schema.py`	Changed `DDLChunker.run`, `DDLChunker._model_preprocessor`, `DDLChunker._get_ddl_commands`, and the `chunk` stage from `async def` to `def`; removed `asyncio.gather` and replaced awaited helper calls with direct synchronous calls.
Test Updates (sync conversion) `wren-ai-service/tests/pytest/pipelines/indexing/test_db_schema.py`	Converted asyncio tests to synchronous: removed `@pytest.mark.asyncio`/`async def` where applicable and replaced `await chunker.run(...)` with `chunker.run(...)` (one test still awaits `pipe.run(...)`).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nibbled at async threads all night,

Now syncy carrots gleam in morning light.
No Task-shaped shadows left to fear,
Chunks are plain text — the path is clear. 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(wren-ai-service): make DDLChunker synchronous to fix asyncio Task passed to embedding node' accurately describes the main change: converting DDLChunker from async to synchronous to resolve the issue where asyncio Tasks were incorrectly passed to the embedding node.
Linked Issues check	✅ Passed	The PR addresses all coding requirements from issue `#2138`: converting DDLChunker.run(), _get_ddl_commands(), _model_preprocessor(), and chunk() from async to sync functions, removing asyncio.gather and await statements, updating call sites, and updating tests accordingly.
Out of Scope Changes check	✅ Passed	All changes are within scope: the PR only modifies DDLChunker and related test functions to convert from async to sync, directly addressing issue `#2138` without introducing unrelated modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

wren-ai-service/tests/pytest/pipelines/indexing/test_db_schema.py (1)
11-547: Optional: drop async/@pytest.mark.asyncio from chunker-only tests.

After this change, every test_* from test_empty_mdl through test_metric contains no await and only exercises the now-synchronous chunker.run(...). The async def signature and @pytest.mark.asyncio marker are redundant for these and can be converted to plain def to make the sync contract of DDLChunker.run explicit in the tests. test_pipeline_run (Line 550) must stay async since it awaits pipe.run(...) at Line 619.
♻️ Example diff for one test (apply the same pattern to the other chunker-only tests)
-@pytest.mark.asyncio
-async def test_empty_mdl():
+def test_empty_mdl():
     chunker = DDLChunker()
     mdl = {"models": [], "views": [], "relationships": [], "metrics": []}

     document = chunker.run(mdl, column_batch_size=1)
     assert document == {"documents": []}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren-ai-service/tests/pytest/pipelines/indexing/test_db_schema.py` around
lines 11 - 547, The tests that only call the synchronous DDLChunker.run
(test_empty_mdl, test_single_model, test_multiple_models,
test_column_is_primary_key, test_column_with_properties,
test_column_with_nested_columns, test_column_with_calculated_property,
test_column_with_relationship, test_column_batch_size, test_view,
test_view_with_properties, test_metric) should be converted from async def +
`@pytest.mark.asyncio` to plain def without the marker; remove the
`@pytest.mark.asyncio` decorator and change each async def test_* to def so they
clearly reflect that DDLChunker.run is synchronous, but leave test_pipeline_run
(which awaits pipe.run(...)) as async with its marker intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@wren-ai-service/tests/pytest/pipelines/indexing/test_db_schema.py`:
- Around line 11-547: The tests that only call the synchronous DDLChunker.run
(test_empty_mdl, test_single_model, test_multiple_models,
test_column_is_primary_key, test_column_with_properties,
test_column_with_nested_columns, test_column_with_calculated_property,
test_column_with_relationship, test_column_batch_size, test_view,
test_view_with_properties, test_metric) should be converted from async def +
`@pytest.mark.asyncio` to plain def without the marker; remove the
`@pytest.mark.asyncio` decorator and change each async def test_* to def so they
clearly reflect that DDLChunker.run is synchronous, but leave test_pipeline_run
(which awaits pipe.run(...)) as async with its marker intact.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 926a368c-bf58-4f8e-850d-9e90e31c8e5a

📥 Commits

Reviewing files that changed from the base of the PR and between b6d5932 and 166e4af.

📒 Files selected for processing (2)

wren-ai-service/src/pipelines/indexing/db_schema.py
wren-ai-service/tests/pytest/pipelines/indexing/test_db_schema.py

DDLChunker.run() is now synchronous, so the chunker test cases no longer need pytest.mark.asyncio or async def. Only test_pipeline_run keeps async because it still awaits DBSchema.run.

octo-patch · 2026-04-19T05:27:39Z

Thanks for the careful review @coderabbitai!

Pushed 33999d1a which drops @pytest.mark.asyncio and async from the chunker-only tests now that DDLChunker.run() is synchronous. test_pipeline_run keeps the async decorator because it still awaits DBSchema.run(...).

Please re-review when you have a chance.

coderabbitai · 2026-04-19T05:28:27Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me pull up the current state of the files to verify everything looks good.

🏁 Script executed:

#!/bin/bash
cat wren-ai-service/tests/pytest/pipelines/indexing/test_db_schema.py

Length of output: 18048

🏁 Script executed:

#!/bin/bash
cat wren-ai-service/src/pipelines/indexing/db_schema.py

Length of output: 12861

@octo-patch: The changes in commit 33999d1 look great! Here's a summary of what I verified:

Test file (test_db_schema.py) ✅

All 12 chunker-only tests (test_empty_mdl through test_metric) are now plain synchronous def with no @pytest.mark.asyncio decorator — exactly right since DDLChunker.run() is now synchronous.
test_pipeline_run correctly retains @pytest.mark.asyncio + async def + await pipe.run(...) because it exercises the full DBSchema async pipeline.

Source file (db_schema.py) ✅

DDLChunker.run(), _model_preprocessor(), _get_ddl_commands(), and the chunk() Hamilton node are all plain synchronous def — no await, no asyncio.
The downstream Hamilton nodes (embedding, clean, write) and DBSchema.run() correctly remain async def, so the AsyncDriver graph still works end-to-end.
No import asyncio remains in the file.

The sync/async boundary is clean and the fix directly addresses the root cause: Hamilton's AsyncDriver no longer wraps chunk() output in an unawaited asyncio.Task before it reaches the embedding node. LGTM 🐇

coderabbitai bot reviewed Apr 19, 2026

View reviewed changes

test(wren-ai-service): drop async/asyncio from chunker-only tests

33999d1

DDLChunker.run() is now synchronous, so the chunker test cases no longer need pytest.mark.asyncio or async def. Only test_pipeline_run keeps async because it still awaits DBSchema.run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(wren-ai-service): make DDLChunker synchronous to fix asyncio Task passed to embedding node#2194

fix(wren-ai-service): make DDLChunker synchronous to fix asyncio Task passed to embedding node#2194
octo-patch wants to merge 2 commits intoCanner:mainfrom
octo-patch:fix/issue-2138-sync-ddl-chunker

octo-patch commented Apr 19, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 19, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

octo-patch commented Apr 19, 2026

Uh oh!

coderabbitai bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

octo-patch commented Apr 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

octo-patch commented Apr 19, 2026

Uh oh!

coderabbitai bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

octo-patch commented Apr 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 19, 2026 •

edited

Loading