Dev by jorgedanisc · Pull Request #229 · ducflair/duc

jorgedanisc · 2026-03-20T03:56:46Z

No description provided.

…n SQL migrations for the builder - use the sql_builder instead of hardcoding db connectivity on the search

First round of advanced search on ducpy

Jorgedanisc

github-actions · 2026-03-20T03:59:53Z

📋 Release Preview

Package	Current	Next	Tests
ducrs	`3.2.0`	–	⏭️ No release
ducjs	`3.2.0`	–	⏭️ No release
ducpy	`3.2.0`	`3.2.1`	✅ Passed
ducpdf	`3.7.0`	–	⏭️ No release
ducsvg	`3.6.0`	–	⏭️ No release
duc2pdf	`3.2.0`	–	⏭️ No release

Copilot

Pull request overview

This PR updates the DUC SQLite schema to enhance FTS5-based search (diacritic-insensitive tokenization + prefix indexing), introduces a migration to rebuild existing FTS tables, and adds a Python (ducpy) search API with tests while ensuring Python-side schema migrations are applied when opening .duc databases.

Changes:

Rebuild/upgrade FTS5 virtual tables to use unicode61 remove_diacritics 2 and prefix=..., plus add explicit backfill (rebuild) steps.
Add schema migration 3000002 → 3000003 and bump PRAGMA user_version to 3000003.
Add ducpy.search module (hybrid FTS + Python reranking) with new pytest coverage, and update DucSQL to apply SQL migrations on open.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
schema/search.sql	Adds tokenizer/prefix settings and rebuild statements for FTS tables; switches to `IF NOT EXISTS` for FTS tables/triggers.
schema/migrations/3000002_to_3000003.sql	New migration that drops/recreates FTS tables and rebuilds the indexes; bumps `user_version`.
schema/duc.sql	Bumps `PRAGMA user_version` to `3000003`.
packages/ducrs/src/api/version_control.rs	Removes outdated module doc bullet.
packages/ducpy/src/ducpy/builders/sql_builder.py	Adds migration discovery/application to keep opened DBs current.
packages/ducpy/src/ducpy/search/search_elements.py	New Python search implementation and result types.
packages/ducpy/src/ducpy/search/init.py	Exposes the search API from the new package.
packages/ducpy/src/ducpy/init.py	Re-exports search helpers at top-level `ducpy`.
packages/ducpy/src/tests/src/test_search_module.py	New tests covering search behavior (exact, prefix, grouping, limit, empty query).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T04:01:54Z

schema/migrations/3000002_to_3000003.sql

+-- 1. Drop old FTS tables (triggers are automatically dropped)
+DROP TABLE IF EXISTS search_elements;
+DROP TABLE IF EXISTS search_element_text;
+DROP TABLE IF EXISTS search_element_doc;
+DROP TABLE IF EXISTS search_element_model;
+DROP TABLE IF EXISTS search_blocks;


Dropping the FTS tables will NOT drop the existing sync triggers, because the triggers are defined on elements / element_text / etc (not on the FTS tables). On an upgrade, the old trg_* triggers will still exist, so the subsequent CREATE TRIGGER trg_* statements will fail with "trigger already exists" (and/or keep stale trigger definitions if you switch to IF NOT EXISTS). Explicitly DROP TRIGGER IF EXISTS ... for each trg_* before recreating the tables/triggers, and consider fixing the misleading comment here.

Copilot · 2026-03-20T04:01:54Z

packages/ducpy/src/ducpy/builders/sql_builder.py

+        conn.executescript(sql)
+        current = conn.execute("PRAGMA user_version").fetchone()[0]


_apply_migrations can get stuck in an infinite loop if a migration script fails to advance PRAGMA user_version (e.g., script error, missing final PRAGMA, or PRAGMA guarded by a conditional). Add a progress check after executescript (e.g., ensure the new user_version is > current and ideally equals the expected to_v) and raise a clear error if not.

Suggested change

conn.executescript(sql)

current = conn.execute("PRAGMA user_version").fetchone()[0]

prev_version = current

conn.executescript(sql)

new_version = conn.execute("PRAGMA user_version").fetchone()[0]

if new_version <= prev_version:

raise RuntimeError(

"Migration did not advance PRAGMA user_version: "

f"attempted to migrate from {prev_version} to {to_v}, "

f"but user_version is still {new_version}."

)

if new_version != to_v:

raise RuntimeError(

"Migration set unexpected PRAGMA user_version: "

f"attempted to migrate from {prev_version} to {to_v}, "

f"but user_version is {new_version}."

)

current = new_version

Copilot · 2026-03-20T04:01:54Z

packages/ducpy/src/ducpy/search/search_elements.py

+@dataclass(slots=True)
+class DucSearchResult:
+    """Compatibility placeholder for exported result types."""
+
+
+@dataclass(slots=True)
+class DucElementSearchResult:


DucSearchResult is defined twice: first as a (empty) dataclass, then later reassigned to a DucElementSearchResult | DucFileSearchResult union type alias. The second assignment overwrites the class at runtime, which is confusing and can break consumers expecting DucSearchResult to be a class. Either remove the placeholder dataclass, or rename it (e.g., DucSearchResultBase) and keep DucSearchResult exclusively as the type alias (or vice versa).

Copilot · 2026-03-20T04:01:54Z

packages/ducpy/src/ducpy/search/search_elements.py

+    for _variant_name, _expression, variant_boost in _build_query_variants(query):
+        for element in elements:
+            if element.get("is_deleted"):
+                continue
+
+            element_id = element.get("id")
+            element_type = element.get("type")
+            if not element_id or not element_type:
+                continue
+
+            aggregate = aggregates.get(element_id)
+            if aggregate is None:
+                aggregate = _ElementAggregate(
+                    element_id=element_id,
+                    raw_element_type=element_type,
+                    label=element.get("label") or "",
+                    description=element.get("description"),
+                )
+                aggregates[element_id] = aggregate
+
+            for field_name, source_weight in field_weights.items():
+                raw_text = element.get(field_name)
+                score, _similarity = _evaluate_match_text(
+                    query,
+                    raw_text,
+                    fts_rank=None,
+                    source_weight=source_weight,
+                    variant_boost=variant_boost,
+                )
+                if score > 0.0 and raw_text:
+                    aggregate.add_match(raw_text, score)


In _collect_candidates_from_parsed_duc, the loop over _build_query_variants(query) ignores the generated FTS expressions and only uses variant_boost. Because the boosts are all <= 1.0 and _evaluate_match_text doesn't depend on the expression, this repeats the same work up to 3× without affecting the final best score (the 1.0-boost pass dominates). Consider removing the variants loop for the non-SQLite path (or refactor so variants actually change the matching logic) to avoid unnecessary CPU work.

Suggested change

for _variant_name, _expression, variant_boost in _build_query_variants(query):

for element in elements:

if element.get("is_deleted"):

continue

element_id = element.get("id")

element_type = element.get("type")

if not element_id or not element_type:

continue

aggregate = aggregates.get(element_id)

if aggregate is None:

aggregate = _ElementAggregate(

element_id=element_id,

raw_element_type=element_type,

label=element.get("label") or "",

description=element.get("description"),

)

aggregates[element_id] = aggregate

for field_name, source_weight in field_weights.items():

raw_text = element.get(field_name)

score, _similarity = _evaluate_match_text(

query,

raw_text,

fts_rank=None,

source_weight=source_weight,

variant_boost=variant_boost,

)

if score > 0.0 and raw_text:

aggregate.add_match(raw_text, score)

# For non-SQLite search, query variants do not affect matching logic,

# and the highest boost is 1.0, so we only need a single pass here.

variant_boost = 1.0

for element in elements:

if element.get("is_deleted"):

continue

element_id = element.get("id")

element_type = element.get("type")

if not element_id or not element_type:

continue

aggregate = aggregates.get(element_id)

if aggregate is None:

aggregate = _ElementAggregate(

element_id=element_id,

raw_element_type=element_type,

label=element.get("label") or "",

description=element.get("description"),

)

aggregates[element_id] = aggregate

for field_name, source_weight in field_weights.items():

raw_text = element.get(field_name)

score, _similarity = _evaluate_match_text(

query,

raw_text,

fts_rank=None,

source_weight=source_weight,

variant_boost=variant_boost,

)

if score > 0.0 and raw_text:

aggregate.add_match(raw_text, score)

Copilot · 2026-03-20T04:01:55Z

packages/ducpy/src/tests/src/test_search_module.py

+    asset_path = _asset_input_path(asset_name)
+    assert asset_path.exists(), f"Missing asset file: {asset_path}"
+
+    json_path = Path(test_output_dir+"/search_results") /  f"{test_name}.json"


Path construction here uses string concatenation with a hard-coded / (test_output_dir + "/search_results"), which is not portable on Windows and bypasses pathlib's normalization. Prefer Path(test_output_dir) / "search_results" / f"{test_name}.json" (and avoid the double-space before the f-string).

Suggested change

json_path = Path(test_output_dir+"/search_results") / f"{test_name}.json"

json_path = Path(test_output_dir) / "search_results" / f"{test_name}.json"

Copilot · 2026-03-20T04:01:55Z

packages/ducpy/src/ducpy/search/__init__.py

+from .search_elements import (DucElementSearchResult, DucFileSearchResult,
+                       DucSearchResponse, DucSearchResult,
+                       search_duc_elements)


The import list formatting/indentation is inconsistent here, which makes the module harder to scan and is likely to fight auto-formatters. Reformat this into a standard multi-line import (one name per line or a single line if it fits) to keep style consistent.

Suggested change

from .search_elements import (DucElementSearchResult, DucFileSearchResult,

DucSearchResponse, DucSearchResult,

search_duc_elements)

from .search_elements import (

DucElementSearchResult,

DucFileSearchResult,

DucSearchResponse,

DucSearchResult,

search_duc_elements,

)

github-actions · 2026-03-20T04:03:02Z

🎉 This PR is included in version 3.2.1 🎉

The release is available on:

ducpy@3.2.1
GitHub release

Your semantic-release bot 📦🚀

github-actions · 2026-04-06T10:58:22Z

🎉 This PR is included in version 3.2.1 🎉

The release is available on:

GitHub release
ducrs@3.2.1

Your semantic-release bot 📦🚀

github-actions · 2026-04-06T11:02:14Z

🎉 This PR is included in version 3.2.1 🎉

The release is available on:

ducjs@3.2.1
GitHub release

Your semantic-release bot 📦🚀

github-actions · 2026-04-06T11:03:25Z

🎉 This PR is included in version 3.2.1 🎉

The release is available on:

GitHub release
duc2pdf@3.2.1

Your semantic-release bot 📦🚀

github-actions · 2026-04-06T11:12:10Z

🎉 This PR is included in version 3.7.1 🎉

The release is available on:

ducpdf@3.7.1
GitHub release

Your semantic-release bot 📦🚀

github-actions · 2026-04-06T11:22:11Z

🎉 This PR is included in version 3.6.1 🎉

The release is available on:

ducsvg@3.6.1
GitHub release

Your semantic-release bot 📦🚀

1220823 and others added 7 commits March 10, 2026 01:29

created search module

9f5a34f

small tests

9436cf8

docs: remove pruning old versions from API description

9cbd8cd

Merge remote-tracking branch 'origin/jorgedanisc' into fil

96a3a39

fix(migrations): add FTS5 search table migration and auto detection o…

48f98d1

…n SQL migrations for the builder - use the sql_builder instead of hardcoding db connectivity on the search

Merge pull request #227 from ducflair/fil

fedc64b

First round of advanced search on ducpy

Merge pull request #228 from ducflair/jorgedanisc

85ba28c

Jorgedanisc

jorgedanisc self-assigned this Mar 20, 2026

Copilot AI review requested due to automatic review settings March 20, 2026 03:56

Copilot started reviewing on behalf of jorgedanisc March 20, 2026 03:57 View session

jorgedanisc merged commit 8091c5c into main Mar 20, 2026
5 checks passed

Copilot AI reviewed Mar 20, 2026

View reviewed changes

github-actions bot added the released label Mar 20, 2026

		conn.executescript(sql)
		current = conn.execute("PRAGMA user_version").fetchone()[0]

-        conn.executescript(sql)
-        current = conn.execute("PRAGMA user_version").fetchone()[0]
+        prev_version = current
+        conn.executescript(sql)
+        new_version = conn.execute("PRAGMA user_version").fetchone()[0]
+        if new_version <= prev_version:
+            raise RuntimeError(
+                "Migration did not advance PRAGMA user_version: "
+                f"attempted to migrate from {prev_version} to {to_v}, "
+                f"but user_version is still {new_version}."
+            )
+        if new_version != to_v:
+            raise RuntimeError(
+                "Migration set unexpected PRAGMA user_version: "
+                f"attempted to migrate from {prev_version} to {to_v}, "
+                f"but user_version is {new_version}."
+            )
+        current = new_version

	json_path = Path(test_output_dir+"/search_results") / f"{test_name}.json"
	json_path = Path(test_output_dir) / "search_results" / f"{test_name}.json"

Conversation

jorgedanisc commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

📋 Release Preview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants