First round of advanced search on ducpy by jorgedanisc · Pull Request #227 · ducflair/duc

jorgedanisc · 2026-03-20T02:22:44Z

No description provided.

Jorgedanisc

ci: update semantic-release CI environment variables

Jorgedanisc

ci: update PR comment handling to use POST method

Jorgedanisc

ci: update PR comment API endpoint and ducjs package

fix: new release references wasm as internal

fix: remove pkg from files and update wasm paths

fix: improve delta encoding on the version graph

feat: restructure external files schema with revisions

Copilot

Pull request overview

Adds an “advanced search” capability to ducpy by introducing an FTS5-backed element search pipeline (with Python-side scoring/grouping) and corresponding tests, plus updates to the SQLite search schema to improve prefix/diacritic-insensitive matching.

Changes:

Enhanced schema/search.sql FTS5 configuration (tokenizer, prefix indexing, idempotent creation) and added index rebuild statements.
Introduced ducpy.search module (search_duc_elements, schema ensure/rebuild, result types) with hybrid SQLite+Python ranking.
Added a dedicated pytest suite validating exact, prefix, grouping, empty-query, and limit behaviors.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
schema/search.sql	Improves FTS5 configuration (unicode61 + prefix) and adds rebuild backfill statements.
packages/ducpy/src/ducpy/search/search_elements.py	Implements the search API, schema bootstrap, candidate scoring, and grouped results logic.
packages/ducpy/src/ducpy/search/init.py	Exposes the search API/types from the `ducpy.search` package.
packages/ducpy/src/ducpy/init.py	Re-exports search helpers at the top-level `ducpy` namespace.
packages/ducpy/src/tests/src/test_search_module.py	Adds integration-style tests exercising search behaviors against real `.duc` assets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T02:27:50Z

packages/ducpy/src/ducpy/search/search_elements.py

+def _find_schema_dir() -> Path:
+    current = Path(__file__).resolve()
+    for parent in current.parents:
+        candidate = parent / "schema"
+        if (candidate / "search.sql").exists():
+            return candidate
+    raise FileNotFoundError(
+        "Could not locate schema/search.sql. Ensure the search module is running inside the DUC repository."
+    )
+
+
+def _read_search_schema_sql() -> str:
+    return (_find_schema_dir() / "search.sql").read_text(encoding="utf-8")


_find_schema_dir() searches for schema/search.sql by walking up the filesystem and raises if not running inside the monorepo. Since ducpy is distributed via PyPI, this will break in normal installations where the workspace-level schema/ directory is not present. Consider shipping search.sql as package data (e.g., under ducpy/search/schema/) and loading it via importlib.resources, or embedding the SQL at build time, so ensure_search_schema() works outside the repo.

Copilot · 2026-03-20T02:27:51Z

packages/ducpy/src/ducpy/search/search_elements.py

+@dataclass(slots=True)
+class DucSearchResult:
+    """Compatibility placeholder for exported result types."""
+
+
+@dataclass(slots=True)
+class DucElementSearchResult:
+    """One result row for a single element."""
+
+    element_id: str
+    element_type: str
+    matches: list[str]
+    score: float
+
+    def to_dict(self) -> dict[str, Any]:
+        return {
+            "element_id": self.element_id,
+            "element_type": self.element_type,
+            "matches": self.matches,
+            "score": round(self.score, 6),
+        }
+
+
+@dataclass(slots=True)
+class DucFileSearchResult:
+    """One grouped result row for repeated file-backed elements."""
+
+    file_id: str
+    element_type: str
+    matches: list[str]
+    score: float
+    hits: int
+    element_ids: list[str]
+
+    def to_dict(self) -> dict[str, Any]:
+        return {
+            "file_id": self.file_id,
+            "element_type": self.element_type,
+            "matches": self.matches,
+            "score": round(self.score, 6),
+            "hits": self.hits,
+            "element_ids": self.element_ids,
+        }
+
+
+DucSearchResult = DucElementSearchResult | DucFileSearchResult
+
+


DucSearchResult is first declared as an (empty) dataclass and then immediately reassigned to a union type alias. This makes the placeholder class unreachable and can confuse runtime users (e.g., isinstance(x, DucSearchResult) won’t work) and type checkers. Either remove the placeholder dataclass and declare the union using TypeAlias, or keep DucSearchResult as a real base class and have the concrete result types inherit from it.

Copilot · 2026-03-20T02:27:51Z

packages/ducpy/src/ducpy/search/search_elements.py

+def ensure_search_schema(conn: sqlite3.Connection, rebuild: bool = False) -> None:
+    """Create missing FTS search tables and triggers for a DUC database.
+
+    Args:
+        conn: Open SQLite connection for a ``.duc`` database.
+        rebuild: When ``True``, reruns the FTS rebuild statements even if the
+            search schema already exists.
+    """
+
+    existing_tables = {
+        row[0]
+        for row in conn.execute("SELECT name FROM sqlite_master WHERE type IN ('table', 'view')")
+    }
+    if rebuild or not _SEARCH_TABLES.issubset(existing_tables):
+        conn.executescript(_read_search_schema_sql())
+        conn.commit()


ensure_search_schema() only checks for the presence of the FTS tables, but the schema’s correctness also depends on the associated triggers existing (to keep the external-content FTS indexes in sync). If tables exist but triggers are missing (or were created under different names), this function will skip initialization and the index will silently stop updating. Consider also checking sqlite_master for the expected trigger names (or a schema version marker) before deciding to skip schema application.

Copilot · 2026-03-20T02:27:51Z

schema/search.sql

+CREATE VIRTUAL TABLE IF NOT EXISTS search_elements USING fts5(
    label,
    description,
    content='elements',
-    content_rowid='rowid'
+    content_rowid='rowid',
+    tokenize='unicode61 remove_diacritics 2',
+    prefix='2 3 4 5 6 7 8 9 10'
 );


Using IF NOT EXISTS for the FTS virtual tables/triggers means that running this script against a database that already has an older version of these objects will not apply changes (e.g., tokenizer/prefix options or trigger definitions). If you expect this script to serve as an upgrade path (not just first-time creation), consider explicitly dropping/recreating the FTS tables and triggers (or provide a versioned migration path) so schema changes take effect.

…n SQL migrations for the builder - use the sql_builder instead of hardcoding db connectivity on the search

github-actions · 2026-03-20T04:03:07Z

🎉 This PR is included in version 3.2.1 🎉

The release is available on:

ducpy@3.2.1
GitHub release

Your semantic-release bot 📦🚀

github-actions · 2026-04-06T10:58:28Z

🎉 This PR is included in version 3.2.1 🎉

The release is available on:

GitHub release
ducrs@3.2.1

Your semantic-release bot 📦🚀

github-actions · 2026-04-06T11:02:20Z

🎉 This PR is included in version 3.2.1 🎉

The release is available on:

ducjs@3.2.1
GitHub release

Your semantic-release bot 📦🚀

github-actions · 2026-04-06T11:03:31Z

🎉 This PR is included in version 3.2.1 🎉

The release is available on:

GitHub release
duc2pdf@3.2.1

Your semantic-release bot 📦🚀

github-actions · 2026-04-06T11:12:15Z

🎉 This PR is included in version 3.7.1 🎉

The release is available on:

ducpdf@3.7.1
GitHub release

Your semantic-release bot 📦🚀

github-actions · 2026-04-06T11:22:17Z

🎉 This PR is included in version 3.6.1 🎉

The release is available on:

ducsvg@3.6.1
GitHub release

Your semantic-release bot 📦🚀

jorgedanisc and others added 12 commits February 28, 2026 18:06

Merge pull request #206 from ducflair/jorgedanisc

6e45428

Jorgedanisc

Merge pull request #208 from ducflair/jorgedanisc

1457dca

ci: update semantic-release CI environment variables

Merge pull request #209 from ducflair/jorgedanisc

0eebfa4

Jorgedanisc

Merge pull request #210 from ducflair/jorgedanisc

de1e7bf

ci: update PR comment handling to use POST method

Merge pull request #211 from ducflair/jorgedanisc

a7374f0

Jorgedanisc

Merge pull request #213 from ducflair/jorgedanisc

124e825

ci: update PR comment API endpoint and ducjs package

Merge pull request #214 from ducflair/jorgedanisc

2897452

fix: new release references wasm as internal

Merge pull request #216 from ducflair/jorgedanisc

f5dc2cf

fix: remove pkg from files and update wasm paths

Merge pull request #218 from ducflair/jorgedanisc

e26dffc

fix: improve delta encoding on the version graph

Merge pull request #220 from ducflair/jorgedanisc

be7e220

feat: restructure external files schema with revisions

created search module

9f5a34f

small tests

9436cf8

jorgedanisc self-assigned this Mar 20, 2026

Copilot AI review requested due to automatic review settings March 20, 2026 02:22

Copilot started reviewing on behalf of jorgedanisc March 20, 2026 02:23 View session

Merge remote-tracking branch 'origin/jorgedanisc' into fil

96a3a39

Copilot AI reviewed Mar 20, 2026

View reviewed changes

fix(migrations): add FTS5 search table migration and auto detection o…

48f98d1

…n SQL migrations for the builder - use the sql_builder instead of hardcoding db connectivity on the search

jorgedanisc merged commit fedc64b into jorgedanisc Mar 20, 2026

github-actions bot added the released label Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First round of advanced search on ducpy#227

First round of advanced search on ducpy#227
jorgedanisc merged 14 commits intojorgedaniscfrom
fil

jorgedanisc commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jorgedanisc commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants