Skip to content

First round of advanced search on ducpy#227

Merged
jorgedanisc merged 14 commits intojorgedaniscfrom
fil
Mar 20, 2026
Merged

First round of advanced search on ducpy#227
jorgedanisc merged 14 commits intojorgedaniscfrom
fil

Conversation

@jorgedanisc
Copy link
Copy Markdown
Contributor

No description provided.

jorgedanisc and others added 12 commits February 28, 2026 18:06
ci: update semantic-release CI environment variables
ci: update PR comment handling to use POST method
ci: update PR comment API endpoint and ducjs package
fix: new release references wasm as internal
fix: remove pkg from files and update wasm paths
fix: improve delta encoding on the version graph
feat: restructure external files schema with revisions
@jorgedanisc jorgedanisc self-assigned this Mar 20, 2026
Copilot AI review requested due to automatic review settings March 20, 2026 02:22
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an “advanced search” capability to ducpy by introducing an FTS5-backed element search pipeline (with Python-side scoring/grouping) and corresponding tests, plus updates to the SQLite search schema to improve prefix/diacritic-insensitive matching.

Changes:

  • Enhanced schema/search.sql FTS5 configuration (tokenizer, prefix indexing, idempotent creation) and added index rebuild statements.
  • Introduced ducpy.search module (search_duc_elements, schema ensure/rebuild, result types) with hybrid SQLite+Python ranking.
  • Added a dedicated pytest suite validating exact, prefix, grouping, empty-query, and limit behaviors.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
schema/search.sql Improves FTS5 configuration (unicode61 + prefix) and adds rebuild backfill statements.
packages/ducpy/src/ducpy/search/search_elements.py Implements the search API, schema bootstrap, candidate scoring, and grouped results logic.
packages/ducpy/src/ducpy/search/init.py Exposes the search API/types from the ducpy.search package.
packages/ducpy/src/ducpy/init.py Re-exports search helpers at the top-level ducpy namespace.
packages/ducpy/src/tests/src/test_search_module.py Adds integration-style tests exercising search behaviors against real .duc assets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +249 to +261
def _find_schema_dir() -> Path:
current = Path(__file__).resolve()
for parent in current.parents:
candidate = parent / "schema"
if (candidate / "search.sql").exists():
return candidate
raise FileNotFoundError(
"Could not locate schema/search.sql. Ensure the search module is running inside the DUC repository."
)


def _read_search_schema_sql() -> str:
return (_find_schema_dir() / "search.sql").read_text(encoding="utf-8")
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_find_schema_dir() searches for schema/search.sql by walking up the filesystem and raises if not running inside the monorepo. Since ducpy is distributed via PyPI, this will break in normal installations where the workspace-level schema/ directory is not present. Consider shipping search.sql as package data (e.g., under ducpy/search/schema/) and loading it via importlib.resources, or embedding the SQL at build time, so ensure_search_schema() works outside the repo.

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +98
@dataclass(slots=True)
class DucSearchResult:
"""Compatibility placeholder for exported result types."""


@dataclass(slots=True)
class DucElementSearchResult:
"""One result row for a single element."""

element_id: str
element_type: str
matches: list[str]
score: float

def to_dict(self) -> dict[str, Any]:
return {
"element_id": self.element_id,
"element_type": self.element_type,
"matches": self.matches,
"score": round(self.score, 6),
}


@dataclass(slots=True)
class DucFileSearchResult:
"""One grouped result row for repeated file-backed elements."""

file_id: str
element_type: str
matches: list[str]
score: float
hits: int
element_ids: list[str]

def to_dict(self) -> dict[str, Any]:
return {
"file_id": self.file_id,
"element_type": self.element_type,
"matches": self.matches,
"score": round(self.score, 6),
"hits": self.hits,
"element_ids": self.element_ids,
}


DucSearchResult = DucElementSearchResult | DucFileSearchResult


Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DucSearchResult is first declared as an (empty) dataclass and then immediately reassigned to a union type alias. This makes the placeholder class unreachable and can confuse runtime users (e.g., isinstance(x, DucSearchResult) won’t work) and type checkers. Either remove the placeholder dataclass and declare the union using TypeAlias, or keep DucSearchResult as a real base class and have the concrete result types inherit from it.

Copilot uses AI. Check for mistakes.
Comment on lines +264 to +279
def ensure_search_schema(conn: sqlite3.Connection, rebuild: bool = False) -> None:
"""Create missing FTS search tables and triggers for a DUC database.

Args:
conn: Open SQLite connection for a ``.duc`` database.
rebuild: When ``True``, reruns the FTS rebuild statements even if the
search schema already exists.
"""

existing_tables = {
row[0]
for row in conn.execute("SELECT name FROM sqlite_master WHERE type IN ('table', 'view')")
}
if rebuild or not _SEARCH_TABLES.issubset(existing_tables):
conn.executescript(_read_search_schema_sql())
conn.commit()
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensure_search_schema() only checks for the presence of the FTS tables, but the schema’s correctness also depends on the associated triggers existing (to keep the external-content FTS indexes in sync). If tables exist but triggers are missing (or were created under different names), this function will skip initialization and the index will silently stop updating. Consider also checking sqlite_master for the expected trigger names (or a schema version marker) before deciding to skip schema application.

Copilot uses AI. Check for mistakes.
Comment on lines +21 to 28
CREATE VIRTUAL TABLE IF NOT EXISTS search_elements USING fts5(
label,
description,
content='elements',
content_rowid='rowid'
content_rowid='rowid',
tokenize='unicode61 remove_diacritics 2',
prefix='2 3 4 5 6 7 8 9 10'
);
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using IF NOT EXISTS for the FTS virtual tables/triggers means that running this script against a database that already has an older version of these objects will not apply changes (e.g., tokenizer/prefix options or trigger definitions). If you expect this script to serve as an upgrade path (not just first-time creation), consider explicitly dropping/recreating the FTS tables and triggers (or provide a versioned migration path) so schema changes take effect.

Copilot uses AI. Check for mistakes.
…n SQL migrations for the builder

- use the sql_builder instead of hardcoding db connectivity on the search
@jorgedanisc jorgedanisc merged commit fedc64b into jorgedanisc Mar 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.2.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🎉 This PR is included in version 3.2.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🎉 This PR is included in version 3.2.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🎉 This PR is included in version 3.2.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🎉 This PR is included in version 3.7.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🎉 This PR is included in version 3.6.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants