diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 014931c..d4612cc 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -37,4 +37,10 @@ jobs: run: mypy sqlite_vec_client/ - name: Test with pytest - run: pytest --cov=sqlite_vec_client --cov-report=term + run: pytest --cov=sqlite_vec_client --cov-report=term-missing --cov-report=xml + + - name: Upload coverage report + uses: actions/upload-artifact@v4 + with: + name: coverage-${{ matrix.python-version }} + path: coverage.xml diff --git a/CHANGELOG.md b/CHANGELOG.md index b321546..a592dec 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,25 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [2.3.0] - 2025-02-15 + +### Added +- High-level `backup()` and `restore()` helpers wrapping JSONL/CSV workflows +- MkDocs documentation scaffold with API reference, operations playbook, and migration guide +- Backup/restore coverage in the integration test suite + +### Fixed +- Enforced embedding dimension validation across add/update/search operations +- `import_from_json()` and `import_from_csv()` now respect `skip_duplicates` and emit clear errors when embeddings are missing + +### Documentation +- New migration guide outlining v2.3.0 changes +- Expanded README with backup helper examples and coverage instructions +- Requirements updated with MkDocs to build the documentation locally + +### CI +- Pytest coverage step now generates XML output and uploads `coverage.xml` as a GitHub Actions artifact + ## [2.2.0] - 2025-02-01 ### Added diff --git a/README.md b/README.md index 9a402b0..6af9fc6 100644 --- a/README.md +++ b/README.md @@ -16,6 +16,7 @@ A lightweight Python client around [sqlite-vec](https://github.com/asg017/sqlite - **Filtering helpers**: Fetch by `rowid`, `text`, or `metadata`. - **Pagination & sorting**: List records with `limit`, `offset`, and order. - **Bulk operations**: Efficient `update_many()`, `get_all()` generator, and transaction support. +- **Backup tooling**: High-level `backup()` and `restore()` helpers for disaster recovery workflows. ## Requirements - Python 3.9+ @@ -97,6 +98,20 @@ client.import_from_json("backup.jsonl") See [examples/export_import_example.py](examples/export_import_example.py) for more examples. +### Quick backup & restore helpers + +```python +# Create a JSONL backup +client.backup("backup.jsonl") + +# Restore later (optionally skip duplicates) +client.restore("backup.jsonl", skip_duplicates=True) + +# Work with CSV +client.backup("backup.csv", format="csv", include_embeddings=True) +client.restore("backup.csv", format="csv", skip_duplicates=True) +``` + ## Metadata Filtering Efficiently filter records by metadata fields using SQLite's JSON functions: @@ -227,10 +242,11 @@ pytest -m unit # Unit tests only pytest -m integration # Integration tests only ``` -**Run with coverage report:** +**Coverage (terminal + XML for CI):** ```bash -pytest --cov=sqlite_vec_client --cov-report=html +pytest --cov=sqlite_vec_client --cov-report=term-missing --cov-report=xml ``` +The CI workflow uploads the generated `coverage.xml` as an artifact for downstream dashboards. **Run specific test file:** ```bash @@ -282,6 +298,7 @@ Edit [benchmarks/config.yaml](benchmarks/config.yaml) to customize: - [CONTRIBUTING.md](CONTRIBUTING.md) - Contribution guidelines - [CHANGELOG.md](CHANGELOG.md) - Version history - [TESTING.md](TESTING.md) - Testing documentation +- [Docs site (MkDocs)](docs/index.md) - Serve locally with `mkdocs serve` - [Examples](examples/) - Usage examples - [basic_usage.py](examples/basic_usage.py) - Basic CRUD operations - [metadata_filtering.py](examples/metadata_filtering.py) - Metadata filtering and queries diff --git a/TODO b/TODO index 0b5ee18..5cacfc7 100644 --- a/TODO +++ b/TODO @@ -35,8 +35,8 @@ ### Documentation - [x] Create CONTRIBUTING.md - [x] Start CHANGELOG.md -- [ ] API reference documentation (Sphinx or MkDocs) -- [ ] Migration guide (for version updates) +- [x] API reference documentation (Sphinx or MkDocs) +- [x] Migration guide (for version updates) ## 🟢 Medium Priority (Development & Tooling) @@ -75,7 +75,7 @@ - [x] Export/import functions (JSON, CSV) - [ ] Async/await support (aiosqlite) - [ ] Table migration utilities -- [ ] Backup/restore functions +- [x] Backup/restore functions ### API Improvements - [x] Optimized methods for bulk operations @@ -102,7 +102,7 @@ ## 📊 Metrics & Monitoring -- [ ] Code coverage tracking +- [x] Code coverage tracking - [ ] Performance metrics - [ ] Download statistics (PyPI) - [ ] Issue response time tracking diff --git a/docs/api.md b/docs/api.md new file mode 100644 index 0000000..f18b480 --- /dev/null +++ b/docs/api.md @@ -0,0 +1,71 @@ +++ docs/api.md +# API Reference + +The `sqlite_vec_client` package exposes a single high-level class and a few helpers. +This page captures the behaviour most consumers rely on. + +## sqlite_vec_client.SQLiteVecClient + +```python +from sqlite_vec_client import SQLiteVecClient +``` + +### Constructor + +`SQLiteVecClient(table: str, db_path: str | None = None, pool: ConnectionPool | None = None)` + +- Validates table name and establishes a connection (or borrows from the supplied pool). +- Loads the `sqlite-vec` extension and configures pragmas for performance. + +### create_table + +`create_table(dim: int, distance: Literal["L1", "L2", "cosine"] = "cosine") -> None` + +Creates the base table, vector index, and triggers that keep embeddings in sync. + +### add + +`add(texts: list[str], embeddings: list[list[float]], metadata: list[dict] | None = None) -> list[int]` + +- Validates that all embeddings match the configured dimension. +- Serialises metadata and embeddings and returns the new rowids. + +### similarity_search / similarity_search_with_filter + +- Both methods require embeddings that match the table dimension. +- Filtering variant accepts the same metadata constraints as `filter_by_metadata`. + +### backup / restore + +High-level helpers that wrap JSONL/CSV export/import: + +```python +client.backup("backup.jsonl") +client.restore("backup.jsonl") + +client.backup("backup.csv", format="csv", include_embeddings=True) +client.restore("backup.csv", format="csv", skip_duplicates=True) +``` + +### Transactions + +`with client.transaction(): ...` wraps operations in a BEGIN/COMMIT pair and rolls back on error. + +### Connection Management + +- `client.close()` returns the connection to the pool (if configured) or closes it outright. +- Connections emit debug logs to help trace lifecycle events. + +## Exceptions + +- `VecClientError` — base class for client-specific errors. +- `ValidationError` — invalid user input. +- `TableNotFoundError` — operations attempted before `create_table`. +- `DimensionMismatchError` — embeddings do not match the table dimension. + +## Utilities + +- `serialize_f32` / `deserialize_f32` convert embeddings to/from blobs. +- Metadata helpers build safe JSON filter clauses. + +Refer to `sqlite_vec_client/utils.py` for implementation details. diff --git a/docs/guides/migration.md b/docs/guides/migration.md new file mode 100644 index 0000000..9a4593a --- /dev/null +++ b/docs/guides/migration.md @@ -0,0 +1,36 @@ +++ docs/guides/migration.md +# Migration Guide + +## Upgrading to v2.3.0 + +### Embedding Dimension Validation + +- All write and search operations now validate embedding length against the table + dimension. Existing databases created with `create_table` are supported automatically, + but manual schemas must follow the `float[dim]` declaration used by `sqlite-vec`. +- Action: ensure any custom tooling or fixtures produce embeddings with the expected + dimension before calling client methods. + +### Import Behaviour + +- `import_from_json` and `import_from_csv` honour `skip_duplicates`, skipping records + whose rowids already exist. +- Importers now require embeddings to be present; CSV sources exported without the + `embedding` column raise a descriptive error. +- Action: export backups with `include_embeddings=True` if you intend to re-import them. + +### Backup & Restore Helpers + +- New `backup()` and `restore()` helpers wrap JSONL/CSV workflows and log the format + being used. Prefer these helpers for consistent backup scripts. + +### Continuous Coverage + +- The CI pipeline now uploads `coverage.xml` as an artifact. Configure downstream + tooling (Codecov, Sonar, etc.) to consume the artifact if you need external reporting. + +## General Advice + +- Always run `pytest --cov=sqlite_vec_client --cov-report=xml` before publishing. +- Keep `requirements-dev.txt` up-to-date locally to build the documentation site with + `mkdocs serve`. diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..fd3deba --- /dev/null +++ b/docs/index.md @@ -0,0 +1,27 @@ +++ docs/index.md +# sqlite-vec-client Documentation + +Welcome to the project documentation. This site complements the information in `README.md` +and focuses on how to operate the client in real-world scenarios. + +## Highlights + +- Lightweight CRUD and similarity search API powered by `sqlite-vec` +- Typed results for safer integrations +- Bulk operations, metadata filters, and transaction helpers +- New backup/restore helpers to streamline disaster recovery + +## Quick Links + +- [API Reference](api.md) — method-by-method contract details +- [Migration Guide](guides/migration.md) — upgrade notes for the latest releases +- [Operational Playbook](operations.md) — checklists for testing, backups, and restore + +## Building the Docs + +```bash +pip install -r requirements-dev.txt +mkdocs serve +``` + +The site is served at `http://127.0.0.1:8000` by default. diff --git a/docs/operations.md b/docs/operations.md new file mode 100644 index 0000000..6c63b48 --- /dev/null +++ b/docs/operations.md @@ -0,0 +1,37 @@ +++ docs/operations.md +# Operational Playbook + +## Testing + +- Run `pytest --cov=sqlite_vec_client --cov-report=term-missing --cov-report=xml`. +- Upload the generated `coverage.xml` as part of your CI artifacts (handled automatically + in the GitHub Actions workflow). +- For environments without the native `sqlite-vec` extension, rely on the mocked tests + planned in the roadmap or disable integration markers temporarily. + +## Backups + +```python +client.backup("backup.jsonl") +client.backup("backup.csv", format="csv", include_embeddings=True) +``` + +- JSONL is recommended for long-term storage (embeddings stay in human-readable lists). +- CSV is convenient for spreadsheets but still requires embeddings for restore. + +## Restore & Disaster Recovery + +```python +client.restore("backup.jsonl") +client.restore("backup.csv", format="csv", skip_duplicates=True) +``` + +- Use `skip_duplicates=True` when replaying backups into a database that may contain + partial data (e.g., after a failed migration). + +## Observability + +- Set `SQLITE_VEC_CLIENT_LOG_LEVEL=DEBUG` in the environment to trace connection + lifecycle and queries during incident response. +- Logs include connection open/close events and count of rows processed during imports + and exports. diff --git a/mkdocs.yml b/mkdocs.yml new file mode 100644 index 0000000..b6edfee --- /dev/null +++ b/mkdocs.yml @@ -0,0 +1,14 @@ +site_name: sqlite-vec-client +site_url: https://atasoglu.github.io/sqlite-vec-client/ +repo_url: https://github.com/atasoglu/sqlite-vec-client +theme: + name: mkdocs +nav: + - Overview: index.md + - API Reference: + - SQLiteVecClient: api.md + - Guides: + - Migration Guide: guides/migration.md + - Operations: operations.md +markdown_extensions: + - admonition diff --git a/pyproject.toml b/pyproject.toml index 5ee2b05..d5f4852 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "sqlite-vec-client" -version = "2.2.0" +version = "2.3.0" description = "A lightweight Python client around sqlite-vec for CRUD and similarity search." readme = "README.md" requires-python = ">=3.9" diff --git a/requirements-dev.txt b/requirements-dev.txt index c055e32..aa1d560 100644 --- a/requirements-dev.txt +++ b/requirements-dev.txt @@ -7,3 +7,4 @@ tabulate>=0.9.0 pyyaml>=6.0 types-PyYAML>=6.0 types-tabulate>=0.9 +mkdocs>=1.5.0 diff --git a/sqlite_vec_client/base.py b/sqlite_vec_client/base.py index c685d60..2bc5f51 100644 --- a/sqlite_vec_client/base.py +++ b/sqlite_vec_client/base.py @@ -8,6 +8,7 @@ from __future__ import annotations import json +import re import sqlite3 from collections.abc import Generator from contextlib import contextmanager @@ -24,6 +25,7 @@ from .utils import build_metadata_where_clause, deserialize_f32, serialize_f32 from .validation import ( validate_dimension, + validate_embedding_dimension, validate_embeddings_match, validate_limit, validate_metadata_filters, @@ -114,6 +116,7 @@ def __init__( self.table = table self._in_transaction = False self._pool = pool + self._dimension: int | None = None logger.debug(f"Initializing SQLiteVecClient for table: {table}") if pool: @@ -213,6 +216,51 @@ def create_table( ) self.connection.commit() logger.debug(f"Table '{self.table}' and triggers created successfully") + self._dimension = dim + + def _ensure_dimension(self) -> int: + """Return embedding dimension for the vec table, reading schema if needed.""" + if self._dimension is not None: + return self._dimension + + logger.debug("Resolving embedding dimension from vector table schema") + cursor = self.connection.cursor() + try: + cursor.execute( + "SELECT sql FROM sqlite_master WHERE type = 'table' AND name = ?", + [f"{self.table}_vec"], + ) + except sqlite3.OperationalError as e: + logger.error(f"Failed to inspect schema for table '{self.table}_vec': {e}") + raise TableNotFoundError( + f"Table '{self.table}_vec' does not exist. Call create_table() first." + ) from e + row = cursor.fetchone() + if row is None: + logger.error(f"Vector table '{self.table}_vec' not found in sqlite_master") + raise TableNotFoundError( + f"Table '{self.table}_vec' does not exist. Call create_table() first." + ) + sql_definition = row["sql"] if hasattr(row, "keys") else row[0] + if sql_definition is None: + raise ValueError( + "Embedding dimension could not be determined for " + f"table '{self.table}_vec'" + ) + + match = re.search(r"float\[(\d+)\]", sql_definition) + if not match: + raise ValueError( + "Embedding dimension could not be determined for " + f"table '{self.table}_vec'" + ) + + self._dimension = int(match.group(1)) + logger.debug( + "Resolved embedding dimension for table " + f"'{self.table}_vec': {self._dimension}" + ) + return self._dimension def similarity_search( self, @@ -233,6 +281,8 @@ def similarity_search( TableNotFoundError: If table doesn't exist """ validate_top_k(top_k) + expected_dim = self._ensure_dimension() + validate_embedding_dimension(embedding, expected_dim) logger.debug(f"Performing similarity search with top_k={top_k}") try: cursor = self.connection.cursor() @@ -291,6 +341,9 @@ def add( TableNotFoundError: If table doesn't exist """ validate_embeddings_match(texts, embeddings, metadata) + expected_dim = self._ensure_dimension() + for embedding in embeddings: + validate_embedding_dimension(embedding, expected_dim) logger.debug(f"Adding {len(texts)} records to table '{self.table}'") try: if metadata is None: @@ -410,6 +463,8 @@ def update( sets.append("metadata = ?") params.append(json.dumps(metadata)) if embedding is not None: + expected_dim = self._ensure_dimension() + validate_embedding_dimension(embedding, expected_dim) sets.append("text_embedding = ?") params.append(serialize_f32(embedding)) @@ -457,6 +512,8 @@ def update_many( sets.append("metadata = ?") params.append(json.dumps(metadata)) if embedding is not None: + expected_dim = self._ensure_dimension() + validate_embedding_dimension(embedding, expected_dim) sets.append("text_embedding = ?") params.append(serialize_f32(embedding)) @@ -597,6 +654,8 @@ def similarity_search_with_filter( """ validate_top_k(top_k) validate_metadata_filters(filters) + expected_dim = self._ensure_dimension() + validate_embedding_dimension(embedding, expected_dim) logger.debug(f"Similarity search with filters: {filters}, top_k={top_k}") where_clause, params = build_metadata_where_clause(filters) @@ -710,6 +769,78 @@ def import_from_csv( """ return io_module.import_from_csv(self, filepath, skip_duplicates, batch_size) + def backup( + self, + filepath: str, + *, + format: Literal["jsonl", "csv"] = "jsonl", + include_embeddings: bool = True, + filters: dict[str, Any] | None = None, + batch_size: int = 1000, + ) -> int: + """Create a backup of the table in JSONL or CSV format. + + Args: + filepath: Destination path for the backup file + format: Output format (`jsonl` or `csv`) + include_embeddings: Whether to include embeddings in the output + filters: Optional metadata filters to export a subset + batch_size: Number of records to process at once + + Returns: + Number of records written to the backup + """ + logger.info( + f"Backing up table '{self.table}' to {filepath} using format={format}" + ) + if format == "jsonl": + return self.export_to_json( + filepath, + include_embeddings=include_embeddings, + filters=filters, + batch_size=batch_size, + ) + if format == "csv": + return self.export_to_csv( + filepath, + include_embeddings=include_embeddings, + filters=filters, + batch_size=batch_size, + ) + raise ValueError("format must be 'jsonl' or 'csv'") + + def restore( + self, + filepath: str, + *, + format: Literal["jsonl", "csv"] = "jsonl", + skip_duplicates: bool = False, + batch_size: int = 1000, + ) -> int: + """Restore records from a backup file. + + Args: + filepath: Path to the backup file + format: Input format (`jsonl` or `csv`) + skip_duplicates: Skip records whose rowids already exist + batch_size: Number of records to process at once + + Returns: + Number of records imported + """ + logger.info( + f"Restoring table '{self.table}' from {filepath} using format={format}" + ) + if format == "jsonl": + return self.import_from_json( + filepath, skip_duplicates=skip_duplicates, batch_size=batch_size + ) + if format == "csv": + return self.import_from_csv( + filepath, skip_duplicates=skip_duplicates, batch_size=batch_size + ) + raise ValueError("format must be 'jsonl' or 'csv'") + @contextmanager def transaction(self) -> Generator[None, None, None]: """Context manager for atomic transactions. diff --git a/sqlite_vec_client/io.py b/sqlite_vec_client/io.py index df81373..48b0d5f 100644 --- a/sqlite_vec_client/io.py +++ b/sqlite_vec_client/io.py @@ -110,9 +110,21 @@ def import_from_json( continue record = json.loads(line) + embedding = record.get("embedding") + if embedding is None: + raise ValueError( + "JSON record is missing 'embedding'. Export with " + "include_embeddings=True to support import." + ) + + record_rowid = record.get("rowid") + if skip_duplicates and isinstance(record_rowid, int): + if client.get(record_rowid) is not None: + continue + texts.append(record["text"]) metadata_list.append(record.get("metadata", {})) - embeddings.append(record["embedding"]) + embeddings.append(embedding) if len(texts) >= batch_size: client.add(texts=texts, embeddings=embeddings, metadata=metadata_list) @@ -225,7 +237,28 @@ def import_from_csv( with path.open("r", encoding="utf-8", newline="") as f: reader = csv.DictReader(f) + if reader.fieldnames is None: + raise ValueError("CSV file must include headers") + if "embedding" not in reader.fieldnames: + raise ValueError( + "CSV file is missing 'embedding' column. Export with " + "include_embeddings=True to support import." + ) + if "text" not in reader.fieldnames or "metadata" not in reader.fieldnames: + raise ValueError("CSV file must include 'text' and 'metadata' columns") for row in reader: + record_rowid = row.get("rowid") + parsed_rowid = int(record_rowid) if record_rowid else None + if skip_duplicates and parsed_rowid is not None: + if client.get(parsed_rowid) is not None: + continue + + if row.get("embedding") is None or row["embedding"].strip() == "": + raise ValueError( + "CSV record is missing embedding data. Export with " + "include_embeddings=True." + ) + texts.append(row["text"]) metadata_list.append(json.loads(row["metadata"])) embeddings.append(json.loads(row["embedding"])) diff --git a/tests/test_client.py b/tests/test_client.py index 0537bb1..4ebd8bc 100644 --- a/tests/test_client.py +++ b/tests/test_client.py @@ -3,6 +3,7 @@ import pytest from sqlite_vec_client import ( + DimensionMismatchError, SQLiteVecClient, TableNameError, TableNotFoundError, @@ -84,6 +85,14 @@ def test_add_mismatched_lengths(self, client_with_table): with pytest.raises(ValidationError): client_with_table.add(texts=["a", "b"], embeddings=[[1.0, 2.0, 3.0]]) + def test_add_invalid_embedding_dimension( + self, client_with_table, sample_texts, sample_embeddings + ): + """Test that embeddings with wrong dimension raise error.""" + invalid_embeddings = [[0.1, 0.2]] * len(sample_texts) + with pytest.raises(DimensionMismatchError): + client_with_table.add(texts=sample_texts, embeddings=invalid_embeddings) + @pytest.mark.integration class TestSimilaritySearch: @@ -110,6 +119,14 @@ def test_similarity_search_invalid_top_k(self, client_with_table): with pytest.raises(ValidationError): client_with_table.similarity_search(embedding=[0.1, 0.2, 0.3], top_k=0) + def test_similarity_search_invalid_dimension( + self, client_with_table, sample_texts, sample_embeddings + ): + """Test that query embedding with wrong dimension raises error.""" + client_with_table.add(texts=sample_texts, embeddings=sample_embeddings) + with pytest.raises(DimensionMismatchError): + client_with_table.similarity_search(embedding=[0.1, 0.2], top_k=1) + @pytest.mark.integration class TestGetRecords: diff --git a/tests/test_io.py b/tests/test_io.py index 32bfd88..063c6f7 100644 --- a/tests/test_io.py +++ b/tests/test_io.py @@ -67,6 +67,63 @@ def test_export_without_embeddings(self, tmp_path, sample_texts, sample_embeddin client.close() + def test_import_skip_duplicates(self, tmp_path, sample_texts, sample_embeddings): + """Test that duplicate records are skipped when requested.""" + db_path = str(tmp_path / "test.db") + client = SQLiteVecClient(table="test", db_path=db_path) + client.create_table(dim=3) + client.add(texts=sample_texts, embeddings=sample_embeddings) + + export_path = str(tmp_path / "export.jsonl") + client.export_to_json(export_path) + + imported = client.import_from_json(export_path, skip_duplicates=True) + assert imported == 0 + assert client.count() == 3 + + client.close() + + def test_import_missing_embedding_raises( + self, tmp_path, sample_texts, sample_embeddings + ): + """Test that importing without embedding data raises error.""" + export_path = tmp_path / "invalid.jsonl" + with export_path.open("w", encoding="utf-8") as f: + json.dump({"rowid": 1, "text": "hello", "metadata": {"a": 1}}, f) + f.write("\n") + + db_path = str(tmp_path / "test.db") + client = SQLiteVecClient(table="test", db_path=db_path) + client.create_table(dim=3) + + with pytest.raises(ValueError, match="missing 'embedding'"): + client.import_from_json(str(export_path)) + + client.close() + + def test_backup_and_restore_helpers( + self, tmp_path, sample_texts, sample_embeddings + ): + """Test high-level backup and restore helpers.""" + db_path = str(tmp_path / "test.db") + client = SQLiteVecClient(table="test", db_path=db_path) + client.create_table(dim=3) + client.add(texts=sample_texts, embeddings=sample_embeddings) + + backup_path = str(tmp_path / "backup.jsonl") + count = client.backup(backup_path) + assert count == 3 + + for rowid in range(1, 4): + client.delete(rowid) + assert client.count() == 0 + + restored = client.restore(backup_path) + assert restored == 3 + assert client.count() == 3 + + client.close() + def test_export_with_filters(self, tmp_path, sample_texts, sample_embeddings): """Test export with metadata filters.""" db_path = str(tmp_path / "test.db") @@ -156,6 +213,43 @@ def test_export_without_embeddings(self, tmp_path, sample_texts, sample_embeddin client.close() + def test_import_without_embedding_column_raises( + self, tmp_path, sample_texts, sample_embeddings + ): + """Test importing CSV without embeddings raises error.""" + db_path = str(tmp_path / "test.db") + client = SQLiteVecClient(table="test", db_path=db_path) + client.create_table(dim=3) + client.add(texts=sample_texts, embeddings=sample_embeddings) + + export_path = str(tmp_path / "export.csv") + client.export_to_csv(export_path, include_embeddings=False) + + with pytest.raises(ValueError, match="missing 'embedding'"): + client.import_from_csv(export_path) + + client.close() + + def test_backup_and_restore_csv(self, tmp_path, sample_texts, sample_embeddings): + """Test backup and restore helpers with CSV format.""" + db_path = str(tmp_path / "test.db") + client = SQLiteVecClient(table="test", db_path=db_path) + client.create_table(dim=3) + client.add(texts=sample_texts, embeddings=sample_embeddings) + + backup_path = str(tmp_path / "backup.csv") + count = client.backup(backup_path, format="csv", include_embeddings=True) + assert count == 3 + + for rowid in range(1, 4): + client.delete(rowid) + + restored = client.restore(backup_path, format="csv") + assert restored == 3 + assert client.count() == 3 + + client.close() + def test_export_with_filters(self, tmp_path, sample_texts, sample_embeddings): """Test CSV export with metadata filters.""" db_path = str(tmp_path / "test.db")