Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,10 @@ jobs:
run: mypy sqlite_vec_client/

- name: Test with pytest
run: pytest --cov=sqlite_vec_client --cov-report=term
run: pytest --cov=sqlite_vec_client --cov-report=term-missing --cov-report=xml

- name: Upload coverage report
uses: actions/upload-artifact@v4
with:
name: coverage-${{ matrix.python-version }}
path: coverage.xml
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,25 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [2.3.0] - 2025-02-15

### Added
- High-level `backup()` and `restore()` helpers wrapping JSONL/CSV workflows
- MkDocs documentation scaffold with API reference, operations playbook, and migration guide
- Backup/restore coverage in the integration test suite

### Fixed
- Enforced embedding dimension validation across add/update/search operations
- `import_from_json()` and `import_from_csv()` now respect `skip_duplicates` and emit clear errors when embeddings are missing

### Documentation
- New migration guide outlining v2.3.0 changes
- Expanded README with backup helper examples and coverage instructions
- Requirements updated with MkDocs to build the documentation locally

### CI
- Pytest coverage step now generates XML output and uploads `coverage.xml` as a GitHub Actions artifact

## [2.2.0] - 2025-02-01

### Added
Expand Down
21 changes: 19 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ A lightweight Python client around [sqlite-vec](https://github.com/asg017/sqlite
- **Filtering helpers**: Fetch by `rowid`, `text`, or `metadata`.
- **Pagination & sorting**: List records with `limit`, `offset`, and order.
- **Bulk operations**: Efficient `update_many()`, `get_all()` generator, and transaction support.
- **Backup tooling**: High-level `backup()` and `restore()` helpers for disaster recovery workflows.

## Requirements
- Python 3.9+
Expand Down Expand Up @@ -97,6 +98,20 @@ client.import_from_json("backup.jsonl")

See [examples/export_import_example.py](examples/export_import_example.py) for more examples.

### Quick backup & restore helpers

```python
# Create a JSONL backup
client.backup("backup.jsonl")

# Restore later (optionally skip duplicates)
client.restore("backup.jsonl", skip_duplicates=True)

# Work with CSV
client.backup("backup.csv", format="csv", include_embeddings=True)
client.restore("backup.csv", format="csv", skip_duplicates=True)
```

## Metadata Filtering

Efficiently filter records by metadata fields using SQLite's JSON functions:
Expand Down Expand Up @@ -227,10 +242,11 @@ pytest -m unit # Unit tests only
pytest -m integration # Integration tests only
```

**Run with coverage report:**
**Coverage (terminal + XML for CI):**
```bash
pytest --cov=sqlite_vec_client --cov-report=html
pytest --cov=sqlite_vec_client --cov-report=term-missing --cov-report=xml
```
The CI workflow uploads the generated `coverage.xml` as an artifact for downstream dashboards.

**Run specific test file:**
```bash
Expand Down Expand Up @@ -282,6 +298,7 @@ Edit [benchmarks/config.yaml](benchmarks/config.yaml) to customize:
- [CONTRIBUTING.md](CONTRIBUTING.md) - Contribution guidelines
- [CHANGELOG.md](CHANGELOG.md) - Version history
- [TESTING.md](TESTING.md) - Testing documentation
- [Docs site (MkDocs)](docs/index.md) - Serve locally with `mkdocs serve`
- [Examples](examples/) - Usage examples
- [basic_usage.py](examples/basic_usage.py) - Basic CRUD operations
- [metadata_filtering.py](examples/metadata_filtering.py) - Metadata filtering and queries
Expand Down
8 changes: 4 additions & 4 deletions TODO
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@
### Documentation
- [x] Create CONTRIBUTING.md
- [x] Start CHANGELOG.md
- [ ] API reference documentation (Sphinx or MkDocs)
- [ ] Migration guide (for version updates)
- [x] API reference documentation (Sphinx or MkDocs)
- [x] Migration guide (for version updates)

## 🟢 Medium Priority (Development & Tooling)

Expand Down Expand Up @@ -75,7 +75,7 @@
- [x] Export/import functions (JSON, CSV)
- [ ] Async/await support (aiosqlite)
- [ ] Table migration utilities
- [ ] Backup/restore functions
- [x] Backup/restore functions

### API Improvements
- [x] Optimized methods for bulk operations
Expand All @@ -102,7 +102,7 @@

## 📊 Metrics & Monitoring

- [ ] Code coverage tracking
- [x] Code coverage tracking
- [ ] Performance metrics
- [ ] Download statistics (PyPI)
- [ ] Issue response time tracking
Expand Down
71 changes: 71 additions & 0 deletions docs/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
++ docs/api.md
# API Reference

The `sqlite_vec_client` package exposes a single high-level class and a few helpers.
This page captures the behaviour most consumers rely on.

## sqlite_vec_client.SQLiteVecClient

```python
from sqlite_vec_client import SQLiteVecClient
```

### Constructor

`SQLiteVecClient(table: str, db_path: str | None = None, pool: ConnectionPool | None = None)`

- Validates table name and establishes a connection (or borrows from the supplied pool).
- Loads the `sqlite-vec` extension and configures pragmas for performance.

### create_table

`create_table(dim: int, distance: Literal["L1", "L2", "cosine"] = "cosine") -> None`

Creates the base table, vector index, and triggers that keep embeddings in sync.

### add

`add(texts: list[str], embeddings: list[list[float]], metadata: list[dict] | None = None) -> list[int]`

- Validates that all embeddings match the configured dimension.
- Serialises metadata and embeddings and returns the new rowids.

### similarity_search / similarity_search_with_filter

- Both methods require embeddings that match the table dimension.
- Filtering variant accepts the same metadata constraints as `filter_by_metadata`.

### backup / restore

High-level helpers that wrap JSONL/CSV export/import:

```python
client.backup("backup.jsonl")
client.restore("backup.jsonl")

client.backup("backup.csv", format="csv", include_embeddings=True)
client.restore("backup.csv", format="csv", skip_duplicates=True)
```

### Transactions

`with client.transaction(): ...` wraps operations in a BEGIN/COMMIT pair and rolls back on error.

### Connection Management

- `client.close()` returns the connection to the pool (if configured) or closes it outright.
- Connections emit debug logs to help trace lifecycle events.

## Exceptions

- `VecClientError` — base class for client-specific errors.
- `ValidationError` — invalid user input.
- `TableNotFoundError` — operations attempted before `create_table`.
- `DimensionMismatchError` — embeddings do not match the table dimension.

## Utilities

- `serialize_f32` / `deserialize_f32` convert embeddings to/from blobs.
- Metadata helpers build safe JSON filter clauses.

Refer to `sqlite_vec_client/utils.py` for implementation details.
36 changes: 36 additions & 0 deletions docs/guides/migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
++ docs/guides/migration.md
# Migration Guide

## Upgrading to v2.3.0

### Embedding Dimension Validation

- All write and search operations now validate embedding length against the table
dimension. Existing databases created with `create_table` are supported automatically,
but manual schemas must follow the `float[dim]` declaration used by `sqlite-vec`.
- Action: ensure any custom tooling or fixtures produce embeddings with the expected
dimension before calling client methods.

### Import Behaviour

- `import_from_json` and `import_from_csv` honour `skip_duplicates`, skipping records
whose rowids already exist.
- Importers now require embeddings to be present; CSV sources exported without the
`embedding` column raise a descriptive error.
- Action: export backups with `include_embeddings=True` if you intend to re-import them.

### Backup & Restore Helpers

- New `backup()` and `restore()` helpers wrap JSONL/CSV workflows and log the format
being used. Prefer these helpers for consistent backup scripts.

### Continuous Coverage

- The CI pipeline now uploads `coverage.xml` as an artifact. Configure downstream
tooling (Codecov, Sonar, etc.) to consume the artifact if you need external reporting.

## General Advice

- Always run `pytest --cov=sqlite_vec_client --cov-report=xml` before publishing.
- Keep `requirements-dev.txt` up-to-date locally to build the documentation site with
`mkdocs serve`.
27 changes: 27 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
++ docs/index.md
# sqlite-vec-client Documentation

Welcome to the project documentation. This site complements the information in `README.md`
and focuses on how to operate the client in real-world scenarios.

## Highlights

- Lightweight CRUD and similarity search API powered by `sqlite-vec`
- Typed results for safer integrations
- Bulk operations, metadata filters, and transaction helpers
- New backup/restore helpers to streamline disaster recovery

## Quick Links

- [API Reference](api.md) — method-by-method contract details
- [Migration Guide](guides/migration.md) — upgrade notes for the latest releases
- [Operational Playbook](operations.md) — checklists for testing, backups, and restore

## Building the Docs

```bash
pip install -r requirements-dev.txt
mkdocs serve
```

The site is served at `http://127.0.0.1:8000` by default.
37 changes: 37 additions & 0 deletions docs/operations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
++ docs/operations.md
# Operational Playbook

## Testing

- Run `pytest --cov=sqlite_vec_client --cov-report=term-missing --cov-report=xml`.
- Upload the generated `coverage.xml` as part of your CI artifacts (handled automatically
in the GitHub Actions workflow).
- For environments without the native `sqlite-vec` extension, rely on the mocked tests
planned in the roadmap or disable integration markers temporarily.

## Backups

```python
client.backup("backup.jsonl")
client.backup("backup.csv", format="csv", include_embeddings=True)
```

- JSONL is recommended for long-term storage (embeddings stay in human-readable lists).
- CSV is convenient for spreadsheets but still requires embeddings for restore.

## Restore & Disaster Recovery

```python
client.restore("backup.jsonl")
client.restore("backup.csv", format="csv", skip_duplicates=True)
```

- Use `skip_duplicates=True` when replaying backups into a database that may contain
partial data (e.g., after a failed migration).

## Observability

- Set `SQLITE_VEC_CLIENT_LOG_LEVEL=DEBUG` in the environment to trace connection
lifecycle and queries during incident response.
- Logs include connection open/close events and count of rows processed during imports
and exports.
14 changes: 14 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
site_name: sqlite-vec-client
site_url: https://atasoglu.github.io/sqlite-vec-client/
repo_url: https://github.com/atasoglu/sqlite-vec-client
theme:
name: mkdocs
nav:
- Overview: index.md
- API Reference:
- SQLiteVecClient: api.md
- Guides:
- Migration Guide: guides/migration.md
- Operations: operations.md
markdown_extensions:
- admonition
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "sqlite-vec-client"
version = "2.2.0"
version = "2.3.0"
description = "A lightweight Python client around sqlite-vec for CRUD and similarity search."
readme = "README.md"
requires-python = ">=3.9"
Expand Down
1 change: 1 addition & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ tabulate>=0.9.0
pyyaml>=6.0
types-PyYAML>=6.0
types-tabulate>=0.9
mkdocs>=1.5.0
Loading