From 1164e236681b91c0c57c40dd2b8b44cc34b0afbe Mon Sep 17 00:00:00 2001 From: Ethan Arrowood Date: Wed, 18 Mar 2026 16:54:17 -0600 Subject: [PATCH 1/5] docs: migrate Database section to v4 consolidated reference Co-Authored-By: Claude Sonnet 4.6 --- .../database-link-placeholders.md | 117 +++++ .../version-v4/database/compaction.md | 71 +++ .../version-v4/database/data-loader.md | 153 ++++++ .../version-v4/database/jobs.md | 272 +++++++++++ .../version-v4/database/overview.md | 123 +++++ .../version-v4/database/schema.md | 450 ++++++++++++++++++ .../version-v4/database/storage-algorithm.md | 71 +++ .../version-v4/database/system-tables.md | 154 ++++++ .../version-v4/database/transaction.md | 231 +++++++++ .../version-v4-sidebars.json | 48 ++ 10 files changed, 1690 insertions(+) create mode 100644 migration-context/link-placeholders/database-link-placeholders.md create mode 100644 reference_versioned_docs/version-v4/database/compaction.md create mode 100644 reference_versioned_docs/version-v4/database/data-loader.md create mode 100644 reference_versioned_docs/version-v4/database/jobs.md create mode 100644 reference_versioned_docs/version-v4/database/overview.md create mode 100644 reference_versioned_docs/version-v4/database/schema.md create mode 100644 reference_versioned_docs/version-v4/database/storage-algorithm.md create mode 100644 reference_versioned_docs/version-v4/database/system-tables.md create mode 100644 reference_versioned_docs/version-v4/database/transaction.md diff --git a/migration-context/link-placeholders/database-link-placeholders.md b/migration-context/link-placeholders/database-link-placeholders.md new file mode 100644 index 00000000..e3680421 --- /dev/null +++ b/migration-context/link-placeholders/database-link-placeholders.md @@ -0,0 +1,117 @@ +# Link Placeholders for Database Section + +## reference_versioned_docs/version-v4/database/overview.md + +- Line ~37: `[Resource API](TODO:reference_versioned_docs/version-v4/resources/resource-api.md)` + - Context: Mentioning custom resources as extension of the database system + - Target should be: Resource API reference page + +- Line ~55: `[REST](TODO:reference_versioned_docs/version-v4/rest/overview.md)` + - Context: Related documentation footer + - Target should be: REST overview + +- Line ~56: `[Resources](TODO:reference_versioned_docs/version-v4/resources/overview.md)` + - Context: Related documentation footer + - Target should be: Resources overview + +- Line ~57: `[Operations API](TODO:reference_versioned_docs/version-v4/operations-api/overview.md)` + - Context: Related documentation footer + - Target should be: Operations API overview + +- Line ~58: `[Configuration](TODO:reference_versioned_docs/version-v4/configuration/overview.md)` + - Context: Related documentation footer + - Target should be: Configuration overview + +## reference_versioned_docs/version-v4/database/schema.md + +- Line ~164: `[REST Querying](TODO:reference_versioned_docs/version-v4/rest/querying.md)` + - Context: How to query tables via HTTP using schema-defined relationships + - Target should be: REST querying reference + +- Line ~165: `[Resources](TODO:reference_versioned_docs/version-v4/resources/resource-api.md)` + - Context: Extending table behavior with custom resource logic + - Target should be: Resource API reference + +- Line ~167: `[Configuration](TODO:reference_versioned_docs/version-v4/configuration/options.md)` + - Context: graphqlSchema component and storage configuration + - Target should be: Configuration options page + +- Line ~141 (Dynamic Schema section): `[Operations API](TODO:reference_versioned_docs/version-v4/operations-api/operations.md)` + - Context: NoSQL create_attribute/drop_attribute operations + - Target should be: Operations list page + +## reference_versioned_docs/version-v4/database/data-loader.md + +- Line ~13: `[Extension](TODO:reference_versioned_docs/version-v4/components/extension-api.md)` + - Context: dataLoader is an Extension component + - Target should be: Extension API reference + +- Line ~73: `[Components](TODO:reference_versioned_docs/version-v4/components/overview.md)` + - Context: Related documentation footer + - Target should be: Components overview + +## reference_versioned_docs/version-v4/database/storage-algorithm.md + +- Line ~45: `[Configuration](TODO:reference_versioned_docs/version-v4/configuration/options.md)` + - Context: Storage configuration options (compression settings) + - Target should be: Configuration options page (storage section) + +## reference_versioned_docs/version-v4/database/jobs.md + +- Line ~128: `[Operations API](TODO:reference_versioned_docs/version-v4/operations-api/overview.md)` + - Context: Related documentation footer + - Target should be: Operations API overview + +## reference_versioned_docs/version-v4/database/system-tables.md + +- Line ~82: `[Analytics](TODO:reference_versioned_docs/version-v4/analytics/overview.md)` + - Context: Full analytics metrics reference in related docs footer + - Target should be: Analytics overview + +- Line ~95: `[Replication](TODO:reference_versioned_docs/version-v4/replication/clustering.md)` + - Context: hdb_nodes used by clustering operations + - Target should be: Clustering reference + +- Line ~104: `[Analytics](TODO:reference_versioned_docs/version-v4/analytics/overview.md)` (second reference) + - Context: Related documentation footer + - Target should be: Analytics overview + +- Line ~105: `[Replication](TODO:reference_versioned_docs/version-v4/replication/overview.md)` + - Context: Related documentation footer + - Target should be: Replication overview + +- Line ~106: `[Operations API](TODO:reference_versioned_docs/version-v4/operations-api/overview.md)` + - Context: Querying system tables + - Target should be: Operations API overview + +## reference_versioned_docs/version-v4/database/compaction.md + +- Line ~38: `[CLI Commands](TODO:reference_versioned_docs/version-v4/cli/commands.md)` + - Context: copy-db CLI command + - Target should be: CLI commands reference + +- Line ~56: `[Configuration](TODO:reference_versioned_docs/version-v4/configuration/options.md)` + - Context: Storage configuration options + - Target should be: Configuration options page (storage section) + +## reference_versioned_docs/version-v4/database/transaction.md + +- Line ~73: `[Replication](TODO:reference_versioned_docs/version-v4/replication/overview.md)` + - Context: Clustering must be set up for transaction logs + - Target should be: Replication overview + +- Line ~148: `[Logging](TODO:reference_versioned_docs/version-v4/logging/overview.md)` + - Context: Distinction between app logging and transaction/audit logging + - Target should be: Logging overview + +- Line ~149: `[Replication](TODO:reference_versioned_docs/version-v4/replication/overview.md)` (second reference) + - Context: Related documentation footer + - Target should be: Replication overview + +- Line ~150: `[Configuration](TODO:reference_versioned_docs/version-v4/configuration/options.md)` + - Context: logging.auditLog global configuration + - Target should be: Configuration options page + +- Line ~151: `[Operations API](TODO:reference_versioned_docs/version-v4/operations-api/overview.md)` + - Context: Related documentation footer + - Target should be: Operations API overview diff --git a/reference_versioned_docs/version-v4/database/compaction.md b/reference_versioned_docs/version-v4/database/compaction.md new file mode 100644 index 00000000..c8021101 --- /dev/null +++ b/reference_versioned_docs/version-v4/database/compaction.md @@ -0,0 +1,71 @@ +--- +title: Compaction +--- + + + + +# Compaction + +Added in: v4.3.0 + +Database files grow over time as records are inserted, updated, and deleted. Deleted records and updated values leave behind free space (fragmentation) in the database file, which can increase file size and potentially affect performance. Compaction eliminates this free space, creating a smaller, contiguous database file. + +> **Note:** Compaction does not compress your data. It removes internal fragmentation to make the file smaller. To enable compression on a database, use compaction to copy the database with updated storage configuration applied. + +Compaction is also the mechanism to apply storage configuration changes (such as enabling compression) to existing databases, since some storage settings cannot be changed in-place. + +## Copy Compaction + +Creates a compacted copy of a database file. The original database is left unchanged. + +> **Recommendation:** Stop Harper before performing copy compaction to prevent any record loss during the copy operation. + +Run using the [CLI](TODO:reference_versioned_docs/version-v4/cli/commands.md): + +```bash +harperdb copy-db +``` + +The `source-database` is the database name (not a file path). The target is the full file path where the compacted copy will be written. + +To replace the original database with the compacted copy, move or rename the output file to the original database path after Harper is stopped. + +**Example — compact the default `data` database:** + +```bash +harperdb copy-db data /home/user/hdb/database/copy.mdb +``` + +## Compact on Start + +Automatically compacts all non-system databases when Harper starts. Harper will not start until compaction is complete. Under the hood, it loops through all user databases, creates a backup of each, compacts it, replaces the original with the compacted copy, and removes the backup. + +Configure in `harperdb-config.yaml`: + +```yaml +storage: + compactOnStart: true + compactOnStartKeepBackup: false +``` + +Using CLI environment variables: + +```bash +STORAGE_COMPACTONSTART=true STORAGE_COMPACTONSTARTKEEPBACKUP=true harperdb +``` + +### Options + +| Option | Type | Default | Description | +|---|---|---|---| +| `compactOnStart` | Boolean | `false` | Compact all databases at startup. Automatically reset to `false` after running. | +| `compactOnStartKeepBackup` | Boolean | `false` | Retain the backup copy created during compact on start | + +> **Note:** `compactOnStart` is automatically set back to `false` after it runs, so compaction only happens on the next start if you explicitly re-enable it. + +## Related Documentation + +- [Storage Algorithm](./storage-algorithm.md) — How Harper stores data using LMDB +- [CLI Commands](TODO:reference_versioned_docs/version-v4/cli/commands.md) — `copy-db` CLI command reference +- [Configuration](TODO:reference_versioned_docs/version-v4/configuration/options.md 'storage section') — Full storage configuration options including compression settings diff --git a/reference_versioned_docs/version-v4/database/data-loader.md b/reference_versioned_docs/version-v4/database/data-loader.md new file mode 100644 index 00000000..c57495a9 --- /dev/null +++ b/reference_versioned_docs/version-v4/database/data-loader.md @@ -0,0 +1,153 @@ +--- +title: Data Loader +--- + + + + +# Data Loader + +Added in: v4.6.0 + +The Data Loader is a built-in component that loads data from JSON or YAML files into Harper tables as part of component deployment. It is designed for seeding tables with initial records — configuration data, reference data, default users, or other records that should exist when a component is first deployed or updated. + +## Configuration + +In your component's `config.yaml`, use the `dataLoader` key to specify the data files to load: + +```yaml +dataLoader: + files: 'data/*.json' +``` + +`dataLoader` is an [Extension](TODO:reference_versioned_docs/version-v4/components/extension-api.md 'Extension component API') and supports the standard `files` configuration option, including glob patterns. + +## Data File Format + +Each data file loads records into a single table. The file specifies the target database, table, and an array of records. + +### JSON Example + +```json +{ + "database": "myapp", + "table": "users", + "records": [ + { + "id": 1, + "username": "admin", + "email": "admin@example.com", + "role": "administrator" + }, + { + "id": 2, + "username": "user1", + "email": "user1@example.com", + "role": "standard" + } + ] +} +``` + +### YAML Example + +```yaml +database: myapp +table: settings +records: + - id: 1 + setting_name: app_name + setting_value: My Application + - id: 2 + setting_name: version + setting_value: '1.0.0' +``` + +One table per file. To load data into multiple tables, create a separate file for each table. + +## File Patterns + +The `files` option accepts a single path, a list of paths, or a glob pattern: + +```yaml +# Single file +dataLoader: + files: 'data/seed-data.json' + +# Multiple specific files +dataLoader: + files: + - 'data/users.json' + - 'data/settings.yaml' + - 'data/initial-products.json' + +# Glob pattern +dataLoader: + files: 'data/**/*.{json,yaml,yml}' +``` + +## Loading Behavior + +When Harper starts a component with `dataLoader` configured: + +1. All specified data files are read +2. Each file is validated to reference a single table +3. Records are inserted or updated using content hash comparison (SHA-256 hashes stored in the `hdb_dataloader_hash` system table) + +### Change Detection + +| Scenario | Behavior | +|---|---| +| New record | Inserted; content hash stored | +| Unchanged record | Skipped (no writes) | +| Changed data file | Updated via `patch`, preserving any extra fields | +| Record created by user (not data loader) | Never overwritten | +| Record modified by user after load | Preserved, not overwritten | +| Extra fields added by user to a data-loaded record | Preserved during updates | + +This design makes data files safe to redeploy without losing manual modifications. + +## Best Practices + +**Define schemas first.** While the Data Loader can infer schemas from the records it loads, it is strongly recommended to define table schemas explicitly using the [graphqlSchema component](./schema.md) before loading data. This ensures proper types, constraints, and relationships. + +**One table per file.** Each data file must target a single table. Organize files accordingly. + +**Idempotent data.** Design files to be safe to load multiple times without creating conflicts. + +**Version control.** Include data files in version control for consistency across deployments and environments. + +**No sensitive data.** Do not include passwords, API keys, or secrets directly in data files. Use environment variables or secure configuration management instead. + +## Example Component Structure + +``` +my-component/ +├── config.yaml +├── data/ +│ ├── users.json +│ ├── roles.json +│ └── settings.json +├── schemas.graphql +└── roles.yaml +``` + +```yaml +# config.yaml +graphqlSchema: + files: 'schemas.graphql' + +roles: + files: 'roles.yaml' + +dataLoader: + files: 'data/*.json' + +rest: true +``` + +## Related Documentation + +- [Schema](./schema.md) — Defining table structure before loading data +- [Jobs](./jobs.md) — Bulk data operations via the Operations API (CSV/JSON import from file, URL, or S3) +- [Components](TODO:reference_versioned_docs/version-v4/components/overview.md) — Extension and plugin system that the data loader is built on diff --git a/reference_versioned_docs/version-v4/database/jobs.md b/reference_versioned_docs/version-v4/database/jobs.md new file mode 100644 index 00000000..90612406 --- /dev/null +++ b/reference_versioned_docs/version-v4/database/jobs.md @@ -0,0 +1,272 @@ +--- +title: Jobs +--- + + + + + +# Jobs + +Harper uses an asynchronous job system for long-running data operations. When a bulk operation is initiated — such as loading a large CSV file or exporting millions of records — Harper starts a background job and immediately returns a job ID. Use the job ID to check progress and status. + +Job status values: + +- `IN_PROGRESS` — the job is currently running +- `COMPLETE` — the job finished successfully + +## Bulk Operations + +The following operations create jobs. All bulk operations are sent to the Operations API. + +### CSV Data Load + +Ingests CSV data provided directly in the request body. + +- `operation` _(required)_ — `csv_data_load` +- `database` _(optional)_ — target database; defaults to `data` +- `table` _(required)_ — target table +- `action` _(optional)_ — `insert`, `update`, or `upsert`; defaults to `insert` +- `data` _(required)_ — CSV content as a string + +```json +{ + "operation": "csv_data_load", + "database": "dev", + "action": "insert", + "table": "breed", + "data": "id,name,country\n1,Labrador,Canada\n2,Poodle,France\n" +} +``` + +Response: + +```json +{ + "message": "Starting job with id 2fe25039-566e-4670-8bb3-2db3d4e07e69", + "job_id": "2fe25039-566e-4670-8bb3-2db3d4e07e69" +} +``` + +--- + +### CSV File Load + +Ingests CSV data from a file on the server's local filesystem. + +> The CSV file must reside on the same machine running Harper. + +- `operation` _(required)_ — `csv_file_load` +- `database` _(optional)_ — target database; defaults to `data` +- `table` _(required)_ — target table +- `action` _(optional)_ — `insert`, `update`, or `upsert`; defaults to `insert` +- `file_path` _(required)_ — absolute path to the CSV file on the host + +```json +{ + "operation": "csv_file_load", + "action": "insert", + "database": "dev", + "table": "breed", + "file_path": "/home/user/imports/breeds.csv" +} +``` + +--- + +### CSV URL Load + +Ingests CSV data from a URL. + +- `operation` _(required)_ — `csv_url_load` +- `database` _(optional)_ — target database; defaults to `data` +- `table` _(required)_ — target table +- `action` _(optional)_ — `insert`, `update`, or `upsert`; defaults to `insert` +- `csv_url` _(required)_ — URL pointing to the CSV file + +```json +{ + "operation": "csv_url_load", + "action": "insert", + "database": "dev", + "table": "breed", + "csv_url": "https://s3.amazonaws.com/mydata/breeds.csv" +} +``` + +--- + +### Import from S3 + +Imports CSV or JSON files from an AWS S3 bucket. + +- `operation` _(required)_ — `import_from_s3` +- `database` _(optional)_ — target database; defaults to `data` +- `table` _(required)_ — target table +- `action` _(optional)_ — `insert`, `update`, or `upsert`; defaults to `insert` +- `s3` _(required)_ — S3 connection details: + - `aws_access_key_id` + - `aws_secret_access_key` + - `bucket` + - `key` — filename including extension (`.csv` or `.json`) + - `region` + +```json +{ + "operation": "import_from_s3", + "action": "insert", + "database": "dev", + "table": "dog", + "s3": { + "aws_access_key_id": "YOUR_KEY", + "aws_secret_access_key": "YOUR_SECRET_KEY", + "bucket": "BUCKET_NAME", + "key": "dogs.json", + "region": "us-east-1" + } +} +``` + +--- + +### Export Local + +Exports table data to a local file in JSON or CSV format. + +- `operation` _(required)_ — `export_local` +- `format` _(required)_ — `json` or `csv` +- `path` _(required)_ — local directory path where the export file will be written +- `search_operation` _(required)_ — query to select records: `search_by_hash`, `search_by_value`, `search_by_conditions`, or `sql` + +Changed in: v4.3.0 — `search_by_conditions` added as a supported search operation for exports + +- `filename` _(optional)_ — filename without extension; auto-generated from epoch timestamp if omitted + +```json +{ + "operation": "export_local", + "format": "json", + "path": "/data/exports/", + "search_operation": { + "operation": "sql", + "sql": "SELECT * FROM dev.breed" + } +} +``` + +--- + +### Export to S3 + +Exports table data to an AWS S3 bucket in JSON or CSV format. + +Changed in: v4.3.0 — `search_by_conditions` added as a supported search operation + +- `operation` _(required)_ — `export_to_s3` +- `format` _(required)_ — `json` or `csv` +- `s3` _(required)_ — S3 connection details (same fields as Import from S3, plus `key` for the output object name) +- `search_operation` _(required)_ — `search_by_hash`, `search_by_value`, `search_by_conditions`, or `sql` + +```json +{ + "operation": "export_to_s3", + "format": "json", + "s3": { + "aws_access_key_id": "YOUR_KEY", + "aws_secret_access_key": "YOUR_SECRET_KEY", + "bucket": "BUCKET_NAME", + "key": "exports/dogs.json", + "region": "us-east-1" + }, + "search_operation": { + "operation": "sql", + "sql": "SELECT * FROM dev.dog" + } +} +``` + +--- + +### Delete Records Before + +Deletes records older than a given timestamp from a table. Operates only on the local node — clustered replicas retain their data. + +_Restricted to `super_user` roles._ + +- `operation` _(required)_ — `delete_records_before` +- `schema` _(required)_ — database name +- `table` _(required)_ — table name +- `date` _(required)_ — records with `__createdtime__` before this timestamp are deleted. Format: `YYYY-MM-DDThh:mm:ss.sZ` + +```json +{ + "operation": "delete_records_before", + "date": "2024-01-01T00:00:00.000Z", + "schema": "dev", + "table": "breed" +} +``` + +## Managing Jobs + +### Get Job + +Returns status, metrics, and messages for a specific job by ID. + +- `operation` _(required)_ — `get_job` +- `id` _(required)_ — job ID + +```json +{ + "operation": "get_job", + "id": "4a982782-929a-4507-8794-26dae1132def" +} +``` + +Response: + +```json +[ + { + "__createdtime__": 1611615798782, + "__updatedtime__": 1611615801207, + "created_datetime": 1611615798774, + "end_datetime": 1611615801206, + "id": "4a982782-929a-4507-8794-26dae1132def", + "job_body": null, + "message": "successfully loaded 350 of 350 records", + "start_datetime": 1611615798805, + "status": "COMPLETE", + "type": "csv_url_load", + "user": "HDB_ADMIN", + "start_datetime_converted": "2021-01-25T23:03:18.805Z", + "end_datetime_converted": "2021-01-25T23:03:21.206Z" + } +] +``` + +--- + +### Search Jobs by Start Date + +Returns all jobs started within a time window. + +_Restricted to `super_user` roles._ + +- `operation` _(required)_ — `search_jobs_by_start_date` +- `from_date` _(required)_ — start of the search window (ISO 8601 format) +- `to_date` _(required)_ — end of the search window (ISO 8601 format) + +```json +{ + "operation": "search_jobs_by_start_date", + "from_date": "2024-01-01T00:00:00.000+0000", + "to_date": "2024-01-02T00:00:00.000+0000" +} +``` + +## Related Documentation + +- [Data Loader](./data-loader.md) — Component-based data loading as part of deployment +- [Operations API](TODO:reference_versioned_docs/version-v4/operations-api/overview.md) — Sending operations to Harper +- [Transaction Logging](./transaction.md) — Recording a history of changes made to tables diff --git a/reference_versioned_docs/version-v4/database/overview.md b/reference_versioned_docs/version-v4/database/overview.md new file mode 100644 index 00000000..93bb6c38 --- /dev/null +++ b/reference_versioned_docs/version-v4/database/overview.md @@ -0,0 +1,123 @@ +--- +title: Overview +--- + + + + + +# Database + +Harper's database system is the foundation of its data storage and retrieval capabilities. It is built on top of [LMDB](https://www.symas.com/lmdb) (Lightning Memory-Mapped Database) and is designed to provide high performance, ACID-compliant storage with automatic indexing and flexible schema support. + +## How Harper Stores Data + +Harper organizes data in a three-tier hierarchy: + +- **Databases** — containers that group related tables together in a single transactional file +- **Tables** — collections of records with a common data pattern +- **Records** — individual data objects with a primary key and any number of attributes + +All tables within a database share the same transaction context, meaning reads and writes across tables in the same database can be performed atomically. + +### The Schema System and Auto-REST + +The most common way to use Harper's database is through the **schema system**. By defining a [GraphQL schema](./schema.md), you can: + +- Declare tables and their attribute types +- Control which attributes are indexed +- Define relationships between tables +- Automatically expose data via REST, MQTT, and other interfaces + +You do not need to build custom application code to use the database. A schema definition alone is enough to create fully functional, queryable REST endpoints for your data. + +For more advanced use cases, you can extend table behavior using the [Resource API](TODO:reference_versioned_docs/version-v4/resources/resource-api.md 'Custom resource logic layered on top of tables'). + +### Architecture Overview + +``` + ┌──────────┐ ┌──────────┐ + │ Clients │ │ Clients │ + └────┬─────┘ └────┬─────┘ + │ │ + ▼ ▼ + ┌────────────────────────────────────────┐ + │ │ + │ Socket routing/management │ + ├───────────────────────┬────────────────┤ + │ │ │ + │ Server Interfaces ─►│ Authentication │ + │ RESTful HTTP, MQTT │ Authorization │ + │ ◄─┤ │ + │ ▲ └────────────────┤ + │ │ │ │ + ├───┼──────────┼─────────────────────────┤ + │ │ │ ▲ │ + │ ▼ Resources ▲ │ ┌───────────┐ │ + │ │ └─┤ │ │ + ├─────────────────┴────┐ │ App │ │ + │ ├─►│ resources │ │ + │ Database tables │ └───────────┘ │ + │ │ ▲ │ + ├──────────────────────┘ │ │ + │ ▲ ▼ │ │ + │ ┌────────────────┐ │ │ + │ │ External │ │ │ + │ │ data sources ├────┘ │ + │ │ │ │ + │ └────────────────┘ │ + │ │ + └────────────────────────────────────────┘ +``` + +## Databases + +Added in: v4.2.0 + +Harper databases hold a collection of tables in a single transactionally-consistent file. This means reads and writes can be performed atomically across all tables in the same database, and multi-table transactions are replicated as a single atomic unit. + +The default database is named `data`. Most applications will use this default. Additional databases can be created for namespace separation — this is particularly useful for components designed for reuse across multiple applications, where a unique database name avoids naming collisions. + +> **Note:** Transactions do not preserve atomicity across different databases, only across tables within the same database. + +## Tables + +Tables group records with a common data pattern. A table must have: + +- **Table name** — used to identify the table +- **Primary key** — the unique identifier for each record (also referred to as `hash_attribute` in the Operations API) + +Primary keys must be unique. If a primary key is not provided on insert, Harper auto-generates one: +- A **UUID string** for primary keys typed as `String` or `ID` +- An **auto-incrementing integer** for primary keys typed as `Int`, `Long`, or `Any` + +Numeric primary keys are more efficient than UUIDs for large tables. + +## Dynamic vs. Defined Schemas + +Harper tables can operate in two modes: + +**Defined schemas** (recommended): Tables with schemas explicitly declared using [GraphQL schema syntax](./schema.md). This provides predictable structure, precise control over indexing, and data integrity. Schemas are declared in a component's `schema.graphql` file. + +**Dynamic schemas**: Tables created through the Operations API or Studio without a schema definition. Attributes are reflexively added as data is ingested. All top-level attributes are automatically indexed. Dynamic schema tables automatically maintain `__createdtime__` and `__updatedtime__` audit attributes on every record. + +It is best practice to define schemas for production tables. Dynamic schemas are convenient for experimentation and prototyping. + +## Key Concepts + +For deeper coverage of each database feature, see the dedicated pages in this section: + +- **[Schema](./schema.md)** — Defining table structure, types, indexes, relationships, and computed properties using GraphQL schema syntax +- **[Data Loader](./data-loader.md)** — Loading seed or initial data into tables as part of component deployment +- **[Storage Algorithm](./storage-algorithm.md)** — How Harper stores data using LMDB with universal indexing and ACID compliance +- **[Jobs](./jobs.md)** — Asynchronous bulk data operations (CSV import/export, S3 import/export) +- **[System Tables](./system-tables.md)** — Harper internal tables for analytics, data loader state, and other system features +- **[Compaction](./compaction.md)** — Reducing database file size by eliminating fragmentation and free space +- **[Transaction Logging](./transaction.md)** — Recording and querying a history of data changes via audit log and transaction log + +## Related Documentation + +- [REST](TODO:reference_versioned_docs/version-v4/rest/overview.md) — HTTP interface built on top of the database resource system +- [Resources](TODO:reference_versioned_docs/version-v4/resources/overview.md) — Custom application logic extending database tables +- [Operations API](TODO:reference_versioned_docs/version-v4/operations-api/overview.md) — Direct database management operations (create/drop databases and tables, insert/update/delete records) +- [Configuration](TODO:reference_versioned_docs/version-v4/configuration/overview.md) — Storage configuration options (compression, blob paths, compaction) diff --git a/reference_versioned_docs/version-v4/database/schema.md b/reference_versioned_docs/version-v4/database/schema.md new file mode 100644 index 00000000..30b6f872 --- /dev/null +++ b/reference_versioned_docs/version-v4/database/schema.md @@ -0,0 +1,450 @@ +--- +title: Schema +--- + + + + + + + + + + + +# Schema + +Harper uses GraphQL Schema Definition Language (SDL) to declaratively define table structure. Schema definitions are loaded from `.graphql` files in a component directory and control table creation, attribute types, indexing, and relationships. + +## Overview + +Added in: v4.2.0 + +Schemas are defined using standard [GraphQL type definitions](https://graphql.org/learn/schema/) with Harper-specific directives. A schema definition: + +- Ensures required tables exist when a component is deployed +- Enforces attribute types and required constraints +- Controls which attributes are indexed +- Defines relationships between tables +- Configures computed properties, expiration, and audit behavior + +Schemas are flexible by default — records may include additional properties beyond those declared in the schema. Use the `@sealed` directive to prevent this. + +A minimal example: + +```graphql +type Dog @table { + id: ID @primaryKey + name: String + breed: String + age: Int +} + +type Breed @table { + id: ID @primaryKey + name: String @indexed +} +``` + +### Loading Schemas + +In a component's `config.yaml`, specify the schema file with the `graphqlSchema` plugin: + +```yaml +graphqlSchema: + files: 'schema.graphql' +``` + +## Type Directives + +Type directives apply to the entire table type definition. + +### `@table` + +Marks a GraphQL type as a Harper database table. The type name becomes the table name by default. + +```graphql +type MyTable @table { + id: ID @primaryKey +} +``` + +Optional arguments: + +| Argument | Type | Default | Description | +|---|---|---|---| +| `table` | `String` | type name | Override the table name | +| `database` | `String` | `"data"` | Database to place the table in | +| `expiration` | `Int` | — | Auto-expire records after this many seconds (useful for caching tables) | +| `audit` | `Boolean` | config default | Enable audit log for this table | + +**Database naming:** The default `data` database is a good choice for tables that won't be reused elsewhere. Components designed for reuse should specify a unique database name (e.g., `"my-component-data"`) to avoid naming collisions with other components. + +### `@export` + +Exposes the table as an externally accessible resource endpoint, available via REST, MQTT, and other interfaces. + +```graphql +type MyTable @table @export(name: "my-table") { + id: ID @primaryKey +} +``` + +The optional `name` parameter specifies the URL path segment (e.g., `/my-table/`). Without `name`, the type name is used. + +## Field Directives + +Field directives apply to individual attributes in a type definition. + +### `@primaryKey` + +Designates the attribute as the table's primary key. Primary keys must be unique; inserts with a duplicate primary key are rejected. + +```graphql +type Product @table { + id: Long @primaryKey + name: String +} +``` + +If no primary key is provided on insert, Harper auto-generates one: +- **UUID string** — when type is `String` or `ID` +- **Auto-incrementing integer** — when type is `Int`, `Long`, or `Any` + +Changed in: v4.4.0 + +Auto-incrementing integer primary keys were added. Previously only UUID generation was supported for `ID` and `String` types. + +Using `Long` or `Any` is recommended for auto-generated numeric keys. `Int` is limited to 32-bit and may be insufficient for large tables. + +### `@indexed` + +Creates a secondary index on the attribute for fast querying. Required for filtering by this attribute in REST queries, SQL, or NoSQL operations. + +```graphql +type Product @table { + id: ID @primaryKey + category: String @indexed + price: Float @indexed +} +``` + +If the field value is an array, each element in the array is individually indexed, enabling queries by any individual value. + +Null values are indexed by default on new tables (added in v4.3.0), enabling queries like `GET /Product/?category=null`. + +> **Note:** Existing indexes created before v4.3.0 do not include null values. To add null indexing to an existing attribute, drop and re-add the attribute index. + +### `@createdTime` + +Automatically assigns a creation timestamp (Unix epoch milliseconds) to the attribute when a record is created. + +```graphql +type Event @table { + id: ID @primaryKey + createdAt: Long @createdTime +} +``` + +### `@updatedTime` + +Automatically assigns a timestamp (Unix epoch milliseconds) each time the record is updated. + +```graphql +type Event @table { + id: ID @primaryKey + updatedAt: Long @updatedTime +} +``` + +### `@sealed` + +Prevents records from including any properties beyond those explicitly declared in the type. By default, Harper allows records to have additional properties. + +```graphql +type StrictRecord @table @sealed { + id: ID @primaryKey + name: String +} +``` + +## Relationships + +Added in: v4.3.0 + +The `@relationship` directive defines how one table relates to another through a foreign key. Relationships enable join queries and allow related records to be selected as nested properties in query results. + +### `@relationship(from: attribute)` — many-to-one or many-to-many + +The foreign key is in this table, referencing the primary key of the target table. + +```graphql +type Product @table @export { + id: ID @primaryKey + brandId: ID @indexed # foreign key + brand: Brand @relationship(from: brandId) # many-to-one +} + +type Brand @table @export { + id: ID @primaryKey + name: String @indexed +} +``` + +Query products by brand name: + +```http +GET /Product?brand.name=Microsoft +``` + +If the foreign key is an array, this establishes a many-to-many relationship: + +```graphql +type Product @table @export { + id: ID @primaryKey + featureIds: [ID] @indexed + features: [Feature] @relationship(from: featureIds) +} +``` + +### `@relationship(to: attribute)` — one-to-many or many-to-many + +The foreign key is in the target table, referencing the primary key of this table. The result type must be an array. + +```graphql +type Brand @table @export { + id: ID @primaryKey + name: String @indexed + products: [Product] @relationship(to: brandId) # one-to-many +} +``` + +> **Note:** Do not combine `from` and `to` in the same `@relationship` directive. + +Schemas can also define self-referential relationships, enabling parent-child hierarchies within a single table. + +## Computed Properties + +Added in: v4.4.0 + +The `@computed` directive marks a field as derived from other fields at query time. Computed properties are not stored in the database but are evaluated when the field is accessed. + +```graphql +type Product @table { + id: ID @primaryKey + price: Float + taxRate: Float + totalPrice: Float @computed(from: "price + (price * taxRate)") +} +``` + +The `from` argument is a JavaScript expression that can reference other record fields. + +Computed properties can also be defined in JavaScript for complex logic: + +```graphql +type Product @table { + id: ID @primaryKey + totalPrice: Float @computed +} +``` + +```javascript +tables.Product.setComputedAttribute('totalPrice', (record) => { + return record.price + record.price * record.taxRate; +}); +``` + +Computed properties are not included in query results by default — use `select` to include them explicitly. + +### Computed Indexes + +Computed properties can be indexed with `@indexed`, enabling custom indexing strategies such as composite indexes, full-text search, or vector indexing: + +```graphql +type Product @table { + id: ID @primaryKey + tags: String + tagsSeparated: String[] @computed(from: "tags.split(/\\s*,\\s*/)") @indexed +} +``` + +When using a JavaScript function for an indexed computed property, use the `version` argument to ensure re-indexing when the function changes: + +```graphql +type Product @table { + id: ID @primaryKey + totalPrice: Float @computed(version: 1) @indexed +} +``` + +Increment `version` whenever the computation function changes. Failing to do so can result in an inconsistent index. + +## Vector Indexing + +Added in: v4.6.0 + +Use `@indexed(type: "HNSW")` to create a vector index using the Hierarchical Navigable Small World algorithm, designed for fast approximate nearest-neighbor search on high-dimensional vectors. + +```graphql +type Document @table { + id: Long @primaryKey + textEmbeddings: [Float] @indexed(type: "HNSW") +} +``` + +Query by nearest neighbors using the `sort` parameter: + +```javascript +let results = Document.search({ + sort: { attribute: 'textEmbeddings', target: searchVector }, + limit: 5, +}); +``` + +HNSW can be combined with filter conditions: + +```javascript +let results = Document.search({ + conditions: [{ attribute: 'price', comparator: 'lt', value: 50 }], + sort: { attribute: 'textEmbeddings', target: searchVector }, + limit: 5, +}); +``` + +### HNSW Parameters + +| Parameter | Default | Description | +|---|---|---| +| `distance` | `"cosine"` | Distance function: `"euclidean"` or `"cosine"` (negative cosine similarity) | +| `efConstruction` | `100` | Max nodes explored during index construction. Higher = better recall, lower = better performance | +| `M` | `16` | Preferred connections per graph layer. Higher = more space, better recall for high-dimensional data | +| `optimizeRouting` | `0.5` | Heuristic aggressiveness for omitting redundant connections (0 = off, 1 = most aggressive) | +| `mL` | computed from `M` | Normalization factor for level generation | +| `efSearchConstruction` | `50` | Max nodes explored during search | + +Example with custom parameters: + +```graphql +type Document @table { + id: Long @primaryKey + textEmbeddings: [Float] @indexed(type: "HNSW", distance: "euclidean", optimizeRouting: 0, efSearchConstruction: 100) +} +``` + +## Field Types + +Harper supports the following field types: + +| Type | Description | +|---|---| +| `String` | Unicode text, UTF-8 encoded | +| `Int` | 32-bit signed integer (−2,147,483,648 to 2,147,483,647) | +| `Long` | 54-bit signed integer (−9,007,199,254,740,992 to 9,007,199,254,740,992) | +| `Float` | 64-bit double precision floating point | +| `BigInt` | Integer up to ~300 digits. Note: distinct JavaScript type; handle appropriately in custom code | +| `Boolean` | `true` or `false` | +| `ID` | String; indicates a non-human-readable identifier | +| `Any` | Any primitive, object, or array | +| `Date` | JavaScript `Date` object | +| `Bytes` | Binary data as `Buffer` or `Uint8Array` | +| `Blob` | Binary large object; designed for streaming content >20KB | + +Added in for `BigInt`: v4.3.0 + +Added in for `Blob`: v4.5.0 + +Arrays of a type are expressed with `[Type]` syntax (e.g., `[Float]` for a vector). + +### Blob Type + +Added in: v4.5.0 + +`Blob` fields are designed for large binary content. Unlike `Bytes`, blobs are stored separately from the record, support streaming, and do not need to be held entirely in memory. Use `Blob` for content typically larger than 20KB (images, video, audio, large HTML, etc.). + +See [Blob usage details](#blob-usage) below. + +#### Blob Usage + +Declare a blob field: + +```graphql +type MyTable @table { + id: Any! @primaryKey + data: Blob +} +``` + +Create and store a blob: + +```javascript +let blob = createBlob(largeBuffer); +await MyTable.put({ id: 'my-record', data: blob }); +``` + +Retrieve blob data: + +```javascript +let record = await MyTable.get('my-record'); +let buffer = await record.data.bytes(); +// or stream it: +let stream = record.data.stream(); +``` + +Blobs support asynchronous streaming, meaning a record can reference a blob before it is fully written to storage. Use `saveBeforeCommit: true` to wait for full write before committing: + +```javascript +let blob = createBlob(stream, { saveBeforeCommit: true }); +await MyTable.put({ id: 'my-record', data: blob }); +``` + +Any string or buffer assigned to a `Blob` field in a `put`, `patch`, or `publish` is automatically coerced to a `Blob`. + +When returning a blob via REST, register an error handler to handle interrupted streams: + +```javascript +export class MyEndpoint extends MyTable { + async get(target) { + const record = super.get(target); + let blob = record.data; + blob.on('error', () => { + MyTable.invalidate(target); + }); + return { status: 200, headers: {}, body: blob }; + } +} +``` + +## Dynamic Schema Behavior + +When a table is created through the Operations API or Studio without a schema definition, it follows dynamic schema behavior: + +- Attributes are reflexively created as data is ingested +- All top-level attributes are automatically indexed +- Records automatically get `__createdtime__` and `__updatedtime__` audit attributes + +Dynamic schema tables are additive — new attributes are added as new data arrives. Existing records will have `null` for any newly added attributes. + +Use `create_attribute` and `drop_attribute` operations to manually manage attributes on dynamic schema tables. See the [Operations API](TODO:reference_versioned_docs/version-v4/operations-api/operations.md 'NoSQL and database operations') for details. + +## OpenAPI Specification + +Tables exported with `@export` are described in a default endpoint: + +```http +GET /openapi +``` + +This provides an OpenAPI 3.x description of all exported resource endpoints. The endpoint is a starting guide and may not cover every edge case. + +## Renaming Tables + +> Harper does not support renaming tables. Changing a type name in a schema definition creates a new, empty table — the original table and its data are unaffected. + +## Related Documentation + +- [Data Loader](./data-loader.md) — Seed tables with initial data alongside schema deployment +- [REST Querying](TODO:reference_versioned_docs/version-v4/rest/querying.md) — Querying tables via HTTP using schema-defined attributes and relationships +- [Resources](TODO:reference_versioned_docs/version-v4/resources/resource-api.md) — Extending table behavior with custom application logic +- [Storage Algorithm](./storage-algorithm.md) — How Harper indexes and stores schema-defined data +- [Configuration](TODO:reference_versioned_docs/version-v4/configuration/options.md 'graphqlSchema component and storage options') — Component configuration for schemas diff --git a/reference_versioned_docs/version-v4/database/storage-algorithm.md b/reference_versioned_docs/version-v4/database/storage-algorithm.md new file mode 100644 index 00000000..bf83da5d --- /dev/null +++ b/reference_versioned_docs/version-v4/database/storage-algorithm.md @@ -0,0 +1,71 @@ +--- +title: Storage Algorithm +--- + + + + +# Storage Algorithm + +Harper's storage algorithm is the foundation of all database functionality. It is built on top of [LMDB](https://www.symas.com/lmdb) (Lightning Memory-Mapped Database), a high-performance key-value store, and extends it with automatic indexing, query-language-agnostic data access, and ACID compliance. + +## Query Language Agnostic + +Harper's storage layer is decoupled from any specific query language. Data inserted via NoSQL operations can be read via SQL, REST, or the Resource API — all accessing the same underlying storage. This architecture allows Harper to add new query interfaces without changing how data is stored. + +## ACID Compliance + +Harper provides full ACID compliance on each node using Multi-Version Concurrency Control (MVCC) through LMDB: + +- **Atomicity**: All writes in a transaction either fully commit or fully roll back +- **Consistency**: Each transaction moves data from one valid state to another +- **Isolation**: Readers and writers operate independently — readers do not block writers and writers do not block readers +- **Durability**: Committed transactions are persisted to disk + +Each Harper table has a single writer process, eliminating deadlocks and ensuring writes are executed in the order received. Multiple reader processes can operate concurrently for high-throughput reads. + +## Universally Indexed + +Changed in: v4.3.0 — Storage performance improvements including better free-space management + +All top-level attributes are automatically indexed immediately upon ingestion. For [dynamic schema tables](./overview.md#dynamic-vs-defined-schemas), Harper reflexively creates the attribute and its index as new data arrives. For [schema-defined tables](./schema.md), indexes are created for all attributes marked with `@indexed`. + +Indexes are type-agnostic, ordering values as follows: +1. Booleans +2. Numbers (ordered numerically) +3. Strings (ordered lexically) + +### LMDB Storage Layout + +Within the LMDB implementation, table records are grouped into a single LMDB environment file. Each attribute index is stored as a sub-database (`dbi`) within that environment. This means each attribute has its own dedicated index structure inside the same database file. + +## Compression + +Changed in: v4.3.0 — Compression is now enabled by default for all records over 4KB + +Harper compresses record data automatically for records over 4KB. Compression settings can be configured in the [storage configuration](TODO:reference_versioned_docs/version-v4/configuration/options.md 'storage configuration options'). Note that compression settings cannot be changed on existing databases without creating a new compacted copy — see [Compaction](./compaction.md). + +## Performance Characteristics + +Harper inherits the following performance properties from LMDB: + +- **Memory-mapped I/O**: Data is accessed via memory mapping, enabling fast reads without data duplication between disk and memory +- **Buffer cache integration**: Fully exploits the OS buffer cache for reduced I/O +- **CPU cache optimization**: Built to maximize data locality within CPU caches +- **Deadlock-free writes**: Full serialization of writers guarantees write ordering without deadlocks +- **Zero-copy reads**: Readers access data directly from the memory map without copying + +## Indexing Example + +The diagram below illustrates how a single table's data and attribute indexes are laid out within Harper's LMDB-based storage: + + + + +![Storage Algorithm Diagram](TODO:IMAGE) + +## Related Documentation + +- [Schema](./schema.md) — Defining indexed attributes and vector indexes +- [Compaction](./compaction.md) — Reclaiming free space and applying new storage configuration to existing databases +- [Configuration](TODO:reference_versioned_docs/version-v4/configuration/options.md 'storage section') — Storage configuration options (compression, memory maps, blob paths) diff --git a/reference_versioned_docs/version-v4/database/system-tables.md b/reference_versioned_docs/version-v4/database/system-tables.md new file mode 100644 index 00000000..3ac18748 --- /dev/null +++ b/reference_versioned_docs/version-v4/database/system-tables.md @@ -0,0 +1,154 @@ +--- +title: System Tables +--- + + + + + +# System Tables + +Harper maintains a set of internal system tables in the `system` database. These tables store analytics, job tracking, replication configuration, and other internal state. Most are read-only from the application perspective; some can be queried for observability or management purposes. + +System tables are prefixed with `hdb_` and reside in the `system` database. + +## Analytics Tables + +Added in: v4.5.0 (resource and storage analytics expansion) + +### `hdb_raw_analytics` + +Stores per-second, per-thread performance metrics. Records are written once per second (when there is activity) and include metrics for all operations, URL endpoints, and messaging topics, plus system resource information such as memory and CPU utilization. + +Records have a primary key equal to the timestamp in milliseconds since Unix epoch. + +Query with `search_by_conditions` (requires `superuser` permission): + +```json +{ + "operation": "search_by_conditions", + "schema": "system", + "table": "hdb_raw_analytics", + "conditions": [{ + "search_attribute": "id", + "search_type": "between", + "search_value": [1688594000000, 1688594010000] + }] +} +``` + +A typical record: + +```json +{ + "time": 1688594390708, + "period": 1000.8336279988289, + "metrics": [ + { + "metric": "bytes-sent", + "path": "search_by_conditions", + "type": "operation", + "median": 202, + "mean": 202, + "p95": 202, + "p90": 202, + "count": 1 + }, + { + "metric": "memory", + "threadId": 2, + "rss": 1492664320, + "heapTotal": 124596224, + "heapUsed": 119563120, + "external": 3469790, + "arrayBuffers": 798721 + }, + { + "metric": "utilization", + "idle": 138227.52767700003, + "active": 70.5066209952347, + "utilization": 0.0005098165086230495 + } + ], + "threadId": 2, + "totalBytesProcessed": 12182820, + "id": 1688594390708.6853 +} +``` + +### `hdb_analytics` + +Stores per-minute aggregate analytics. Once per minute, Harper aggregates all per-second raw entries from all threads into summary records in this table. Query it for longer-term performance trends. + +```json +{ + "operation": "search_by_conditions", + "schema": "system", + "table": "hdb_analytics", + "conditions": [{ + "search_attribute": "id", + "search_type": "between", + "search_value": [1688194100000, 1688594990000] + }] +} +``` + +A typical aggregate record: + +```json +{ + "period": 60000, + "metric": "bytes-sent", + "method": "connack", + "type": "mqtt", + "median": 4, + "mean": 4, + "p95": 4, + "p90": 4, + "count": 1, + "id": 1688589569646, + "time": 1688589569646 +} +``` + +For a full reference of available metrics and their fields, see [Analytics](TODO:reference_versioned_docs/version-v4/analytics/overview.md 'Complete analytics metrics reference'). + +## Data Loader Table + +### `hdb_dataloader_hash` + +Added in: v4.6.0 + +Used internally by the [Data Loader](./data-loader.md) to track which records have been loaded and detect changes. Stores SHA-256 content hashes of data file records so that unchanged records are not re-written on subsequent deployments. + +This table is managed automatically by the Data Loader. No direct interaction is required. + +## Replication Tables + +### `hdb_nodes` + +Stores the configuration and state of known nodes in a cluster, including connection details, replication settings, and revoked certificate serial numbers. + +Can be queried to inspect the current replication topology: + +```json +{ + "operation": "search_by_hash", + "schema": "system", + "table": "hdb_nodes", + "hash_values": ["node-id"] +} +``` + +Used by the `add_node`, `update_node`, and related clustering operations. See [Replication](TODO:reference_versioned_docs/version-v4/replication/clustering.md 'Clustering and node management') for details. + +### `hdb_certificate` + +Stores TLS certificates used in replication. Can be queried to inspect the certificates currently known to the cluster. + +## Related Documentation + +- [Analytics](TODO:reference_versioned_docs/version-v4/analytics/overview.md) — Full reference for analytics metrics tracked in `hdb_analytics` and `hdb_raw_analytics` +- [Data Loader](./data-loader.md) — Component that writes to `hdb_dataloader_hash` +- [Replication](TODO:reference_versioned_docs/version-v4/replication/overview.md) — Clustering and replication system that uses `hdb_nodes` and `hdb_certificate` +- [Operations API](TODO:reference_versioned_docs/version-v4/operations-api/overview.md) — Querying system tables using `search_by_conditions` diff --git a/reference_versioned_docs/version-v4/database/transaction.md b/reference_versioned_docs/version-v4/database/transaction.md new file mode 100644 index 00000000..a0537ced --- /dev/null +++ b/reference_versioned_docs/version-v4/database/transaction.md @@ -0,0 +1,231 @@ +--- +title: Transaction Logging +--- + + + + + + + + +# Transaction Logging + +Harper provides two complementary mechanisms for recording a history of data changes on a table: the **audit log** and the **transaction log**. Both are available at the table level and serve different use cases. + +| Feature | Audit Log | Transaction Log | +|---|---|---| +| Storage | Standard Harper table (per-table) | Clustering streams (per-table) | +| Requires clustering | No | Yes | +| Available since | v4.1.0 | v4.1.0 | +| Stores original record values | Yes | No | +| Query by username | Yes | No | +| Query by primary key | Yes | No | +| Used for real-time messaging | Yes (required) | No | + +## Audit Log + +Available since: v4.1.0 + +The audit log uses a standard Harper table to track every transaction against a user table. For each user table, Harper automatically creates and maintains a corresponding audit log table. The audit log captures the operation type, the user who made the change, the timestamp, and both the new and original record values. + +The audit log is **enabled by default**. To disable it, set `logging.auditLog` to `false` in `harperdb-config.yaml` and restart Harper. + +> The audit log is required for real-time messaging (WebSocket and MQTT subscriptions). Do not disable it if real-time features are in use. + +### Audit Log Operations + +#### `read_audit_log` + +Queries the audit log for a specific table. Supports filtering by timestamp, username, or primary key value. + +**By timestamp:** + +```json +{ + "operation": "read_audit_log", + "schema": "dev", + "table": "dog", + "search_type": "timestamp", + "search_values": [1660585740558] +} +``` + +Timestamp behavior: + +| `search_values` | Result | +|---|---| +| `[]` | All records for the table | +| `[timestamp]` | All records after the provided timestamp | +| `[from, to]` | Records between the two timestamps | + +**By username:** + +```json +{ + "operation": "read_audit_log", + "schema": "dev", + "table": "dog", + "search_type": "username", + "search_values": ["admin"] +} +``` + +**By primary key:** + +```json +{ + "operation": "read_audit_log", + "schema": "dev", + "table": "dog", + "search_type": "hash_value", + "search_values": [318] +} +``` + +**Response example:** + +```json +{ + "operation": "update", + "user_name": "HDB_ADMIN", + "timestamp": 1607035559122.277, + "hash_values": [1, 2], + "records": [ + { + "id": 1, + "breed": "Muttzilla", + "age": 6, + "__updatedtime__": 1607035559122 + } + ], + "original_records": [ + { + "__createdtime__": 1607035556801, + "__updatedtime__": 1607035556801, + "age": 5, + "breed": "Mutt", + "id": 1, + "name": "Harper" + } + ] +} +``` + +The `original_records` field contains the record state before the operation was applied. + +#### `delete_audit_logs_before` + +Deletes audit log entries older than the specified timestamp. + +Changed in: v4.3.0 — Audit log cleanup improved to reduce resource consumption during scheduled cleanups + +Changed in: v4.5.0 — Storage reclamation: Harper automatically evicts older audit log entries when free storage drops below a configurable threshold + +```json +{ + "operation": "delete_audit_logs_before", + "schema": "dev", + "table": "dog", + "timestamp": 1598290282817 +} +``` + +--- + +## Transaction Log + +Available since: v4.1.0 + +The transaction log is built on top of clustering streams. When clustering is enabled, every transaction against a table is pushed to its stream, which collectively forms the transaction log. The transaction log is primarily useful when clustering is set up, as it relies on the stream infrastructure. + +> To use the transaction log, clustering must be configured. See [Replication](TODO:reference_versioned_docs/version-v4/replication/overview.md 'Setting up clustering') for setup instructions. + +Changed in: v4.5.0 — Transactions can now be reused after calling `transaction.commit()` + +### Transaction Log Operations + +#### `read_transaction_log` + +Returns a prescribed set of transaction records based on a time range and optional limit. + +```json +{ + "operation": "read_transaction_log", + "schema": "dev", + "table": "dog", + "from": 1598290235769, + "to": 1660249020865, + "limit": 2 +} +``` + +**Response example:** + +```json +[ + { + "operation": "insert", + "user": "admin", + "timestamp": 1660165619736, + "records": [ + { + "id": 1, + "dog_name": "Penny", + "owner_name": "Kyle", + "__updatedtime__": 1660165619688, + "__createdtime__": 1660165619688 + } + ] + }, + { + "operation": "update", + "user": "admin", + "timestamp": 1660165620040, + "records": [ + { + "id": 1, + "dog_name": "Penny B", + "__updatedtime__": 1660165620036 + } + ] + } +] +``` + +#### `delete_transaction_logs_before` + +Deletes transaction log entries older than the specified timestamp. + +> **Warning:** Clustering uses transaction log streams for node catchup. Deleting transaction log entries may prevent a node that went offline from catching up on missed transactions. + +```json +{ + "operation": "delete_transaction_logs_before", + "schema": "dev", + "table": "dog", + "timestamp": 1598290282817 +} +``` + +--- + +## Enabling Audit Log Per Table + +You can enable or disable the audit log for individual tables using the `@table` directive's `audit` argument in your schema: + +```graphql +type Dog @table(audit: true) { + id: ID @primaryKey + name: String +} +``` + +This overrides the `logging.auditLog` global configuration for that specific table. + +## Related Documentation + +- [Logging](TODO:reference_versioned_docs/version-v4/logging/overview.md) — Application and system logging (separate from transaction/audit logging) +- [Replication](TODO:reference_versioned_docs/version-v4/replication/overview.md) — Clustering setup required for transaction logs +- [Configuration](TODO:reference_versioned_docs/version-v4/configuration/options.md 'logging.auditLog option') — Global audit log configuration +- [Operations API](TODO:reference_versioned_docs/version-v4/operations-api/overview.md) — Sending operations to Harper diff --git a/reference_versioned_sidebars/version-v4-sidebars.json b/reference_versioned_sidebars/version-v4-sidebars.json index 299587d6..533c7ad0 100644 --- a/reference_versioned_sidebars/version-v4-sidebars.json +++ b/reference_versioned_sidebars/version-v4-sidebars.json @@ -294,6 +294,54 @@ } ] }, + { + "type": "category", + "label": "Database", + "collapsible": false, + "className": "learn-category-header", + "items": [ + { + "type": "doc", + "id": "database/overview", + "label": "Overview" + }, + { + "type": "doc", + "id": "database/schema", + "label": "Schema" + }, + { + "type": "doc", + "id": "database/data-loader", + "label": "Data Loader" + }, + { + "type": "doc", + "id": "database/storage-algorithm", + "label": "Storage Algorithm" + }, + { + "type": "doc", + "id": "database/jobs", + "label": "Jobs" + }, + { + "type": "doc", + "id": "database/system-tables", + "label": "System Tables" + }, + { + "type": "doc", + "id": "database/compaction", + "label": "Compaction" + }, + { + "type": "doc", + "id": "database/transaction", + "label": "Transaction Logging" + } + ] + }, { "type": "category", "label": "Legacy", From 052ea8ac4707c5d4e5da87cbd04e14c202035a3a Mon Sep 17 00:00:00 2001 From: Ethan Arrowood Date: Wed, 18 Mar 2026 16:57:29 -0600 Subject: [PATCH 2/5] fixup! docs: migrate Database section to v4 consolidated reference --- .../version-v4/database/compaction.md | 8 +- .../version-v4/database/data-loader.md | 48 ++--- .../version-v4/database/jobs.md | 144 +++++++------- .../version-v4/database/overview.md | 1 + .../version-v4/database/schema.md | 171 ++++++++-------- .../version-v4/database/storage-algorithm.md | 1 + .../version-v4/database/system-tables.md | 130 ++++++------ .../version-v4/database/transaction.md | 186 +++++++++--------- 8 files changed, 348 insertions(+), 341 deletions(-) diff --git a/reference_versioned_docs/version-v4/database/compaction.md b/reference_versioned_docs/version-v4/database/compaction.md index c8021101..4557fa4b 100644 --- a/reference_versioned_docs/version-v4/database/compaction.md +++ b/reference_versioned_docs/version-v4/database/compaction.md @@ -57,10 +57,10 @@ STORAGE_COMPACTONSTART=true STORAGE_COMPACTONSTARTKEEPBACKUP=true harperdb ### Options -| Option | Type | Default | Description | -|---|---|---|---| -| `compactOnStart` | Boolean | `false` | Compact all databases at startup. Automatically reset to `false` after running. | -| `compactOnStartKeepBackup` | Boolean | `false` | Retain the backup copy created during compact on start | +| Option | Type | Default | Description | +| -------------------------- | ------- | ------- | ------------------------------------------------------------------------------- | +| `compactOnStart` | Boolean | `false` | Compact all databases at startup. Automatically reset to `false` after running. | +| `compactOnStartKeepBackup` | Boolean | `false` | Retain the backup copy created during compact on start | > **Note:** `compactOnStart` is automatically set back to `false` after it runs, so compaction only happens on the next start if you explicitly re-enable it. diff --git a/reference_versioned_docs/version-v4/database/data-loader.md b/reference_versioned_docs/version-v4/database/data-loader.md index c57495a9..1b3d1d39 100644 --- a/reference_versioned_docs/version-v4/database/data-loader.md +++ b/reference_versioned_docs/version-v4/database/data-loader.md @@ -30,22 +30,22 @@ Each data file loads records into a single table. The file specifies the target ```json { - "database": "myapp", - "table": "users", - "records": [ - { - "id": 1, - "username": "admin", - "email": "admin@example.com", - "role": "administrator" - }, - { - "id": 2, - "username": "user1", - "email": "user1@example.com", - "role": "standard" - } - ] + "database": "myapp", + "table": "users", + "records": [ + { + "id": 1, + "username": "admin", + "email": "admin@example.com", + "role": "administrator" + }, + { + "id": 2, + "username": "user1", + "email": "user1@example.com", + "role": "standard" + } + ] } ``` @@ -96,14 +96,14 @@ When Harper starts a component with `dataLoader` configured: ### Change Detection -| Scenario | Behavior | -|---|---| -| New record | Inserted; content hash stored | -| Unchanged record | Skipped (no writes) | -| Changed data file | Updated via `patch`, preserving any extra fields | -| Record created by user (not data loader) | Never overwritten | -| Record modified by user after load | Preserved, not overwritten | -| Extra fields added by user to a data-loaded record | Preserved during updates | +| Scenario | Behavior | +| -------------------------------------------------- | ------------------------------------------------ | +| New record | Inserted; content hash stored | +| Unchanged record | Skipped (no writes) | +| Changed data file | Updated via `patch`, preserving any extra fields | +| Record created by user (not data loader) | Never overwritten | +| Record modified by user after load | Preserved, not overwritten | +| Extra fields added by user to a data-loaded record | Preserved during updates | This design makes data files safe to redeploy without losing manual modifications. diff --git a/reference_versioned_docs/version-v4/database/jobs.md b/reference_versioned_docs/version-v4/database/jobs.md index 90612406..d3417221 100644 --- a/reference_versioned_docs/version-v4/database/jobs.md +++ b/reference_versioned_docs/version-v4/database/jobs.md @@ -31,11 +31,11 @@ Ingests CSV data provided directly in the request body. ```json { - "operation": "csv_data_load", - "database": "dev", - "action": "insert", - "table": "breed", - "data": "id,name,country\n1,Labrador,Canada\n2,Poodle,France\n" + "operation": "csv_data_load", + "database": "dev", + "action": "insert", + "table": "breed", + "data": "id,name,country\n1,Labrador,Canada\n2,Poodle,France\n" } ``` @@ -43,8 +43,8 @@ Response: ```json { - "message": "Starting job with id 2fe25039-566e-4670-8bb3-2db3d4e07e69", - "job_id": "2fe25039-566e-4670-8bb3-2db3d4e07e69" + "message": "Starting job with id 2fe25039-566e-4670-8bb3-2db3d4e07e69", + "job_id": "2fe25039-566e-4670-8bb3-2db3d4e07e69" } ``` @@ -64,11 +64,11 @@ Ingests CSV data from a file on the server's local filesystem. ```json { - "operation": "csv_file_load", - "action": "insert", - "database": "dev", - "table": "breed", - "file_path": "/home/user/imports/breeds.csv" + "operation": "csv_file_load", + "action": "insert", + "database": "dev", + "table": "breed", + "file_path": "/home/user/imports/breeds.csv" } ``` @@ -86,11 +86,11 @@ Ingests CSV data from a URL. ```json { - "operation": "csv_url_load", - "action": "insert", - "database": "dev", - "table": "breed", - "csv_url": "https://s3.amazonaws.com/mydata/breeds.csv" + "operation": "csv_url_load", + "action": "insert", + "database": "dev", + "table": "breed", + "csv_url": "https://s3.amazonaws.com/mydata/breeds.csv" } ``` @@ -113,17 +113,17 @@ Imports CSV or JSON files from an AWS S3 bucket. ```json { - "operation": "import_from_s3", - "action": "insert", - "database": "dev", - "table": "dog", - "s3": { - "aws_access_key_id": "YOUR_KEY", - "aws_secret_access_key": "YOUR_SECRET_KEY", - "bucket": "BUCKET_NAME", - "key": "dogs.json", - "region": "us-east-1" - } + "operation": "import_from_s3", + "action": "insert", + "database": "dev", + "table": "dog", + "s3": { + "aws_access_key_id": "YOUR_KEY", + "aws_secret_access_key": "YOUR_SECRET_KEY", + "bucket": "BUCKET_NAME", + "key": "dogs.json", + "region": "us-east-1" + } } ``` @@ -144,13 +144,13 @@ Changed in: v4.3.0 — `search_by_conditions` added as a supported search operat ```json { - "operation": "export_local", - "format": "json", - "path": "/data/exports/", - "search_operation": { - "operation": "sql", - "sql": "SELECT * FROM dev.breed" - } + "operation": "export_local", + "format": "json", + "path": "/data/exports/", + "search_operation": { + "operation": "sql", + "sql": "SELECT * FROM dev.breed" + } } ``` @@ -169,19 +169,19 @@ Changed in: v4.3.0 — `search_by_conditions` added as a supported search operat ```json { - "operation": "export_to_s3", - "format": "json", - "s3": { - "aws_access_key_id": "YOUR_KEY", - "aws_secret_access_key": "YOUR_SECRET_KEY", - "bucket": "BUCKET_NAME", - "key": "exports/dogs.json", - "region": "us-east-1" - }, - "search_operation": { - "operation": "sql", - "sql": "SELECT * FROM dev.dog" - } + "operation": "export_to_s3", + "format": "json", + "s3": { + "aws_access_key_id": "YOUR_KEY", + "aws_secret_access_key": "YOUR_SECRET_KEY", + "bucket": "BUCKET_NAME", + "key": "exports/dogs.json", + "region": "us-east-1" + }, + "search_operation": { + "operation": "sql", + "sql": "SELECT * FROM dev.dog" + } } ``` @@ -200,10 +200,10 @@ _Restricted to `super_user` roles._ ```json { - "operation": "delete_records_before", - "date": "2024-01-01T00:00:00.000Z", - "schema": "dev", - "table": "breed" + "operation": "delete_records_before", + "date": "2024-01-01T00:00:00.000Z", + "schema": "dev", + "table": "breed" } ``` @@ -218,8 +218,8 @@ Returns status, metrics, and messages for a specific job by ID. ```json { - "operation": "get_job", - "id": "4a982782-929a-4507-8794-26dae1132def" + "operation": "get_job", + "id": "4a982782-929a-4507-8794-26dae1132def" } ``` @@ -227,21 +227,21 @@ Response: ```json [ - { - "__createdtime__": 1611615798782, - "__updatedtime__": 1611615801207, - "created_datetime": 1611615798774, - "end_datetime": 1611615801206, - "id": "4a982782-929a-4507-8794-26dae1132def", - "job_body": null, - "message": "successfully loaded 350 of 350 records", - "start_datetime": 1611615798805, - "status": "COMPLETE", - "type": "csv_url_load", - "user": "HDB_ADMIN", - "start_datetime_converted": "2021-01-25T23:03:18.805Z", - "end_datetime_converted": "2021-01-25T23:03:21.206Z" - } + { + "__createdtime__": 1611615798782, + "__updatedtime__": 1611615801207, + "created_datetime": 1611615798774, + "end_datetime": 1611615801206, + "id": "4a982782-929a-4507-8794-26dae1132def", + "job_body": null, + "message": "successfully loaded 350 of 350 records", + "start_datetime": 1611615798805, + "status": "COMPLETE", + "type": "csv_url_load", + "user": "HDB_ADMIN", + "start_datetime_converted": "2021-01-25T23:03:18.805Z", + "end_datetime_converted": "2021-01-25T23:03:21.206Z" + } ] ``` @@ -259,9 +259,9 @@ _Restricted to `super_user` roles._ ```json { - "operation": "search_jobs_by_start_date", - "from_date": "2024-01-01T00:00:00.000+0000", - "to_date": "2024-01-02T00:00:00.000+0000" + "operation": "search_jobs_by_start_date", + "from_date": "2024-01-01T00:00:00.000+0000", + "to_date": "2024-01-02T00:00:00.000+0000" } ``` diff --git a/reference_versioned_docs/version-v4/database/overview.md b/reference_versioned_docs/version-v4/database/overview.md index 93bb6c38..93da9218 100644 --- a/reference_versioned_docs/version-v4/database/overview.md +++ b/reference_versioned_docs/version-v4/database/overview.md @@ -88,6 +88,7 @@ Tables group records with a common data pattern. A table must have: - **Primary key** — the unique identifier for each record (also referred to as `hash_attribute` in the Operations API) Primary keys must be unique. If a primary key is not provided on insert, Harper auto-generates one: + - A **UUID string** for primary keys typed as `String` or `ID` - An **auto-incrementing integer** for primary keys typed as `Int`, `Long`, or `Any` diff --git a/reference_versioned_docs/version-v4/database/schema.md b/reference_versioned_docs/version-v4/database/schema.md index 30b6f872..8bb07bb0 100644 --- a/reference_versioned_docs/version-v4/database/schema.md +++ b/reference_versioned_docs/version-v4/database/schema.md @@ -34,15 +34,15 @@ A minimal example: ```graphql type Dog @table { - id: ID @primaryKey - name: String - breed: String - age: Int + id: ID @primaryKey + name: String + breed: String + age: Int } type Breed @table { - id: ID @primaryKey - name: String @indexed + id: ID @primaryKey + name: String @indexed } ``` @@ -65,18 +65,18 @@ Marks a GraphQL type as a Harper database table. The type name becomes the table ```graphql type MyTable @table { - id: ID @primaryKey + id: ID @primaryKey } ``` Optional arguments: -| Argument | Type | Default | Description | -|---|---|---|---| -| `table` | `String` | type name | Override the table name | -| `database` | `String` | `"data"` | Database to place the table in | -| `expiration` | `Int` | — | Auto-expire records after this many seconds (useful for caching tables) | -| `audit` | `Boolean` | config default | Enable audit log for this table | +| Argument | Type | Default | Description | +| ------------ | --------- | -------------- | ----------------------------------------------------------------------- | +| `table` | `String` | type name | Override the table name | +| `database` | `String` | `"data"` | Database to place the table in | +| `expiration` | `Int` | — | Auto-expire records after this many seconds (useful for caching tables) | +| `audit` | `Boolean` | config default | Enable audit log for this table | **Database naming:** The default `data` database is a good choice for tables that won't be reused elsewhere. Components designed for reuse should specify a unique database name (e.g., `"my-component-data"`) to avoid naming collisions with other components. @@ -86,7 +86,7 @@ Exposes the table as an externally accessible resource endpoint, available via R ```graphql type MyTable @table @export(name: "my-table") { - id: ID @primaryKey + id: ID @primaryKey } ``` @@ -102,12 +102,13 @@ Designates the attribute as the table's primary key. Primary keys must be unique ```graphql type Product @table { - id: Long @primaryKey - name: String + id: Long @primaryKey + name: String } ``` If no primary key is provided on insert, Harper auto-generates one: + - **UUID string** — when type is `String` or `ID` - **Auto-incrementing integer** — when type is `Int`, `Long`, or `Any` @@ -123,9 +124,9 @@ Creates a secondary index on the attribute for fast querying. Required for filte ```graphql type Product @table { - id: ID @primaryKey - category: String @indexed - price: Float @indexed + id: ID @primaryKey + category: String @indexed + price: Float @indexed } ``` @@ -141,8 +142,8 @@ Automatically assigns a creation timestamp (Unix epoch milliseconds) to the attr ```graphql type Event @table { - id: ID @primaryKey - createdAt: Long @createdTime + id: ID @primaryKey + createdAt: Long @createdTime } ``` @@ -152,8 +153,8 @@ Automatically assigns a timestamp (Unix epoch milliseconds) each time the record ```graphql type Event @table { - id: ID @primaryKey - updatedAt: Long @updatedTime + id: ID @primaryKey + updatedAt: Long @updatedTime } ``` @@ -163,8 +164,8 @@ Prevents records from including any properties beyond those explicitly declared ```graphql type StrictRecord @table @sealed { - id: ID @primaryKey - name: String + id: ID @primaryKey + name: String } ``` @@ -180,14 +181,14 @@ The foreign key is in this table, referencing the primary key of the target tabl ```graphql type Product @table @export { - id: ID @primaryKey - brandId: ID @indexed # foreign key - brand: Brand @relationship(from: brandId) # many-to-one + id: ID @primaryKey + brandId: ID @indexed # foreign key + brand: Brand @relationship(from: brandId) # many-to-one } type Brand @table @export { - id: ID @primaryKey - name: String @indexed + id: ID @primaryKey + name: String @indexed } ``` @@ -201,9 +202,9 @@ If the foreign key is an array, this establishes a many-to-many relationship: ```graphql type Product @table @export { - id: ID @primaryKey - featureIds: [ID] @indexed - features: [Feature] @relationship(from: featureIds) + id: ID @primaryKey + featureIds: [ID] @indexed + features: [Feature] @relationship(from: featureIds) } ``` @@ -213,9 +214,9 @@ The foreign key is in the target table, referencing the primary key of this tabl ```graphql type Brand @table @export { - id: ID @primaryKey - name: String @indexed - products: [Product] @relationship(to: brandId) # one-to-many + id: ID @primaryKey + name: String @indexed + products: [Product] @relationship(to: brandId) # one-to-many } ``` @@ -231,10 +232,10 @@ The `@computed` directive marks a field as derived from other fields at query ti ```graphql type Product @table { - id: ID @primaryKey - price: Float - taxRate: Float - totalPrice: Float @computed(from: "price + (price * taxRate)") + id: ID @primaryKey + price: Float + taxRate: Float + totalPrice: Float @computed(from: "price + (price * taxRate)") } ``` @@ -244,14 +245,14 @@ Computed properties can also be defined in JavaScript for complex logic: ```graphql type Product @table { - id: ID @primaryKey - totalPrice: Float @computed + id: ID @primaryKey + totalPrice: Float @computed } ``` ```javascript tables.Product.setComputedAttribute('totalPrice', (record) => { - return record.price + record.price * record.taxRate; + return record.price + record.price * record.taxRate; }); ``` @@ -273,8 +274,8 @@ When using a JavaScript function for an indexed computed property, use the `vers ```graphql type Product @table { - id: ID @primaryKey - totalPrice: Float @computed(version: 1) @indexed + id: ID @primaryKey + totalPrice: Float @computed(version: 1) @indexed } ``` @@ -288,8 +289,8 @@ Use `@indexed(type: "HNSW")` to create a vector index using the Hierarchical Nav ```graphql type Document @table { - id: Long @primaryKey - textEmbeddings: [Float] @indexed(type: "HNSW") + id: Long @primaryKey + textEmbeddings: [Float] @indexed(type: "HNSW") } ``` @@ -297,8 +298,8 @@ Query by nearest neighbors using the `sort` parameter: ```javascript let results = Document.search({ - sort: { attribute: 'textEmbeddings', target: searchVector }, - limit: 5, + sort: { attribute: 'textEmbeddings', target: searchVector }, + limit: 5, }); ``` @@ -306,29 +307,29 @@ HNSW can be combined with filter conditions: ```javascript let results = Document.search({ - conditions: [{ attribute: 'price', comparator: 'lt', value: 50 }], - sort: { attribute: 'textEmbeddings', target: searchVector }, - limit: 5, + conditions: [{ attribute: 'price', comparator: 'lt', value: 50 }], + sort: { attribute: 'textEmbeddings', target: searchVector }, + limit: 5, }); ``` ### HNSW Parameters -| Parameter | Default | Description | -|---|---|---| -| `distance` | `"cosine"` | Distance function: `"euclidean"` or `"cosine"` (negative cosine similarity) | -| `efConstruction` | `100` | Max nodes explored during index construction. Higher = better recall, lower = better performance | -| `M` | `16` | Preferred connections per graph layer. Higher = more space, better recall for high-dimensional data | -| `optimizeRouting` | `0.5` | Heuristic aggressiveness for omitting redundant connections (0 = off, 1 = most aggressive) | -| `mL` | computed from `M` | Normalization factor for level generation | -| `efSearchConstruction` | `50` | Max nodes explored during search | +| Parameter | Default | Description | +| ---------------------- | ----------------- | --------------------------------------------------------------------------------------------------- | +| `distance` | `"cosine"` | Distance function: `"euclidean"` or `"cosine"` (negative cosine similarity) | +| `efConstruction` | `100` | Max nodes explored during index construction. Higher = better recall, lower = better performance | +| `M` | `16` | Preferred connections per graph layer. Higher = more space, better recall for high-dimensional data | +| `optimizeRouting` | `0.5` | Heuristic aggressiveness for omitting redundant connections (0 = off, 1 = most aggressive) | +| `mL` | computed from `M` | Normalization factor for level generation | +| `efSearchConstruction` | `50` | Max nodes explored during search | Example with custom parameters: ```graphql type Document @table { - id: Long @primaryKey - textEmbeddings: [Float] @indexed(type: "HNSW", distance: "euclidean", optimizeRouting: 0, efSearchConstruction: 100) + id: Long @primaryKey + textEmbeddings: [Float] @indexed(type: "HNSW", distance: "euclidean", optimizeRouting: 0, efSearchConstruction: 100) } ``` @@ -336,19 +337,19 @@ type Document @table { Harper supports the following field types: -| Type | Description | -|---|---| -| `String` | Unicode text, UTF-8 encoded | -| `Int` | 32-bit signed integer (−2,147,483,648 to 2,147,483,647) | -| `Long` | 54-bit signed integer (−9,007,199,254,740,992 to 9,007,199,254,740,992) | -| `Float` | 64-bit double precision floating point | -| `BigInt` | Integer up to ~300 digits. Note: distinct JavaScript type; handle appropriately in custom code | -| `Boolean` | `true` or `false` | -| `ID` | String; indicates a non-human-readable identifier | -| `Any` | Any primitive, object, or array | -| `Date` | JavaScript `Date` object | -| `Bytes` | Binary data as `Buffer` or `Uint8Array` | -| `Blob` | Binary large object; designed for streaming content >20KB | +| Type | Description | +| --------- | ---------------------------------------------------------------------------------------------- | +| `String` | Unicode text, UTF-8 encoded | +| `Int` | 32-bit signed integer (−2,147,483,648 to 2,147,483,647) | +| `Long` | 54-bit signed integer (−9,007,199,254,740,992 to 9,007,199,254,740,992) | +| `Float` | 64-bit double precision floating point | +| `BigInt` | Integer up to ~300 digits. Note: distinct JavaScript type; handle appropriately in custom code | +| `Boolean` | `true` or `false` | +| `ID` | String; indicates a non-human-readable identifier | +| `Any` | Any primitive, object, or array | +| `Date` | JavaScript `Date` object | +| `Bytes` | Binary data as `Buffer` or `Uint8Array` | +| `Blob` | Binary large object; designed for streaming content >20KB | Added in for `BigInt`: v4.3.0 @@ -370,8 +371,8 @@ Declare a blob field: ```graphql type MyTable @table { - id: Any! @primaryKey - data: Blob + id: Any! @primaryKey + data: Blob } ``` @@ -404,14 +405,14 @@ When returning a blob via REST, register an error handler to handle interrupted ```javascript export class MyEndpoint extends MyTable { - async get(target) { - const record = super.get(target); - let blob = record.data; - blob.on('error', () => { - MyTable.invalidate(target); - }); - return { status: 200, headers: {}, body: blob }; - } + async get(target) { + const record = super.get(target); + let blob = record.data; + blob.on('error', () => { + MyTable.invalidate(target); + }); + return { status: 200, headers: {}, body: blob }; + } } ``` diff --git a/reference_versioned_docs/version-v4/database/storage-algorithm.md b/reference_versioned_docs/version-v4/database/storage-algorithm.md index bf83da5d..8833247d 100644 --- a/reference_versioned_docs/version-v4/database/storage-algorithm.md +++ b/reference_versioned_docs/version-v4/database/storage-algorithm.md @@ -31,6 +31,7 @@ Changed in: v4.3.0 — Storage performance improvements including better free-sp All top-level attributes are automatically indexed immediately upon ingestion. For [dynamic schema tables](./overview.md#dynamic-vs-defined-schemas), Harper reflexively creates the attribute and its index as new data arrives. For [schema-defined tables](./schema.md), indexes are created for all attributes marked with `@indexed`. Indexes are type-agnostic, ordering values as follows: + 1. Booleans 2. Numbers (ordered numerically) 3. Strings (ordered lexically) diff --git a/reference_versioned_docs/version-v4/database/system-tables.md b/reference_versioned_docs/version-v4/database/system-tables.md index 3ac18748..6c47a461 100644 --- a/reference_versioned_docs/version-v4/database/system-tables.md +++ b/reference_versioned_docs/version-v4/database/system-tables.md @@ -26,14 +26,16 @@ Query with `search_by_conditions` (requires `superuser` permission): ```json { - "operation": "search_by_conditions", - "schema": "system", - "table": "hdb_raw_analytics", - "conditions": [{ - "search_attribute": "id", - "search_type": "between", - "search_value": [1688594000000, 1688594010000] - }] + "operation": "search_by_conditions", + "schema": "system", + "table": "hdb_raw_analytics", + "conditions": [ + { + "search_attribute": "id", + "search_type": "between", + "search_value": [1688594000000, 1688594010000] + } + ] } ``` @@ -41,38 +43,38 @@ A typical record: ```json { - "time": 1688594390708, - "period": 1000.8336279988289, - "metrics": [ - { - "metric": "bytes-sent", - "path": "search_by_conditions", - "type": "operation", - "median": 202, - "mean": 202, - "p95": 202, - "p90": 202, - "count": 1 - }, - { - "metric": "memory", - "threadId": 2, - "rss": 1492664320, - "heapTotal": 124596224, - "heapUsed": 119563120, - "external": 3469790, - "arrayBuffers": 798721 - }, - { - "metric": "utilization", - "idle": 138227.52767700003, - "active": 70.5066209952347, - "utilization": 0.0005098165086230495 - } - ], - "threadId": 2, - "totalBytesProcessed": 12182820, - "id": 1688594390708.6853 + "time": 1688594390708, + "period": 1000.8336279988289, + "metrics": [ + { + "metric": "bytes-sent", + "path": "search_by_conditions", + "type": "operation", + "median": 202, + "mean": 202, + "p95": 202, + "p90": 202, + "count": 1 + }, + { + "metric": "memory", + "threadId": 2, + "rss": 1492664320, + "heapTotal": 124596224, + "heapUsed": 119563120, + "external": 3469790, + "arrayBuffers": 798721 + }, + { + "metric": "utilization", + "idle": 138227.52767700003, + "active": 70.5066209952347, + "utilization": 0.0005098165086230495 + } + ], + "threadId": 2, + "totalBytesProcessed": 12182820, + "id": 1688594390708.6853 } ``` @@ -82,14 +84,16 @@ Stores per-minute aggregate analytics. Once per minute, Harper aggregates all pe ```json { - "operation": "search_by_conditions", - "schema": "system", - "table": "hdb_analytics", - "conditions": [{ - "search_attribute": "id", - "search_type": "between", - "search_value": [1688194100000, 1688594990000] - }] + "operation": "search_by_conditions", + "schema": "system", + "table": "hdb_analytics", + "conditions": [ + { + "search_attribute": "id", + "search_type": "between", + "search_value": [1688194100000, 1688594990000] + } + ] } ``` @@ -97,17 +101,17 @@ A typical aggregate record: ```json { - "period": 60000, - "metric": "bytes-sent", - "method": "connack", - "type": "mqtt", - "median": 4, - "mean": 4, - "p95": 4, - "p90": 4, - "count": 1, - "id": 1688589569646, - "time": 1688589569646 + "period": 60000, + "metric": "bytes-sent", + "method": "connack", + "type": "mqtt", + "median": 4, + "mean": 4, + "p95": 4, + "p90": 4, + "count": 1, + "id": 1688589569646, + "time": 1688589569646 } ``` @@ -133,10 +137,10 @@ Can be queried to inspect the current replication topology: ```json { - "operation": "search_by_hash", - "schema": "system", - "table": "hdb_nodes", - "hash_values": ["node-id"] + "operation": "search_by_hash", + "schema": "system", + "table": "hdb_nodes", + "hash_values": ["node-id"] } ``` diff --git a/reference_versioned_docs/version-v4/database/transaction.md b/reference_versioned_docs/version-v4/database/transaction.md index a0537ced..78cadd9a 100644 --- a/reference_versioned_docs/version-v4/database/transaction.md +++ b/reference_versioned_docs/version-v4/database/transaction.md @@ -13,15 +13,15 @@ title: Transaction Logging Harper provides two complementary mechanisms for recording a history of data changes on a table: the **audit log** and the **transaction log**. Both are available at the table level and serve different use cases. -| Feature | Audit Log | Transaction Log | -|---|---|---| -| Storage | Standard Harper table (per-table) | Clustering streams (per-table) | -| Requires clustering | No | Yes | -| Available since | v4.1.0 | v4.1.0 | -| Stores original record values | Yes | No | -| Query by username | Yes | No | -| Query by primary key | Yes | No | -| Used for real-time messaging | Yes (required) | No | +| Feature | Audit Log | Transaction Log | +| ----------------------------- | --------------------------------- | ------------------------------ | +| Storage | Standard Harper table (per-table) | Clustering streams (per-table) | +| Requires clustering | No | Yes | +| Available since | v4.1.0 | v4.1.0 | +| Stores original record values | Yes | No | +| Query by username | Yes | No | +| Query by primary key | Yes | No | +| Used for real-time messaging | Yes (required) | No | ## Audit Log @@ -43,31 +43,31 @@ Queries the audit log for a specific table. Supports filtering by timestamp, use ```json { - "operation": "read_audit_log", - "schema": "dev", - "table": "dog", - "search_type": "timestamp", - "search_values": [1660585740558] + "operation": "read_audit_log", + "schema": "dev", + "table": "dog", + "search_type": "timestamp", + "search_values": [1660585740558] } ``` Timestamp behavior: -| `search_values` | Result | -|---|---| -| `[]` | All records for the table | -| `[timestamp]` | All records after the provided timestamp | -| `[from, to]` | Records between the two timestamps | +| `search_values` | Result | +| --------------- | ---------------------------------------- | +| `[]` | All records for the table | +| `[timestamp]` | All records after the provided timestamp | +| `[from, to]` | Records between the two timestamps | **By username:** ```json { - "operation": "read_audit_log", - "schema": "dev", - "table": "dog", - "search_type": "username", - "search_values": ["admin"] + "operation": "read_audit_log", + "schema": "dev", + "table": "dog", + "search_type": "username", + "search_values": ["admin"] } ``` @@ -75,11 +75,11 @@ Timestamp behavior: ```json { - "operation": "read_audit_log", - "schema": "dev", - "table": "dog", - "search_type": "hash_value", - "search_values": [318] + "operation": "read_audit_log", + "schema": "dev", + "table": "dog", + "search_type": "hash_value", + "search_values": [318] } ``` @@ -87,28 +87,28 @@ Timestamp behavior: ```json { - "operation": "update", - "user_name": "HDB_ADMIN", - "timestamp": 1607035559122.277, - "hash_values": [1, 2], - "records": [ - { - "id": 1, - "breed": "Muttzilla", - "age": 6, - "__updatedtime__": 1607035559122 - } - ], - "original_records": [ - { - "__createdtime__": 1607035556801, - "__updatedtime__": 1607035556801, - "age": 5, - "breed": "Mutt", - "id": 1, - "name": "Harper" - } - ] + "operation": "update", + "user_name": "HDB_ADMIN", + "timestamp": 1607035559122.277, + "hash_values": [1, 2], + "records": [ + { + "id": 1, + "breed": "Muttzilla", + "age": 6, + "__updatedtime__": 1607035559122 + } + ], + "original_records": [ + { + "__createdtime__": 1607035556801, + "__updatedtime__": 1607035556801, + "age": 5, + "breed": "Mutt", + "id": 1, + "name": "Harper" + } + ] } ``` @@ -124,10 +124,10 @@ Changed in: v4.5.0 — Storage reclamation: Harper automatically evicts older au ```json { - "operation": "delete_audit_logs_before", - "schema": "dev", - "table": "dog", - "timestamp": 1598290282817 + "operation": "delete_audit_logs_before", + "schema": "dev", + "table": "dog", + "timestamp": 1598290282817 } ``` @@ -151,12 +151,12 @@ Returns a prescribed set of transaction records based on a time range and option ```json { - "operation": "read_transaction_log", - "schema": "dev", - "table": "dog", - "from": 1598290235769, - "to": 1660249020865, - "limit": 2 + "operation": "read_transaction_log", + "schema": "dev", + "table": "dog", + "from": 1598290235769, + "to": 1660249020865, + "limit": 2 } ``` @@ -164,32 +164,32 @@ Returns a prescribed set of transaction records based on a time range and option ```json [ - { - "operation": "insert", - "user": "admin", - "timestamp": 1660165619736, - "records": [ - { - "id": 1, - "dog_name": "Penny", - "owner_name": "Kyle", - "__updatedtime__": 1660165619688, - "__createdtime__": 1660165619688 - } - ] - }, - { - "operation": "update", - "user": "admin", - "timestamp": 1660165620040, - "records": [ - { - "id": 1, - "dog_name": "Penny B", - "__updatedtime__": 1660165620036 - } - ] - } + { + "operation": "insert", + "user": "admin", + "timestamp": 1660165619736, + "records": [ + { + "id": 1, + "dog_name": "Penny", + "owner_name": "Kyle", + "__updatedtime__": 1660165619688, + "__createdtime__": 1660165619688 + } + ] + }, + { + "operation": "update", + "user": "admin", + "timestamp": 1660165620040, + "records": [ + { + "id": 1, + "dog_name": "Penny B", + "__updatedtime__": 1660165620036 + } + ] + } ] ``` @@ -201,10 +201,10 @@ Deletes transaction log entries older than the specified timestamp. ```json { - "operation": "delete_transaction_logs_before", - "schema": "dev", - "table": "dog", - "timestamp": 1598290282817 + "operation": "delete_transaction_logs_before", + "schema": "dev", + "table": "dog", + "timestamp": 1598290282817 } ``` @@ -216,8 +216,8 @@ You can enable or disable the audit log for individual tables using the `@table` ```graphql type Dog @table(audit: true) { - id: ID @primaryKey - name: String + id: ID @primaryKey + name: String } ``` From 18384d469884ceb89fb0374de0c8afb06fb7c6cc Mon Sep 17 00:00:00 2001 From: Ethan Arrowood Date: Thu, 19 Mar 2026 13:28:00 -0600 Subject: [PATCH 3/5] fixup! fixup! docs: migrate Database section to v4 consolidated reference --- .../version-v4/database/schema.md | 32 +++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/reference_versioned_docs/version-v4/database/schema.md b/reference_versioned_docs/version-v4/database/schema.md index 8bb07bb0..6ac3e9b8 100644 --- a/reference_versioned_docs/version-v4/database/schema.md +++ b/reference_versioned_docs/version-v4/database/schema.md @@ -78,6 +78,38 @@ Optional arguments: | `expiration` | `Int` | — | Auto-expire records after this many seconds (useful for caching tables) | | `audit` | `Boolean` | config default | Enable audit log for this table | +**Examples:** + +```graphql +# Override table name +type Product @table(table: "products") { + id: ID @primaryKey +} + +# Place in a specific database +type Order @table(database: "commerce") { + id: ID @primaryKey +} + +# Auto-expire records after 1 hour (e.g., a session cache) +type Session @table(expiration: 3600) { + id: ID @primaryKey + userId: String +} + +# Enable audit log for this table explicitly +type AuditedRecord @table(audit: true) { + id: ID @primaryKey + value: String +} + +# Combine multiple arguments +type Event @table(database: "analytics", expiration: 86400) { + id: Long @primaryKey + name: String @indexed +} +``` + **Database naming:** The default `data` database is a good choice for tables that won't be reused elsewhere. Components designed for reuse should specify a unique database name (e.g., `"my-component-data"`) to avoid naming collisions with other components. ### `@export` From 827aa31e9d0490565bfd22bade8bddb5f94b7356 Mon Sep 17 00:00:00 2001 From: Ethan Arrowood Date: Thu, 19 Mar 2026 13:31:27 -0600 Subject: [PATCH 4/5] fixup! fixup! docs: migrate Database section to v4 consolidated reference --- .../version-v4/database/schema.md | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/reference_versioned_docs/version-v4/database/schema.md b/reference_versioned_docs/version-v4/database/schema.md index 6ac3e9b8..046a35cc 100644 --- a/reference_versioned_docs/version-v4/database/schema.md +++ b/reference_versioned_docs/version-v4/database/schema.md @@ -124,6 +124,17 @@ type MyTable @table @export(name: "my-table") { The optional `name` parameter specifies the URL path segment (e.g., `/my-table/`). Without `name`, the type name is used. +### `@sealed` + +Prevents records from including any properties beyond those explicitly declared in the type. By default, Harper allows records to have additional properties. + +```graphql +type StrictRecord @table @sealed { + id: ID @primaryKey + name: String +} +``` + ## Field Directives Field directives apply to individual attributes in a type definition. @@ -190,17 +201,6 @@ type Event @table { } ``` -### `@sealed` - -Prevents records from including any properties beyond those explicitly declared in the type. By default, Harper allows records to have additional properties. - -```graphql -type StrictRecord @table @sealed { - id: ID @primaryKey - name: String -} -``` - ## Relationships Added in: v4.3.0 From b21f50f06d16eeab2786e1c26ff2785a7c0bf331 Mon Sep 17 00:00:00 2001 From: Ethan Arrowood Date: Thu, 19 Mar 2026 15:12:24 -0600 Subject: [PATCH 5/5] manual review edits --- .../version-v4/cli/commands.md | 4 +- .../version-v4/cli/operations-api-commands.md | 2 +- .../version-v4/database/compaction.md | 4 +- .../version-v4/database/data-loader.md | 85 ++++++++++++++++--- .../version-v4/database/jobs.md | 2 +- .../version-v4/database/overview.md | 2 +- .../version-v4/database/schema.md | 70 ++++++++------- .../version-v4/database/storage-algorithm.md | 51 +++++++++-- .../version-v4/database/system-tables.md | 4 +- .../version-v4/database/transaction.md | 8 +- .../version-v4/logging/configuration.md | 4 +- .../version-v4/logging/operations.md | 4 +- .../version-v4/logging/overview.md | 4 +- .../version-v4/mqtt/overview.md | 4 +- .../version-v4/rest/overview.md | 4 +- .../version-v4/rest/querying.md | 4 +- 16 files changed, 181 insertions(+), 75 deletions(-) diff --git a/reference_versioned_docs/version-v4/cli/commands.md b/reference_versioned_docs/version-v4/cli/commands.md index 2c2e9112..7143f0a2 100644 --- a/reference_versioned_docs/version-v4/cli/commands.md +++ b/reference_versioned_docs/version-v4/cli/commands.md @@ -230,7 +230,7 @@ This copies the default `data` database to a new location with compaction applie - Creating compacted backups - Reclaiming free space -See also: [Database Compaction](TODO:reference_versioned_docs/version-v4/database/compaction.md 'Database compaction reference') for more information. +See also: [Database Compaction](../database/compaction.md) for more information. #### How Backups Work @@ -266,4 +266,4 @@ The CLI supports executing commands on remote Harper instances. For details, see - [Operations API Commands](./operations-api-commands.md) - Operations available through CLI - [CLI Authentication](./authentication.md) - Authentication mechanisms - [Configuration](TODO:reference_versioned_docs/version-v4/configuration/overview.md 'Configuration') - Configuration parameters for installation -- [Database Compaction](TODO:reference_versioned_docs/version-v4/database/compaction.md 'Compaction') - More on database compaction +- [Database Compaction](../database/compaction.md) - More on database compaction diff --git a/reference_versioned_docs/version-v4/cli/operations-api-commands.md b/reference_versioned_docs/version-v4/cli/operations-api-commands.md index 9fa7c540..98bf0446 100644 --- a/reference_versioned_docs/version-v4/cli/operations-api-commands.md +++ b/reference_versioned_docs/version-v4/cli/operations-api-commands.md @@ -153,7 +153,7 @@ last_updated_record: 1724483231970.9949 ``` :::tip -For detailed information on database and table structures, see the [Database Reference](TODO:reference_versioned_docs/version-v4/database/overview.md 'Database reference documentation'). +For detailed information on database and table structures, see the [Database Reference](../database/overview.md). ::: ### Data Operations diff --git a/reference_versioned_docs/version-v4/database/compaction.md b/reference_versioned_docs/version-v4/database/compaction.md index 4557fa4b..152a9ab4 100644 --- a/reference_versioned_docs/version-v4/database/compaction.md +++ b/reference_versioned_docs/version-v4/database/compaction.md @@ -21,7 +21,7 @@ Creates a compacted copy of a database file. The original database is left uncha > **Recommendation:** Stop Harper before performing copy compaction to prevent any record loss during the copy operation. -Run using the [CLI](TODO:reference_versioned_docs/version-v4/cli/commands.md): +Run using the [CLI](../cli/commands.md): ```bash harperdb copy-db @@ -67,5 +67,5 @@ STORAGE_COMPACTONSTART=true STORAGE_COMPACTONSTARTKEEPBACKUP=true harperdb ## Related Documentation - [Storage Algorithm](./storage-algorithm.md) — How Harper stores data using LMDB -- [CLI Commands](TODO:reference_versioned_docs/version-v4/cli/commands.md) — `copy-db` CLI command reference +- [CLI Commands](../cli/commands.md) — `copy-db` CLI command reference - [Configuration](TODO:reference_versioned_docs/version-v4/configuration/options.md 'storage section') — Full storage configuration options including compression settings diff --git a/reference_versioned_docs/version-v4/database/data-loader.md b/reference_versioned_docs/version-v4/database/data-loader.md index 1b3d1d39..69745753 100644 --- a/reference_versioned_docs/version-v4/database/data-loader.md +++ b/reference_versioned_docs/version-v4/database/data-loader.md @@ -88,11 +88,19 @@ dataLoader: ## Loading Behavior -When Harper starts a component with `dataLoader` configured: +The Data Loader runs on every full system start and every component deployment — this includes fresh installs, restarts of the Harper process, and redeployments of the component. It does **not** re-run on individual thread restarts within a running Harper process. -1. All specified data files are read +Because the Data Loader runs on every startup and deployment, change detection is central to how it works safely. On each run: + +1. All specified data files are read (JSON or YAML) 2. Each file is validated to reference a single table -3. Records are inserted or updated using content hash comparison (SHA-256 hashes stored in the `hdb_dataloader_hash` system table) +3. Records are inserted or updated based on content hash comparison: + - New records are inserted if they don't exist + - Existing records are updated only if the data file content has changed + - Records created outside the Data Loader (via Operations API, REST, etc.) are never overwritten + - Records modified by users after being loaded are preserved and not overwritten + - Extra fields added by users to data-loaded records are preserved during updates +4. SHA-256 content hashes are stored in the [`hdb_dataloader_hash`](./system-tables.md#hdb_dataloader_hash) system table to track which records have been loaded and detect changes ### Change Detection @@ -105,7 +113,7 @@ When Harper starts a component with `dataLoader` configured: | Record modified by user after load | Preserved, not overwritten | | Extra fields added by user to a data-loaded record | Preserved during updates | -This design makes data files safe to redeploy without losing manual modifications. +This design makes data files safe to redeploy repeatedly — across deployments, node scaling, and system restarts — without losing manual modifications or causing unnecessary writes. ## Best Practices @@ -113,27 +121,33 @@ This design makes data files safe to redeploy without losing manual modification **One table per file.** Each data file must target a single table. Organize files accordingly. -**Idempotent data.** Design files to be safe to load multiple times without creating conflicts. +**Idempotent data.** Design files to be safe to load multiple times without creating duplicate or conflicting records. **Version control.** Include data files in version control for consistency across deployments and environments. +**Environment-specific data.** Consider using different data files for different environments (development, staging, production) to avoid loading inappropriate records. + +**Validate before deploying.** Ensure data files are valid JSON or YAML and match your table schemas before deployment to catch type mismatches early. + **No sensitive data.** Do not include passwords, API keys, or secrets directly in data files. Use environment variables or secure configuration management instead. ## Example Component Structure +A common production use case is shipping reference data — lookup tables like countries and regions — as part of a component. The records are version-controlled alongside the code, consistent across every environment, and the data loader keeps them in sync on every deployment without touching any user-modified fields. + ``` my-component/ ├── config.yaml -├── data/ -│ ├── users.json -│ ├── roles.json -│ └── settings.json ├── schemas.graphql -└── roles.yaml +├── roles.yaml +└── data/ + ├── countries.json # ISO country codes — reference data, ships with component + └── regions.json # region/subdivision codes ``` +**`config.yaml`**: + ```yaml -# config.yaml graphqlSchema: files: 'schemas.graphql' @@ -146,6 +160,55 @@ dataLoader: rest: true ``` +**`schemas.graphql`**: + +```graphql +type Country @table(database: "myapp") @export { + id: ID @primaryKey # ISO 3166-1 alpha-2, e.g. "US" + name: String @indexed + region: String @indexed +} + +type Region @table(database: "myapp") @export { + id: ID @primaryKey # ISO 3166-2, e.g. "US-CA" + name: String @indexed + countryId: ID @indexed + country: Country @relationship(from: countryId) +} +``` + +**`data/countries.json`**: + +```json +{ + "database": "myapp", + "table": "Country", + "records": [ + { "id": "US", "name": "United States", "region": "Americas" }, + { "id": "GB", "name": "United Kingdom", "region": "Europe" }, + { "id": "DE", "name": "Germany", "region": "Europe" } + // ... all ~250 ISO countries + ] +} +``` + +**`data/regions.json`**: + +```json +{ + "database": "myapp", + "table": "Region", + "records": [ + { "id": "US-CA", "name": "California", "countryId": "US" }, + { "id": "US-NY", "name": "New York", "countryId": "US" }, + { "id": "GB-ENG", "name": "England", "countryId": "GB" } + // ... + ] +} +``` + +Because the data loader uses content hashing, adding new countries or correcting a name in the file will update only the changed records on the next deployment — existing records that haven't changed are skipped entirely. + ## Related Documentation - [Schema](./schema.md) — Defining table structure before loading data diff --git a/reference_versioned_docs/version-v4/database/jobs.md b/reference_versioned_docs/version-v4/database/jobs.md index d3417221..5931746c 100644 --- a/reference_versioned_docs/version-v4/database/jobs.md +++ b/reference_versioned_docs/version-v4/database/jobs.md @@ -17,7 +17,7 @@ Job status values: ## Bulk Operations -The following operations create jobs. All bulk operations are sent to the Operations API. +The following operations create jobs. All bulk operations are sent to the [Operations API](TODO:reference_versioned_docs/version-v4/operations-api/overview.md). ### CSV Data Load diff --git a/reference_versioned_docs/version-v4/database/overview.md b/reference_versioned_docs/version-v4/database/overview.md index 93da9218..d86f9744 100644 --- a/reference_versioned_docs/version-v4/database/overview.md +++ b/reference_versioned_docs/version-v4/database/overview.md @@ -118,7 +118,7 @@ For deeper coverage of each database feature, see the dedicated pages in this se ## Related Documentation -- [REST](TODO:reference_versioned_docs/version-v4/rest/overview.md) — HTTP interface built on top of the database resource system +- [REST](../rest/overview.md) — HTTP interface built on top of the database resource system - [Resources](TODO:reference_versioned_docs/version-v4/resources/overview.md) — Custom application logic extending database tables - [Operations API](TODO:reference_versioned_docs/version-v4/operations-api/overview.md) — Direct database management operations (create/drop databases and tables, insert/update/delete records) - [Configuration](TODO:reference_versioned_docs/version-v4/configuration/overview.md) — Storage configuration options (compression, blob paths, compaction) diff --git a/reference_versioned_docs/version-v4/database/schema.md b/reference_versioned_docs/version-v4/database/schema.md index 046a35cc..b8d74876 100644 --- a/reference_versioned_docs/version-v4/database/schema.md +++ b/reference_versioned_docs/version-v4/database/schema.md @@ -55,6 +55,8 @@ graphqlSchema: files: 'schema.graphql' ``` +Keep in mind that both plugins and applications can specify schemas. + ## Type Directives Type directives apply to the entire table type definition. @@ -83,30 +85,30 @@ Optional arguments: ```graphql # Override table name type Product @table(table: "products") { - id: ID @primaryKey + id: ID @primaryKey } # Place in a specific database type Order @table(database: "commerce") { - id: ID @primaryKey + id: ID @primaryKey } # Auto-expire records after 1 hour (e.g., a session cache) type Session @table(expiration: 3600) { - id: ID @primaryKey - userId: String + id: ID @primaryKey + userId: String } # Enable audit log for this table explicitly type AuditedRecord @table(audit: true) { - id: ID @primaryKey - value: String + id: ID @primaryKey + value: String } # Combine multiple arguments type Event @table(database: "analytics", expiration: 86400) { - id: Long @primaryKey - name: String @indexed + id: Long @primaryKey + name: String @indexed } ``` @@ -212,31 +214,32 @@ The `@relationship` directive defines how one table relates to another through a The foreign key is in this table, referencing the primary key of the target table. ```graphql -type Product @table @export { +type RealityShow @table @export { id: ID @primaryKey - brandId: ID @indexed # foreign key - brand: Brand @relationship(from: brandId) # many-to-one + networkId: ID @indexed # foreign key + network: Network @relationship(from: networkId) # many-to-one + title: String @indexed } -type Brand @table @export { +type Network @table @export { id: ID @primaryKey - name: String @indexed + name: String @indexed # e.g. "Bravo", "Peacock", "Netflix" } ``` -Query products by brand name: +Query shows by network name: ```http -GET /Product?brand.name=Microsoft +GET /RealityShow?network.name=Bravo ``` -If the foreign key is an array, this establishes a many-to-many relationship: +If the foreign key is an array, this establishes a many-to-many relationship (e.g., a show with multiple streaming homes): ```graphql -type Product @table @export { +type RealityShow @table @export { id: ID @primaryKey - featureIds: [ID] @indexed - features: [Feature] @relationship(from: featureIds) + networkIds: [ID] @indexed + networks: [Network] @relationship(from: networkIds) } ``` @@ -245,10 +248,11 @@ type Product @table @export { The foreign key is in the target table, referencing the primary key of this table. The result type must be an array. ```graphql -type Brand @table @export { +type Network @table @export { id: ID @primaryKey - name: String @indexed - products: [Product] @relationship(to: brandId) # one-to-many + name: String @indexed # e.g. "Bravo", "Peacock", "Netflix" + shows: [RealityShow] @relationship(to: networkId) # one-to-many + # shows like "Real Housewives of Atlanta", "The Traitors", "Vanderpump Rules" } ``` @@ -383,9 +387,9 @@ Harper supports the following field types: | `Bytes` | Binary data as `Buffer` or `Uint8Array` | | `Blob` | Binary large object; designed for streaming content >20KB | -Added in for `BigInt`: v4.3.0 +Added `BigInt` in v4.3.0 -Added in for `Blob`: v4.5.0 +Added `Blob` in v4.5.0 Arrays of a type are expressed with `[Type]` syntax (e.g., `[Float]` for a vector). @@ -393,7 +397,7 @@ Arrays of a type are expressed with `[Type]` syntax (e.g., `[Float]` for a vecto Added in: v4.5.0 -`Blob` fields are designed for large binary content. Unlike `Bytes`, blobs are stored separately from the record, support streaming, and do not need to be held entirely in memory. Use `Blob` for content typically larger than 20KB (images, video, audio, large HTML, etc.). +`Blob` fields are designed for large binary content. Harper's `Blob` type implements the [Web API `Blob` interface](https://developer.mozilla.org/en-US/docs/Web/API/Blob), so all standard `Blob` methods (`.text()`, `.arrayBuffer()`, `.stream()`, `.slice()`) are available. Unlike `Bytes`, blobs are stored separately from the record, support streaming, and do not need to be held entirely in memory. Use `Blob` for content typically larger than 20KB (images, video, audio, large HTML, etc.). See [Blob usage details](#blob-usage) below. @@ -415,13 +419,13 @@ let blob = createBlob(largeBuffer); await MyTable.put({ id: 'my-record', data: blob }); ``` -Retrieve blob data: +Retrieve blob data using standard Web API `Blob` methods: ```javascript let record = await MyTable.get('my-record'); -let buffer = await record.data.bytes(); -// or stream it: -let stream = record.data.stream(); +let buffer = await record.data.bytes(); // ArrayBuffer +let text = await record.data.text(); // string +let stream = record.data.stream(); // ReadableStream ``` Blobs support asynchronous streaming, meaning a record can reference a blob before it is fully written to storage. Use `saveBeforeCommit: true` to wait for full write before committing: @@ -462,22 +466,22 @@ Use `create_attribute` and `drop_attribute` operations to manually manage attrib ## OpenAPI Specification -Tables exported with `@export` are described in a default endpoint: +Tables exported with `@export` are described via the Operations API server (default port 9925), which is separate from the main HTTP server where REST, MQTT, and WebSocket services run: ```http -GET /openapi +GET http://localhost:9925/openapi ``` This provides an OpenAPI 3.x description of all exported resource endpoints. The endpoint is a starting guide and may not cover every edge case. ## Renaming Tables -> Harper does not support renaming tables. Changing a type name in a schema definition creates a new, empty table — the original table and its data are unaffected. +Harper does **not** support renaming tables. Changing a type name in a schema definition creates a new, empty table — the original table and its data are unaffected. ## Related Documentation - [Data Loader](./data-loader.md) — Seed tables with initial data alongside schema deployment -- [REST Querying](TODO:reference_versioned_docs/version-v4/rest/querying.md) — Querying tables via HTTP using schema-defined attributes and relationships +- [REST Querying](../rest/querying.md) — Querying tables via HTTP using schema-defined attributes and relationships - [Resources](TODO:reference_versioned_docs/version-v4/resources/resource-api.md) — Extending table behavior with custom application logic - [Storage Algorithm](./storage-algorithm.md) — How Harper indexes and stores schema-defined data - [Configuration](TODO:reference_versioned_docs/version-v4/configuration/options.md 'graphqlSchema component and storage options') — Component configuration for schemas diff --git a/reference_versioned_docs/version-v4/database/storage-algorithm.md b/reference_versioned_docs/version-v4/database/storage-algorithm.md index 8833247d..9acd59ca 100644 --- a/reference_versioned_docs/version-v4/database/storage-algorithm.md +++ b/reference_versioned_docs/version-v4/database/storage-algorithm.md @@ -58,12 +58,51 @@ Harper inherits the following performance properties from LMDB: ## Indexing Example -The diagram below illustrates how a single table's data and attribute indexes are laid out within Harper's LMDB-based storage: - - - - -![Storage Algorithm Diagram](TODO:IMAGE) +Given a table with records like this: + +``` +┌────┬────────┬────────┐ +│ id │ field1 │ field2 │ +├────┼────────┼────────┤ +│ 1 │ A │ X │ +│ 2 │ 25 │ X │ +│ 3 │ -1 │ Y │ +│ 4 │ A │ │ +│ 5 │ true │ 2 │ +└────┴────────┴────────┘ +``` + +Harper maintains three separate LMDB sub-databases for that table: + +``` +Table (LMDB environment file) +│ +├── primary index: id +│ ┌─────┬──────────────────────────────────────┐ +│ │ Key │ Value (full record) │ +│ ├─────┼──────────────────────────────────────┤ +│ │ 1 │ { id:1, field1:"A", field2:"X" } │ +│ │ 2 │ { id:2, field1:25, field2:"X" } │ +│ │ 3 │ { id:3, field1:-1, field2:"Y" } │ +│ │ 4 │ { id:4, field1:"A" } │ +│ │ 5 │ { id:5, field1:true, field2:2 } │ +│ └─────┴──────────────────────────────────────┘ +│ +├── secondary index: field1 secondary index: field2 +│ ┌────────┬───────┐ ┌────────┬───────┐ +│ │ Key │ Value │ │ Key │ Value │ +│ ├────────┼───────┤ ├────────┼───────┤ +│ │ -1 │ 3 │ │ 2 │ 5 │ +│ │ 25 │ 2 │ │ X │ 1 │ +│ │ A │ 1 │ │ X │ 2 │ +│ │ A │ 4 │ │ Y │ 3 │ +│ │ true │ 5 │ └────────┴───────┘ +│ └────────┴───────┘ +``` + +Secondary indexes store the attribute value as the key and the record's primary key (`id`) as the value. To resolve a query result, Harper looks up the matching ids in the secondary index, then fetches the full records from the primary index. + +Indexes are ordered — booleans first, then numbers (numerically), then strings (lexically) — enabling efficient range queries across all types. ## Related Documentation diff --git a/reference_versioned_docs/version-v4/database/system-tables.md b/reference_versioned_docs/version-v4/database/system-tables.md index 6c47a461..6b457e9f 100644 --- a/reference_versioned_docs/version-v4/database/system-tables.md +++ b/reference_versioned_docs/version-v4/database/system-tables.md @@ -115,7 +115,7 @@ A typical aggregate record: } ``` -For a full reference of available metrics and their fields, see [Analytics](TODO:reference_versioned_docs/version-v4/analytics/overview.md 'Complete analytics metrics reference'). +For a full reference of available metrics and their fields, see [Analytics](../analytics/overview.md 'Complete analytics metrics reference'). ## Data Loader Table @@ -152,7 +152,7 @@ Stores TLS certificates used in replication. Can be queried to inspect the certi ## Related Documentation -- [Analytics](TODO:reference_versioned_docs/version-v4/analytics/overview.md) — Full reference for analytics metrics tracked in `hdb_analytics` and `hdb_raw_analytics` +- [Analytics](../analytics/overview.md) — Full reference for analytics metrics tracked in `hdb_analytics` and `hdb_raw_analytics` - [Data Loader](./data-loader.md) — Component that writes to `hdb_dataloader_hash` - [Replication](TODO:reference_versioned_docs/version-v4/replication/overview.md) — Clustering and replication system that uses `hdb_nodes` and `hdb_certificate` - [Operations API](TODO:reference_versioned_docs/version-v4/operations-api/overview.md) — Querying system tables using `search_by_conditions` diff --git a/reference_versioned_docs/version-v4/database/transaction.md b/reference_versioned_docs/version-v4/database/transaction.md index 78cadd9a..ff10d684 100644 --- a/reference_versioned_docs/version-v4/database/transaction.md +++ b/reference_versioned_docs/version-v4/database/transaction.md @@ -29,7 +29,7 @@ Available since: v4.1.0 The audit log uses a standard Harper table to track every transaction against a user table. For each user table, Harper automatically creates and maintains a corresponding audit log table. The audit log captures the operation type, the user who made the change, the timestamp, and both the new and original record values. -The audit log is **enabled by default**. To disable it, set `logging.auditLog` to `false` in `harperdb-config.yaml` and restart Harper. +The audit log is **enabled by default**. To disable it, set [`logging.auditLog`](../logging/configuration.md) to `false` in `harperdb-config.yaml` and restart Harper. > The audit log is required for real-time messaging (WebSocket and MQTT subscriptions). Do not disable it if real-time features are in use. @@ -221,11 +221,11 @@ type Dog @table(audit: true) { } ``` -This overrides the `logging.auditLog` global configuration for that specific table. +This overrides the [`logging.auditLog`](../logging/configuration.md) global configuration for that specific table. ## Related Documentation -- [Logging](TODO:reference_versioned_docs/version-v4/logging/overview.md) — Application and system logging (separate from transaction/audit logging) +- [Logging](../logging/overview.md) — Application and system logging (separate from transaction/audit logging) - [Replication](TODO:reference_versioned_docs/version-v4/replication/overview.md) — Clustering setup required for transaction logs -- [Configuration](TODO:reference_versioned_docs/version-v4/configuration/options.md 'logging.auditLog option') — Global audit log configuration +- [Logging Configuration](../logging/configuration.md) — Global audit log configuration (`logging.auditLog`) - [Operations API](TODO:reference_versioned_docs/version-v4/operations-api/overview.md) — Sending operations to Harper diff --git a/reference_versioned_docs/version-v4/logging/configuration.md b/reference_versioned_docs/version-v4/logging/configuration.md index 76b4396d..659296fd 100644 --- a/reference_versioned_docs/version-v4/logging/configuration.md +++ b/reference_versioned_docs/version-v4/logging/configuration.md @@ -102,7 +102,7 @@ Default: `false` Enables audit (table transaction) logging. When enabled, Harper records every insert, update, and delete to a corresponding audit table. Audit log data is accessed via the `read_audit_log` operation. -See [Database / Transaction Logging](TODO:reference_versioned_docs/version-v4/database/transaction.md 'Audit and transaction logging') for details on using audit logs. +See [Database / Transaction Logging](../database/transaction.md) for details on using audit logs. ```yaml logging: @@ -366,5 +366,5 @@ http: - [Logging Overview](./overview) - [Logging API](./api) - [Logging Operations](./operations) -- [Database / Transaction Logging](TODO:reference_versioned_docs/version-v4/database/transaction.md 'Audit and transaction logging') +- [Database / Transaction Logging](../database/transaction.md) - [Configuration Overview](TODO:reference_versioned_docs/version-v4/configuration/overview.md 'Full harperdb-config.yaml reference') diff --git a/reference_versioned_docs/version-v4/logging/operations.md b/reference_versioned_docs/version-v4/logging/operations.md index ab288b88..b4dbab5f 100644 --- a/reference_versioned_docs/version-v4/logging/operations.md +++ b/reference_versioned_docs/version-v4/logging/operations.md @@ -8,7 +8,7 @@ title: Logging Operations Operations for reading the standard Harper log (`hdb.log`). All operations are restricted to `super_user` roles only. -> Audit log and transaction log operations (`read_audit_log`, `read_transaction_log`, `delete_audit_logs_before`, `delete_transaction_logs_before`) are documented in [Database / Transaction Logging](TODO:reference_versioned_docs/version-v4/database/transaction.md 'Audit and transaction logging operations'). +> Audit log and transaction log operations (`read_audit_log`, `read_transaction_log`, `delete_audit_logs_before`, `delete_transaction_logs_before`) are documented in [Database / Transaction Logging](../database/transaction.md). --- @@ -87,5 +87,5 @@ _Restricted to super_user roles only._ - [Logging Overview](./overview) - [Logging Configuration](./configuration) -- [Database / Transaction Logging](TODO:reference_versioned_docs/version-v4/database/transaction.md 'Audit log and transaction log operations') +- [Database / Transaction Logging](../database/transaction.md) - [Operations API Overview](TODO:reference_versioned_docs/version-v4/operations-api/overview.md 'Operations API overview') diff --git a/reference_versioned_docs/version-v4/logging/overview.md b/reference_versioned_docs/version-v4/logging/overview.md index 1bada4a3..862aa01c 100644 --- a/reference_versioned_docs/version-v4/logging/overview.md +++ b/reference_versioned_docs/version-v4/logging/overview.md @@ -10,7 +10,7 @@ title: Logging Harper's core logging system is used for diagnostics, monitoring, and observability. It has an extensive configuration system, and even supports feature-specific (per-component) configurations in latest versions. Furthermore, the `logger` global API is available for creating custom logs from any JavaScript application or plugin code. -> If you are looking for information on Harper's Audit and Transaction logging system, refer to the [Database](TODO:reference_versioned_docs/version-v4/database/transaction.md 'Audit and transaction logging') section. +> If you are looking for information on Harper's Audit and Transaction logging system, refer to the [Database](../database/transaction.md) section. ## Log File @@ -89,4 +89,4 @@ The `logger` global provides `trace`, `debug`, `info`, `warn`, `error`, `fatal`, - [Logging Configuration](./configuration) - [Logging API](./api) - [Logging Operations](./operations) -- [Database / Transaction Logging](TODO:reference_versioned_docs/version-v4/database/transaction.md 'Audit and transaction logging') +- [Database / Transaction Logging](../database/transaction.md) diff --git a/reference_versioned_docs/version-v4/mqtt/overview.md b/reference_versioned_docs/version-v4/mqtt/overview.md index bb9d7c55..829730b7 100644 --- a/reference_versioned_docs/version-v4/mqtt/overview.md +++ b/reference_versioned_docs/version-v4/mqtt/overview.md @@ -23,7 +23,7 @@ A topic of `my-resource/some-id` corresponds to the record with id `some-id` in - **Publishing** with the `retain` flag set replaces the record in the database (equivalent to a PUT operation). - **Publishing without** the `retain` flag delivers the message to current subscribers without writing to the database. -Defining a table that creates a topic can be as simple as adding a table with no attributes to your [schema.graphql](TODO:reference_versioned_docs/version-v4/database/schema.md 'Schema definition for defining tables and topics') in a Harper application: +Defining a table that creates a topic can be as simple as adding a table with no attributes to your [schema.graphql](../database/schema.md) in a Harper application: ```graphql type MyTopic @table @export @@ -138,5 +138,5 @@ Available events: - [MQTT Configuration](./configuration) - [HTTP Overview](../http/overview.md) - [Security Overview](../security/overview.md) -- [Database Schema](TODO:reference_versioned_docs/version-v4/database/schema.md 'Defining tables and topics with schema.graphql') +- [Database Schema](../database/schema.md) - [REST Overview](TODO:reference_versioned_docs/version-v4/rest/overview.md 'REST interface — same path conventions as MQTT topics') diff --git a/reference_versioned_docs/version-v4/rest/overview.md b/reference_versioned_docs/version-v4/rest/overview.md index 54574101..e38b2e4c 100644 --- a/reference_versioned_docs/version-v4/rest/overview.md +++ b/reference_versioned_docs/version-v4/rest/overview.md @@ -17,7 +17,7 @@ Harper provides a powerful, efficient, and standard-compliant HTTP REST interfac Harper's REST interface exposes database tables and custom resources as RESTful endpoints. Tables are **not** exported by default; they must be explicitly exported in a schema definition. The name of the exported resource defines the base of the endpoint path, served on the application HTTP server port (default `9926`). -For more on defining schemas and exporting resources, see [TODO:reference_versioned_docs/version-v4/database/schema.md 'Schema definition']. +For more on defining schemas and exporting resources, see [Database / Schema](../database/schema.md). ## Configuration @@ -156,4 +156,4 @@ GET /openapi - [WebSockets](./websockets.md) — Real-time connections via WebSocket - [Server-Sent Events](./server-sent-events.md) — One-way streaming via SSE - [HTTP Server](../http/overview.md) — Underlying HTTP server configuration -- [Database / Schema](TODO:reference_versioned_docs/version-v4/database/schema.md 'Schema definition') — How to define and export resources +- [Database / Schema](../database/schema.md) — How to define and export resources diff --git a/reference_versioned_docs/version-v4/rest/querying.md b/reference_versioned_docs/version-v4/rest/querying.md index 83070f56..49662cbc 100644 --- a/reference_versioned_docs/version-v4/rest/querying.md +++ b/reference_versioned_docs/version-v4/rest/querying.md @@ -251,11 +251,11 @@ This only works for properties declared in the schema. As of v4.5.0, dots in URL Added in: v4.5.0 -Resources can be configured with `directURLMapping: true` for more direct URL path handling. When enabled, the URL path is mapped more directly to the resource without the default query parameter parsing semantics. See [Database / Schema](TODO:reference_versioned_docs/version-v4/database/schema.md 'Schema and resource configuration') for configuration details. +Resources can be configured with `directURLMapping: true` for more direct URL path handling. When enabled, the URL path is mapped more directly to the resource without the default query parameter parsing semantics. See [Database / Schema](../database/schema.md) for configuration details. ## See Also - [REST Overview](./overview.md) — HTTP methods, URL structure, and caching - [Headers](./headers.md) — Request and response headers - [Content Types](./content-types.md) — Encoding formats -- [Database / Schema](TODO:reference_versioned_docs/version-v4/database/schema.md 'Schema definition') — Defining schemas, relationships, and indexes +- [Database / Schema](../database/schema.md) — Defining schemas, relationships, and indexes