Harper | Site Cache Component

A rules-driven caching component for Harper that sits between edge traffic and origin services. It supports:

HTML/page caching and API caching in separate Harper tables.
A DB-backed TTL rule engine (regex + optional header/query conditions).
Cache bypass and debug observability headers.
Manual and timestamp-based invalidation.
Environment-based configuration via cacheConfiguration.<env>.json files, selected at boot by the ENVIRONMENT env var.

Architecture in Harper

This component is implemented as:

A global HTTP interceptor in src/index.ts using server.http(...).
Cache handler modules in src/cacheHandlers/:
- defaultCache.ts — page/HTML cache using Harper's sourcedFrom external data source pattern
- apiCache.ts — API response cache using the same pattern
Custom resource classes for rule management and invalidation:
- TTLRules in src/resources/ttlRules.ts
- Invalidate in src/resources/cacheInvalidation.ts
Utility modules for:
- rule classification and cache entry fetching (src/util/cache.ts)
- key generation (src/util/cacheKeys.ts)
- header handling (src/util/headers.ts)
- origin fetch pooling with connection reuse and timeouts (src/util/originClient.ts)

Runtime bootstrap behavior:

Load request interceptor.
Subscribe to TTL rule updates from TTLRules and rebuild in-memory index.
Subscribe to invalidation timestamps and keep an in-memory invalidation map.

Request Flow

1) Request routing

Incoming traffic is processed by server.http(...):

Reserved paths bypass cache routing (always bypassed):
- /status
- /prometheus_exporter/metrics
- /cache/ttlConfig
- /cache/invalidate
- Additional paths can be added via the RESERVED_PATHS env var (comma-separated).
Other requests are authenticated using authorization (Basic auth).
Requests are classified as API traffic when either condition matches:
- a configurable header (CACHE_CONFIG.apiHeader)
- URL contains configured API prefix (CACHE_CONFIG.apiPathPrefix)
API requests route to handleAPI(...); cacheable page requests route to fetchCachedResponse(...), non-cacheable to originPassthrough(...).

2) Cache key generation

Keys are deterministic and configuration-driven:

Path is normalized:
- lowercased
- canonical trailing slash behavior
Included headers/cookies/query params are controlled by:
- defaultCacheKey for page cache
- apiCacheKey for API cache
Key parts are sorted for order-independent equality.
When key length exceeds KEY_OVERFLOW (1000 chars), a suffix MD5 hash is added.

3) Rule classification

Each request is classified against in-memory TTL rules:

Path is normalized.
Candidates are narrowed by longest literal prefix bucket.
Candidates are checked in descending specificity.
Optional conditions are evaluated (header/query + operator).
First matching rule wins.
Result is memoized for hot keys (up to 5000 entries) when rule has no conditions.

4) Cache read/write behavior

API flow (`src/cacheHandlers/apiCache.ts`)

GET requests can use cache unless x-harper-cache-bypass: true.
Cache hit returns stored payload + stored headers + x-harper-cache: hit.
Cache miss fetches origin, conditionally stores successful responses (status === 200 and matching TTL rule), returns x-harper-cache: miss.
Non-cacheable API responses return x-harper-cache: no-cache.
Non-GET API methods are proxied without caching.

Default/page flow (`src/cacheHandlers/defaultCache.ts`)

Computes page cache key and rule match.
If cacheable and present, returns cached content (x-harper-cache: hit).
Otherwise fetches origin and stores response when rule exists.
Unsuccessful origin responses are not cached.

Both flows:

Strip/normalize non-cache-safe headers.
Persist optional groupCode, cacheTags, url, and refreshedAt.
Use invalidation timestamps to skip stale entries.

TTL Rules Engine

TTL rules are defined per row and loaded into memory from the TTL rules table.

Rule shape

Each rule row supports:

id
description
pathPatterns: array of regex strings
ttl: duration or special policy
groupCode (optional)
additionalMatchCriteria (optional)

TTL values

Supported rule policies in runtime:

Duration policy: numeric duration converted to seconds.
origin_expires: use upstream Expires semantics.
never: no expiration timestamp is set.
no_cache: explicitly disallow caching

Validation on the admin resource currently accepts:

Durations: 1m, 6h, 1d, 1y (pattern: positive integer + m|h|d|y)
Specials: origin_expires, never

Note: no_cache is a runtime-only policy and cannot be submitted via the admin API.

Note:

If no TTL rule matches, the request is treated as non-cacheable by default.

Additional match criteria

additionalMatchCriteria entries support:

additionalMatchType: header or query
additionalMatchOperator: equals, not_equals, contains, not_contains, exists, not_exists
additionalMatchKey
additionalMatchValue: string or string array for value-based operators

Current evaluation behavior:

Multiple criteria in one rule are ANDed.
equals and not_equals compare against a single normalized value.
contains and not_contains are evaluated against provided value list.
exists and not_exists only require key presence/absence.

Matching algorithm details

Indexing and ranking:

Regexes are compiled once.
A literal prefix is extracted from each regex when possible.
Prefix buckets are sorted longest-first.
Specificity scoring:
- base from literal density and path segments
- +10 weight when conditions are present
Candidate order is "bucket candidates first, then general rules", each already sorted by specificity.

Memoization:

In-memory map max size: 5000.
FIFO-style eviction at overflow.
Memo is cleared when rules reload.
Rules with additional conditions are not memoized.

Harper Schema

Source: src/db/schema.graphql

`DefaultCache.CacheContent`

Field	Type	Notes
`cacheKey`	`String`	Primary key
`data`	`Blob!`	Cached page payload
`headers`	`String!`	Serialized response headers
`debugHeaders`	`String`	Serialized debug metadata
`groupCode`	`String`	Optional grouped invalidation key
`cacheTags`	`String`	Optional cache tags extracted from origin header
`url`	`String`	Source URL
`refreshedAt`	`Long`	Last write timestamp

`APICache.CacheContent`

Field	Type	Notes
`cacheKey`	`String`	Primary key
`data`	`Blob!`	Cached API payload
`headers`	`String!`	Serialized response headers
`debugHeaders`	`String`	Serialized debug metadata
`groupCode`	`String`	Optional grouped invalidation key
`cacheTags`	`String`	Optional cache tags
`url`	`String`	Source URL
`refreshedAt`	`Long`	Last write timestamp

`CacheManagement.TTLRules`

Field	Type	Notes
`id`	`ID`	Primary key
`description`	`String`	Human-readable label
`pathPatterns`	`[String]!`	Regex list
`ttl`	`String!`	Duration or special policy
`groupCode`	`String`	Optional grouped invalidation key
`additionalMatchCriteria`	`[Any]`	Optional conditional matching criteria

`CacheManagement.CacheInvalidation`

Field	Type	Notes
`id`	`Int`	Primary key (record `1` is used by invalidation flow)
`timestamps`	`Any`	Map of invalidation timestamps by key (`api`, `page`, or `groupCode`)

Cache Configuration Reference

File naming and environment selection

Configuration is split into per-environment files named cacheConfiguration.<env>.json. The active file is selected at boot using the ENVIRONMENT environment variable:

ENVIRONMENT=prod    # loads cacheConfiguration.prod.json
ENVIRONMENT=stage   # loads cacheConfiguration.stage.json
ENVIRONMENT=integration  # loads cacheConfiguration.integration.json
# unset or empty   # defaults to cacheConfiguration.local.json

Create one file per environment you deploy to. A minimal set looks like:

cacheConfiguration.local.json       ← local dev (default when ENVIRONMENT is unset)
cacheConfiguration.stage.json
cacheConfiguration.prod.json
cacheConfiguration.integration.json ← used when running integration tests

If the resolved file does not exist, the app will throw at startup with a clear error indicating which file it tried to load.

Setting ENVIRONMENT in your deployment

Set the ENVIRONMENT variable wherever your Harper process is launched. Examples:

Shell / systemd:

ENVIRONMENT=prod harperdb run .

Docker:

ENV ENVIRONMENT=prod

docker-compose:

environment:
  - ENVIRONMENT=prod

Harper config.yaml (via loadEnv.files: .env):

# .env
ENVIRONMENT=prod

Example file (`cacheConfiguration.stage.json`)

{
	"cacheTagsHeader": "X-Origin-Cache-Tags",
	"apiPathPrefix": "/api/",
	"apiHeader": { "key": "X-Fwd-Origin", "value": "API" },
	"apiPathReplacement": { "search": "/api/", "replace": "" },
	"apiOrigin": "https://api.staging.example.com",
	"apiOriginAuthHeader": "",
	"apiCacheKey": {
		"includeHeaders": ["accept", "origin", "version"],
		"includeQueryParams": "ALL",
		"includeCookies": []
	},
	"defaultOrigin": "https://www.harper.fast",
	"defaultOriginAuthHeader": "",
	"defaultPathReplacement": false,
	"defaultCacheKey": {
		"includeHeaders": ["device-type", "accept-language"],
		"includeQueryParams": ["sort", "page", "filter"],
		"includeCookies": ["brand"]
	}
}

Field-by-field behavior

Key	Purpose	Required
`cacheTagsHeader`	Response header name read from origin to persist cache tags in records.	No
`apiPathPrefix`	URL substring used to classify requests as API traffic.	Conditional: required if `apiOrigin` is set and `apiHeader` is not set
`apiHeader`	Header-based API classifier object (`key` + `value`).	Conditional: required if `apiOrigin` is set and `apiPathPrefix` is not set
`apiPathReplacement`	Rewrites incoming API path before forwarding to API origin.	No
`apiOrigin`	API origin base URL string (e.g. `https://api.example.com`).	No
`apiOriginAuthHeader`	Optional header name sent to API origin for auth token forwarding.	Optional; if set, token env var is required
`apiCacheKey`	API cache key config object (`includeHeaders`, `includeQueryParams`, `includeCookies`).	Conditional: required if `apiOrigin` is set
`defaultOrigin`	Default/page origin base URL string (e.g. `https://www.example.com`).	Yes
`defaultOriginAuthHeader`	Optional header name sent to default origin for auth token forwarding.	Optional; if set, token env var is required
`defaultPathReplacement`	Rewrites default/page path before forwarding to default origin.	No
`defaultCacheKey`	Page cache key config object (`includeHeaders`, `includeQueryParams`, `includeCookies`).	Yes

If apiOriginAuthHeader is set, provide one of as an env var:

HDB_API_ORIGIN_AUTH_TOKEN
API_ORIGIN_AUTH_TOKEN

If defaultOriginAuthHeader is set, provide one of as an env var:

HDB_DEFAULT_ORIGIN_AUTH_TOKEN
DEFAULT_ORIGIN_AUTH_TOKEN

Environment Variables

All variables are optional unless noted.

Variable	Default	Description
`ENVIRONMENT`	`local`	Selects the `cacheConfiguration.<env>.json` file to load at startup.
`REQUEST_TIMEOUT_MS`	`30000`	Outer per-request timeout (ms). Requests exceeding this return `504 Gateway Timeout`.
`MAX_CONNECTIONS`	`80`	Max simultaneous connections per origin in the undici pool.
`CLIENT_TTL_MS`	`300000`	Keep-alive timeout for pooled connections (ms).
`CONNECT_TIMEOUT_MS`	`10000`	TCP connect timeout per origin request (ms).
`HEADERS_TIMEOUT_MS`	`30000`	Time to wait for response headers from origin (ms).
`BODY_TIMEOUT_MS`	`60000`	Time to wait for the full response body from origin (ms).
`RESERVED_PATHS`	(none)	Comma-separated additional URL paths that bypass cache logic entirely (e.g. `/health,/ping`).
`HDB_LOAD_TEST_MODE`	`false`	When `true`, replaces all origin fetches with mock responses for load testing without a real origin.
`API_ORIGIN_AUTH_TOKEN`	(none)	Auth token forwarded to the API origin when `apiOriginAuthHeader` is configured.
`HDB_API_ORIGIN_AUTH_TOKEN`	(none)	Same as above; takes priority over `API_ORIGIN_AUTH_TOKEN`.
`DEFAULT_ORIGIN_AUTH_TOKEN`	(none)	Auth token forwarded to the default origin when `defaultOriginAuthHeader` is configured.
`HDB_DEFAULT_ORIGIN_AUTH_TOKEN`	(none)	Same as above; takes priority over `DEFAULT_ORIGIN_AUTH_TOKEN`.

Admin Resources

The module exports:

cache.ttlConfig for TTL rule writes
cache.invalidate for cache invalidation operations

Path mapping for exported Resources:

POST /cache/ttlConfig
PUT /cache/ttlConfig/:id
POST /cache/invalidate

Sample `cache.ttlConfig` request body

{
	"description": "Category API responses",
	"pathPatterns": ["^.*/catalog/v\\d+/category/.+$"],
	"ttl": "6h",
	"groupCode": "catalog",
	"additionalMatchCriteria": [
		{
			"additionalMatchType": "query",
			"additionalMatchOperator": "equals",
			"additionalMatchKey": "catNav",
			"additionalMatchValue": "L4"
		}
	]
}

Expected behavior:

POST: create rule
PUT: upsert by :id
Validation errors return 400 with message text.
Success returns 204.

Invalidation Model

cache.invalidate expects JSON body:

{
	"type": "api | page | cacheTag | url",
	"groupCode": "optional-group",
	"cacheTag": "required-when-type-cacheTag",
	"url": "required-when-type-url",
	"runAsync": false
}

`type: api` or `type: page` (soft invalidation)

Does not delete cache rows.
Writes an invalidation timestamp into CacheInvalidation.timestamps.
The app subscribes to this table and keeps the latest timestamps in memory.
Any cache record with refreshedAt < timestamp is treated as expired on read.
If groupCode is provided, the timestamp is stored by groupCode instead of global api/page key, Invalidating only records with matching group code within the default or api cache.
This model reduces write volume for mass invalidation because it avoids record-by-record deletes.

`type: cacheTag` (hard invalidation)

Immediately deletes records in both cache tables matching cacheTags contains <tag>.
Note: this requires a scan of the relevant cache table(s) to find matching records, which can be expensive at scale.
Set runAsync: true to return a 202 Accepted immediately and run the deletes in the background.

`type: url` (hard invalidation)

Immediately deletes records in both cache tables matching url == <url>.
Set runAsync: true to return a 202 Accepted immediately and run the deletes in the background.

Individual record deletion by cache key

You can directly delete a single cache row by cacheKey using Harper Operations API delete.
Operations API reference: Delete (NoSQL)

Headers and Observability

Control headers

x-harper-cache-bypass: true
- Bypass cache and fetch origin directly.
x-harper-cache-debug: true
- Include detailed cache debug headers in response.

Response headers

x-harper: true
x-harper-cache: hit | miss
Debug headers when enabled:
- x-harper-cache-path
- x-harper-cache-rule
- x-harper-cache-rule-id
- x-harper-cache-policy
- x-harper-cache-ttl
- x-harper-cache-bucket
- x-harper-cache-pattern
- x-harper-cache-key
- x-harper-cache-ttl-remaining-sec

Authentication Model

For non-reserved paths, the interceptor reads:

authorization: Basic <base64(username:password)>

Then calls server.authenticateUser(username, password).

Example:

HDB_ADMIN:password -> Basic SERCX0FETUlOOnBhc3N3b3Jk

authorization: Basic SERCX0FETUlOOnBhc3N3b3Jk

Required roles

The component enforces role-based access using two role sets:

ALLOWED_ROLES_CACHE = ['cache_user', 'cache_admin', 'super_user'];
ALLOWED_ROLES_ADMIN = ['cache_admin', 'super_user'];

ALLOWED_ROLES_CACHE — required for all cache proxy requests. Users must hold one of these roles to have their requests served.
ALLOWED_ROLES_ADMIN — required for admin operations such as TTL rule management (/cache/ttlConfig) and cache invalidation (/cache/invalidate).

Requests authenticated with a user that does not hold the required role will be rejected. To grant access to a user:

Create the user in Harper with the appropriate role, or assign an existing user one of the roles above.
Harper's built-in super_user role has full access. For cache-only access use cache_user; for admin access use cache_admin.

See Configuring Roles in the Harper documentation for how to create and assign custom roles.

Run and Deploy

Local dev

npm install
npm run build
npm run dev

Tests

npm run test:unit

Harper app wiring

config.yaml currently sets:

rest: true
graphqlSchema.files: src/db/schema.graphql
jsResource.files: dist/resources/index.js
loadEnv.files: .env

Ensure build output and resource entrypoints are aligned with your deployment target.

Operational Notes

Rules are hot-reloaded via table subscriptions; restart is not required for rule edits.
Invalidation timestamps are in-memory plus table-backed; using record id 1 keeps semantics simple.
Cache keys intentionally include only configured dimensions to avoid cardinality blowups.
Keep regex patterns as specific as possible to minimize candidate scans and false matches.

Test Suites

Unit tests

Validates isolated utility and rules logic.

npm run test:unit

Integration tests

Validates end-to-end cache behavior against a running Harper instance with mocked origins.

npm run test:integration

Minimum local environment (in the shell running tests):

export TEST_DOMAIN=http://localhost:9926
export HDB_ADMIN_USERNAME=HDB_ADMIN
export HDB_ADMIN_PASSWORD=password

Harper must be running with ENVIRONMENT=integration (loads cacheConfiguration.integration.json):

# Shell running Harper
export ENVIRONMENT=integration
harperdb run .

The mock origin host and port are configurable via env vars in the test shell if your network differs from the defaults (e.g. non-Docker local setups):

# Shell running tests — override if mock origins are not on the default addresses
export MOCK_ORIGIN_HOST=127.0.0.1
export MOCK_DEFAULT_ORIGIN_PORT=4101
export MOCK_API_ORIGIN_PORT=4102

If you need a fully local setup with different origin URLs, create a cacheConfiguration.local.json pointing to your local mock addresses and run Harper with ENVIRONMENT=local (or unset).

See tests/integration/README.md for the full local two-terminal setup.

Performance tests (k6)

Ramps to target RPS and round-robins requests across provided hosts.

k6 run tests/performance/ramp-round-robin.test.js \
  -e HOSTS=https://localhost:9926,https://localhost:9936 \
  -e REQUEST_PATHS=/,/api/it/load/products?foo=bar \
  -e TARGET_RPS=250 \
  -e RAMP_UP_DURATION=2m \
  -e TARGET_DURATION=8m

To simulate origin behavior without calling external origins, run Harper with:

export HDB_LOAD_TEST_MODE=true

See tests/performance/README.md for all k6 options.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cacheConfiguration.integration.json		cacheConfiguration.integration.json
codegen.ts		codegen.ts
config.yaml		config.yaml
eslint.config.mjs		eslint.config.mjs
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.jsonc		tsconfig.jsonc

Folders and files

Latest commit

History

Repository files navigation

Harper | Site Cache Component

Table of Contents

Architecture in Harper

Request Flow

1) Request routing

2) Cache key generation

3) Rule classification

4) Cache read/write behavior

API flow (src/cacheHandlers/apiCache.ts)

Default/page flow (src/cacheHandlers/defaultCache.ts)

TTL Rules Engine

Rule shape

TTL values

Additional match criteria

Matching algorithm details

Harper Schema

DefaultCache.CacheContent

APICache.CacheContent

CacheManagement.TTLRules

CacheManagement.CacheInvalidation

Cache Configuration Reference

File naming and environment selection

Setting ENVIRONMENT in your deployment

Example file (cacheConfiguration.stage.json)

Field-by-field behavior

Environment Variables

Admin Resources

Sample cache.ttlConfig request body

Invalidation Model

type: api or type: page (soft invalidation)

type: cacheTag (hard invalidation)

type: url (hard invalidation)

Individual record deletion by cache key

Headers and Observability

Control headers

Response headers

Authentication Model

Required roles

Run and Deploy

Local dev

Tests

Harper app wiring

Operational Notes

Test Suites

Unit tests

Integration tests

Performance tests (k6)

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

API flow (`src/cacheHandlers/apiCache.ts`)

Default/page flow (`src/cacheHandlers/defaultCache.ts`)

`DefaultCache.CacheContent`

`APICache.CacheContent`

`CacheManagement.TTLRules`

`CacheManagement.CacheInvalidation`

Example file (`cacheConfiguration.stage.json`)

Sample `cache.ttlConfig` request body

`type: api` or `type: page` (soft invalidation)

`type: cacheTag` (hard invalidation)

`type: url` (hard invalidation)

Packages