Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion appinfo/routes.php
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,10 @@
['name' => 'transition#transition', 'url' => '/api/objects/{id}/transition', 'verb' => 'POST', 'requirements' => ['id' => '[^/]+']],
['name' => 'transition#availableActions', 'url' => '/api/objects/{id}/available-actions', 'verb' => 'GET', 'requirements' => ['id' => '[^/]+']],

// Aggregations sugar endpoint.
// Aggregations — ad-hoc time-bucket primitive (must be ordered
// BEFORE the {name} wildcard so /timeseries literal matches first).
['name' => 'aggregation#timeseries', 'url' => '/api/objects/aggregations/{register}/{schema}/timeseries', 'verb' => 'GET'],
// Aggregations sugar endpoint — named annotation surface.
['name' => 'aggregation#aggregate', 'url' => '/api/objects/aggregations/{register}/{schema}/{name}', 'verb' => 'GET'],

// Contacts matching API — used by ContactsMenuProvider + mail-sidebar.
Expand Down
171 changes: 171 additions & 0 deletions docs/technical/aggregation-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
# Aggregation API

OpenRegister exposes **two** aggregation surfaces. Pick the right one for your use case:

| Surface | Surface owner | When to use |
|---|---|---|
| **Named declarative** — `x-openregister-aggregations` schema annotation | App author / schema author | KPI tiles, business-rule counts, anything the app owns and ships with its register. Cached for 60s. |
| **Runtime ad-hoc** — REST `/aggregate/timeseries` + GraphQL `groupBy` | Client (per-request) | Dashboard charts, ad-hoc bucketing, "let the user pick a date range". No cache. |

This page documents the **ad-hoc primitive** (added by the `add-time-bucket-aggregation` change). For the named surface see `x-openregister-aggregations` documentation.

## When to use each

The named declarative surface is the right home for behaviours the **schema author** controls — KPIs, counts, business-rule rollups. Those are part of the app's contract and live in `lib/Settings/{app}_register.json`.

The ad-hoc primitive is the right home for behaviours the **client** controls — the user picks a date range, the dashboard widget picks the bucketing interval, the chart picks the metric. None of that belongs in the schema register; it's request-scoped.

A rule of thumb: if you'd hard-code the metric and field in the dashboard's source code, use the named surface. If the user gets to pick them at runtime, use the ad-hoc surface.

## REST surface

### Endpoint

```
GET /api/objects/aggregations/{register}/{schema}/timeseries
```

### Query parameters

| Param | Required | Notes |
|---|---|---|
| `field` | yes | The field to group / bucket on. MUST be a declared property of `{schema}` OR one of `_created`, `_updated`, `_deleted_at`. |
| `interval` | no | One of `MINUTE`, `HOUR`, `DAY`, `WEEK`, `MONTH`, `QUARTER`, `YEAR`. When set, the field is time-bucketed via Postgres `date_trunc()`. When absent, the field is grouped categorically. |
| `from` | required when `interval` set | ISO-8601 lower bound, inclusive. |
| `to` | required when `interval` set | ISO-8601 upper bound, exclusive. |
| `metric` | no | One of `count`, `sum`, `avg`, `min`, `max`. Default `count`. |
| `metricField` | required when `metric != count` | Field to aggregate over. MUST be a declared schema property. |
| `filter[...]` | no | Reuses the existing object-collection filter vocabulary (`filter[status]=active`, `filter[duration][gte]=10`). |

### Sub-day intervals require date-time fields

Bucketing by `MINUTE` or `HOUR` requires the field's JSON-Schema `format` to be `date-time` (not `date`). A `date`-only field can only be bucketed by `DAY`, `WEEK`, `MONTH`, `QUARTER`, or `YEAR`. The endpoint returns `400 Bad Request` if the constraint is violated.

### Response shape

```json
{
"groups": [
{ "key": "2026-05-21T00:00:00Z", "value": 42 },
{ "key": "2026-05-22T00:00:00Z", "value": 17 }
],
"backend": "postgres",
"cached": false
}
```

- `key`: bucket label. For `interval`-bucketed queries this is an ISO-8601-UTC string at the start of the bucket. For categorical groupBy it's the value of the groupBy field.
- `value`: the aggregated metric (always a number; an integer for `count`, a float for other metrics).
- `backend`: which engine served the query — `"postgres"` (native `date_trunc`), `"mysql"` (native `DATE_FORMAT`), `"sqlite"` (native `strftime`), or `"php-fallback"` (unrecognised engine).
- `cached`: `true` on a read-through cache hit, `false` on miss or the first request after invalidation. See [Cache](#cache) below.

### Empty buckets

Buckets with zero rows are **omitted** from the response — `GROUP BY` does not emit empty groups. The client fills empties at render time. See [issue #1607](https://github.com/ConductionNL/openregister/issues/1607) for cumulative / windowed series.

### Status codes

| Code | When |
|---|---|
| `200` | Happy path. |
| `400` | Validation failure (unknown field, bad interval, missing bounds, etc.). |
| `403` | Caller lacks `list` permission on the schema. |
| `404` | Register or schema not found. |

### Example

```bash
curl -s 'http://localhost:8080/index.php/apps/openregister/api/objects/aggregations/openconnector/calllogs/timeseries?field=created&interval=DAY&from=2026-05-01T00:00:00Z&to=2026-05-22T00:00:00Z' \
-u admin:admin \
-H 'OCS-APIRequest: true' \
| jq .
```

## GraphQL surface

Every auto-generated list query accepts an optional `groupBy: GroupByInput` argument. When supplied, the connection result includes a non-null `groups: [GroupBucket!]` field.

### Types (auto-generated)

```graphql
input GroupByInput {
field: String!
interval: TimeInterval
from: String # required when interval is set
to: String # required when interval is set
metric: AggregationMetric = COUNT
metricField: String # required when metric != COUNT
}

enum TimeInterval { MINUTE HOUR DAY WEEK MONTH QUARTER YEAR }
enum AggregationMetric { COUNT SUM AVG MIN MAX }
type GroupBucket { key: String! value: Float! }
```

### Example query

```graphql
query CallsPerDay {
calllogs(
filter: { status: "error" }
groupBy: {
field: "created"
interval: DAY
from: "2026-05-01T00:00:00Z"
to: "2026-05-22T00:00:00Z"
}
) {
totalCount
groups {
key
value
}
}
}
```

`totalCount` is the size of the filtered set; the sum of `groups[*].value` equals `totalCount` when `metric: COUNT`.

When the client does not request `groupBy`, the `groups` field is `null` (not an empty array — `null` means "no aggregation requested").

### Validation errors

Validation problems surface as GraphQL field-errors on the `groups` field. The rest of the connection (edges, pageInfo, totalCount, facets) still resolves normally.

## Backend matrix

The runner picks the matching native bucketing primitive for the active database engine and falls back to PHP only on engines OpenRegister doesn't natively target.

| Database | Bucketing path | `backend` field |
|---|---|---|
| PostgreSQL | `date_trunc($gap, "$field")::text` | `postgres` |
| MySQL / MariaDB | `DATE_FORMAT("$field", '<format>')` (ISO-Monday week-start; `CONCAT(YEAR, ..., '-01T00:00:00Z')` for quarter) | `mysql` |
| SQLite | `strftime('<format>', "$field")` (ISO-Monday via `weekday 0` + `-6 days`; `CASE` on `strftime('%m')` for quarter) | `sqlite` |
| Other / unknown | RBAC-filtered hydrate + PHP `date_trunc` polyfill (`gmdate`) | `php-fallback` |

All four paths emit identical wire shape: ISO-8601-UTC bucket keys (`Y-m-d\TH:i:s\Z`), the same `groups[i].value` coercion (int for `count`, float otherwise), and the same RBAC + multi-tenancy gate. The `backend` field lets a caller observe which engine served the request without changing how the response is consumed.

## Cache

Ad-hoc results are served via a 60-second distributed cache:

- **Storage**: same `openregister_aggregations` distributed cache the named-aggregation path uses (`AggregationCache`).
- **Read-through**: on entry to `runAdhoc()`, the runner derives the key from `(registerSlug, schemaSlug, sha1(json_encode($query->toArray())), filter, rbacScopeHash)`, prefixed with `adhoc:`. A hit returns the stored envelope with `cached: true`; a miss falls through to the native-or-fallback dispatch and the resulting envelope is written back.
- **Key stability**: `AggregationQuery::toArray()` ksort-sorts the filter map (recursively, into operator sub-arrays), so `filter[a, b]` and `filter[b, a]` produce identical cache keys.
- **RBAC scoping**: the key includes `sha1(uid)` (or `sha1('anonymous')`) so two callers with different list-permission verdicts on the same `(metric, field, filter)` tuple never read each other's results.
- **Invalidation**: `AggregationCacheInvalidationListener` evicts the entire `(register, schema)` cache on every `ObjectCreatedEvent`, `ObjectUpdatedEvent`, `ObjectDeletedEvent`, and `ObjectTransitionedEvent`. The eviction is coarse (`ICache::clear()` flushes the whole `openregister_aggregations` namespace) but bounded by the 60-second TTL ceiling on missed evicts.
- **Stampede tolerance**: no distributed lock — the 60-second TTL bounds duplicate-miss compute. Revisit if a high-traffic dashboard surfaces stampede symptoms in production.

## Performance notes

- **Database index**: for any field commonly used as a bucketing target (`created`, `updated`, custom date columns), declare a btree index on the magic-table column. The native bucketing expression operates against an indexed column on every supported engine, so the cost stays in the database where it belongs.
- **Row-level RBAC**: the multi-tenant predicate (`_organisation = ?`) and the schema's `PermissionHandler::canRead()` verdict both apply BEFORE bucketing. Aggregations cannot leak rows the caller could not read row-by-row.
- **PHP fallback ceiling**: the PHP-fallback path on unrecognised engines caps the hydrated row set at 10 000 and sets `truncated: true` when exceeded. Native paths (Postgres / MySQL / SQLite) operate over the full set in SQL.

## Non-goals (deferred)

| Topic | Issue |
|---|---|
| Multi-field groupBy (`groupBy: [status, priority]`) | [#1606](https://github.com/ConductionNL/openregister/issues/1606) |
| Running / cumulative series | [#1607](https://github.com/ConductionNL/openregister/issues/1607) |
| Multi-metric in one request (`count` + `sum`) | [#1608](https://github.com/ConductionNL/openregister/issues/1608) |
98 changes: 93 additions & 5 deletions lib/Controller/AggregationController.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,27 @@
/**
* OpenRegister AggregationController
*
* Sugar HTTP entry point for the x-openregister-aggregations annotation.
* HTTP entry point for the two aggregation surfaces OR exposes:
*
* - {@see aggregate()} — named-annotation surface backed by the
* `x-openregister-aggregations` block on a schema. Schema-author
* declared, immutable per release. Original surface.
* - {@see timeseries()} — ad-hoc surface where the client supplies
* the field, optional bucketing interval, and bounds at request
* time. Added by `add-time-bucket-aggregation`. Backs the
* nextcloud-vue `CnChartWidget.dataSource` bucket shorthand.
*
* Both paths share `AggregationRunner` for RBAC + multi-tenant
* gating + Postgres / fallback dispatch. The ad-hoc path does not
* consult `AggregationCache` (its key shape is keyed on the named
* annotation — extending it is tracked in issue #1610).
*
* @category Controller
* @package OCA\OpenRegister\Controller
*
* SPDX-License-Identifier: EUPL-1.2
* SPDX-FileCopyrightText: 2026 Conduction B.V. <dev@conduction.nl>
*
* @author Conduction Development Team <dev@conduction.nl>
* @copyright 2026 Conduction B.V.
* @license EUPL-1.2 https://joinup.ec.europa.eu/collection/eupl/eupl-text-eupl-12
Expand All @@ -21,8 +37,10 @@

namespace OCA\OpenRegister\Controller;

use InvalidArgumentException;
use OCA\OpenRegister\Exception\NotAuthorizedException;
use OCA\OpenRegister\Service\Aggregation\AggregationRunner;
use OCA\OpenRegister\Service\Aggregation\TimeseriesRequestValidator;
use OCP\AppFramework\Controller;
use OCP\AppFramework\Http;
use OCP\AppFramework\Http\JSONResponse;
Expand All @@ -34,14 +52,16 @@ class AggregationController extends Controller
/**
* Constructor.
*
* @param string $appName The application name.
* @param IRequest $request The current request.
* @param AggregationRunner $runner The aggregation runner.
* @param string $appName The application name.
* @param IRequest $request The current request.
* @param AggregationRunner $runner The aggregation runner.
* @param TimeseriesRequestValidator $validator Ad-hoc request validator.
*/
public function __construct(
string $appName,
IRequest $request,
private readonly AggregationRunner $runner
private readonly AggregationRunner $runner,
private readonly TimeseriesRequestValidator $validator
) {
parent::__construct(appName: $appName, request: $request);
}//end __construct()
Expand Down Expand Up @@ -82,4 +102,72 @@ public function aggregate(string $register, string $schema, string $name): JSONR
);
return $response;
}//end aggregate()

/**
* Ad-hoc time-bucket aggregation entry point.
*
* Accepts query params:
* - field (required)
* - interval (optional — MINUTE|HOUR|DAY|WEEK|MONTH|QUARTER|YEAR)
* - from, to (required when interval set; ISO-8601)
* - metric (optional, default `count`)
* - metricField (required when metric != count)
* - filter[...] (optional, reuses the existing filter vocabulary)
*
* Returns `{ groups: [{ key, value }], backend, cached }` matching the
* GraphQL `groups` field shape so `CnChartWidget` can normalise once.
*
* @param string $register Register reference.
* @param string $schema Schema reference.
*
* @return JSONResponse JSON response with bucketed groups.
*
* @NoAdminRequired
* @NoCSRFRequired
*/
public function timeseries(string $register, string $schema): JSONResponse
{
// Resolve schema first so the validator can consult the
// declared property list. A missing schema is a 404; a bad
// query-param shape is a 400.
try {
$schemaEntity = $this->runner->findSchema(schemaRef: $schema);
} catch (RuntimeException $e) {
return new JSONResponse(['error' => $e->getMessage()], Http::STATUS_NOT_FOUND);
}

// Pull the request shape from the active IRequest. The filter
// map comes through as a nested array because PHP parses
// `filter[x][op]=y` into `$_GET['filter']['x']['op']='y'`.
$input = [
'field' => $this->request->getParam('field', ''),
'interval' => $this->request->getParam('interval'),
'from' => $this->request->getParam('from'),
'to' => $this->request->getParam('to'),
'metric' => $this->request->getParam('metric', 'count'),
'metricField' => $this->request->getParam('metricField'),
'filter' => (array) ($this->request->getParam('filter', [])),
];

try {
$query = $this->validator->validate(input: $input, schema: $schemaEntity);
} catch (InvalidArgumentException $e) {
return new JSONResponse(['error' => $e->getMessage()], Http::STATUS_BAD_REQUEST);
}

try {
$result = $this->runner->runAdhocByRef(
registerRef: $register,
schemaRef: $schema,
query: $query
);
} catch (NotAuthorizedException $e) {
return new JSONResponse(['error' => $e->getMessage()], Http::STATUS_FORBIDDEN);
} catch (RuntimeException $e) {
return new JSONResponse(['error' => $e->getMessage()], Http::STATUS_NOT_FOUND);
}

return new JSONResponse($result);

}//end timeseries()
}//end class
68 changes: 68 additions & 0 deletions lib/Service/Aggregation/AggregationCache.php
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,74 @@ public function set(string $registerSlug, string $schemaSlug, string $name, arra
}
}//end set()

/**
* Look up a cached ad-hoc aggregation result.
*
* Mirrors {@see get()} but derives the name slot from the query value
* object. The literal `adhoc:` prefix keeps ad-hoc entries visually
* distinct from named-aggregation entries in cache dumps.
*
* @param string $registerSlug Register slug component of the cache key.
* @param string $schemaSlug Schema slug component of the cache key.
* @param AggregationQuery $query Query value object hashed into the cache key.
*
* @return array<string, mixed>|null Cached envelope or null on miss.
*/
public function getAdhoc(string $registerSlug, string $schemaSlug, AggregationQuery $query): ?array
{
return $this->get(
registerSlug: $registerSlug,
schemaSlug: $schemaSlug,
name: $this->adhocName(query: $query),
filter: $query->filter
);

}//end getAdhoc()

/**
* Store an ad-hoc aggregation result.
*
* Mirrors {@see set()} for the ad-hoc path. The stored envelope is
* rewritten on read (`cached: true`) by callers — see
* {@see \OCA\OpenRegister\Service\Aggregation\AggregationRunner::runAdhoc()}.
*
* @param string $registerSlug Register slug component of the cache key.
* @param string $schemaSlug Schema slug component of the cache key.
* @param AggregationQuery $query Query value object hashed into the cache key.
* @param array<string, mixed> $result Result envelope to store.
*
* @return void
*/
public function setAdhoc(string $registerSlug, string $schemaSlug, AggregationQuery $query, array $result): void
{
$this->set(
registerSlug: $registerSlug,
schemaSlug: $schemaSlug,
name: $this->adhocName(query: $query),
filter: $query->filter,
result: $result
);

}//end setAdhoc()

/**
* Derive the cache name slot for an ad-hoc query.
*
* Computes `'adhoc:'.sha1(json_encode($query->toArray()))`. The
* `AggregationQuery::toArray()` output is ksort-stable so two
* structurally-equivalent queries produce identical hashes.
*
* @param AggregationQuery $query The ad-hoc query value object.
*
* @return string The cache name slot, prefixed with `adhoc:`.
*/
private function adhocName(AggregationQuery $query): string
{
$encoded = json_encode($query->toArray());
return 'adhoc:'.sha1($encoded === false ? '' : $encoded);

}//end adhocName()

/**
* Evict every cached aggregation for a (register, schema). Called by
* the object-write listeners.
Expand Down
Loading
Loading