Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -133,3 +133,5 @@ HISTORY_ACCESS_PUBLISHED_DATA_GROUPS=""
HISTORY_ACCESS_POLICIES_GROUPS=""
HISTORY_ACCESS_DATABLOCK_GROUPS=""
HISTORY_ACCESS_ATTACHMENT_GROUPS=""

DATAFILES_METADATA_SCHEMA="datafilesMetadataSchema.example.json"
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ jobConfig.yaml
metricsConfig.json
publishedDataConfig.json
openSearchConfig.json
datafilesMetadataSchema.json

# Configs
.env
Expand Down
6 changes: 6 additions & 0 deletions datafilesMetadataSchema.example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {},
"additionalProperties": false
}
200 changes: 200 additions & 0 deletions docs/developer-guide/datafiles_metadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
# Datafile Metadata

Datafile objects can carry optional file-specific metadata in the `metadata`
field. This field is available on each entry of an origdatablock
`dataFileList`.

```json
{
"path": "raw/run-0001.nxs",
"size": 1048576,
"time": "2026-06-02T08:00:00Z",
"chk": "2cf24dba5fb0a30e26e83b2ac5b9e29e",
"metadata": {
"duration": 12.5,
"measurement_type": "scan"
}
}
```

The backend stores `metadata` as a JSON object:

```ts
metadata?: Record<string, unknown>;
```

>**Important**:
>This field is intended for facility specific file-level metadata. For aggregate metadata that should be searchable and shown more prominently to users, prefer the dataset `scientificMetadata` field. Note that these metadata are not searchable at Dataset level.

## Configuration

Datafile metadata validation is configured with the
`DATAFILES_METADATA_SCHEMA` environment variable.

```sh
DATAFILES_METADATA_SCHEMA="datafilesMetadataSchema.example.json"
```

The environment variable points to a JSON Schema file. During application
configuration, `src/config/configuration.ts` reads and parses that file, then
exposes the parsed schema object through the Nest configuration key
`datafilesMetadataSchema`.

If `DATAFILES_METADATA_SCHEMA` is not set, `configuration.ts` uses
`datafilesMetadataSchema.json` as the default schema path. If that file is
missing, it falls back to `datafilesMetadataSchema.example.json`. The default
schema shipped with the backend is closed and rejects non-empty metadata until a
facility configures allowed fields. If both the default schema file and example
schema file are missing, the current configuration loader stores a schema
object (`{type: "object", additionalProperties: false}`), which would still reject all non empty metadata.

The example schema is deliberately closed:

```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {},
"additionalProperties": false
}
```

With this schema, omitted or empty datafile metadata is valid, but any
non-empty metadata object is rejected until the facility configures the allowed
metadata fields.

## Schema Draft

The validation pipe uses the default Ajv import, which validates draft-07
schemas. Schema files should declare the draft-07 meta-schema:

```json
{
"$schema": "http://json-schema.org/draft-07/schema#"
}
```

## Example Facility Schema

```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"duration": {
"type": "number",
"minimum": 0
},
"measurement_type": {
"type": "string",
"enum": ["scan", "calibration", "dark"]
},
"detector": {
"type": "object",
"properties": {
"name": { "type": "string" },
"distance_mm": { "type": "number" }
},
"required": ["name"],
"additionalProperties": false
}
},
"required": ["measurement_type"],
"additionalProperties": false
}
```

To allow arbitrary top-level metadata keys, configure
`additionalProperties: true` explicitly or simply remove it entirely:

```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
}
```

For nested objects, define `additionalProperties` in the nested schema too if
unknown nested keys should be rejected.

## Validation Mechanism

Validation is implemented by `DatafilesMetadataValidationPipe` in
`src/origdatablocks/pipes/datafiles-metadata-validation.pipe.ts`.

For each request body handled by this pipe:

1. The default schema in case of missing configuration is:
```json
{
"type": "object",
"additionalProperties": false
}
```
2. If the request body has no `dataFileList`, the pipe returns without metadata
validation. This allows partial update bodies that do not touch files.

3. Each entry in `dataFileList` is checked.
4. Each datafile `metadata` value is validated. If a datafile omits `metadata`,
the pipe validates an empty object (`{}`).
5. Invalid metadata causes HTTP 400 with the validation error details.
6. A schema that cannot be compiled by Ajv causes HTTP 500 because it is a
server configuration problem.

Invalid metadata produces an error like:

```text
Datafile metadata is not following the configured schema: metadata/duration must be number
```

## Validated Routes

The pipe validates origdatablock request bodies on routes where
`@UsePipes(DatafilesMetadataValidationPipe)` is applied.

Current covered routes:

- `POST /origdatablocks` in the v3 controller
- `PATCH /origdatablocks/:id` in the v3 controller
- `POST /origdatablocks` in the v4 controller
- `POST /origdatablocks/isValid` in the v4 controller
- `PATCH /origdatablocks/:id` in the v4 controller

Any future route that accepts origdatablock `dataFileList` input must also use
this pipe if datafile metadata should be validated there.

## Request Examples

Accepted when the configured schema allows `duration` and
`measurement_type`:

```json
{
"datasetId": "20.500.12345/example-dataset",
"size": 1048576,
"dataFileList": [
{
"path": "raw/run-0001.nxs",
"size": 1048576,
"time": "2026-06-02T08:00:00Z",
"metadata": {
"duration": 12.5,
"measurement_type": "scan"
}
}
]
}
```

Rejected by the closed example schema:

```json
{
"path": "raw/run-0001.nxs",
"size": 1048576,
"time": "2026-06-02T08:00:00Z",
"metadata": {
"operator_comment": "extra key"
}
}
```
18 changes: 17 additions & 1 deletion src/common/dto/datafile.dto.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
import { ApiProperty } from "@nestjs/swagger";
import { IsDateString, IsNumber, IsOptional, IsString } from "class-validator";
import {
IsDateString,
IsNumber,
IsOptional,
IsString,
IsObject,
} from "class-validator";

export class DataFileDto {
@ApiProperty({
Expand Down Expand Up @@ -72,4 +78,14 @@ export class DataFileDto {
@IsString()
@IsOptional()
readonly type: string;

@ApiProperty({
type: Object,
required: false,
description:
"File-specific metadata. The Dataset field scientificMetadata should be preferred for aggregate metadata, as it is searchable and displayed more prominently to users.",
})
@IsObject()
@IsOptional()
readonly metadata: Record<string, unknown>;
}
1 change: 1 addition & 0 deletions src/common/interfaces/common.interface.ts
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ export interface IDatafileFilter {
gid?: string;
perm?: string;
type?: string;
metadata?: Record<string, unknown>;
}

export type IFiltersV4<T, Y = null, Z = string> = Pick<
Expand Down
12 changes: 12 additions & 0 deletions src/common/schemas/datafile.schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,18 @@ export class DataFile {
required: false,
})
type?: string;

@ApiProperty({
type: Object,
required: false,
description:
"File-specific metadata. The Dataset field scientificMetadata should be preferred for aggregate metadata, as it is searchable and displayed more prominently to users.",
})
@Prop({
type: Object,
required: false,
})
metadata?: Record<string, unknown>;
}

export const DataFileSchema = SchemaFactory.createForClass(DataFile);
5 changes: 5 additions & 0 deletions src/config/configuration.ts
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ const configuration = () => {
datasetTypes: {},
proposalTypes: {},
opensearchConfig: {},
datafilesMetadataSchema: { type: "object", additionalProperties: false },
};
const jsonConfigFileList: { [key: string]: string } = {
frontendConfig:
Expand All @@ -89,6 +90,8 @@ const configuration = () => {
process.env.PUBLISHED_DATA_CONFIG_FILE || "publishedDataConfig.json",
opensearchConfig:
process.env.OPENSEARCH_CONFIG_FILE || "opensearchConfig.json",
datafilesMetadataSchema:
process.env.DATAFILES_METADATA_SCHEMA || "datafilesMetadataSchema.json",
};
Object.keys(jsonConfigFileList).forEach((key) => {
const filePath = jsonConfigFileList[key];
Expand All @@ -106,6 +109,7 @@ const configuration = () => {
const configsWithExampleFallback = [
"publishedDataConfig",
"opensearchConfig",
"datafilesMetadataSchema",
];
if (configsWithExampleFallback.includes(key)) {
console.warn(
Expand Down Expand Up @@ -458,6 +462,7 @@ const configuration = () => {
publishedDataConfig: jsonConfigMap.publishedDataConfig,
ajvCustomDefinitions: ajvCustomDefinitions,
opensearchConfig: jsonConfigMap.opensearchConfig,
datafilesMetadataSchema: jsonConfigMap.datafilesMetadataSchema,
};
return merge(config, localconfiguration);
};
Expand Down
3 changes: 3 additions & 0 deletions src/datablocks/datablocks.service.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ const mockDatablock: Datablock = {
uid: "testUid",
gid: "testGid",
perm: "testPerm",
metadata: {
key: "value",
},
},
],
};
Expand Down
4 changes: 4 additions & 0 deletions src/origdatablocks/origdatablocks.controller.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import {
Req,
ForbiddenException,
NotFoundException,
UsePipes,
} from "@nestjs/common";
import { Request } from "express";
import { OrigDatablocksService } from "./origdatablocks.service";
Expand Down Expand Up @@ -48,6 +49,7 @@ import { CreateRawDatasetObsoleteDto } from "src/datasets/dto/create-raw-dataset
import { CreateDerivedDatasetObsoleteDto } from "src/datasets/dto/create-derived-dataset-obsolete.dto";
import { logger } from "@user-office-software/duo-logger";
import { FullFacetFilters, FullFacetResponse } from "src/common/types";
import { DatafilesMetadataValidationPipe } from "src/origdatablocks/pipes/datafiles-metadata-validation.pipe";

@ApiBearerAuth()
@ApiTags("origdatablocks")
Expand Down Expand Up @@ -163,6 +165,7 @@ export class OrigDatablocksController {

// POST /origdatablocks
@UseGuards(PoliciesGuard)
@UsePipes(DatafilesMetadataValidationPipe)
@CheckPolicies("origdatablocks", (ability: AppAbility) =>
ability.can(Action.OrigdatablockCreate, OrigDatablock),
)
Expand Down Expand Up @@ -619,6 +622,7 @@ export class OrigDatablocksController {
@CheckPolicies("origdatablocks", (ability: AppAbility) =>
ability.can(Action.OrigdatablockUpdate, OrigDatablock),
)
@UsePipes(DatafilesMetadataValidationPipe)
@Patch("/:id")
@ApiOperation({
summary: "It updates the origdatablock.",
Expand Down
3 changes: 2 additions & 1 deletion src/origdatablocks/origdatablocks.module.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import { OrigDatablocksPublicV4Controller } from "./origdatablocks-public.v4.con
import { OrigDatablocksV4Controller } from "./origdatablocks.v4.controller";
import { CaslModule } from "src/casl/casl.module";
import { DatasetsModule } from "src/datasets/datasets.module";
import { DatafilesMetadataValidationPipe } from "./pipes/datafiles-metadata-validation.pipe";

@Module({
imports: [
Expand All @@ -28,6 +29,6 @@ import { DatasetsModule } from "src/datasets/datasets.module";
OrigDatablocksV4Controller,
],
exports: [OrigDatablocksService],
providers: [OrigDatablocksService],
providers: [OrigDatablocksService, DatafilesMetadataValidationPipe],
})
export class OrigDatablocksModule {}
Loading
Loading