Skip to content

Commit

Permalink
Merge pull request #189 from ResearchObject/issue-183-nonslash-root
Browse files Browse the repository at this point in the history
Add concept Detached RO-Crate
  • Loading branch information
stain committed Mar 23, 2023
2 parents 1531617 + 8225e00 commit cf5d1cd
Show file tree
Hide file tree
Showing 4 changed files with 191 additions and 36 deletions.
102 changes: 97 additions & 5 deletions docs/1.2-DRAFT/appendix/relative-uris.md
Expand Up @@ -32,12 +32,105 @@ grand_parent: RO-Crate 1.2-DRAFT
1. TOC
{:toc}

The _RO-Crate Metadata File_ use _relative URI references_ to identify files and directories
In an _Attached RO-Crate_, the _RO-Crate Metadata File_ use _relative URI references_
to identify files and directories
contained within the _RO-Crate Root_ and its children. As described in section
[Describing entities in JSON-LD](#describing-entities-in-json-ld) above,
[Describing entities in JSON-LD](jsonld.md#describing-entities-in-json-ld),
relative URI references are also frequently used for
identifying _Contextual entities_.

## Converting from Attached to Detached RO-Crate

An [Attached RO-Crate](../structure.md#attached-ro-crate) can be published on the Web by placing its _RO-Crate Root_ directory on a static file-based Web server (e.g. Nginx, Apache HTTPd, GitHub Pages). The use of relative URI references in the _RO-Crate Metadata File_ ensures identifiers of [data entities](../data-entities.md) work as they should.

Sometimes it is desired to make a [Detached RO-Crate](../structure.md#detached-ro-crate), e.g. for depositing or integrating the RO-Crate Metadata File into a knowledge graph or repository that is unable to preserve data files using their existing pathnames. In this case one needs to:

1. Decide on new Web locations for individual data files and update their absolute URI in `@id`
2. Observe the preservation considerations for [Web-based Data Entities](data-entities.md#web-based-data-entities)
3. Ensure all nested directories not browsable on the Web are represented as `Dataset` with its content listed with `hasPart` or `distribution` (see section [Directories on the web](../data-entities.md#Directories on the web; dataset distributions)). Change their relative `@id` to become absolute, e.g. using [ARCP](#establishing-a-base-uri-inside-a-zip-file).
4. Rewrite the JSON-LD with absolute URIs for data entities

If the RO-Crate is already published on the Web, with directory browsing enabled for nested directories, then these steps can be achieved using JSON-LD tooling.

For example, as the RO-Crate Metadata file <https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json> along with the RO-Crate Root is published on the Web (using GitHub Pages), we can generate a random UUID (e.g. `d6be5c9b-132a-4a93-9837-3e02e06c08e6`) and use [JSON-LD flattening]
from this context:

```json
{ "@context": [
{"@base": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json"},
"https://w3id.org/ro/crate/1.1/context"
]
}

```

to this context:

```json
{ "@context": [
{"@base": "arcp://uuid,d6be5c9b-132a-4a93-9837-3e02e06c08e6/"},
"https://w3id.org/ro/crate/1.1/context"
]
}
```

None of the existing resources will have a `@id` starting with this fresh base URI, therefore all URIs will be made absolute. The resulting `{@base: ..}` is harmless, but can be removed from the output JSON-LD.

Example output (abbreviated):

```json
{
"@context": [
{"@base": "arcp://uuid,d6be5c9b-132a-4a93-9837-3e02e06c08e6/"},
"https://w3id.org/ro/crate/1.1/context"
],
"@graph": [
{
"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
"about": {"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/"},
"creator": {"@id": "https://orcid.org/0000-0001-9842-9718"}
},
{
"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/",
"@type": "Dataset",
"hasPart": [
{ "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/index.html"},
{ "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/example/"},
],
"name": "Workflow RO-Crate profile"
}
```

Notice how identifiers like `ro-crate-metadata.json`, `./`, `index.html` and `example/` have been translated to absolute URIs.

The above JSON-LD processing will also expand any `#`-based local identifiers of contextual entities:

```json
{
"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json#include-ComputationalWorkflow",
"@type": "Recommendation",
"category": "MUST",
"name": "Include Main Workflow",
"itemReviewed": {
"@id": "https://bioschemas.org/ComputationalWorkflow"
}
}
```

In this approach, the Detached RO-Crate can be resolved to the corresponding Attached RO-Crate by following the `@id` of the Root Data Set or the Root Metadata File entity.

If the new Detached RO-Crate is not meant as a snapshot of the corresponding Attached RO-Crate, then such contextual entities should be assigned new `@id`, e.g. by generating random UUIDs like `urn:uuid:e47e41d9-f924-4c07-bc90-97e7ed34fe35`. Such tranformations are typically not catered for by traditional JSON-LD tooling and require additional implementation.


## Converting from Detached to Attached RO-Crate

_TODO_


## Handling relative URI references when using JSON-LD/RDF tools

When using JSON-LD tooling and RDF libraries to consume or generate RO-Crates,
extra care should be taken to ensure these URI references are handled correctly.

Expand All @@ -46,15 +139,14 @@ consistent handling:

## Flattening JSON-LD from nested JSON

If performing [JSON-LD flattening] to generate a valid _RO-Crate Metadata File_ for a _Regular RO-Crate_, add `@base: null` to the input JSON-LD `@context` array to avoid expanding relative URI references. The flattening `@context` SHOULD NOT need `@base: null`.
If performing [JSON-LD flattening] to generate a valid _RO-Crate Metadata File_ for a _Attached RO-Crate_, add `@base: null` to the input JSON-LD `@context` array to avoid expanding relative URI references. The flattening `@context` SHOULD NOT need `@base: null`.

Example, this JSON-LD is in [compacted form][compacted] which may be beneficial for processing, but is not yet valid _RO-Crate Metadata File_ as it has not been flattened into a `@graph` array.

```json
{
"@context": [
{"@base": null},
"https://w3id.org/ro/crate/1.2-DRAFT/context"
"https://w3id.org/ro/crate/1.2-DRAFT/context"
],
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
Expand Down
15 changes: 12 additions & 3 deletions docs/1.2-DRAFT/root-data-entity.md
Expand Up @@ -46,7 +46,7 @@ The _RO-Crate JSON-LD_ MUST contain a self-describing
**RO-Crate Metadata File Descriptor** with
the `@id` value `ro-crate-metadata.json` (or `ro-crate-metadata.jsonld` in legacy
crates) and `@type` [CreativeWork]. This descriptor MUST have an [about]
property referencing the _Root Data Entity_, which SHOULD have an `@id` of `./`.
property referencing the _Root Data Entity_'s `@id`.

```json

Expand Down Expand Up @@ -175,11 +175,12 @@ be minimally valid.
The _Root Data Entity_ MUST have the following properties:

* `@type`: MUST be [Dataset]
* `@id`: MUST end with `/` and SHOULD be the string `./`
* `@id`: SHOULD be the string `./` or an absolute URI (see below)
* `name`: SHOULD identify the dataset to humans well enough to disambiguate it from other RO-Crates
* `description`: SHOULD further elaborate on the name to provide a summary of the context in which the dataset is important.
* `datePublished`: MUST be a string in [ISO 8601 date format][DateTime] and SHOULD be specified to at least the precision of a day, MAY be a timestamp down to the millisecond.
* `license`: SHOULD link to a _Contextual Entity_ or _Data Entity_ in the _RO-Crate Metadata File_ with a name and description (see section on [licensing](contextual-entities.md#licensing-access-control-and-copyright)). MAY, if necessary be a textual description of how the RO-Crate may be used.

{: .note }
> These requirements are stricter than those published
> for [Google Dataset Search](https://developers.google.com/search/docs/data-types/dataset) which
Expand All @@ -188,7 +189,15 @@ The _Root Data Entity_ MUST have the following properties:
{: .warning }
> The properties above are not sufficient to generate a [DataCite][DataCite Schema] citation. Advice on integrating with [DataCite] will be provided in a future version of this specification, or as an implementation guide.
Additional properties of _schema.org_ types [Dataset] and [CreativeWork] MAY be added to further describe the RO-Crate as a whole, e.g. [author], [abstract], [publisher]. See sections [contextual entities](contextual-entities.md) and [provenance](provenance.md) for further details.
Additional properties of _schema.org_ types [Dataset] and [CreativeWork] MAY be added to further describe the RO-Crate as a whole, e.g. [author], [publisher]. See sections [contextual entities](contextual-entities.md) and [provenance](provenance.md) for further details.


### Root Data Entity identifier

The root data entity's `@id` SHOULD be either `./` (indicating the directory of `ro-crate-metadata.json` is the [RO-Crate Root](structure.md)), or an absolute URI (indicating a [detached RO-Crate](structure.md#detached-ro-crate)).

If the `@id` of the Root Data Entity is an absolute URI, the Crate SHOULD NOT contain [data entities](data-entities.md) using relative URI references, but MAY contain [Web-based Data Entities](data-entities.html#web-based-data-entities) using absolute URIs.


## Minimal example of RO-Crate

Expand Down
78 changes: 64 additions & 14 deletions docs/1.2-DRAFT/structure.md
Expand Up @@ -5,8 +5,8 @@ parent: RO-Crate 1.2-DRAFT
---
<!--
Copyright 2019-2020 University of Technology Sydney
Copyright 2019-2020 The University of Manchester UK
Copyright 2019-2020 RO-Crate contributors <https://github.com/ResearchObject/ro-crate/graphs/contributors>
Copyright 2019-2022 The University of Manchester UK
Copyright 2019-2022 RO-Crate contributors <https://github.com/ResearchObject/ro-crate/graphs/contributors>
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand All @@ -32,7 +32,32 @@ parent: RO-Crate 1.2-DRAFT
1. TOC
{:toc}

The structure an _RO-Crate_ MUST follow is:
## Types of RO-Crate

There are two classes of RO-Crate detailed below:

**Attached RO-Crate**
: A crate that has a well-defined _RO-Crate Root directory_ and can carry an **explicit payload** of local [data entities](data-entities.md) as regular files (combined with [Web-based Data Entities](data-entities.html#web-based-data-entities) where needed). This type of RO-Crate can be suitable for long-term preservation, transfer and publishing, as the _RO-Crate Metadata File_ is stored alongside the crate's payload. See further definition of [attached RO-Crate](#attached-ro-crate) below.

**Detached RO-Crate**
: A crate without a defined payload directory. In this kind of crate, all data references are absolute. This approach may be suitable for use with dynamic web service APIs and repositories that can't preserve file paths. As the data of these crates can only be [Web-based Data Entities](data-entities.html#web-based-data-entities), the **payload is implicit** and must be preserved/transferred/archived independent of the _RO-Crate Metadata File_. See further definition of [detached RO-Crate](#detached-ro-crate) below.

In both types of crates the metadata is completed with [contextual entities](contextual-entities.md) that further describe the relationships and context of the data to form a _Research Object_.


## Attached RO-Crate

A **Attached RO-Crate** is used to contain and describe a _payload_ of files and directories, among with their contextual information.

A _Attached RO-Crate_ can be stored and published in multiple ways depending on its use:
* On a typical hierarchical _file system_ (e.g. `/files/shared/crates/my-crate-01/`)
* Exposed as a _Web resource_ within a folder structure (e.g. <https://www.researchobject.org/2021-packaging-research-artefacts-with-ro-crate/>)
* [_Packaged_](appendix/implementation-notes.md#combining-with-other-packaging-schemes) within a ZIP file, BagIt archive or OCFL structure
* _Archived_ as a set of named files in other ways (e.g. Zenodo deposit)

A valuable feature of the _Attached RO-Crate_ approach is that the metadata is preserved when a crate is transferred between these types of storage/publication systems.

The file path structure a _Attached RO-Crate_ MUST follow is:

```
<RO-Crate root directory>/
Expand All @@ -42,22 +67,36 @@ The structure an _RO-Crate_ MUST follow is:
| | [other RO-Crate Website files]
| [payload files and directories] # 0 or more
```

The name of the _RO-Crate root_ directory is not defined, but a root directory is identifiable by the presence of the _RO-Crate Metadata File_, `ro-crate-metadata.json`. For instance, if an _RO-Crate_ is archived in a ZIP-file, the ZIP root directory is an _RO-Crate root_ directory if it contains `ro-crate-metadata.json`.

The payload directory (and its child directory) contains files and directories that SHOULD be described within the _RO-Crate Metadata File_ as [Data Entities](data-entities.md). Additional [Web-based Data Entities](data-entities.html#web-based-data-entities) MAY also be described, but are not considered part of the payload.

[Data Entities](data-entities.md) in the RO-Crate MUST either be _payload files/directories_ present within the RO-Crate root directory or its subdirectories, or be [Web-based Data Entities](data-entities.html#web-based-data-entities).

<!--
RO-Crates can be _nested_ by including payload directories that themselves contain an _RO-Crate Metadata File_.
-->
## Detached RO-Crate

A _Detached RO-Crate_ is an RO-Crate without a defined root directory, where the _RO-Crate Metadata File_ and/or _RO-Crate Website_ content is accessed independently (e.g. as part of a programmatic API).

These crates cannot carry their own data _payload_, but may reference data deposited separately, or purely reference [contextual entities](contextual-entities.md).

Any [data entities](data-entities.md) in a _Detached RO-Crate_ MUST be [Web-based Data Entities](data-entities.html#web-based-data-entities).

{: .warning }
> Using relative URI references like `example/data.txt` in a _Detached RO-Crate_ is NOT RECOMMENDED as this is considered ambigious and fragile.
A _Detached RO-Crate_ can be identified by the [root data entity](root-data-entity.md) having an `@id` different from `./` in the JSON.

{: .note }
> [Finding the Root Data Entity](root-data-entity.md#finding-the-root-data-entity) can be harder for consumers of detached crates, particularly if the platform serving the _RO-Crate Metadata File_ is unable to ensure the URI path ends with `…/ro-crate-metadata.json`.
Note that a detached RO-Crate may still use `#`-based local identifiers for [contextual entities](contextual-entities.md).


## RO-Crate Metadata File (`ro-crate-metadata.json`)

* In new RO-Crates the _RO-Crate Metadata File_ MUST be named `ro-crate-metadata.json` and appear in the _RO-Crate Root_
* In a _Attached RO-Crate_ the _RO-Crate Metadata File_ MUST be named `ro-crate-metadata.json` and appear in the _RO-Crate Root_
- If an RO-Crate conforming to version 1.0 or earlier contains a file named `ro-crate-metadata.jsonld` but not `ro-crate-metadata.json`, then processing software should treat this as the _RO-Crate Metadata File_. If the crate is updated, the file SHOULD be renamed to `ro-crate-metadata.json` and the _RO-Crate Metadata File Descriptor_ SHOULD be updated to reference it, with an up to date [conformsTo] property naming an appropriate version of this specification.
* The _RO-Crate Metadata File_ MUST contain _RO-Crate JSON-LD_; a valid [JSON-LD 1.0] document in [flattened] and [compacted] form
* The _RO-Crate JSON-LD_ SHOULD use the _RO-Crate JSON-LD Context_ <https://w3id.org/ro/crate/1.2-DRAFT/context> by reference.
* If an RO-Crate conforming to version 1.0 or earlier contains a file named `ro-crate-metadata.jsonld` instead of `ro-crate-metadata.json` then processing software should treat this as the _RO-Crate Metadata File_. If the crate is updated then the file SHOULD be renamed to `ro-crate-metadata.json` and the _RO-Crate Metadata File Descriptor_ SHOULD be updated to reference it, with an up to date [conformsTo] property naming an appropriate version of this specification.


[JSON-LD](https://json-ld.org/) is a structured form of [JSON] that can represent a _Linked Data_ graph.
Expand Down Expand Up @@ -86,7 +125,7 @@ The appendix [RO-Crate JSON-LD](appendix/jsonld.md) details the general structur

In addition to the machine-oriented _RO-Crate Metadata File_, the RO-Crate MAY include a human-readable HTML rendering of the same information, known as the _RO-Crate Website_. If present, the _RO-Crate Website_ MUST be a file named `ro-crate-preview.html` in the root directory, which MAY serve as the entry point to other web-resources, which MUST be in `ro-crate-preview_files/` in the root directory.

If present in the root directory, `ro-crate-preview.html` MUST:
If present in the root directory of a _Attached RO-Crate_ as `ro-crate-preview.html`, (or otherwise served in a _Detached RO-Crate_), the RO-Crate Website MUST:

* Be a valid [HTML 5] document
* Be useful to users of the RO-Crate - this will vary by community and intended use, but in general the aim to assist users in reusing data by explaining what it is, how it was created how it can be used and how to cite it. One simple approach to this is to expose *all* the metadata in the _RO-Crate Metadata File_.
Expand Down Expand Up @@ -125,7 +164,7 @@ Metadata about parts of the _RO-Crate Website_ MAY be included in an RO-Crate as
{
"@id": "https://www.npmjs.com/package/ro-crate-html-js",
"@type": "SoftwareApplication",
"url": "ttps://www.npmjs.com/package/ro-crate-html-js",
"url": "https://www.npmjs.com/package/ro-crate-html-js",
"name": "ro-crate-html-js",
"version": "1.4.19"
}
Expand All @@ -147,12 +186,23 @@ Metadata about parts of the _RO-Crate Website_ MAY be included in an RO-Crate as
}
```

{: .warning }
> In a _Detached RO-Crate_ it is **undefined** how to find the _RO-Crate Website_ from the _RO-Crate Metadata File_ or vice versa It is RECOMMENDED to describe both as contextual entities.


## Payload files and directories

These are the actual files and directories that make up the dataset being described.
These are the actual files and directories that make up the **payload** of the dataset being described in a _Attached RO-Crate_.

The base RO-Crate specification makes no assumptions about the presence of any specific files or folders beyond the reserved RO-Crate files described above.

Payload files may appear directly in the _RO-Crate Root_ alongside the _RO-Crate Metadata File_, and/or appear in sub-directories of the _RO-Crate Root_. Each file and directory MAY be represented as [Data Entities](data-entities.md) in the _RO-Crate Metadata File_.

A RO-Crate may also contain [Web-based Data Entities](data-entities.html#web-based-data-entities) that are not present as part of the payload and referenced using absolute URIs. These may require additional preservation measures.

The base RO-Crate specification makes no assumptions about the presence of any specific files or folders beyond the reserved RO-Crate files described above. Payload files may appear directly in the _RO-Crate Root_ alongside the _RO-Crate Metadata File_, and/or appear in sub-directories of the _RO-Crate Root_. Each file and directory MAY be represented as [Data Entities](data-entities.md) in the _RO-Crate Metadata File_.
{: .tip }
> A RO-Crate [packaged with BagIt](appendix/implementation-notes.md#adding-ro-crate-to-bagit) may be [referencing external files](appendix/implementation-notes.md#referencing-external-files) which are not present in the _RO-Crate Root_ hierarchy until the BagIt has been _completed_. This method can be used for files that are large, require authentication or otherwise inconvenient to transfer with the RO-Crate, but which should nevertheless still be considered part of the _payload_.

## Self-describing and self-contained
Expand Down

0 comments on commit cf5d1cd

Please sign in to comment.