New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add concept Detached RO-Crate #189
Conversation
From RO-Crate meeting 2022-01-27:
|
I suggest we use the term "Attached RO-Crate". I Suggest definition (from this pull request's structure.md)
Attached RO-Crate If a crate makes any relative references then it is considered an Attached RO-Crate and the Root Dataset ID MUST be "./". Detached RO-Crate See further definition of detached RO-Crate I think this is necessary because of #183 allowing
|
Terminology attached/detached RO-Crate agreed in RO-Crate meeting 2022-02-10. |
d94d1fa
to
81ac88d
Compare
I started drafting a section Converting from attached to detached just wanted to check if we are OK with what comes out of the JSON-LD flattening: {
"@context": [
{"@base": "arcp://uuid,d6be5c9b-132a-4a93-9837-3e02e06c08e6/"},
"https://w3id.org/ro/crate/1.1/context"
],
"@graph": [
{
"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
"about": {"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/"},
"creator": {"@id": "https://orcid.org/0000-0001-9842-9718"}
},
{
"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/",
"@type": "Dataset",
"hasPart": [
{ "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/index.html"},
{ "@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/example/"},
],
"name": "Workflow RO-Crate profile"
},
{
"@id": "https://about.workflowhub.eu/Workflow-RO-Crate/1.0/ro-crate-metadata.json#include-ComputationalWorkflow",
"@type": "Recommendation",
"category": "MUST",
"name": "Include Main Workflow",
"itemReviewed": {
"@id": "https://bioschemas.org/ComputationalWorkflow"
}
}
For anything more "proper" I think you would need manual processing, e.g. manual deposit and rewrite of each data entity file, manual UUID for each contextual entity. |
I think we should recommend removing the |
Are the uuids intended to be unique? Cos people will copy and paste, or hardcode them into their crate. Regarding attached crates can we do the deal with the relativity of paths using base: "./" or similar (or is that not allowed?) I know I have base: null in crates to stop JSON-LD libraries from messing with my paths - would have to refresh my memory |
If you leave the arcp based UUID, you should add a small recipe about using python's arcp or how to generate in a couple of programming languages those UUIDs in the namespace of URLs. import uuid
the_url = 'https://example.org'
the_uuid = uuid.uuid5(uuid.NAMESPACE_URL, the_url)
# the_uuid.hex has the UUID string representation import arcp
the_arcp = arcp.arcp_location("http://example.com/data.zip", "/file.txt")
# the_arcp has the ARCP string representation import uuid
the_random_uuid = uuid.uuid4()
# the_random_uuid.hex has the UUID string representation import arcp
the_random_arcp = arcp.arcp_random()
# the_random_arcp has the ARCP string representation |
On reflection I don't think we need this attached/detached distinction. I think we should look at providing clear info about how to use relative and absolute paths for various resources. Based on experience where we have implemented an API that uses the API URL as the @id but it is then not clear how to reconstitute a crate, I think that approach was a mistake. It might be better to go back to an approach where @ids are
For packaged crates-on-disk use @base: null with relative paths for data entities For crates over an API use the dcat:downloadURL property on DataEntities for the place where you can get a file and as per (1) above make its @id the filename it should have relative to the root. and Identifier for IDs like DOIs. |
Further to my last comment @stain & @simleo. I think I have found a neat solution to the problem we were having with letting "@id" in for a File be a URI - how would you save it to disk and re-construct the relative path structure of a package? Solution: In RO-Crate Metadata Documents served from a service leave the @ids as relative paths but use DCAT accessUrl (to point to RO-Crate Metadata served over an API) and downloadURL for the actual datastream. We can then recommend that a process for reconstituting an RO-Crate by using the @id to create directories and write file contents. I have written this up in the work I was doing on a new intro - this detail probably does not all belong in the intro though. Here's a copy and paste from that Google doc.
|
Looks nice, @ptsefton @stain @simleo !! I have several questions, some of them offtopic.
|
The current spec already allows url = "http://example.com/foo.txt"
# Download file; it will be placed under <CRATE_DIR>/examples when the crate is written out
crate.add_file(url, "examples/foo.txt", fetch_remote=True)
# Don't download file; its @id will still be a URI in the output crate
crate.add_file(url, fetch_remote=False) In the latter case, a @jmfernandez I don't think there's any requirement for UPDATE: https://schema.org/downloadUrl is only used in |
@@ -44,7 +44,7 @@ The _RO-Crate JSON-LD_ MUST contain a self-describing | |||
**RO-Crate Metadata File Descriptor** with | |||
the `@id` value `ro-crate-metadata.json` (or `ro-crate-metadata.jsonld` in legacy | |||
crates) and `@type` [CreativeWork]. This descriptor MUST have an [about] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to clarify that the descriptor's id can also be an absolute URI. For instance, by add a sentence here like:
In a [detached RO-Crate](structure.md#detached-ro-crate), the descriptor's `@id` can be
an absolute URI; in this case, its _last path segment_ MUST be `ro-crate-metadata.json`
(or `ro-crate-metadata.jsonld` in legacy crates)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative is rephrasing the previous sentence as:
The _RO-Crate JSON-LD_ MUST contain a self-describing **RO-Crate Metadata File Descriptor**
whose `@id` MUST have `ro-crate-metadata.json` (or `ro-crate-metadata.jsonld` in legacy crates)
as its last path segment, and `@type` [CreativeWork].
@simleo |
Call 2023-03-23 agreed to merge all outstanding PRs. There's outstanding how to do re-construct the relative path -- @simleo may have also thoughts on this now from Workflow Run profile perspective which also needed to this. |
.. to support #183 my logical conclusion is that we need the concept of a Detached RO-Crate.
Suggest definition (from this pull request's structure.md)
Regular RO-Crate
: A crate that has a well-defined RO-Crate Root directory and can carry an explicit payload of local data entities as regular files (combined with Web-based Data Entities where needed). This type of RO-Crate can be suitable for long-term preservation, transfer and publishing, as the RO-Crate Metadata File is stored alongside the crate's payload.
Detached RO-Crate
: A crate without a defined payload directory. In this kind of crate, all data references are absolute. This approach may be suitable for use with dynamic web service APIs and repositories that can't preserve file paths. As the data of these crates can only be Web-based Data Entities, the payload is implicit and must be preserved/transferred/archived independent of the RO-Crate Metadata File.
See further definition of detached RO-Crate
I think this is necessary because of #183 allowing
@id
to be any ID, as here proposed in new sub section Root Data Entity identifier - thenAnd from that my logical conclusion is that the whole concept of "RO-Crate Root" and any relative URIs becomes ambigious and difficult if we no longer have
"@id: ./"
of the Root Dataset and the URI that servesro-crate-metadata.json
no longer is grounded in something similar to a folder.I would hope for some discussion on this in the RO-Crate meeting today 2022-01-27.