Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize Media entities as LDP-RS describing the File, not itself #662

Closed
dannylamb opened this issue Jun 13, 2017 · 17 comments
Closed

Serialize Media entities as LDP-RS describing the File, not itself #662

dannylamb opened this issue Jun 13, 2017 · 17 comments
Assignees
Milestone

Comments

@dannylamb
Copy link
Contributor

Right now a Media entity, when serialized, has itself as the subject and contains a triple of the form <uri_of_media> iana:describes <uri_of_file>, but really it needs to be <uri_of_file> iana:describedby <uri_of_media> to be in line with how Fedora generates a LDP-RS for every LDP-NR that gets created. This amounts to adding a special case for Media entities in the jsonld module.

Here's what it looks like now (non-relevant triples removed for brevity):

{
    "@graph":[
        {
            "@id":"http:\/\/localhost:8000\/media\/1?_format=jsonld",
           ...
            "http:\/\/www.iana.org\/assignments\/relation\/describes":[
                {
                    "@id":"http:\/\/localhost:8000\/sites\/default\/files\/2017-06\/sample.jp2"
                }
            ]
        }
        ...
}

And here's what it should look like:

{
    "@graph":[
        {
            "@id":"http:\/\/localhost:8000\/sites\/default\/files\/2017-06\/sample.jp2",
           ...
            "http:\/\/www.iana.org\/assignments\/relation\/describedby":[
                {
                    "@id":"http:\/\/localhost:8000\/media\/1?_format=jsonld"
                }
            ]
        }
        ...
}
@DiegoPino
Copy link
Contributor

DiegoPino commented Jun 21, 2017

@dannylamb is that something you want to have fixed as such? Since Media entities are not file entities, not sure how to handle that. I would have guessed that media entities were a way of managing nicely images, etc, but the real Non RDF Source payload would come from one of the file entities connected to them.
Would that not leave all the properties that are part of the media entity but not of (one of) the files that are part of the media entity out?

jsonld module handles, or at least would like to handle this, as generic and ldp-less as possible: Says the jsonld module 😺

@dannylamb
Copy link
Contributor Author

@DiegoPino That's precisely the conundrum. The Drupal and LDP models a bit at odds. So long as we're ok with the fact that the JSONLD we generate for Media has the wrong subject w/r/t LDP, then it's reasonable to do this conversion elsewhere.

@dannylamb
Copy link
Contributor Author

And FWIW I'm totally ok with that.

dannylamb pushed a commit to dannylamb/CLAW that referenced this issue Feb 8, 2018
@dannylamb dannylamb added this to the 1.0.0 milestone May 9, 2019
@whikloj whikloj self-assigned this May 10, 2019
@rosiel
Copy link
Member

rosiel commented May 10, 2019

In my understanding, the Media entity in Drupal is "a wrapper for the file" and any fields/values on a Media entity - for example: ebucore:height is 2394px, or mimetype is image/tiff, are semantically the properties of the file. It's just that file entities, in Drupal, can't have fields attached. So the fields go on the Media. Any other fields or properties you attach to a Media should, I think, describe the file proper (otherwise put it on the node).

The Media contains the same information, and is analogous to, the /fcr:metadata document describing the binary. However, it's different structurally - in Drupal it's "the middleman" tying a node to a file. In Fedora, the file itself points to the node, through its properties (which are accessed through the document at /fcr:metadata).

Taking the Media's JSONLD serialization, it would say: (using REALLY LAZY shorthand)

<DRUPAL/media/1> pcdm:fileOf <DRUPAL/node/1>,
      schema:sameAs <DRUPAL/_flysystem/fedora/stuff/filename> .

This does not make a lot of sense because the media is not, semantically, "the same as" the file nor "a file of" the node.

It's only when in Fedora, and the subject is swapped out for the Fedora Binary, that it makes sense:

<FEDORA/fcrepo/rest/stuff/filename>  pcdm:fileOf <DRUPAL/node/1> ,
     schema:sameAs <DRUPAL/_flysystem/fedora/stuff/filename> . 

that it makes semantic sense. I don't want to put too much weight in the Media's jsonld here because it's misleading as LD, but it works as the in-transit-to-fedora construct.

Here's a diagram of the JSONLD of a node, media, and file, along with the fedora objects and their types (both according to HTTP headers, and to the documents they delivered).
Islandora-and-fedora-jsonld-2019-05-09

Point is, I agree that the serialization would make semantic sense if you make the main Subject (id) the URI of the file in Drupal rather than the Media in drupal. (though it already contains a schema:sameAs to that effect, so maybe . As far as I can tell (using a CLAW instance that is some days out of date) the original problematic triple, <uri_of_media> iana:describes <uri_of_file>, is not present, so i'm not sure what needs to be done for this issue.

@whikloj
Copy link
Member

whikloj commented May 10, 2019

@dannylamb so I did this and it has no effect. I am guessing because you are only grabbing the media elements graph and by moving this triple from the media -> file to file -> media it is outside that graph. So its the same as removing it.

@whikloj
Copy link
Member

whikloj commented May 10, 2019

Whoa! missed the @rosiel comment. reading now.

@whikloj
Copy link
Member

whikloj commented May 10, 2019

Okay, I agree with @rosiel above. This is not working due to our serializing method, but even if it did it wouldn't necessarily make sense.

A simple way to add this (not that it makes sense) would be to replace iana:describes with iana:describedby and make both the subject and object the media element.

So <drupal/media/2> iana:describes <drupal/file/3> becomes <drupal/media/2> iana:describedby <drupal/media/2>, again this doesn't make sense.

But in Fedora it would become
<fedora/NonRdfSource/1234-5678> iana:describedby <drupal/media/2>.

I'm not sure its worth the hassle though.

@rosiel
Copy link
Member

rosiel commented May 11, 2019

This issue is from 2017, and I don't see any iana:describes in the graph returned from a media in 2019 - I think it was removed a while ago.

Using curl, I see it in the header for /media/x?_format=jsonld. Link: <http://DOMAIN/_flysystem/fedora/2019-05/IMG_0606.JPG>; rel="describes"; type="image/jpeg". This statement is ... accurate, no?

To rewrite the original issue to reflect current behaviour:

Right now a Media entity, when serialized, has itself as the subject and contains triples of the form <uri_of_media> ebucore:height '3024', but really it needs to be <uri_of_file> ebucore:height '3024' to be semantically accurate. Also, the existence of a 'media document' describing the file is in line with how Fedora generates a LDP-RS for every LDP-NR that gets created, since even in its HTTP headers it claims it iana:describes <uri_of_file>.

@dannylamb
Copy link
Contributor Author

@rosiel That link header is indeed accurate. As is your summary about the subject uri. The missing piece we should add on top is an iana:descibedby with the media's url in the RDF. That would tie it up all nicely.

To stick with your example, something like this in a jsonld GET response for a media

<uri_of_file> ebucore:height '3024'
<uri_of_file> iana:describedby <uri_of_media>

with a rel="describes" link header pointing to <uri_of_file>.

@dannylamb
Copy link
Contributor Author

Ok, here's what we have now

{
   "@graph":[
      {
         "@id":"http:\/\/localhost:8000\/media\/1",
         "@type":[
            "http:\/\/pcdm.org\/models#File",
            "http:\/\/pcdm.org\/use#OriginalFile"
         ],
         "http:\/\/purl.org\/dc\/terms\/title":[
            {
               "@value":"Original Image",
               "@language":"en"
            }
         ],
         "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#label":[
            {
               "@value":"Original Image",
               "@language":"en"
            }
         ],
         "http:\/\/schema.org\/author":[
            {
               "@id":"http:\/\/localhost:8000\/user\/1"
            }
         ],
         "http:\/\/schema.org\/dateCreated":[
            {
               "@value":"2019-05-15T19:21:42+00:00",
               "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#dateTime"
            }
         ],
         "http:\/\/schema.org\/dateModified":[
            {
               "@value":"2019-05-15T19:22:12+00:00",
               "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#dateTime"
            }
         ],
         "http:\/\/www.ebu.ch\/metadata\/ontologies\/ebucore\/ebucore#height":[
            {
               "@value":"1018",
               "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#int"
            }
         ],
         "http:\/\/pcdm.org\/models#fileOf":[
            {
               "@id":"http:\/\/localhost:8000\/node\/1"
            }
         ],
         "http:\/\/www.ebu.ch\/metadata\/ontologies\/ebucore\/ebucore#hasMimeType":[
            {
               "@value":"image\/jpeg",
               "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#string"
            }
         ],
         "http:\/\/www.ebu.ch\/metadata\/ontologies\/ebucore\/ebucore#width":[
            {
               "@value":"904",
               "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#int"
            }
         ],
         "http:\/\/schema.org\/sameAs":[
            {
               "@value":"http:\/\/localhost:8000\/_flysystem\/fedora\/2019-05\/Flemming-Magic.jpg"
            }
         ]
      },
      {
         "@id":"http:\/\/localhost:8000\/user\/1",
         "@type":[
            "http:\/\/schema.org\/Person"
         ]
      },
      {
         "@id":"http:\/\/localhost:8000\/node\/1",
         "@type":[
            "http:\/\/pcdm.org\/models#Object"
         ]
      }
   ]
}

Feels like we've batted around two ways of doing this

  1. Just change schema:sameAs to iana:describes, and then process the rest to be more fedora/ldp-ish in Milliner. This is done with a simple config change using Context, and would result in the following from Drupal (editied for brevity):
{
   "@graph":[
      {
         "@id":"http:\/\/localhost:8000\/media\/1",
         "@type":[
            "http:\/\/pcdm.org\/models#File",
            "http:\/\/pcdm.org\/use#OriginalFile"
         ],
         "http:\/\/pcdm.org\/models#fileOf":[
            {
               "@id":"http:\/\/localhost:8000\/node\/1"
            }
         ],
         "http:\/\/www.iana.org\/assignments\/relation\/describes":[
            {
               "@value":"http:\/\/localhost:8000\/_flysystem\/fedora\/2019-05\/Flemming-Magic.jpg"
            }
         ]
         ...
      },
      ...
   ]
}

which isn't 100% over-the-top semantically correct, but is actually the more intuitive solution to folks coming from outside the ldp sphere. We'd then further process it in Crayfish/Alpaca to have it make sense in fedora and the triplestore.

  1. We replace the @id to be that of the file, and use iana:describedby to reference the media. This would look like (again, edited for brevity):
{
   "@graph":[
      {
         "@id":"http:\/\/localhost:8000\/_flysystem\/fedora\/2019-05\/Flemming-Magic.jpg",
         "@type":[
            "http:\/\/pcdm.org\/models#File",
            "http:\/\/pcdm.org\/use#OriginalFile"
         ],
         "http:\/\/pcdm.org\/models#fileOf":[
            {
               "@id":"http:\/\/localhost:8000\/node\/1"
            }
         ],
         "http:\/\/www.iana.org\/assignments\/relation\/describedby":[
            {
               "@value":"http:\/\/localhost:8000\/media\/1"
            }
         ]
         ...
      },
      ...
   ]
}

This is the most semantically correct, but may come off as strange to the uninitiated. It would require less processing to get into the right shape for Fedora and the Triplestore, though.

@rosiel
Copy link
Member

rosiel commented May 16, 2019

No. 2 makes sense.
No. 1 would be a regression back into the semantic flaw from 2017 that caused this issue to be created.

@dannylamb
Copy link
Contributor Author

@rosiel @whikloj PRs are up^^

Testing instructions are in #136

@mjordan
Copy link
Contributor

mjordan commented May 22, 2019

@rosiel your diagram in #662 (comment) is epic. Mind if I use it in my Open Repositories and iCamp slide decks, with full and genuflecting attribution?

@rosiel
Copy link
Member

rosiel commented May 22, 2019

@mjordan Yes, but no genuflecting please, and it was a product of collaborating with @elizoller.

[edit: also, unless things change by then, please include the fileOf arrow that gets crossed out and redirected to Drupal. ;) ]

@mjordan
Copy link
Contributor

mjordan commented May 22, 2019

OK, will nix the genuflecting, cocredit @elizoller, and note updates.

😃

@elizoller
Copy link
Member

These might be right?
Islandora 8 - Drupal Node and Fedora Resource - Service File
Islandora 8 - Drupal Node and Fedora Resource - Original File

@mjordan
Copy link
Contributor

mjordan commented May 22, 2019

@elizoller++

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants