Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we need the "Externally Referenced Content" in CLAW/Fedora 4 #564

Closed
whikloj opened this issue Mar 22, 2017 · 13 comments
Closed

Do we need the "Externally Referenced Content" in CLAW/Fedora 4 #564

whikloj opened this issue Mar 22, 2017 · 13 comments

Comments

@whikloj
Copy link
Member

whikloj commented Mar 22, 2017

Currently the Fedora 4 reference implementation and the Fedora API [1] [2] only cover one way to handle binary data that is not stored inside Fedora.

To add to the confusion the External Content in Fedora 4 is akin to the Redirect Referenced Content in Fedora 3. This means that if your binary is not stored inside Fedora, then a request for it sends you a 3XX response code to the alternate location.

Below is from the Fedora 3 Digital Object Model wiki page.

Externally Referenced Content - the content is stored outside the repository and the digital object XML maintains a URL that can be dereferenced by the repository to retrieve the content from a remote location. While the datastream content is stored outside of the Fedora repository, at runtime, when an access request for this type of datastream is made, the Fedora repository will use this URL to get the content from its remote location, and the Fedora repository will mediate access to the content. This means that behind the scenes, Fedora will grab the content and stream in out the the client requesting the content as if it were served up directly by Fedora. This is a good way to create digital objects that point to distributed content, but still have the repository in charge of serving it up.
Redirect Referenced Content - the content is stored outside the repository and the digital object XML maintains a URL that is used to redirect the client when an access request is made. The content is not streamed through the repository. This is beneficial when you want a digital object to have a Datastream that is stored and served by some external service, and you want the repository to get out of the way when it comes time to serve the content up. A good example is when you want a Datastream to be content that is stored and served by a streaming media server. In such a case, you would want to pass control to the media server to actually stream the content to a client (e.g., video streaming), rather than have Fedora in the middle re-streaming the content out.

Remember Fedora 4's External Content is like the "Redirect Referenced Content" type above. This means that Fedora is not involved in retrieving or presenting that content.

The Islandora community (specifically me) have a need to not store some or all binaries in Fedora to provide some flexibility.

However, this raises some questions like how do we handle authorization on calls to the end resource.

Does the Islandora community have a need for the "Externally Referenced Content" type?

Reasons expressed for this are:

  1. Handling Authorization
  2. Keeping HTTP responses consistent
  3. ... add your use case below.
@awoods
Copy link

awoods commented Mar 23, 2017

Continuation of reasons expressed:
...
4. Lifecycle management of binary descriptions relative to binary resource
5. Fixity (with caution)

@bradspry
Copy link

Yes, it seems like a path to CDN support.

Here's a relevant conversation I had with Wim Leers, lead developer of Drupal's CDN module, concerning Fedora Commons and CDN:
https://www.drupal.org/node/2758739

In addition to the video streaming use case, which is crucial because pseudo-streaming isn't robust enough, it is highly desirable to serve up practically every datastream via CDN. The result will be a higher performance experience for public patrons, and great savings on the Internet bandwidth bill.

@pautri
Copy link

pautri commented Mar 30, 2017

Our main reason for using "Externally Referenced Content" exclusively for all primary datastreams (OBJs) in a repository with about 100TB of data is to have more control over the directory structure, which is important to us for defining rules for an HSM storage system and because we are providing local read access to the same resources via Samba shares. We do however still want to provide access (with access control) directly via Islandora as well, so the "Redirect Referenced Content" type does not suit our use case.

@DiegoPino
Copy link
Contributor

Our use case is at METRO similar. We want to be able (we are moving to that now) to have our own long-term storage strategy, which involves:

  • Multiple Terabytes and growing
  • Our own selective checksumming strategy
  • Our own selective backup strategies
  • Our own workflow (move non-frequent access binaries to Glacier/similar cheap storage)
  • heterogenous storage solutions (Local/remote/ cloud) and even file naming

All this because we have been developing a rule system that allows us to build a backend storage strategy, based on size limits, content models, data stream names and even stub datastream creation to make our digital preservation "inclined" initiatives easier and more sustainable (For islandora/Fedora 3.8) and to escape some limiting aspects about Akubra. And we would like to reuse the assets when moving a Fedora 4 API specs based Repository.

And all that under heterogeneous storage providers/technologies.

Resuming: we would like to have a common, consistent (whatever that means) REST/API experience when hitting a non rdf source asset wherever it is stored. Sadly the redirect approach is not consistent (headers, etc) and not flexible enough (no control on some server managed properties and no WebACL) for us.

@jonathangreen
Copy link
Contributor

jonathangreen commented Mar 30, 2017

The LYRASIS use case is very similar to what has been posted here already, but we would like to be able to use tiered storage so that:

  • Infrequently used preservation masters end up in glacier or s3 infrequent access
  • Infrequently used datastream content ends up in some middle tier (s3 or similar)
  • Frequently accessed datastreams end up in in a fast tier of storage (or CDN etc)

I see Externally Referenced Content as a method to achieve this while still being able to keep data in fedora.

@dannylamb
Copy link
Contributor

As of Islandora/Crayfish@0663c54 we are using fcrepo 5's external content capabilities to provide redirects to binary resources stored in the Drupal file system or elsewhere (aws, do, rackspace, etc...) using flysystem.

@mjordan
Copy link
Contributor

mjordan commented May 9, 2019

@dannylamb does documentation on how to use this exist?

@whikloj
Copy link
Member Author

whikloj commented May 9, 2019

@mjordan In Islandora or in Fedora? Because I think this is just used as part of the Flysystem code now.

@mjordan
Copy link
Contributor

mjordan commented May 9, 2019

In Islandora. I haven't spun up a new VM in a while so sorry for not looking first, but for a given object, if I wanted to point to externally hosted content, how would I do that in the node/media edit GUI?

@whikloj
Copy link
Member Author

whikloj commented May 9, 2019

Oh! @dannylamb is just using the externally referenced content to reference the stuff in Drupal from Fedora. This is not to have Drupal look elsewhere for its content.

So you would need to be able to access your content using Flysystem and then it would work.

@mjordan
Copy link
Contributor

mjordan commented May 9, 2019

OK, sorry, I misunderstood, then this is not an end-user feature - is that correct? In other words, as someone creating an object, I can't point to an external URL for any of the media.

@whikloj
Copy link
Member Author

whikloj commented May 9, 2019

I don't think so, you could set up a flysystem adapter to somewhere and stuff content there but I'm not sure if Drupal allows you to reference external content.

@mjordan
Copy link
Contributor

mjordan commented May 9, 2019

OK, thanks for the explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants