Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates existing user docs and adds one more #616

Merged
merged 4 commits into from
May 8, 2017
Merged

Updates existing user docs and adds one more #616

merged 4 commits into from
May 8, 2017

Conversation

bryjbrown
Copy link
Member

GitHub Issue: #510

What does this Pull Request do?

How should this be tested?

Read it and make sure I'm not spreading misinformation

Interested parties

@Islandora-CLAW/committers

@bryjbrown
Copy link
Member Author

Also feel free to suggest new resources to add to the list, or recommend cutting certain things that actually aren't good resources for linked data newbies.

Copy link
Member

@whikloj whikloj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing wrong here, just some suggestions.


* `Resource`: Roughly equivalent to a Fedora 3 object - a conceptual representation of a thing that can contain files or other containers.
* `Non-RDF Source`: Roughly equivalent to a datastream. A Non-RDF Source (or binary) is simply a bitstream (e.g. JPG, PDF, XML, MP3, etc.).
Unlike Islandora 7.x-1.x objects that store metadata and binary files in a predefined way depending on the content model, Islandora CLAW uses [Linked Data Platform Containers](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-container), or LDPCs, to allow resources to contain each other in a flexible way. LDPCs allow one `resource` to act as a collection of other `resources` similar to the way an Islandora 7.x-1.x collection contains objects, or objects contain datastreams. When part of a `resource`, binary files (such as JPG, PDF, MP3, etc) are referred to as [`Non-RDF Sources`](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-non-rdf-source) because their content is not RDF data. `resources` that contain only RDF data are called [`RDF Sources`](https://www.w3.org/TR/ldp/#ldpr-resource).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last sentence is missing a capital letter, resources -> Resources...unless you were trying to keep a common feel for that word.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. I was trying to keep it the same as how it was before, but I feel like capitalizing it really drives home the point that this is a Specially Named Thing. Fixing.

In Islandora CLAW, RDF datastreams (RELS-EXT and RELS-INT) are stored as RDF in Fedora. Binary datastreams are files or `nonRdfResources` (see [PCDM](https://github.com/duraspace/pcdm/wiki)). Descriptive metadata datastreams (MODS, DC, DwC, PBCore, etc) are stored as RDF; [`RDFSource`](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-rdf-source).
In Islandora 7.x-1.x, every object has a specific content model which defined what datastreams it could have and which were absolutely required. Some of these Islandora 7.x-1.x datastreams contained metadata about the object (RELS-EXT, RELS-INT, DC, MODS, PREMIS, etc) while others contained binary files (JPG, PDF, MP3, PNG, TIFF, etc). In Islandora CLAW, all metadata about a resource is stored as RDF attributes directly on the resource itself, whether that resource is a `pcdm:Collection`, `pcdm:Object` or a `pcdm:File`, so we no longer need to separate metadata by type (MODS, DC, PREMIS, etc) and store it in binary files as we did in Islandora 7.x-1.x.

Binary files, such as JPGs, PNGs, MP3s, and PDFs, are handled via `pcdm:Files` which are contained by a parent `pcdm:Object`, similar to how an Islandora 7.x-1.x cmodel may hold a PDF or JPG as a datastream. Unlike Islandora 7.x-1.x, these binary files can actually have their own technical metadata attached them. This is because `pcdm:Collections`, `pcdm:Objects` and even `pcdm:Files` are all `RDF Sources` containing only RDF data, with `pcdm:Files` having links to the URL of the `Non-RDF Source` (binary file) they represent as part of their RDF data in addition to whatever other metadata you may want about the file. Using this system, a `pcdm:Object` can contain as many `pcdm:Files` as necessary, and each `pcdm:File` can have separate metadata about itself and its relationship to other `pcdm:Files` attached to the parent `pcdm:Object`.
Copy link
Member

@whikloj whikloj May 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, its totally correct...I'm wondering if a small diagram might help. Once you start with the whole

...RDF Sources containing only RDF data, with pcdm:Files having links to the URL of...

I feel it might start to get hazy to some, but I'm not sure. Consider this an idea and not a requirement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. We'll probably need a few more diagrams in general. I'll make a separate PR for that when I get some time to draw it out.

- [Wikipedia article on Serialization](https://en.wikipedia.org/wiki/Serialization)
- [W3C’s RDF/XML Syntax Specification](https://www.w3.org/TR/REC-rdf-syntax/)
- [W3C’s RDF 1.1 Turtle](https://www.w3.org/TR/turtle/)
- [W3C’s JSON-LD 1.0](https://www.w3.org/TR/json-ld/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://json-ld.org
and
http://json-ld.org/playground/
are great JSON-LD resources

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, totally forgot about those! Adding those now.

@@ -27,13 +27,16 @@ Fedora 3 objects are FOXML (Fedora Object eXtensible Markup Language) documents,
* `System Properties`: A set of system-defined descriptive properties that is necessary to manage and track the object in the repository.
* `Datastream(s)`: The element in a Fedora digital object that represents a content item.

In Fedora 4 , what we would have called `objects` are now referred to as `resources` and are not composed of XML; instead, they are stored in ModeShape as nodes with RDF properties. They can contain the following elements:
In Fedora 4 , what we would have called `objects` are now referred to as [`Resources`](https://www.w3.org/TR/ld-glossary/#resource) (and *everything* in Fedora 4 is a `Resource`). Instead of being composed of XML as they were in Fedora 3, they are stored in [ModeShape](http://modeshape.jboss.org/) as nodes with RDF properties. A `Resource` in Islandora CLAW may [contain](https://www.w3.org/TR/ldp/#dfn-containment) RDF data or binary files, similar to the way Islandora 7.x-1.x objects stored descriptive metadata and binary files in datastreams.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Resource in Islandora CLAW may contain RDF data or binary files, similar to the way Islandora 7.x-1.x objects stored descriptive metadata and binary files in datastreams.

NonRdfSource's are Resources and do not contain anything. Containment is limited to containers.


* `Resource`: Roughly equivalent to a Fedora 3 object - a conceptual representation of a thing that can contain files or other containers.
* `Non-RDF Source`: Roughly equivalent to a datastream. A Non-RDF Source (or binary) is simply a bitstream (e.g. JPG, PDF, XML, MP3, etc.).
Unlike Islandora 7.x-1.x objects that store metadata and binary files in a predefined way depending on the content model, Islandora CLAW uses [Linked Data Platform Containers](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-container), or LDPCs, to allow resources to contain each other in a flexible way. LDPCs allow one `Resource` to act as a collection of other `Resources` similar to the way an Islandora 7.x-1.x collection contains objects, or objects contain datastreams. When part of a `Resource`, binary files (such as JPG, PDF, MP3, etc) are referred to as [`Non-RDF Sources`](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-non-rdf-source) because their content is not RDF data. `Resources` that contain only RDF data are called [`RDF Sources`](https://www.w3.org/TR/ldp/#ldpr-resource).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move the clarification between RdfSources and NonRdfSources up to the paragraph above? Then you could cut out some of the preceding paragraph.

* `Non-RDF Source`: Roughly equivalent to a datastream. A Non-RDF Source (or binary) is simply a bitstream (e.g. JPG, PDF, XML, MP3, etc.).
Unlike Islandora 7.x-1.x objects that store metadata and binary files in a predefined way depending on the content model, Islandora CLAW uses [Linked Data Platform Containers](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-container), or LDPCs, to allow resources to contain each other in a flexible way. LDPCs allow one `Resource` to act as a collection of other `Resources` similar to the way an Islandora 7.x-1.x collection contains objects, or objects contain datastreams. When part of a `Resource`, binary files (such as JPG, PDF, MP3, etc) are referred to as [`Non-RDF Sources`](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-non-rdf-source) because their content is not RDF data. `Resources` that contain only RDF data are called [`RDF Sources`](https://www.w3.org/TR/ldp/#ldpr-resource).

CLAW makes use of the [Portland Common Data Model (PCDM)](https://github.com/duraspace/pcdm/wiki) as a layer of abstraction over LDPCs to make containment simpler to understand for users; a `pcdm:Collection` may contain other `pcdm:Collections` or `pcdm:Objects` (similar to an Islandora 7.x-1.x collection content model), and a `pcdm:Object` may contain other `pcdm:Objects` (similar to the way an Islandora 7.x-1.x compound object has child objects) or `pcdm:Files` (similar to the way Islandora 7.x-1.x objects have datastreams).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for bringing up that basically all our turn-key datatype can be compound objects.


### Datastreams
In Islandora CLAW, RDF datastreams (RELS-EXT and RELS-INT) are stored as RDF in Fedora. Binary datastreams are files or `nonRdfResources` (see [PCDM](https://github.com/duraspace/pcdm/wiki)). Descriptive metadata datastreams (MODS, DC, DwC, PBCore, etc) are stored as RDF; [`RDFSource`](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-rdf-source).
In Islandora 7.x-1.x, every object has a specific content model which defined what datastreams it could have and which were absolutely required. Some of these Islandora 7.x-1.x datastreams contained metadata about the object (RELS-EXT, RELS-INT, DC, MODS, PREMIS, etc) while others contained binary files (JPG, PDF, MP3, PNG, TIFF, etc). In Islandora CLAW, all metadata about a resource is stored as RDF attributes directly on the resource itself, whether that resource is a `pcdm:Collection`, `pcdm:Object` or a `pcdm:File`, so we no longer need to separate metadata by type (MODS, DC, PREMIS, etc) and store it in binary files as we did in Islandora 7.x-1.x.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add RELS-EXT and RELS-INT to the list of no longer needed metadata datastreams. And perhaps add that putting RDF on the NonRdfSources serves the same purpose as RELS-INT in the paragraph below.

## Islandora 7.x-1.x (with Fedora 3)
Islandora 7.x-1.x is "middleware" for Drupal 7.x and Fedora 3, meaning that it fits as a layer in between these two systems and acts as a bridge allowing them to talk to each other. This is sometimes expressed as a hamburger:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more inclined to describe 7.x as Drupal modules that talk to a Fedora server instead of Drupal's database (although that is used in places as well).

![image](../assets/claw-chimera.png)

Or, for a diagram that doesn't involve food or animals:
Islandora CLAW does more than simply replace that base layer with Fedora 4. It is a total re-architecting of the interaction between the various pieces. Rather than a hamburger, Islandora CLAW is a [chimera](https://en.wikipedia.org/wiki/Chimera_(mythology)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLAW is where it really becomes middleware. I'd maybe add that here.


This new structure has several advantages:
Like Islandora 7.x-1.x, Islandora CLAW uses Drupal modules to extend Drupal's native functionality to handle new types of content (Fedora Resources), but unlike Islandora 7.x-1.x, Islandora CLAW contains a completely new layer of "plumbing" between Drupal, Fedora, Blazegraph (CLAW's default triplestore), Solr and any other [microservices](https://en.wikipedia.org/wiki/Microservices) to allow all of these systems to pass messages to each other and stay in sync. This new structure has several advantages:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace 'other microservices' with just 'microservices'. Otherwise this would imply Solr an Blazegraph are microservices.

@bryjbrown
Copy link
Member Author

Thanks for all the great feedback, @dannylamb! New commit should address all of these issues. I rewrote some of these parts, so there may be fresh 'bugs' to address...

@dannylamb
Copy link
Contributor

@bryjbrown Thanks so much for this. I'm going to let this sit for a bit to give folks form the CLAW call a chance to comment, and then I'll merge it.


Binary files, such as JPGs, PNGs, MP3s, and PDFs, are handled via `pcdm:Files` which are contained by a parent `pcdm:Object`, similar to how an Islandora 7.x-1.x cmodel may hold a PDF or JPG as a datastream. Unlike Islandora 7.x-1.x, these binary files can actually have their own technical metadata attached them. This is because `pcdm:Collections`, `pcdm:Objects` and even `pcdm:Files` are all `RDF Sources` containing only RDF data, with `pcdm:Files` having links to the URL of the `Non-RDF Source` (binary file) they represent as part of their RDF data in addition to whatever other metadata you may want about the file. Using this system, a `pcdm:Object` can contain as many `pcdm:Files` as necessary, and each `pcdm:File` can have separate metadata about itself and its relationship to other `pcdm:Files` attached to the parent `pcdm:Object`.
Binary files, such as JPGs, PNGs, MP3s, and PDFs, are handled via `pcdm:Files` which are contained by a parent `pcdm:Object`, similar to how an Islandora 7.x-1.x cmodel may hold a PDF or JPG as a datastream. Unlike Islandora 7.x-1.x, these binary files can actually have their own technical metadata attached them. This is because `pcdm:Collections`, `pcdm:Objects` and even `pcdm:Files` are all `RDF Sources` containing only RDF data, with `pcdm:Files` having links to the URL of the `Non-RDF Source` (binary file) they represent as part of their RDF data in addition to whatever other metadata you may want about the file. Using this system, a `pcdm:Object` can contain as many `pcdm:Files` as necessary, and each `pcdm:File` can have separate metadata about itself and its relationship to other `pcdm:Files` attached to the parent `pcdm:Object`, serving the same purpose RELS-INT datastreams served in Islandora 7.x-1.x.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention that people can still have their MODS/PBCORE/DC XML datastreams as a NonRdfSource (pcdm:File) if they really want to store an XML representation of RDF?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that might be a good idea. But, with the caveat that I don't see us supporting editing or indexing that anytime soon.

Copy link
Member Author

@bryjbrown bryjbrown May 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruebot @whikloj Even though you can do this, why would you want to? I feel like putting this in the docs would encourage people to do it, and then they might get wrong ideas like the MODS auto-updates when the RDF changes and other misunderstandings.

I'm 100% willing to add this bit to the docs, I just want to make sure I understand why first.

Copy link
Contributor

@dannylamb dannylamb May 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryjbrown IMO you don't want to be doing anything like that. But you can if you want to move into f4 and migrate iteratively into something that will work with CLAW.

If we mention it, we need to mention that by doing it you miss out on pretty much everything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dannylamb Fair enough, updating this now.

@bryjbrown
Copy link
Member Author

New section about pcdm:Files:

'Note that you can use a pcdm:File to represent a file of metadata, such as MODS, DC, or PBCore, in case you would like to preserve a copy of an object's legacy metadata when migrating into Fedora 4. These metadata files will be treated like any other binary file in Islandora CLAW, and will not be indexed or editable through the GUI.'

@dannylamb dannylamb merged commit d56712d into Islandora:master May 8, 2017
jonathangreen pushed a commit that referenced this pull request May 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants