Home

Michael J. Giarlo edited this page Aug 9, 2016 · 22 revisions

Portland Common Data Model

Introduction

The Portland Common Data Model (PCDM) is a flexible, extensible domain model that is intended to underlie a wide array of repository and DAMS applications. The primary objective of this model is to establish a framework that developers of tools (e.g., Hydra-based engines, such as Sufia, Curate, Worthwhile, Avalon; Islandora; custom Fedora sites) can use for working with models in a general way, allowing adopters to easily use custom models with any tool. Given this interoperability goal, the initial work has been focused on structural metadata and access control, since these are the key actionable metadata.

To encourage adoption, this model must support the most complex use cases, which include rich hierarchies of inter-related collections and works, but also elegantly support the simplest use cases, such as a single user-contributed file with a few fields of metadata. It must provide a compact interface that tool developers can easily implement, but also be extensible enough for adopters to customize to their local needs.

As the community migrates to Fedora 4, much of our metadata is migrating to RDF. This model encourages linked data best practices, such as using URIs to identify all resources, using widely-used vocabularies where possible, and subclassing existing classes and properties when creating new terms.

Source Ontology

Scope

Work on this model extends across multiple communities, but there is no expectation that everyone in these communities will want to use this model. Initial discussions were focused on interoperability within the Hydra community (including some who use non-Fedora backends) and then expanded to include people who use Islandora and other tools. This diversity, and the diversity of use cases discussed, means that we don't expect every adopter to implement this model in the same way or with the same tools. We expect implementers to extend this model to fit their local needs, and hope that the model will help provide a framework for implementers to share RDF vocabularies and implementations.

Namespaces

Prefix URI
acl http://www.w3.org/ns/auth/acl#
dc http://purl.org/dc/elements/1.1/
dcterms http://purl.org/dc/terms/
pcdm http://pcdm.org/models#
foaf http://xmlns.com/foaf/0.1/
gen http://www.w3.org/2006/gen/ont#
iana http://www.iana.org/assignments/relation/ (see note)
ldp http://www.w3.org/ns/ldp#
ore http://www.openarchives.org/ore/terms/
rdfs http://www.w3.org/2000/01/rdf-schema#

Note on IANA link relations namespace

While the HTML page describing the IANA link relations is http://www.iana.org/assignments/link-relations/, the actual namespace URI is http://www.iana.org/assignments/relation/. The namespace URI and term URIs are not dereferenceable, and the documentation (RFC 5988) is very oblique (only referencing the full URI in the "self" example). So the confusion is understandable and the "/relation/" namespace URI is correct.

Domain Model

Domain model ORE ordering extension

Core Classes

pcdm:Object

Subclass of: ore:Aggregation

An Object is an intellectual entity, sometimes called a "work", "digital object", etc. Objects have descriptive metadata, access metadata, may contain files and other Objects as member "parts" or "components". Each level of a work is therefore represented by an Object instance, and is capable of standing on its own, being linked to from Collections and other Objects. Member Objects can be ordered using the ORE Proxy class (see Ordering extension below).

Property Range Usage Obligation
Has Member (pcdm:hasMember < ore:aggregates pcdm:Object Links to a related Object. Typically used to link to component parts, such as a book linking to a page. Note on transitivity: hasMember is not defined as transitive, but applications may treat it as transitive as local needs dictate. min 0, max unbounded
Has File (pcdm:hasFile < ore:aggregates pcdm:File Links to a File contained by this Object. min 0, max unbounded Any resource may be contained by at most 1 other resource. Other entities linking to a file should generally link to the parent Object instead.
Has Related Object (pcdm:hasRelatedObject < ore:aggregates pcdm:Object Links to an Object that is related to this Object, but not a component part of it. Typically used for documentation, thumbnails, etc. min 0, max unbounded

pcdm:Collection

Subclass of: ore:Aggregation

A Collection is a group of resources. Collections have descriptive metadata, access metadata, and may links to Objects and/or Collections. By default, member Objects and Collections are an unordered set, but can be ordered using the ORE Proxy class (see Ordering extension below).

Property Range Usage Obligation
Has Related Object (pcdm:hasRelatedObject < ore:aggregates) pcdm:Object Links to an Object that is related to the collection, but not a member of it. Typically used for documentation, thumbnails, etc. min 0, max unbounded
Has Member (pcdm:hasMember < ore:aggregates) pcdm:Collection ∪ pcdm:Object Links to an Object that is a member of this Collection, or a child Collection. Note on transitivity: hasMember is not defined as transitive, but applications may treat it as transitive as local needs dictate. min 0, max unbounded

pcdm:File

A File is a sequence of binary data and is described by some accompanying metadata. The metadata typically includes at least basic technical metadata (size, content type, modification date, etc.), but can also include properties related to preservation, digitization process, provenance, etc. Files MUST be contained by exactly one Object.

Property Range Usage Obligation
Size (dcterms:extent) dcterms:SizeOrDuration File size in bytes, typically system-supplied. 0 or 1
Content Type (dc:format) xsd:string MIME type 0 or 1
Checksum premis:hasMessageDigest?, nfo:hashValue?, fedora:digest? xsd:string or xsd:anyURI May have more than one checksum using different algorithms (differentiated with either URN syntax or separate properties for each algorithm).
Creation Date (dcterms:created) rdfs:Literal 0 or 1
Modification Date (dcterms:modified) rdfs:Literal The last modification date 0 or 1
Label (rdfs:label) xsd:string A human readable label or string that can be used as a simple surrogate for the resource. min 0, max unbounded

External Content

When binary data is served by another application, it may be appropriate to create a File object with no content to model the external content, hold related technical metadata, etc. Fedora 4's external content feature is one way to implement this link to the external content.

Membership vs. Aggregation

Membership and aggregation express different relationships between Collections and Objects:

  • pcdm:hasMember indicates that a resource is a constituent part of the parent resource, such as a page within a book, or a song within an album. This is the typical relationship between these entities.
  • pcdm:hasRelatedObject indicates a different kind of relationship, typically around documenting the parent entity. For example, the cover image within the book or album.

Ordering Extension

This optional class (and additional properties on Collection and Object) serve as an extension to the core classes to support ordering the members of a Object or Collection. Members do not have to have an ordering proxy node (i.e., some members may be ordered while others are unordered), and members may have more than one ordering proxy node, allowing them to appear in multiple positions in the list.

ore:Proxy

A Proxy indicates a Resource in the context of a Collection (see: http://www.openarchives.org/ore/1.0/datamodel#Proxy)

Property Range Usage Obligation
Proxy For (ore:proxyFor) rdf:Resource Links to the resource being ordered. min 1, max 1
Proxy In (ore:proxyIn) ore:Aggregation Links to the aggregation the resource is being ordered in. min 1, max 1
Next (iana:next) ore:Proxy Links to the resource after the current resource (omit for the last resource). min 0, max 1
Prev (iana:prev) ore:Proxy Links to the resource before the current resource (omit for the first resource). min 0, max 1

pcdm:Collection (extension)

To improve usability and performance of sorting, a Collection with ordered members may link to the first and last resources.

Property Range Usage Obligation
First (iana:first) ore:Proxy Links to the Proxy for the first Object in the collection. min 0, max 1
Last (iana:last) ore:Proxy Links to the Proxy for the last Object in the collection. min 0, max 1

pcdm:Object (extension)

To improve usability and performance of sorting, an Object with ordered member Objects may link to the first and last Objects.

Property Range Usage Obligation
First (iana:first) ore:Proxy Links to the Proxy for the first File in the Object. min 0, max 1
Last (iana:last) ore:Proxy Links to the Proxy for the last FileS in the Object. min 0, max 1

Notes

  • By default Collections and Objects are unordered, with the presence of the iana:first and iana:last properties indicating a Collection/Object is ordered.
  • Proxies are not pcdm:contained by anything
  • Files cannot be ordered within an Object
  • Related objects cannot be ordered within an Object or Collection
  • It is possible to have a File associated with a Proxy, for example to model a collection specific thumbnail, however support for this is not required by the application profile. (question)

WebACL

WebACLs are used to specify what actions users can perform on resources. Each ACL is created as its own resource which links to the users, resources, and actions allowed. Users and resources can both be identified individually or using classes. The WebACL ontology includes several actions (read, write, append, control). Hydra access control has historically also had a discover permission, and adopters may create new actions for permissions they wish to assign separately (e.g., download).

Each Collection, Object and File instance can be assigned its own Web ACL. For example, an Object and its thumbnail image might be assigned a public ACL, but the high-resolution master image might be limited to a specific group of users.

acl:Authorization

Property Range Usage Obligation
Agent (acl:agent) foaf:Agent Individual user this ACL applies to. min 0, max unbounded
Agent Class (acl:agentClass) rdfs:Class Class of users this ACL applies to. min 0, max unbounded
Mode (acl:mode) rdfs:Class Actions permitted by this ACL (e.g., acl:Read, acl:Write, hydra:Discover, etc.). min 1, max unbounded
Resource (acl:accessTo) gen:InformationResource Individual resource this ACL applies to. min 0, max unbounded
Resource Class (acl:accessToClass) rdfs:Class Class of resources this ACL applies to. min 0, max unbounded

Appendix I. Usage Guidelines

Different adopters will adopt different conventions for how to use these classes. But these are some guidelines for how to structure complex objects.

Structure

  • Create a single Object instance and attach descriptive metadata for the work as a whole.
  • If there is only a single content file (plus derivatives), attach it directly to the Object.
  • If there are multiple content files, attach each content file and its derivatives to a separate component Object instance. This keeps derivatives from different content files clearly separated, so each content file can have its own thumbnail image, OCR text, etc.

Descriptive Metadata

  • For broadest interoperability, we suggest using commonly-used vocabularies like Dublin Core and FOAF for descriptive metadata.
  • In cases where these vocabularies don’t meet your needs, use them as much as possible and use other vocabularies (or create your own) to complement them.
  • Where possible, use URIs from established vocabularies when referring to names, subjects, places, and other entities that would typically have authority records in traditional library systems.

Technical Metadata

The Technical Metadata Application Profile defines properties for expressing technical metadata about files, etc. Please use that as a definitive reference and add any comments or make changes directly on that page.

General notes about Technical Metadata:

  • Attach technical metadata directly to File instances. This includes format information, runtime/codec/etc. details, digitization information, provenance, etc.
  • Descriptive metadata beyond a simple filename or label should be attached to the parent Object record instead.

Rights Metadata

The Rights Metadata Recommendation provides guidelines for recording rights metadata.

Appendix II. Related Resources

The finer points of this model and how to implement it in Fedora 4/LDP are still under active discussion. Below are the current working documents:

  • The Hydra Metadata Working Group has been meeting to discuss technical, descriptive, and rights metadata, in addition to unresolved structural metadata issues.
  • LDP-PCDM-F4 In Action provides a detailed walkthrough of creating PCDM objects and files in Fedora 4 using the LDP protocol. The walkthrough is also available as a Github repository containing the files and shell scripts.
  • Hydra::PCDM - Hydra implementation of the PCDM model.

Prior Work

This model came out of discussions at HydraConnect 2, which were fleshed out in Google Docs, Github issues, at the Hydra Developers - Making Progress Fall 2014 workshop and Code4Lib 2015. Here are some of the working documents: