Descriptive properties for ActivityStream data #36

mixterj · 2018-06-28T20:56:40Z

A list of AS properties that could be used to help aggregators/harvesters know determine if they are interested in crawling the AS.

Please note that all of the stringy properties are meant for harvesters/aggregators to ingest and build indexes around and are NOT intended for the IIIF Registry to index and make searchable.

Important (based on discussed issues)

attributedTo – organization that the AS is associated with
name – human readable label for AS (maybe Collection name or Organization name – not sure or very opinionated). Basically I want a string to index for searching (after an aggregator has parsed the AS).
summary – human readable text description for the AS. Again, just a bag of words to index for potential searching (after an aggregator has parsed the AS).
tag – list of ‘keywords’ or ‘subjects’ for the AS – connected to Controlled Vocabularies ideally since these are Objects that ‘require’ URIs.

Potential (not discussed but might be useful)

startTime – date the AS was first published (maybe use published?)
updated – date when the AS was last updated
generator – Thing that generated the AS – such as CONTENTdm – maybe more of interest if connected to a specific Activity in the AS??
audience - people interested in the AS

aisaac · 2018-06-28T22:12:46Z

While I would tend to find the list of field relevant, I am very unconvinced that we should use AS itself for representing what essentially is dataset-level description. Maybe some of the elements in the "potential" part are are the level of the Activity Stream itself, but all the element in the "important" part belong to what is behind the Activity Stream, i.e. a dataset. And unless I am pointed to a serious process on behalf of AS conceptors to define and meet the requirements for describing datasets, I would recommend to use W3C and Linked Data standard or established vocabularies that are devoted to this, like DCAT or VOiD. We could have the AS bear only the fields that are specific to the business of generating and publishing an AS, and then link to an instance of say, dcat:Dataset, that would carry the dataset-level metadata. This is much better for interoperability and effectiveness. Soon a lot of institutions, especially in the EU, will feel strongly encouraged to publish description of their datasets using one of these vocabularies anyway. And otherwise we'll be again in a "shadowing" situation, like when the Presentation API encourages to copy some of the original descriptive metadata into its own metadata element. For Presentation I see a good case, for Activity Streams I think it is really thinner.

aisaac · 2018-06-28T22:14:28Z

AS even redefined their own property for 'name'. How interoperable is that?

azaroth42 · 2018-07-11T15:46:25Z

The intent of using AS terms here is to maximize the likelihood that generic producers and consumers can be used to produce and consume the resulting documents. By creating our own profile that imports other ontologies into the AS document, we're making certain that only IIIF organizations will be interoperable. This is exacerbated by the ongoing DXWG work to produce a new version of DCAT -- we would be working with a moving target, or instantly out of date.

Secondly, I think we need to be careful to distinguish between the various resources in play:

The activities that create, update and delete IIIF resources
The IIIF resources themselves
The information from which the IIIF resources are generated
The real world objects that the information is about, that the IIIF resources somehow represent.

What is useful for discovery is the third bullet above -- the machine readable information. Which is at least two steps removed from the ActivityStream.

And in terms of scope, coming up with a profile of DCAT in JSON-LD that is congruous with AS and IIIF design patterns seems like significant expansion of the charter into an area of limited value and not insignificant complexity.

Proposals:

I could definitely see a seeAlso from the AS Collection to other dataset level metadata such as DCAT or VOID. We could recommend that.
Use the list provided by Jeff as the starting point and see how far we get. If there's implementer feedback that something more detailed is important to be specified by IIIF, then we can take that on when we need to rather than front loading it at the expense of other tasks.

aisaac · 2018-07-11T16:00:16Z

On using AS or not: - who is using these AS fields and who is using DCAT? AS fields are idiosyncratic in the sense that they've been designed while ignoring everything else done by the community. - if the level of what we describe is that of datasets then we should use the standard for describing datasets - DCAT is ongoing revision but the DXWG has been very clear that the update is backward compatible. I'm not expecting any DCAT property to disappear, especially these corresponding to the elements identified by Jeff. My proposal would be to start with Jeff's elements, but using their DCAT implementation not the AS ones.

aisaac · 2018-07-26T10:14:17Z

@mixterj I have tried to looking at your metadata fields (other than the contentious attributedTo), from the perspective of whether they would apply to the level of the AS itself or the datasets that the AS "publishes".

I think 'name', 'summary', 'tags' and 'audience' would be rather at the dataset level. Even when we want to describe the situation whereby an organization publishes an AS for someone else's dataset, I don't see this organization adding information for such fields, on top of what would be "inherited" from the dataset behind the AS. I mean, if we would like to have a 'summary' strictly at the AS level, to me the it would be something like "This Activity Stream publishes the resources from dataset X as well as updates about them". And I don't see much value in that.

The fields 'startTime', 'updated' and 'generator' seem to be much more at the level of the AS itself, in contrast.

mixterj · 2018-07-26T14:16:59Z

@aisaac In principle I agree but I would suggest that the AS publisher could actually create an AS, that serves as a kind of Dataset in and of itself. For example, publishing a subest of items in a CONTENTdm collection (a single Dataset) that are Black & White photos or Manuscripts. In this situation, the AS publisher would in essence be creating their won unique view of the data and could apply unique subjects, names, etc. that may or may not be represented in the data publishers dataset metadata or at the very least would be more specific or curated for a given audience.

I certainly will not argue that name, summary, tag, and audience need to belong at the AS level but I would argue that an AS publisher can and would want to, based on some of the use cases we have talked about, want to create a curated stream of data and apply unique or more specific metadata properties to it. In this case maybe the AS publisher just needs to create their own Dataset description?

I am also slightly concerned about the consistency of IIIF data publishers to publish/maintain dataset descriptions and the quality of those descriptions. If Aggregators, who may be more motivated to describe and in some distinguishing detail the data they aggregate (collections of things), I do not see why they would not be encouraged to do so in a prescriptive way. As is stands now, the Aggregator needs to harvest every Manifest, hope there is metadata associated with it, parse the metadata, hope there is a IIIF Collection (of which I have not seen any in the datasets I have looks at), and then also maybe look for a Dataset description (VOID or otherwise). It just seems too inconsistent and haphazard to be really functional.

Finally, I think we are really just debating semantics - i.e. where and how would an Aggregator or AS Publisher describe the data encompassed in the AS.

aisaac · 2018-07-26T14:24:25Z

@mixterj I wouldn't argue about who would create the dataset descriptions. It could well be the publisher of the AS, indeed. In fact for Europeana it may well be the case. And about the publisher willing to create a view on the data, yes this seems like a valid use case. But I think this would be a bit strange if this view exists only as an AS. There must be something a bit more conceptual behind, and I would be in favour of what you suggest, i.e. "the AS publisher just needs to create their own Dataset description". Talking about semantics, I could say it's more a "stream of curated data" than a "curated stream of data" ;-)

mixterj · 2018-07-26T14:35:02Z

@aisaac yes, I agree with these points. If this is a agreed upon approach though, I would push hard for a consistent way to hook all of these components together so the aggregator is not left guessing/hoping - I want to be an aggregator not an archeologist ;)

It sounds like the main components we have here are Manifests, Collection Manifests, ActivityStreams, and Datasets. Does that sound about right?

I guess my main concern is that we have a lightweight, consistent, and easily implementable solution. I will also admit that I am a firm believer that 'the perfect is the enemy of the good' ;)

aisaac · 2018-07-26T14:41:51Z

@mixterj I think we agree. And also with the perfect being the enemy of the good. But I've seen what blatant violations of the one-to-one principle can lead to, so I tend to be obsessed about them ;-)

azaroth42 · 2018-09-18T21:30:40Z

If we're not providing guidance on use of the other AS terms, then we can see if and how people do add them. If there's a need and some emerging best practice, we can clarify in the future.

Propose that we can close the issue, as we can refer to a dataset description with context.

aisaac · 2018-10-02T08:25:00Z

As discussed in the call on 19-09-2018 the group does not see an objection on following the currently proposed approach, far from it :-). But considering that solutions are not so clearly laid out in individual tickets (at least via the fact that they have an impact on several tickets, here for example #34, #35 and #38 ) it's preferable to wait and see what the solution looks like in the spec, and assess then how happy we are with the proposed pattern addressing the original case in this ticket.

azaroth42 · 2019-03-20T17:37:31Z

Call of 2019-03-20 Agree to close, fixed - We get this with our own context, and can use the same pattern of extension and reuse as the presentation API and the Annotation model. Implementors can use whatever features they like without affecting the processing mode.

azaroth42 added enhancement crawling discuss labels Jun 28, 2018

azaroth42 mentioned this issue Jul 11, 2018

AS publishing granularity level #34

Closed

jpstroop changed the title ~~Descriptive properties for AS data~~ Descriptive properties for ActivityStream data Jul 11, 2018

azaroth42 mentioned this issue Jul 11, 2018

Do we need a Dataset resource? #38

Closed

aisaac mentioned this issue Feb 20, 2019

connecting different activity streams #42

Closed

azaroth42 self-assigned this Feb 20, 2019

azaroth42 closed this as completed Mar 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Descriptive properties for ActivityStream data #36

Descriptive properties for ActivityStream data #36

mixterj commented Jun 28, 2018

aisaac commented Jun 28, 2018 via email

aisaac commented Jun 28, 2018

azaroth42 commented Jul 11, 2018

aisaac commented Jul 11, 2018 via email •

edited

Loading

aisaac commented Jul 26, 2018

mixterj commented Jul 26, 2018

aisaac commented Jul 26, 2018 via email •

edited

Loading

mixterj commented Jul 26, 2018 •

edited

Loading

aisaac commented Jul 26, 2018 via email

azaroth42 commented Sep 18, 2018

aisaac commented Oct 2, 2018

azaroth42 commented Mar 20, 2019

Descriptive properties for ActivityStream data #36

Descriptive properties for ActivityStream data #36

Comments

mixterj commented Jun 28, 2018

aisaac commented Jun 28, 2018 via email

aisaac commented Jun 28, 2018

azaroth42 commented Jul 11, 2018

aisaac commented Jul 11, 2018 via email • edited Loading

aisaac commented Jul 26, 2018

mixterj commented Jul 26, 2018

aisaac commented Jul 26, 2018 via email • edited Loading

mixterj commented Jul 26, 2018 • edited Loading

aisaac commented Jul 26, 2018 via email

azaroth42 commented Sep 18, 2018

aisaac commented Oct 2, 2018

azaroth42 commented Mar 20, 2019

aisaac commented Jul 11, 2018 via email •

edited

Loading

aisaac commented Jul 26, 2018 via email •

edited

Loading

mixterj commented Jul 26, 2018 •

edited

Loading