-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Descriptive properties for ActivityStream data #36
Comments
While I would tend to find the list of field relevant, I am very unconvinced that we should use AS itself for representing what essentially is dataset-level description. Maybe some of the elements in the "potential" part are are the level of the Activity Stream itself, but all the element in the "important" part belong to what is behind the Activity Stream, i.e. a dataset.
And unless I am pointed to a serious process on behalf of AS conceptors to define and meet the requirements for describing datasets, I would recommend to use W3C and Linked Data standard or established vocabularies that are devoted to this, like DCAT or VOiD. We could have the AS bear only the fields that are specific to the business of generating and publishing an AS, and then link to an instance of say, dcat:Dataset, that would carry the dataset-level metadata.
This is much better for interoperability and effectiveness. Soon a lot of institutions, especially in the EU, will feel strongly encouraged to publish description of their datasets using one of these vocabularies anyway.
And otherwise we'll be again in a "shadowing" situation, like when the Presentation API encourages to copy some of the original descriptive metadata into its own metadata element. For Presentation I see a good case, for Activity Streams I think it is really thinner.
|
AS even redefined their own property for 'name'. How interoperable is that? |
The intent of using AS terms here is to maximize the likelihood that generic producers and consumers can be used to produce and consume the resulting documents. By creating our own profile that imports other ontologies into the AS document, we're making certain that only IIIF organizations will be interoperable. This is exacerbated by the ongoing DXWG work to produce a new version of DCAT -- we would be working with a moving target, or instantly out of date. Secondly, I think we need to be careful to distinguish between the various resources in play:
What is useful for discovery is the third bullet above -- the machine readable information. Which is at least two steps removed from the ActivityStream. And in terms of scope, coming up with a profile of DCAT in JSON-LD that is congruous with AS and IIIF design patterns seems like significant expansion of the charter into an area of limited value and not insignificant complexity. Proposals:
|
On using AS or not:
- who is using these AS fields and who is using DCAT? AS fields are idiosyncratic in the sense that they've been designed while ignoring everything else done by the community.
- if the level of what we describe is that of datasets then we should use the standard for describing datasets
- DCAT is ongoing revision but the DXWG has been very clear that the update is backward compatible. I'm not expecting any DCAT property to disappear, especially these corresponding to the elements identified by Jeff.
My proposal would be to start with Jeff's elements, but using their DCAT implementation not the AS ones.
|
@mixterj I have tried to looking at your metadata fields (other than the contentious attributedTo), from the perspective of whether they would apply to the level of the AS itself or the datasets that the AS "publishes". I think 'name', 'summary', 'tags' and 'audience' would be rather at the dataset level. Even when we want to describe the situation whereby an organization publishes an AS for someone else's dataset, I don't see this organization adding information for such fields, on top of what would be "inherited" from the dataset behind the AS. I mean, if we would like to have a 'summary' strictly at the AS level, to me the it would be something like "This Activity Stream publishes the resources from dataset X as well as updates about them". And I don't see much value in that. The fields 'startTime', 'updated' and 'generator' seem to be much more at the level of the AS itself, in contrast. |
@aisaac In principle I agree but I would suggest that the AS publisher could actually create an AS, that serves as a kind of Dataset in and of itself. For example, publishing a subest of items in a CONTENTdm collection (a single Dataset) that are Black & White photos or Manuscripts. In this situation, the AS publisher would in essence be creating their won unique view of the data and could apply unique subjects, names, etc. that may or may not be represented in the data publishers dataset metadata or at the very least would be more specific or curated for a given audience. I certainly will not argue that I am also slightly concerned about the consistency of IIIF data publishers to publish/maintain dataset descriptions and the quality of those descriptions. If Aggregators, who may be more motivated to describe and in some distinguishing detail the data they aggregate (collections of things), I do not see why they would not be encouraged to do so in a prescriptive way. As is stands now, the Aggregator needs to harvest every Manifest, hope there is metadata associated with it, parse the metadata, hope there is a IIIF Collection (of which I have not seen any in the datasets I have looks at), and then also maybe look for a Dataset description (VOID or otherwise). It just seems too inconsistent and haphazard to be really functional. Finally, I think we are really just debating semantics - i.e. where and how would an Aggregator or AS Publisher describe the data encompassed in the AS. |
@mixterj I wouldn't argue about who would create the dataset descriptions. It could well be the publisher of the AS, indeed. In fact for Europeana it may well be the case.
And about the publisher willing to create a view on the data, yes this seems like a valid use case. But I think this would be a bit strange if this view exists only as an AS. There must be something a bit more conceptual behind, and I would be in favour of what you suggest, i.e. "the AS publisher just needs to create their own Dataset description". Talking about semantics, I could say it's more a "stream of curated data" than a "curated stream of data" ;-)
|
@aisaac yes, I agree with these points. If this is a agreed upon approach though, I would push hard for a consistent way to hook all of these components together so the aggregator is not left guessing/hoping - I want to be an aggregator not an archeologist ;) It sounds like the main components we have here are Manifests, Collection Manifests, ActivityStreams, and Datasets. Does that sound about right? I guess my main concern is that we have a lightweight, consistent, and easily implementable solution. I will also admit that I am a firm believer that 'the perfect is the enemy of the good' ;) |
@mixterj I think we agree.
And also with the perfect being the enemy of the good. But I've seen what blatant violations of the one-to-one principle can lead to, so I tend to be obsessed about them ;-)
|
If we're not providing guidance on use of the other AS terms, then we can see if and how people do add them. If there's a need and some emerging best practice, we can clarify in the future. Propose that we can close the issue, as we can refer to a dataset description with |
As discussed in the call on 19-09-2018 the group does not see an objection on following the currently proposed approach, far from it :-). But considering that solutions are not so clearly laid out in individual tickets (at least via the fact that they have an impact on several tickets, here for example #34, #35 and #38 ) it's preferable to wait and see what the solution looks like in the spec, and assess then how happy we are with the proposed pattern addressing the original case in this ticket. |
Call of 2019-03-20 Agree to close, fixed - We get this with our own context, and can use the same pattern of extension and reuse as the presentation API and the Annotation model. Implementors can use whatever features they like without affecting the processing mode. |
A list of AS properties that could be used to help aggregators/harvesters know determine if they are interested in crawling the AS.
Please note that all of the stringy properties are meant for harvesters/aggregators to ingest and build indexes around and are NOT intended for the IIIF Registry to index and make searchable.
Important (based on discussed issues)
Potential (not discussed but might be useful)
The text was updated successfully, but these errors were encountered: