-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
View metadata #9
Comments
What if a view has annotations on more than one medium? For example when all texts recognized by Tesseract/EAST are run through NER with the results in one view? It seems weird to have dozens or hundreds of views. Instead of using "medium" as a metadata property we could use a similar thing as when referring to an identifier in a particular view ("v1:tok12") and have something like "m3:seg8", which would require identifiers of views and media to be unique (unless we do "view:v1:tok2" and "medium:m2:seg6"). Another option is to allow sets of media, which is maybe somewhere on the lab whiteboard. |
👍 totally agree. I think in the end, we want to keep a appdirectory that keeps track of existing tools and automatically generates app IDs as URIs based on its metadata (name, versoin, ...). I started a github repo for that purpose a while ago, but it didn't pick up as we haven't had a systematic way for app metadata representation. |
So I think we talked about having media as special type of views, and in that sense we can keep IDs for views (and media) all unique. I like the idea of having a single view to be populated only by a single app. However inside the app, it can do many different things and generate many different things including medium (as in your example of OCR + NER). So in that case having special type of view for media would require the app to generate at least two views. How do you think? |
I had not seen the clams-apps repo, but I like where you were going there so let's talk about this at some point. Having a URI by the way will still allow third-party tool creators to use MMIF. |
So we agree on each view being created by a single tool. And we can still allow a single tool to create more than one view, and indeed also media (primary data). I am a bit uncomfortable with having the identifiers be unique across views and media because they are separate things, but it is better than having monstrosities like "view:v3:ne2". Also, I think I was smoking something when I wrote "m3:seg8", and I need to get more conceptual clarity on this all. |
Well, I now think that it may be okay for a view to have the same identifier as a medium. The reason is that only views have annotation and only media have offsets of some kind and that you always know which one you refer to. We have a mechanism to prefix the view identifier to a reference to an annotation identifier ("v1:ne3") for those case where annotations in a view can depend on one or more external views. The same situation can exist with media if one view refers to more than one of them and we need a mechanism to deal with that if we allow it (and I am leaning towards allowing that and maybe having sets of media). I was suggesting to have "medium" as a metadata property, maybe we need it on the instance as well: { "@type": "Segment", "medium": "m1", "start": 0, "end": 5 }
{ "@type": "Segment", "medium": "m2", "start": 0, "end": 12 } It would only be used if the default from the metadata is not right. The "medium" metadata property will need to be a list unless we have sets of media. |
We have decided to create a repository of applications on the CLAMS website so we can use a tool or application metadata property to refer to a URI that has all the metadata. We need to think about how this works out for a LAPPS service. This probably warrants a new issue or further discussion in the current specifications. |
For the On the {
"views": [
{
"id": "v1",
"metadata": {
"producer": {
"name": "some_name",
"version": "some_version",
"description": "some_description",
"wrappee_version": "other_version",
...
},
# no "medium" field here
"creation_time": "some_time"
},
"annotations": [
{
"@type": "NamedEntity",
"properties": {
"id": "a1",
"NE_type": "http://...../person",
"anchor": {
"type": "char_offset",
"medium": "m1",
"anchors": [12, 18] # structure of this object is defined in the anchor vocabulary for each anchor type.
}
}
},
{ # more annotations }
]
},
{ # more views }
]
} |
11ffc06 proposes addition of |
#60 implemented an omnipotent |
There are other reasons why it would be nice if they do not collide and I even put something along those lines in the new specifications. So I think we are good there. |
This issue is somewhat stale in that many things have changed since our last discussion here, especially w.r.t. how
I'll leave this issue for now, but open separate issues for two problems above so that further discussion won't be a mingle mangle. |
I think this issue can be closed now, unless there's a remaining question that's not transferred to the above two new issues. |
Yup, closing it. |
For LIF, the "contains" field had a dictionary of types and for each type some metadata like "producer" and other properties defined in the metadata in the vocabulary ("tagSet" etcetera). All metadata were inside of "contains".
We need metadata on the view (outside "contains") like "timestamp" (or "creation-time") and "dependsOn". And once we do that we need a theory on what things are in the "contains" dictionary and what things are not. I would like it if things like "producer" and "tool-version" and "tool-wrapper-version" are defined on the view and not the annotation types in "contains" and reserve the latter for properties defined in the vocabulary metadata. That will only work if a view is created by only one component, which is something we never did for LAPPS, but which is a restriction that I like because (1) views are now read-only after they are created and (2) we lose the redundancy of having "producer" and "tool-version" be repeated for each annotation type.
So I am thinking something like this:
At the moment, "producer" and "medium" are in the metadata of "Annotation", but that does not fit nicely with the MMIF above. I am also thinking that we should allow a URI as the value of "producer".
The text was updated successfully, but these errors were encountered: