Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

view.metadata.contains have items (at_types) that are not actually in annotations list #205

Closed
keighrim opened this issue Apr 5, 2023 · 1 comment · Fixed by #212
Closed
Milestone

Comments

@keighrim
Copy link
Member

keighrim commented Apr 5, 2023

Because

Currently, mmif-python SDK does some magic to make sure the information in the contains metadata is true.
Namely, when a develop adds a new annotation object using view.add_annotation() method, it automatically adds the @type of the annotation to the contains dict.

def add_annotation(self, annotation: 'Annotation', overwrite=False) -> 'Annotation':
"""
Adds an annotation to the current view.
Fails if there is already an annotation with the same ID
in the view, unless ``overwrite`` is set to True.
:param annotation: the :class:`mmif.serialize.annotation.Annotation`
object to add
:param overwrite: if set to True, will overwrite an
existing annotation with the same ID
:raises KeyError: if ``overwrite`` is set to False and
an annotation with the same ID exists
in the view
:return: the same Annotation object passed in as ``annotation``
"""
self.annotations.append(annotation, overwrite)
self.new_contain(annotation.at_type)
return annotation

However, it does not provide anything for the opposite direction. Namely,

  1. when a developer add a key (at_type URI) to the contains dict (this usually happens when there's some additional information accompanies as a value map the at_type key),
  2. but then the actually annotation result does not have any of that at_type,
  3. the output MMIF string (serialized as JSON) will say that this view has that type, which is not true

And I don't think that's an ideal behavior. But adding automatic removal of not-used at_types from contains dict seems to too much magic-y (but again, we're already doing some magic. why not do more?)

Done when

We discuss whether the current way is a desired way of doing the business. If not, add an automatic "filtering" in the serialization code. Or we can provide a method to invoke the filtering manually only on demand.

Additional context

Just to be clear, I don't think having a false information should make the MMIF invalid, and thus the output MMIF from the above example still should be a valid MMIF.

@keighrim
Copy link
Member Author

keighrim commented Apr 12, 2023

One more thing to consider is calling jsonschema validation code right after the "filtering" happens to check the output is valid MMIF. An invalid MMIF will eventually turn into an error when it's passed down to a downstream app, but I think having an additional validation net at the end of the "run" of an analysis app is generally good practice basically because it can catch problems even when there are no downstream apps (in test code, or running in "standalone" mode, for example)

(maybe this comment should be an independent issue)

@keighrim keighrim added this to the v1.0.0 milestone Apr 24, 2023
This was referenced May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
1 participant