Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add attributes to API data classes #150

Merged
merged 6 commits into from Oct 5, 2022
Merged

Add attributes to API data classes #150

merged 6 commits into from Oct 5, 2022

Conversation

soldni
Copy link
Member

@soldni soldni commented Sep 30, 2022

This PR adds metadata to API data classes.

API data classes and mmda types differ in a few significant aspects:

  • id and type (and text for SpanGroup) are stored in metadata for mmda types; in the APIs, they are part of the top-level attributes.
  • metadata can store arbitrary content in the mmda types; in the data API, all attributes that are not explicitly declared are dropped.
    • we expect applications that require specific fields to declare custom Metadata and SpanGroup/BoxGroup classes that inherit from their parent class. For an example, see tests/test_internal_ai2/test_api.py
    • metadata entries are mapped to an attributes field to match how data is stored in the Annotation Store

@soldni soldni changed the title Add metadata to API data classes Add attributes to API data classes Oct 3, 2022
@soldni
Copy link
Member Author

soldni commented Oct 3, 2022

@cmwilhelm per @yoganandc suggestion, I'm now mapping what is the metadata field in MMDA to the attributes field in the Annotation Store.

This should better fit the design of the annotation store. All previous considerations regarding attributes being typed are still valid.

Let me know what you think!

@soldni soldni requested a review from yoganandc October 3, 2022 21:50
Copy link
Collaborator

@yoganandc yoganandc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was hoping for something much simpler...

can we just have a

class Annotation(BaseModel):
  id: Optional[str]
  type: Optional[str]
  attributes: Dict[str, Any] = {}

class BoxGroup(Annotation):
class SpanGroup(Annotation):

and then in to_mmda()

pydantic_annotation.attributes = dict(mmda_annotation.metadata)

@soldni
Copy link
Member Author

soldni commented Oct 3, 2022

i was hoping for something much simpler...

can we just have a

class Annotation(BaseModel):
  id: Optional[str]
  type: Optional[str]
  attributes: Dict[str, Any] = {}

class BoxGroup(Annotation):
class SpanGroup(Annotation):

and then in to_mmda()

pydantic_annotation.attributes = dict(mmda_annotation.metadata)

The tricky part in this is that, in mmda, we made the decision to store id and type in the metadata. in fact, mmda_annotation.id, mmda_annotation.type, and mmda_annotation.text are just aliases to mmda_annotation.metadata.id, mmda_annotation.metadata.type, mmda_annotation.metadata.text (where aliases are actually implemented through descriptors).

The reason for such decision was that nor id nor type are actually required in mmda, so we moved off the top level but maintained aliases for backward compatibility. So, when converting to a pydantic object that conforms to SPP APIs, we have to remove them from metadata.

@yoganandc
Copy link
Collaborator

So, when converting to a pydantic object that conforms to SPP APIs, we have to remove them from metadata.

👍 a little more logic but still fits the overall structure im proposing right?

@cmwilhelm
Copy link
Contributor

cmwilhelm commented Oct 3, 2022

@yoganandc the complexity here I think is @soldni responding to my direct request to have some sort of document schema for however a given model is using metadata. If model authors will be dumping principal inference data into them, we need a way to specify what the shape of that data is. Otherwise no one is going to be able to make sense of it later.

explicitly removing `id`, `text`, and `type` is no longer reuqired bc
they are automatically ignored.
Copy link
Collaborator

@yoganandc yoganandc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me then. @cmwilhelm maybe you have more thoughts since you're working on the client PR?

Copy link
Contributor

@cmwilhelm cmwilhelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM but I had one small suggestion.

We'll also need to bump version in setup.py again because we've claimed 0.0.43 :)

if not hasattr(inherit_cls, "__annotations__"):
continue
if "attributes" in inherit_cls.__annotations__:
return inherit_cls.__annotations__["attributes"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could do this via:

@classmethod
def get_metadata_cls(cls) -> Type[Attributes]:
    return cls.__fields__["attributes"]["type"]

rather than using the low level python api and traversing the hierarchy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@soldni soldni merged commit 8ffcc29 into main Oct 5, 2022
@soldni soldni deleted the lucas/metadata-api branch October 5, 2022 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants