Skip to content

Serializing Mentions

Keith Alcock edited this page Mar 28, 2023 · 4 revisions

The org.clulab.odin package of this project defines a class hierarchy of Mentions.

  • Mention
    • TextBoundMention
    • EventMention
    • RelationMention
    • CrossSentenceMention

Other CLU Lab projects like reach extend this hierarchy.

  • TextBoundMention
    • BioTextBoundMention
      • CorefTextBoundMention
  • EventMention
    • BioEventMention
      • CorefEventMention
  • RelationMention
    • BioRelationMention
      • CorefRelationMention

These mentions and other ones not yet invented need to be serialized and/or exported in various formats, most notably JSON, but also others which we don't even know about yet. This should be accomplished with as little change to the Mentions as possible, preferably none, and with little or no effect of one serialization format on others. This page describes how that has been arranged.

Class Diagram

class

  • Mentions themselves don't know anything about JSON serialization or any other format except for standard Java object serialization.
  • All fields that need to be serialized are public so that they can be accessed by wrapper classes.
  • For each format to be supported, each Mention class should have a "wrapper" class specific to that Mention which supports the serialization format. In this example, MentionOps classes support JSON serialization by converting mentions via jsonAST() to JValues and then via json() to Strings.
  • Since this kind of JSON serialization is part of processors, it is considered the "native" format. Other formats can be supported with, for example, JsonldOps for JSON-LD.
  • Generally, something needs to be able to convert a generic Mention to the specific kind of MentionOps for that mention without it being the Mention itself. In this case it is the companion object MentionOps and specifically its apply() method.
  • Once a MentionOps has been obtained for a Mention, if ever that Mention itself contains a mention, the MentionOps.asMentionOps() is called on it to obtain the next MentionOps. This method can be overridden in subclasses to obtain the correct kind of MentionOps for subclasses of Mentions.
  • MentionOps has methods to take care of the recursion for arguments (argsAST) and paths (pathsAST) in its subclasses.
  • In the implementation of MentionOps, a documentEquivalenceHash is used. Although various fields of a Document are mutable, can change, and therefore can invalidate the hash, it is assumed that they don't change during serialization.
  • If something is serializing many Mentions and knows that they all come from the same Document, it can cache the documentEquivalenceHashes somewhere and arrange for the specialized MentionOps class to consult the hash rather than recalculating.
  • The conversion of a Mention to MentionOps is no longer implicit. Package objects are also no longer used for this kind of serialization.

Sequence Diagram

sequence

  1. Something, called Client here, arranges to acquire a reference to a Mention, in this example an EventMention. This could be a straightforward as generating an instance with the new operator.
  2. The client acquires the reference. In the general case, the client will not know exactly what kind of Mention it has. It could be any subclass of Mention.
  3. To convert the Mention to JSON, the client needs to get the appropriate kind of MentionOps to wrap the Mention and use its knowledge of the Mention type and the serialization format to extract the necessary fields. The Mention itself unfortunately does not know what kind of MentionOps it needs. As explained above, MentionOps are used for serialization of which the Mention is blissfully unaware. The client should know what kinds of Mentions it is dealing with and where to find the correct wrapper. Here the client knows it is using org.clulab.odin.Mentions and that org.clulab.odin.serialization.json.MentionOps knows how to map them. A different client, like reach, which has different kinds of Mentions, would need to ask something else.
  4. The MentionOps companion object uses runtime type information and decides to pick an EventMentionOps to provide JSON serialization for an EventMention and creates one.
  5. Object MentionOps receives the returned EventMentionOps.
  6. It returns the value to the client.
  7. A request for JSON conversion is sent to the MentionOps. Again, the client will generally only know that some subclass of MentionOps has been returned.
  8. As part of producing the JSON output, the MentionOps calls jsonAST on itself.
  9. The MentionOps uses knowledge it has of the mention type it wraps to request specific information from the Mention, here its text.
  10. The Mention provides the information. Mentions have many instance variables, so this small loop is usually repeated multiple times.
  11. Some instance variables of Mentions refer to more mentions. In this example, the EventMention has a trigger, which is a TextBoundMention. In general, there could be any subclass of TextBoundMention here. In order to convert the trigger to JSON, the EventMentionOps calls an internal method called triggerAST. Subclasses of EventMentionOps may want to override this this method.
  12. Since the exact kind of trigger may not be known, the EventMentionOps calls its own asMentionOps on the trigger. Subclasses can override this conversion if necessary.
  13. In this case, asMentionOps defers to the companion object's apply method for the conversion. This is the same procedure that the client used for the original EventMention. A subclass of EventMentionOps that wraps an EventMention with a different kind of trigger would want to perform a different conversion.
  14. The appropriate kind of MentionOps is returned.
  15. The MentionOps for the trigger has been acquired.
  16. The EventMentionOps uses it to get the jValue for the trigger.
  17. It gets returned.
  18. The jValue for the trigger is known to EventMentionOps.
  19. It gets combined with other JSON information to produce the jValue for the entire EventMention.
  20. The jValue is converted into a String containing the JSON serialization of the EventMention. This is performed in a library dependency.
  21. The EventMentionOps receives the json.
  22. It gets returned to the client.
  23. At this point, the MentionOps is no longer needed and as soon as no references are held, by the client, for example, it can be finalized.
  24. After the MentionOps, which had wrapped the Mention, has been finalized, the Mention itself can be finalized as well.
  25. The client can now work with the JSON serialization in string form.