Skip to content

Unifying POCO and ITypedElement

Ewout Kramer edited this page Apr 30, 2024 · 2 revisions

For almost a decade now, we have two models for working with FHIR data in our SDK:

  • Using POCOs - The most popular option, just plain C# classes with tons of utility functions provided by us.
  • Using the Element Model - Flexible, and the only option if you want to use FhirPath or the validator.

The Element Model gave us the flexibility to work with incorrect data, or with custom resources. We even designed a metadata system (StructureDefinitionSummary) to go with it. For years we considered it the "master" model to use, hence we used it for all advanced functionality like the validator and FhirPath. Simplifier and Firely Server also use ElementModel extensively. This flexibility does not come for free. Having two models also means we had to implement certain functionality twice, like comparison logic or resolving identifiers, and the dynamic nature of the model makes it both slower and more memory-intensive, at least in our current implementation. The combination of a compositional, layered model with stacks of conversions is powerful, but ultimately complex and expensive. As we were turning our attention to performance and simplicity in the recent years, we started to put more energy and time into working with the POCOs to attain higher throughput. This means ElementModel is starting to lag behind and become a burden for our software. Furthermore, the current architecture stops us from building a more performant design that would integrate with the newer parts of the SDK (e.g. the CQL SDK).

For the SDK 6.0 we want and need to improve this situation, and this page details our plans.

Problem 1: The Parsing pipeline

As an example of the layers we introduced, take a look at the components involved for validating a file of FHIR Json using our validator:

flowchart LR
Json[(Json)] -- Parsed --> SourceNode 
SourceNode -- Typed --> TypedElement
TypedElement -- Hierarchical ---> ScopedNode --> Validator([Validator])
Loading

As the validator navigates the tree presented by the ScopedNode, each of the layers would re-iterate over the children and redo the work, keeping a reference to all data involved, all the way back to the Json. Firely Server re-implemented the SourceNode logic to cache this (as did we in some places).

For most uses it would actually have been better to load the entire tree into a ElementNode (which we have provided out of the box):

flowchart LR
Json[(Json)] -- Parsed --> SourceNode 
SourceNode -- Typed --> TypedElement
TypedElement -- load into --> ElementNode
Loading

And once that is loaded, use that single in-memory isntance, which releases all memory used to retain the original json (and intermediate steps):

flowchart LR
ElementNode2["ElementNode"] --> ScopedNode --> Validator([Validator])
Loading

In fact, this is the behaviour enforced by the new validator, which only accepts an ElementNode (or POCO) as input. As an important side effect, all parse errors are raised while loading into the ElementNode, which is preferable for most uses to the lazy checking done by the original setup. There is now a clear parsing and later validation stage.

When using POCO's the situation is slightly better, certainly after we introduced the new POCO-based parsers:

flowchart LR
Json[(Json)] -- parse --> POCO
Loading

and then:

flowchart LR
POCO --> TypedElement --> ScopedNode --> Validator([Validator])
Loading

The ScopedNode in these pictures is a wrapper around ITypedElement that adds a reference from children back to their parents, making it possible to resolve canonicals in the container and to be able to find the encompassing resources - a feature that FhirPath requires.

Although the new POCO parsers improved the situation for those who know to use them, most parts of the SDK still use the "old" parser stack (e.g. the DirectorySource still uses the older slower stack).

Action 1 - Continue switching to use the new parsers in the SDK where possible

All components of the SDK that do not require the flexibility of the Element model will be updated to use the faster POCO's and POCO-based parsers. We already did this for the FhirClient, with the introduction of IFhirSerializationEngine, a framework (using POCOs) for turning serialization into a service, so that it can easily be swapped out for the faster parsers. Note that in the FhirClient, we hid this feature below a set of extension methods that select the desired (flavour of) one of the parsers.

We will continue this journey by doing the same for all components that are now hard-coded to use the older parsers.

Action 2 - Sync error codes between parsers

Currently, the POCO parsers and the Element Model parsers each produce their own set of errors, the newer POCO parsers using pre-defined coded errors. It would be beneficial if we ensure that the older parsers emit the same errors, codes and error messages. Note that, since Element Model is more flexible, it may accept some input that the POCO parsers cannot accept (i.e. we cannot store a repeating element in a POCO element that is not a list), but at least when an error is produced, it should be the same one. This also means that the flavours of parsers (Ostrich, backward compatible etc) will be slightly different across the parser families.

Action 3 - Improve performance of the Element model parsers

We introduced quite a bit of features in the Element model parsers to retain roundtripping, even in the face of errors and support for non-FHIR models (like CDA). We should check whether the advantages of this functionality still outweighs the downsides (performance, complexity. Since many of these features were not public or undocumented, we might remove these bits if, after 10 years of being present, it turns out we are not using it.

Problem 2 - Complex POCO classes

One of the reasons people might need to use Element Model is its flexibility to support custom resources, or resources with partial incorrect/custom data. This is not a super common usecase, but one that we would like to support. In its essence, this could be easy, as long as the custom resources use the standard datatypes (and thus the standard Datatype POCOs): resources are in essence nothing more than dictionaries of elements that are themselves standard datatypes. This is currently not easy: the POCOs have a large public surface, requiring additional attributes to be added and methods to be implemented before it can fully participate in the SDK's ecosystem. We have not clearly documented which of these elements are required for which functionality. For example, if you do not require validation by .NET attribute validation - would you still need to add some attributes to your custom resource? The only way to safely make this work is defining a StructureDefinition for your resource and then running the generator tool we are using.

Action 4 - Clean up the public surface of the generated POCOs

Over the years, we have accumulated interfaces, methods and attributes that might not be necessary anymore. For backwards compatibility reasons, we have not touched those so far, but in the general benefit of period cleanup, we should try to find an essential core set of methods that we need, and remove the others. For example, the equality interface (IDeepComparable) could be replaced by extension methods using the current IReadOnlyDictionary<>, or even separate implementations of an IEqualityComparer. If we would augment our current IReadOnlyDictionary<> approach and implement IDictionary<>, the copy methods could go to. This would move logic out of the POCO's into separate classes and simplify code generation up to the point where these classes may be coded by hand (although it would remain tedious).

Problem 3 - Requirements to participate in the ecosystem

We have made an inventory of these requirements that document what each major part of the SDK requires from a datamodel. E.g. for the FhirPath engine, it is (currently) not necessary to provide a location for each node, but the validator does need a location to report errors. On the other hand, the Validator does not need to be able to move up from a child to a parent, while FhirPath does require this feature (and since the validator runs the fhirpath engine as a subsystem, it indirectly does require it too). None of these, however, is necessary to be able to serialize the data, etcetera. We should encode these requirements in interfaces or base classes to be implemented (in as far as that has not yet happened). Good examples are set by ICqlConvertible and ICqlComparable - these are used by the FhirPath engine to implement the comparison operators. As long as datatypes implement it, they will be fine. However, most of the requirements are hidden and undocumented, so it is not clear what needs to be done.

Action 5 - Encode and document the requirements

After the clean up, make sure we have a small set of core interfaces that fully cover the needs of serialization, validation, fhirpath and CQL and lesser components. We cannot look into the future, but since these would encompass everything we needed in the past 10 years of implementing FHIR, it is likely to be a good starter set ;-)

Problem 4 - Having two modelling paradigms

Having two separate datamodels causes code duplication, a larget public surface and more classes and concepts to deal with as both a maintainer and a user. Wouldn't it be nice if we could unify both models? The ScopedNode was the initial effort to make this a reality. Both the validator and the fhirpath engine use this unified interface. The problem is that the interface is a sum of all requirements, on all types of nodes in the tree, from resources to primitive datatypes. We can no longer check if a node is a primitive, or implements a certain interface, since all of this is hidden behind the interface. For each change to a base class or interface or extra requirement for some part of the SDK, this interface needs to grow. As well, the only feasibly implementation on top of POCO's is currently yet another adapter layer (as shown above), which creates new child nodes at every tree walk, due to the fact that the underlying POCO's may have changed. What if we could just use a more "normal" model, that looks like a tree with appropriately typed POCO's, each implementing exactly what is necessary? That is of course certainly possible for the POCO's itself, but then - who would this work for the Element model?

What if we could design a new DynamicResource, that derives from Resource, and thus Base? This resource could be based on a Dictionary, slightly slower than the "real" POCO siblings. It could use the standard POCO's for the datatypes for its elements, these are pretty flexible in accepting incorrect data (we are already using PrimitiveType.ObjectValue for this) and could therefore even be used to retain invalid data. Since it would reuse the logic for primitives, most of the interfaces required to be implemented would be automatically available. DynamicResource could probably cover the rest. Would it be possible to write a "parser" that takes an ITypedElement and produces a tree with DynamicResource at the root, filled with normal POCO nodes, based on the ITypedElement.InstanceType? How far can we stretch this model to accommodate incorrect data?

The immediate advantage would be that all data is again created equal, all functions can be written against Resource (or even IResource if that would provide extra flexibility). We would avoid most code duplication, while still retaining the flexibility of an ITypedElement. We could actually implement IDynamicObjectProvider on the POCO's to make it participate in .NET's DLR, useful for initializing instances in tests, etcetera. And we could present our end users a single library of utility functions that work with all data.

Action 6 - Pilot a version of a DynamicResource

This is, in fact, a fancy version of ElementNode, either based on Resource or implementing IResource. It should be able to load custom resources based on IStructureDefinitionSummary, it should be able to accommodate incorrect data, almost as completely as ElementNode does now - but then using POCO's for the internal nodes. In a spike we can check how far we can push this if we encounter lists where we expect single elements, when the datatypes are unknown etc. Even if it cannot go as far as ElementNode, it would still cover a lot of ground and provide far more value.

Scratchpad

  • Sourcenode+ITypedElement into ElementNode, no adapter.
  • Produce same errors as new parsers
  • Simplify, no more roundtrip xml support, cda, etc.
  • Scopednode for itypedelement, scopednode for idictionary (or poco)
  • Resolve en parent resource en location in scopednode.
  • Typelabel in poco/idictionary/itypedelement
  • Value - return cqlquantity, cqlconcept at more complex types (maybe better using ICqlConvertible etc).
  • Iprimitivevalue implement on itypedelement edge nodes. add tryparse (to cql types) and add value/rawvalue to it. basically adding raw value support to itypedelement.
  • poco parse and isourcenode loading into elementnode align on not validating Value
  • idictionary with poco primitives and cql primitives should both work (?).
  • elementnode implements irod++
  • poco primitives and elementnode implement icqlcomparable, icqlconvertible etc. Exact form depending on unification with cql sdk.
  • iresolvablescope (bundle rrsouce.contained)??
  • introduce instancetype "untyped" to make it non nullable?
  • POCO (IReadOnlyDict + requirementes on leave types) -> PocoScopedNode (=PocoElementTree)
  • ITypedElement -> ElementNode, ISourceNode + Loader (=TypedElementOnSourceNode, cleaned up) -> ElementNode (=ElementTree?)
  • Clean up ISourceNode parsing (benchmark leaving out roundtrip stuff)
  • Align Cql datatypes with Cql sdk, remove Result.
  • Let ElementNode use common FHIR poco's at the leaves? Even FHIR Quantity? FHIR Coding etc. Solves possible duplicate implementation of Icql** interfaces.
  • ICqlCOnvertible might be the only one necessary, since after conversion, the result has ICqlComparable, ICqlEquatable etc. Only if direct implementation of these interfaces is faster thatn conversion+comparison, it makes sense to do so.
  • Compare + equatable extension methods for these Icql* interfaces, that take into account that they might be integers, strings (not everything is Any subclass)
  • Maybe have a base PocoScopedNode (abstract GetChildren) that has specific implementations as a nested class in each POCO. This way we can have a ToReadOnlyDictionary() on POCOs that immediately implement the necessary interfaces (and remove the clutter from the POCO interface itself).
  • I think we have an ICoded thing (for CQL) that we can use to determine whether a node is bindeable.
  • 3rd party ITypedElement implementations will not support all these new interfaces, but the validator and fhirpath wont work directly on ITypedElement, but on ElementNode instead (just like scopednode now, but then with a performant implementation not based on wrapping Itypedelement)
  • Need to introduce IDataType (now there's only DataType, which is a POCO and cannot be implemented/
  • Implement ElementTree by containing a POCO root + a dict<child,parent> that's built up while navigating?
Clone this wiki locally