Skip to content
rklancer edited this page Sep 30, 2012 · 24 revisions

Problems arising from lack of a unified model schema

(TL;DR: When you look closely, there are inconsistencies in the dark recesses of the MD2D data model that are just waiting to bite us in the behind. The unified model schema outlined in the next section could help.)

Here are some related issues with the MD2D model object implemented in modeler.js and md2d.js:

  • There are set and get methods for top-level properties of the model such as temperature_control, but for atom properties there is no getAtomProperties method to match setAtomProperties.

  • The serialization of atom properties use upper case keys such as X and Y, but setAtomProperties uses different, lower-case keys such as x and y for the same properties. The same keys should be used in both cases, for ease of use and so that the underlying code can be shared.

  • Atoms are just one kind of "physics object" that we need to serialize or edit the properties of -- some others are obstacles, elements, and radial bonds -- but there are no setter/getter methods such as setObstacleProperties for these other kinds of object.

  • The serialization and deserialization path for the properties of atoms, obstacles, elements, and radial bonds are convoluted and pointlessly different for each type of object -- compare obstacle deserialization and atom deserialization

  • The "tick history" saves and restores per-atom properties but it does not save and restore other important properties. For example. obstacle properties, including the obstacle position, are not saved. This can be seen by visiting http://lab.dev.concord.org/examples/interactives/interactives.html#interactives/gas-laws-page-4.json, allowing the model to run so that the obstacle moves to the right, stopping the model, seeking to the beginning by typing model.seek(0) in the console, and then clicking the play button. The atoms' positions change back to their starting points but the obstacle's position does not. (Note that using the reset button is not sufficient to demonstrate the problem because it reloads all model properties from the serialized JSON.)

  • There is currently meta-information about atom properties, for example, a list of which properties are "saveable" (i.e., which properties are transient and which properties need to be serialized in order to accurately save the state of the model.) However, there is no specific mechanism for storing similar meta-information about the properties of other objects, such as radial bonds.

  • Adding a new atom property to the model requires providing meta-information about the property in several different places -- see fac22a5

  • Although there is a unified way to notify an observer that a toplevel property such as temperature_control has changed, there is no way to notify an observer that a property such as the potential energy has changed. We rely instead on observing tick events, but this is not reliable because the potential energy (or something else, such as the x-position of obstacle #2) can change as a result of user action while the model is stopped.

  • The effective default value of certain properties are defined in different ways, in different places, in the modeler and engine: e.g., here, here, and here

  • The MML parser needs access to the default values of certain properties in order to construct a correct model JSON file, but it cannot access this information from modeler.js, so it duplicates this knowledge.

Proposed design

Summary

At the top of modeler.js, define and publish a schema object that contains the names of top-level properties of the model (such as height), a set of metadata about each property, and the names of each type of object contained in the model (such as obstacles and atoms). Recursively use the same, or a very similar, format for describing each property of each object type (such as the charge property of atoms).

Refer to the metadata defined in this schema throughout modeler.js when, for example, allocating storage arrays, storing tick-history items, serializing, deserializing, issuing "change" events to listeners, or pushing data back and forth from the underlying computational engine.

Sketch of information needed in schema

Toplevel (properties: width, chargeShading, viscosity, ...)

  • Should this property be serialized?
  • What is the default value of this property?
  • What is the general data type of this property (e.g., boolean, integer, or floating point)
  • Is this property read-only, or read-write? (This would help us use the model's standard getter and observer-notification methods for calculated properties such as the potential energy.)
  • Should this property be passed to the engine when the engine is constructed?
  • Should the updated value of this property be passed to the engine when the property changes?
  • Should changing this property trigger a model-state recalculation? (Consider: changing the gravitationalField potential instantaneously changes the total energy.)
  • Should this property be persisted in the tick history?
  • Is this property expected to change with every tick?
  • Is this property mostly view related? (This is a hint for developers and for use by a controller that configures a view; the model doesn't construct a view or maintain a reference to it.)

Of course, we should be able to define a custom setter for each property in order to do the right thing when it changes.

Atoms (properties: x, charge, marked...)

Same as the above, plus:

  • When serializing this property, should the entire array of values be removed if all values are the default value?

Examples of interpretation of some of this information for per-atom properties:

  • Per-atom properties such as ax and px (accleration and momentum) make sense as read-only properties because these are completely determined by atom positions, and the velocity and mass of the atom, respectively.

  • Regarding whether certain properties should be passed to the engine: the engine should continue to use an array of properties indexed by atom, because access to x[j] is faster than access to atoms[j][X] when you consider that the atoms[j] dereference can't meaningfully be cached in inner loops. However, modeler should operate on transposed array of atoms indexed by property (what we have so far been calling the results array). Therefore there is a small performance penalty for passing information back and forth between the modeler and engine.

  • Should this property be persisted in the tick history? Consider this example: we might want to mark atoms in such a way that a mark added now persists even when the history is scrubbed backwards. This corresponds to not storing the marked property in the tick history. But this should be a policy choice we can flip simply by changing the storeInTickHistory property from true to false or vice versa. (We might even consider allowing specific models to override that policy choice by somehow overriding the default schema.)

Obstacles, Radial Bonds, Elements

All of the above information is needed. In addition, the following might be useful:

  • is this property an atom index? (This should be considered the data type, and it might be used by the UI.)

Textboxes

Although textboxes is represented in the model JSON file as an array of individual objects, just like atoms and radialBonds, the array is really meant to be passed wholesale to a view, which interprets it. Therefore it might make sense to define textboxes simply as a toplevel passthrough property which the model serializes and deserializes as-is without attempting to infer anything about its contents.

How the schema might be used

Deserialization could be done by looping over the list of declared object types and then the serialized arrays of object properties, while repeatedly calling setObjectProperties(index, objectType, { properties });. Note that this can be done in a GC-friendly way by reusing the { properties } object in the loop, rather than constructing a new throwaway object for each call. Serialization can be done in a similar way, consulting the serializable value from the schema to determine which properties to serialize.

Remove all serialization/deserialization code and default value handling from the engine. These functions can be handled by the model object that wraps the engine.

Have the engine publish a simplified schema containing just a list the object types it knows about (atoms, radial bonds, etc) and the properties it assigns to each one. Include the data type ('float32', etc.) of each property. modeler.js can consult this list, and accessor functions built into the engine, in order to access the engine's representation of the properties the modeler needs to access:

// in engine:
objects: [
  { name: "atoms",
    properties: [
      { name: "x", type: "float32" },
      { name: "y", ... },
      //...
    ]
  },
  //...

 // in modeler:

 atoms = engine.get('atoms');
 // x is now an array of x-values of atoms, indexed by atom
 x = atoms[engine.index.x];

When a property is changed by a setter (outside of a model "tick"), check the recalculateState value from the schema in order to determine whether to recalculate the model thermodynamic properties. Use the standard setter function to update these values so that observers of these properties (for example, an energy graph or thermometer) are notified appropriately.

When initializing the model, consult the list of properties that are marked as requiring to be saved to the tick history in order to construct a list of indices of the property arrays that need to be saved at each tick. (The typed-array set method can be used to save the contents of these typed arrays very efficiently to a typed array-based tick-history buffer) Note that the tick history should be saved in the same "transposed" format returned by the engine, because this allows a subset of properties to be saved using the efficient, memcpy-based methods such as the set method.

When saving the tick history, properties marked as not changing at every tick could be saved in an alternative, memory-efficient format that simply indicates at which ticks at which the property changed values, together with the changed value.

Finally, make the schema available globally so that the MML parser can access it (It would be helpful to be able to require the modeler from the MML parser, which usually runs as a command-line script. This should be made possible by the require.js refactoring.)

Speed and memory considerations

Array transfer

There are a few speed considerations. Of course, the underlying idea remains to have the engine maintain and update a persistent set of typed arrays of property values, indexed by atom (or element, or radial bond...), and having modeler.js transpose those properties into typed arrays of atoms (or radial bonds, etc.) indexed by property.

Using the schema, the model could avoid transposing certain property arrays, such as those representing purely view-related properties.

The size of the tick history could be shrunk by marking only certain properties as needing to be saved to the tick history (as noted above, this requires saving the tick history in the same "transposed" format used internally by the engine)

If the engine is instantiated in a web worker, then the "output" arrays returned by the worker can be the tick history storage (because the default method for returning typed arrays from a worker is to clone it, making an additional copy operation redundant: http://updates.html5rocks.com/2011/12/Transferable-Objects-Lightning-Fast)

Events and event batching

The idea mentioned above that essentially all system properties should be observable has some performance implications that need to be considered. It would be inefficient for an energy graph component, for example, to register observers on four properties (time, potential energy, total energy, and kinetic energy -- leaving aside that there are only two independent variables among the last three) and to update itself each time these observers are fired.

Instead, it makes sense to batch and categorize update events. That way a graph can be smart enough to only listen for "tick" events, although by itself these are not quite enough: it also needs to know about any system change, such as changing the position and velocity of an atom while the model is stopped, which might simultaneously affect several of the values it is graphing. In order to accomodate that, the system should emit "tick" events and "all state properties changed" events, and consumers such as the energy graph should be smart enough to sign up for the right events.

However, simpler components, such as a bar graph that could be configured by a user script to track any property, might not be smart enough to know which event to listen for in the general case. Therefore, it would register as a listener on whatever property it is configured to graph.

As long as a smart set of "batched" events are offered, it should not be necessary for view components to sign up a prohibitive number of observers just in order to be sure to avoid missing updates. It would also not be prohibitive to allow a few property observers that depend on properties that might change during a tick to monitor that property between ticks.

Inheritance considerations

It might be nice to make "simple" and "complex" versions of the model object. This might be useful for allowing nearly the bare engine to be run from a Node.js command line program, but without needing to instantiate an environment that supports all the features in modeler.js that are designed for an interactive browser application (such as the tick history and live dragging.)

The most robust way to do this would be if the complex version inherited from the simple version, in which case the complex version might want to merge additional schema information on top of the schema information provided by the base class.