:term:`Data Models <data model>` in Indivo describe the format in which Indivo represents medical information. They are NOT the same as :term:`Schemas <schema>`, which describe formats that Indivo recognizes as valid input data. Rather, data models describe the final processed state of medical data in Indivo: how data are stored, how they are queryable via the :doc:`Query API </query-api>`, and how they are returned via the :ref:`Reporting API <processed-reports>`.
We also introduce one additional term: :term:`Medical Facts <fact>`. A Fact is one datapoint corresponding to a data model: for example, a latex allergy is a Fact that is an instance of the :doc:`Allergy data model <allergy>`. Internally, Indivo represents facts as Python objects, so you'll see us referencing medical facts as fact objects as well.
At its most basic level, a data model definition is just a list of fields and their types. For example, our :doc:`Problem data model <problem>` is defined as (some fields omitted):
- date_onset: Date
- date_resolution: Date
- name: String
- comments: String
- diagnosed_by: String
This is pretty simple, and we'd like to enable others add new data models to Indivo just as easily. So we currently allow two formats for defining data models:
Since our data models are directly mapped to database tables using Django's ORM, they are most effectively represented as Django Models. Django has a flexible, powerful method for expressing fields as python class attributes, so data models defined in this way can harness the full capabilities of the Django ORM. Of course, representing data models in this way requires some knowledge of python. For a full reference of Django models, see Django models and Django model fields.
One important Indivo-specific note: when defining Django Model Classes, make sure to subclass :py:class:`indivo.models.Fact`, which will ensure that your class can be treated as a data model. For example, your class definition might look like:
from indivo.models import Fact from django.db import models class YourModel(Fact): your_field1 = models.CharField(max_length=200, null=True) ... # Additional fields here
For modeling medical data, Indivo provides some custom Field Subclasses. These fields represent their data as multiple separate database fields, with names formed from the original field's name and some appended sufffixes (see the classes below for some examples). You should use these fields as if they were any other Django Model Field:
from indivo.models import Fact from django.db import models from indivo.fields import YourFavoriteFieldSubclass class YourModel(Fact): normal_field = models.CharField(max_length=200, null=True) special_field = YourFavoriteFieldSubclass()
Now YourModel has both a standard CharField, and also other fields defined by the Field Subclass. We define the following Field Subclasses:
.. autoclass:: indivo.fields.CodedValueField(Type) :no-members: :no-undoc-members: :no-private-members: :no-show-inheritance:
.. autoclass:: indivo.fields.ValueAndUnitField(Type) :no-members: :no-undoc-members: :no-private-members: :no-show-inheritance:
.. autoclass:: indivo.fields.AddressField(Type) :no-members: :no-undoc-members: :no-private-members: :no-show-inheritance:
.. autoclass:: indivo.fields.NameField(Type) :no-members: :no-undoc-members: :no-private-members: :no-show-inheritance:
.. autoclass:: indivo.fields.TelephoneField(Type) :no-members: :no-undoc-members: :no-private-members: :no-show-inheritance:
.. autoclass:: indivo.fields.PharmacyField(Type) :no-members: :no-undoc-members: :no-private-members: :no-show-inheritance:
.. autoclass:: indivo.fields.ProviderField(Type) :no-members: :no-undoc-members: :no-private-members: :no-show-inheritance:
.. autoclass:: indivo.fields.VitalSignField(Type) :no-members: :no-undoc-members: :no-private-members: :no-show-inheritance:
.. autoclass:: indivo.fields.BloodPressureField(Type) :no-members: :no-undoc-members: :no-private-members: :no-show-inheritance:
.. autoclass:: indivo.fields.ValueRangeField(Type) :no-members: :no-undoc-members: :no-private-members: :no-show-inheritance:
.. autoclass:: indivo.fields.QuantitativeResultField(Type) :no-members: :no-undoc-members: :no-private-members: :no-show-inheritance:
For those less python-savvy who are still capable of thinking in terms of 'fields' and 'types' (which should be most people), we've defined a JSON-based modeling language for defining the very simple data models easily. :doc:`SDML </sdm>` is less flexible than Django's modeling language, but is much quicker to get started with and is less verbose for describing simple models. See our documentation of the language :ref:`here <sdml>`.
For help getting started, see our :ref:`core data models <core-data-models>`, below, each of which provide definitions both in SDML and Django Model classes.
Since the :doc:`Query API </query-api>` allows app developers to directly apply filters and ranges to the datamodels they are selecting, they need to know what fields they are allowed to query against. The answer is simple:
ANY FIELD ON A DATA MODEL THAT IS NOT A RELATION TO ANOTHER MODEL MAY BE USED IN THE QUERY API!
For example, we introduced the 'Problem' model above, which has the fields:
- date_onset: Date
- date_resolution: Date
- name: String
- comments: String
- diagnosed_by: String
If you were making an API call such as :http:get:`/records/{RECORD_ID}/reports/minimal/problems/`, you could filter by any of:
- date_onset
- date_resolution
- name
- comments
- diagnosed_by
If the problems model were a bit more complicated, and had another field:
- prescribed_med: Medication
You wouldn't be able to filter by prescribed_med, since that field is a relation to another model.
The only exceptions to this rule are :ref:`custom Django Model Fields <custom-model-fields>`. Such fields are translated into fields with other names, as described above. Any of these fields may be used in the query API, but (for example), when looking at a model with a CodedValue element such as:
- problem_type: CodedValue
You would be able to filter by problem_type_identifier, problem_type_title, or problem_type_system, but not by problem_type itself.
Here is a listing of the data models currently supported by Indivo. Each instance might define other, contributed models: see :ref:`below <add-data-model>` for information on how to add data models to Indivo.
.. toctree:: :maxdepth: 1 allergy equipment immunization lab medication problem procedure vitals scn
For complicated data models, a simple SDML definition just won't suffice. For a few specific features, such as :ref:`custom object serialization <custom-serializers>` or :ref:`creation-time field validation <data-model-validators>`, you can define (in python) an extra options file for a data model.
This file should be named extra.py
, and can be dropped into the filesystem next to any data model, as described
:ref:`below <data-model-filesystem>`. The file should contain subclasses of
:py:class:`indivo.data_models.options.DataModelOptions`, each of which describes the options for one data model defined in
the model.py
file in the same directory. Options are:
.. autoclass:: indivo.data_models.options.DataModelOptions(Type) :no-members: :no-undoc-members: :no-private-members: :no-show-inheritance:
For example, here's our options file for the Problem data model:
from indivo.serializers import DataModelSerializers from indivo.data_models.options import DataModelOptions from indivo.validators import ExactValueValidator SNOMED_URI = 'http://purl.bioontology.org/ontology/SNOMEDCT/' class ProblemSerializers(DataModelSerializers): def to_rdf(queryset, result_count, record=None, carenet=None): # ... our SMART RDF serializer implementation here ... # return 'some RDF' class ProblemOptions(DataModelOptions): model_class_name = 'Problem' serializers = ProblemSerializers field_validators = { 'name_system': [ExactValueValidator(SNOMED_URI)], }
Make sure to restart Indivo for your changes to take effect after you add your extra.py
file--but there's no need to
reset Indivo.
By default, when returning data via the :doc:`generic reporting API </generic-reports>`, Indivo will attempt to serialize data as :ref:`SDMJ <sdmj>` or :ref:`SDMX <sdml>`, depending on the requested response format. If you need your data to come back in other formats, or if the default serializers aren't smart enough to represent your data model correctly, you can implement custom serializers for the data model.
Serializers for a data model are implemented as simple methods that take a Django queryset object, and return a serialized string. For a given data-model, you should define a subclass of :py:class:`indivo.serializers.DataModelSerializers`, and add your desired serializers as methods on the class. Currently, available serializers are:
.. py:function:: to_xml(queryset, result_count, record=None, carenet=None) returns an XML string representing the model objects in *queryset*. :param QuerySet queryset: the objects to serialize :param integer result_count: the total number of items in *queryset* :param Record record: the patient record that the objects belong to, if available. :param Carenet carenet: the Carenet via which the objects have been retrieved, if available. :rtype: string
.. py:function:: to_json(queryset, result_count, record=None, carenet=None) returns a JSON string representing the model objects in *queryset*. :param QuerySet queryset: the objects to serialize :param integer result_count: the total number of items in *queryset* :param Record record: the patient record that the objects belong to, if available. :param Carenet carenet: the Carenet via which the objects have been retrieved, if available. :rtype: string
.. py:function:: to_rdf(queryset, result_count, record=None, carenet=None) returns an RDF/XML string representing the model objects in *queryset*. :param QuerySet queryset: the objects to serialize :param integer result_count: the total number of items in *queryset* :param Record record: the patient record that the objects belong to, if available. :param Carenet carenet: the Carenet via which the objects have been retrieved, if available. :rtype: string
For example, here's a (non-functional) implementation of the serializers for the Problems data-model:
from indivo.serializers import DataModelSerializers class ProblemSerializers(DataModelSerializers): def to_xml(queryset, result_count, record=None, carenet=None): return '''<Problems>...bunch of problems here...</Problems>''' def to_json(queryset, result_count, record=None, carenet=None): return '''[{"Problem": "data here"}, {"Problem": "More data here..."}]''' def to_rdf(queryset, result_count, record=None, carenet=None): return '''<rdf:RDF><rdf:Description rdf:type='indivo:Problem'>...RDF data here...</rdf:Description></rdf:RDF>'''
A couple things to note:
- The
to_*()
methods DO NOT takeself
as their first argument. Under the hood, we actually rip the methods out of the serializers class and attach them directly to the data-model class. - The
model_class_name
attribute is required, and indicates which data-model the serializers should be attached to.
When serializing models, the following libraries can come in handy:
lxml.etree
: Our favorite XML manipulation library. See http://lxml.de/tutorial.html for the details. Lxml is required for a running Indivo instance, so it will always be available for import (from lxml import etree
).simplejson
: Our favorite JSON manipulation library. See http://simplejson.readthedocs.org/en/latest/index.html. Django bundles a version of simplejson, which can be imported withfrom django.utils import simplejson
.rdflib
: Our favorite RDF manipulation library. See http://readthedocs.org/docs/rdflib/en/latest/. RDFLib may not be installed on all systems, so if you use it, make sure to install it first.
Adding custom serializers to a data-model is simple: simply set your :py:class:`~indivo.serializers.DataModelSerializers`
subclass to the serializers
attribute of a :py:class:`~indivo.data_models.options.DataModelOptions` subclass in
an extra.py
file (see :ref:`above <data-model-options>` for info on adding advanced data-model options.
By default, data models defined in SDML are very permissive: all fields are nullable, and there are no constraints on valid data points other than their type (string, date, etc.). In some cases, a data element could satisfy these constraints, but still be invalid. For example, an Indivo Problem must have its name coded using SNOMED, so a problem without a snomed code is invalid.
In such cases, you can attach validators to the data model. Django Validators are essentially just python callables that raise a :py:class:`django.core.exceptions.ValidationError` if they are called on an invalid data point. We've defined a couple of useful validators, though you could use any function you'd like.
For example, here's a validator that will accept only the value 2
:
from django.core.exceptions import ValidationError def validate_2(value): if value != 2: raise ValidationError("Invalid value: %s. Expected 2"%str(value))
Django provides a number of built-in validators, for which a full reference exists here: https://docs.djangoproject.com/en/1.2/ref/validators/#built-in-validators.
In addition, Indivo defines a few useful validators in :py:mod:`indivo.validators`:
.. autoclass:: indivo.validators.ValueInSetValidator(valid_values, nullable=False) :no-members: :no-undoc-members: :no-private-members: :no-show-inheritance:
.. autoclass:: indivo.validators.ExactValueValidator(valid_value, nullable=False) :no-members: :no-undoc-members: :no-private-members: :no-show-inheritance:
Adding custom validators to a data-model is simple: simply add the validator to the field_validators attribute of a
:py:class:`~indivo.data_models.options.DataModelOptions` subclass in an extra.py
file
(see :ref:`above <data-model-options>` for info on adding advanced data-model options).
For example, let's add the requirement that Problem names must be coded as snomed. We can write the validator using the built-in :py:class:`~indivo.validators.ExactValueValidator`:
from indivo.validators import ExactValueValidator SNOMED_URI = 'http://purl.bioontology.org/ontology/SNOMEDCT/' snomed_validator = ExactValueValidator(SNOMED_URI)
We can then attach it to the name_system
field of a Problem, which will guarantee that we only accept problems which
identify themselves as having a snomed code for their names:
class ProblemOptions(DataModelOptions): model_class_name = 'Problem' field_validators = { 'name_system': [snomed_validator] }
Note that we put snomed_validator
in a list, since we might theoretically add additional validators to the
name_system
field.
As of version 1.1 of Indivo X, we've added a feature that makes it much easier to add (in a drag-and-drop fashion) new supported data models to an instance of Indivo. Adding a new data model to Indivo involves:
- Creating the data model definition
- Dropping the data model into the filesystem
- Migrating the database tables to support the new model
As you saw :ref:`above <data-model-definition-types>`, data models can be defined in two formats: SDML or Django model classes. Simply produce a definition in one of the two forms, and save it to a file named model.sdml or model.py.
Indivo data models currently have the following layout on the filesystem:
indivo_server/ indivo/ ... data_models/ core/ allergy/ model.[sdml | py] example.[sdmj | sdmx | py] extra.py ... contrib/
The indivo/data_models/core/
directory contains all of our built-in data models, and you shouldn't modify it.
Since you are 'contributing' a data model to Indivo, add your data model to the indivo/data_models/contrib/
directory.
Simply:
Create a new subdirectory under
indivo/data_models/contrib/
.Drop your model definition into that directory. This file MUST BE NAMED MODEL.PY OR MODEL.SDML to be identified as a data model.
Add (optional) example files into that directory. Files should be named example.sdmj, example.sdmx, or example.py, and should be example instances of the data model as :ref:`SDMJ <sdmj>`, :ref:`SDMX <sdmx>`, or :term:`Fact objects <fact>` respectively. If present, they will help others use and document your data model.
Add an (optional) extras file to the directory. The file must be named extra.py, and may contain extra options for your data-model, such as :ref:`custom serializers <custom-serializers>`.
Your final directory structure should now look something like:
indivo_server/ indivo/ ... data_models/ core/ allergy/ model.[sdml | py] example.[sdmj | sdmx | py] extra.py ... contrib/ your_data_model/ model.[sdml | py] example.[sdmj | sdmx | py] extra.py
Indivo relies on the South migration tool to get the database synced with the latest data models. Once you've dropped your data model into the filesystem, South should be able to detect the necessary changes.
To detect the new model and generate migrations for it, run (from the indivo_server
directory):
python manage.py schemamigration indivo --auto
You should see output like:
+ Added model indivo.YOURMODELNAME Created 0018_auto__add_model_YOURMODELNAME.py. You can now apply this migration with: ./manage.py migrate indivo
To do a quick sanity check that you aren't about to blow away your database, run:
python manage.py migrate indivo --db-dry-run -v2
This should output the SQL that will be run. Make sure this looks reasonable, ESPECIALLY if you are running Indivo on Oracle, where the South tool is still in alpha. If the SQL looks reasonable, go ahead and run the migration, with:
python manage.py migrate indivo
And you're all set!
Make sure to restart Indivo for your changes to take effect.
.. seealso:: | Now you've added a new data model to Indivo: Congratulations! It can be stored in the database and queried via the API. But until you map a :doc:`Schema </schemas/index>` to it, you won't be able to actually add data to your new model. To learn more, see: * :doc:`/data-pipeline` * :ref:`add-schema` * :ref:`add-transform`