_HDS5210 - Programming for Health Data Scientists_

# Week 9 - Data Structures - JSON

JSON is the abbreviation for Javascript Object Notation.  JSON is a very common way that web-based applications communication information to eachother and the native way that web browsers manage dynamic data internally (even though webpage content is written in HTML - a form of XML).

In this part of the lecture, we'll be working on reading / processing / writing JSON.


## The json module

https://docs.python.org/3/library/json.html

In [1]:
import json

In [2]:
help(json)

Help on package json:

NAME
    json

DESCRIPTION
    JSON (JavaScript Object Notation) <http://json.org> is a subset of
    JavaScript syntax (ECMA-262 3rd edition) used as a lightweight data
    interchange format.
    
    :mod:`json` exposes an API familiar to users of the standard library
    :mod:`marshal` and :mod:`pickle` modules.  It is derived from a
    version of the externally maintained simplejson library.
    
    Encoding basic Python object hierarchies::
    
        >>> import json
        >>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])
        '["foo", {"bar": ["baz", null, 1.0, 2]}]'
        >>> print(json.dumps("\"foo\bar"))
        "\"foo\bar"
        >>> print(json.dumps('\u1234'))
        "\u1234"
        >>> print(json.dumps('\\'))
        "\\"
        >>> print(json.dumps({"c": 0, "b": 0, "a": 0}, sort_keys=True))
        {"a": 0, "b": 0, "c": 0}
        >>> from io import StringIO
        >>> io = StringIO()
        >>> json.dump(['streaming API'], io

In [3]:
dosages = [
    dict( drug="Aspirin", amount=100, mass_unit="mg", time_unit="hr"),
    dict( drug="Digoxin", amount=50,  mass_unit="mg", time_unit="hr")
]

In [7]:
print(json.dumps(dosages))

[{"drug": "Aspirin", "mass_unit": "mg", "time_unit": "hr", "amount": 100}, {"drug": "Digoxin", "mass_unit": "mg", "time_unit": "hr", "amount": 50}]


In [6]:
print(json.dumps(dosages, indent=4))

[
    {
        "drug": "Aspirin",
        "mass_unit": "mg",
        "time_unit": "hr",
        "amount": 100
    },
    {
        "drug": "Digoxin",
        "mass_unit": "mg",
        "time_unit": "hr",
        "amount": 50
    }
]


In [9]:
with open('dosages.json','w') as dosefile:
    json.dump(dosages, dosefile, indent=4)

In [10]:
%%bash
cat dosages.json

[
    {
        "drug": "Aspirin",
        "mass_unit": "mg",
        "time_unit": "hr",
        "amount": 100
    },
    {
        "drug": "Digoxin",
        "mass_unit": "mg",
        "time_unit": "hr",
        "amount": 50
    }
]

## Reading JSON into Python

In [11]:
with open('dosages.json') as dosefile:
    d = json.load(dosefile)
d

[{'amount': 100, 'drug': 'Aspirin', 'mass_unit': 'mg', 'time_unit': 'hr'},
 {'amount': 50, 'drug': 'Digoxin', 'mass_unit': 'mg', 'time_unit': 'hr'}]

## JSON in Healthcare

A new part of the HL7 standard is something called FHIR ("fire").  I've downloaded and stored a sample FHIR document in `/samples/patient-example-a.json`

https://www.hl7.org/fhir/patient-example-a.json.html

In [12]:
with open('/samples/patient-example-a.json') as patfile:
    pat = json.load(patfile)

In [13]:
type(pat)

dict

In [14]:
pat.keys()

dict_keys(['resourceType', 'link', 'active', 'managingOrganization', 'id', 'text', 'photo', 'name', 'identifier', 'contact', 'gender'])

In [15]:
pat['gender']

'male'

In [16]:
pat['name']

[{'family': ['Donald'], 'given': ['Duck'], 'use': 'official'}]

In [19]:
pat['name'][0]['family'][0] + ' ' + pat['name'][0]['given'][0]

'Donald Duck'

## Another HL7 FHIR Example

https://www.hl7.org/fhir/patient-example-f001-pieter.json.html

`/samples/patient-example-f001-pieter.json`


In [20]:
with open('/samples/patient-example-f001-pieter.json') as patfile:
    pat = json.load(patfile)

print(json.dumps(pat, indent=4))

{
    "resourceType": "Patient",
    "active": true,
    "deceasedBoolean": false,
    "multipleBirthBoolean": true,
    "id": "f001",
    "name": [
        {
            "family": [
                "van de Heuvel"
            ],
            "given": [
                "Pieter"
            ],
            "suffix": [
                "MSc"
            ],
            "use": "usual"
        }
    ],
    "identifier": [
        {
            "value": "738472983",
            "_value": {
                "fhir_comments": [
                    "    BSN identification system    "
                ]
            },
            "system": "urn:oid:2.16.840.1.113883.2.4.6.3",
            "use": "usual"
        },
        {
            "fhir_comments": [
                "    BSN identification system    "
            ],
            "system": "urn:oid:2.16.840.1.113883.2.4.6.3",
            "use": "usual"
        }
    ],
    "gender": "male",
    "telecom": [
        {
            "value": "0648352638"

In [21]:
print(pat['name'])

[{'family': ['van de Heuvel'], 'given': ['Pieter'], 'suffix': ['MSc'], 'use': 'usual'}]
