_HDS5210 - Programming for Health Data Scientists_

# Week 9 - Data Structures - JSON

JSON is the abbreviation for Javascript Object Notation.  JSON is a very common way that web-based applications communication information to eachother and the native way that web browsers manage dynamic data internally (even though webpage content is written in HTML - a form of XML).

In this part of the lecture, we'll be working on reading / processing / writing JSON.


## The json module

https://docs.python.org/3/library/json.html

In [1]:
import json

In [2]:
help(json)

Help on package json:

NAME
    json

DESCRIPTION
    JSON (JavaScript Object Notation) <http://json.org> is a subset of
    JavaScript syntax (ECMA-262 3rd edition) used as a lightweight data
    interchange format.
    
    :mod:`json` exposes an API familiar to users of the standard library
    :mod:`marshal` and :mod:`pickle` modules.  It is derived from a
    version of the externally maintained simplejson library.
    
    Encoding basic Python object hierarchies::
    
        >>> import json
        >>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])
        '["foo", {"bar": ["baz", null, 1.0, 2]}]'
        >>> print(json.dumps("\"foo\bar"))
        "\"foo\bar"
        >>> print(json.dumps('\u1234'))
        "\u1234"
        >>> print(json.dumps('\\'))
        "\\"
        >>> print(json.dumps({"c": 0, "b": 0, "a": 0}, sort_keys=True))
        {"a": 0, "b": 0, "c": 0}
        >>> from io import StringIO
        >>> io = StringIO()
        >>> json.dump(['streaming API'], io

In [3]:
dosages = [
    dict( drug="Aspirin", amount=100, mass_unit="mg", time_unit="hr"),
    dict( drug="Digoxin", amount=50,  mass_unit="mg", time_unit="hr")
]

In [4]:
dosages

[{'amount': 100, 'drug': 'Aspirin', 'mass_unit': 'mg', 'time_unit': 'hr'},
 {'amount': 50, 'drug': 'Digoxin', 'mass_unit': 'mg', 'time_unit': 'hr'}]

In [5]:
print(json.dumps(dosages))

[{"mass_unit": "mg", "amount": 100, "time_unit": "hr", "drug": "Aspirin"}, {"mass_unit": "mg", "amount": 50, "time_unit": "hr", "drug": "Digoxin"}]


In [12]:
type(dosages)

list

In [14]:
dosages

[{'amount': 100, 'drug': 'Aspirin', 'mass_unit': 'mg', 'time_unit': 'hr'},
 {'amount': 50, 'drug': 'Digoxin', 'mass_unit': 'mg', 'time_unit': 'hr'}]

In [16]:
json.dumps(dosages, indent=4)

'[\n    {\n        "mass_unit": "mg",\n        "amount": 100,\n        "time_unit": "hr",\n        "drug": "Aspirin"\n    },\n    {\n        "mass_unit": "mg",\n        "amount": 50,\n        "time_unit": "hr",\n        "drug": "Digoxin"\n    }\n]'

In [17]:
json.dumps(dosages, indent=4)[0:10]

'[\n    {\n  '

In [24]:
with open('dosages.json','w') as dosefile:
    json.dump(dosages, dosefile, indent=4)

In [25]:
%%bash
cat dosages.json

[
    {
        "mass_unit": "mg",
        "amount": 100,
        "time_unit": "hr",
        "drug": "Aspirin"
    },
    {
        "mass_unit": "mg",
        "amount": 50,
        "time_unit": "hr",
        "drug": "Digoxin"
    }
]

## Reading JSON into Python

In [26]:
with open('dosages.json') as dosefile:
    d = json.load(dosefile)
d

[{'amount': 100, 'drug': 'Aspirin', 'mass_unit': 'mg', 'time_unit': 'hr'},
 {'amount': 50, 'drug': 'Digoxin', 'mass_unit': 'mg', 'time_unit': 'hr'}]

## JSON in Healthcare

A new part of the HL7 standard is something called FHIR ("fire").  I've downloaded and stored a sample FHIR document in `/samples/patient-example-a.json`

https://www.hl7.org/fhir/patient-example-a.json.html

In [27]:
with open('/samples/patient-example-a.json') as patfile:
    pat = json.load(patfile)

In [28]:
type(pat)

dict

In [29]:
pat.keys()

dict_keys(['active', 'name', 'identifier', 'resourceType', 'link', 'gender', 'managingOrganization', 'text', 'contact', 'id', 'photo'])

In [30]:
pat['gender']

'male'

In [31]:
pat['name']

[{'family': ['Donald'], 'given': ['Duck'], 'use': 'official'}]

In [33]:
pat['name'][0]

{'family': ['Donald'], 'given': ['Duck'], 'use': 'official'}

In [34]:
pat['name'][0]['family']

['Donald']

In [35]:
pat['name'][0]['family'][0]

'Donald'

In [32]:
pat['name'][0]['family'][0] + ' ' + pat['name'][0]['given'][0]

'Donald Duck'

In [36]:
pat['name'].append({'use':'alias', 'family':['Mickey'], 'given':['Mouse']})

In [37]:
pat['name']

[{'family': ['Donald'], 'given': ['Duck'], 'use': 'official'},
 {'family': ['Mickey'], 'given': ['Mouse'], 'use': 'alias'}]

In [38]:
pat['name'][1]['family'][0] + ' ' + pat['name'][1]['given'][0]

'Mickey Mouse'

In [41]:
pat['problem'] = ['annoying', 'yellow']

In [42]:
pat.keys()

dict_keys(['active', 'resourceType', 'link', 'managingOrganization', 'id', 'photo', 'gender', 'identifier', 'problem', 'name', 'contact', 'text'])

In [43]:
print(json.dumps(pat, indent=2))

{
  "active": true,
  "resourceType": "Patient",
  "link": [
    {
      "other": {
        "reference": "Patient/pat2"
      },
      "type": "seealso"
    }
  ],
  "managingOrganization": {
    "display": "ACME Healthcare, Inc",
    "reference": "Organization/1"
  },
  "id": "pat1",
  "photo": [
    {
      "data": "R0lGODlhEwARAPcAAAAAAAAA/+9aAO+1AP/WAP/eAP/eCP/eEP/eGP/nAP/nCP/nEP/nIf/nKf/nUv/nWv/vAP/vCP/vEP/vGP/vIf/vKf/vMf/vOf/vWv/vY//va//vjP/3c//3lP/3nP//tf//vf//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

## Another HL7 FHIR Example

https://www.hl7.org/fhir/patient-example-f001-pieter.json.html

`/samples/patient-example-f001-pieter.json`


In [44]:
with open('/samples/patient-example-f001-pieter.json') as patfile:
    pat = json.load(patfile)

print(json.dumps(pat, indent=4))

{
    "active": true,
    "telecom": [
        {
            "system": "phone",
            "value": "0648352638",
            "use": "mobile"
        },
        {
            "system": "email",
            "value": "p.heuvel@gmail.com",
            "use": "home"
        }
    ],
    "resourceType": "Patient",
    "communication": [
        {
            "preferred": true,
            "language": {
                "text": "Nederlands",
                "coding": [
                    {
                        "code": "nl",
                        "system": "urn:ietf:bcp:47",
                        "_code": {
                            "fhir_comments": [
                                "    IETF language tag    "
                            ]
                        },
                        "display": "Dutch"
                    }
                ]
            }
        }
    ],
    "managingOrganization": {
        "display": "Burgers University Medical Centre",
        "reference":

In [45]:
print(pat['name'])

[{'use': 'usual', 'given': ['Pieter'], 'suffix': ['MSc'], 'family': ['van de Heuvel']}]


# Load from string

In [49]:
s = '{ "one": 1, "two": 2}'

In [50]:
s

'{ "one": 1, "two": 2}'

In [51]:
s_obj = json.loads(s)w

In [52]:
s_obj

{'one': 1, 'two': 2}

In [53]:
type(s_obj)

dict