## Structured Data

### Structured data

CSV files can only model data where each record has several fields, and each field is a simple datatype,
a string or number.

We often want to store data which is more complicated than this, with nested structures of lists and dictionaries.
Structured data formats like Json, YAML, and XML are designed for this.

### Json

A very common structured data format is JSON.

This allows us to represent data which is combinations of lists and dictionaries as a text file which
looks a bit like a Javascript (or Python) data literal.

In [4]:
import json

Any nested group of dictionaries and lists can be saved:

In [5]:
mydata =  {'key': ['value1', 'value2'], 
           'key2': {'key4':'value3'}}

In [6]:
json.dumps(mydata)

'{"key": ["value1", "value2"], "key2": {"key4": "value3"}}'

Loading data is also really easy:

In [7]:
%%writefile myfile.json
{
    "somekey": ["a list", "with values"]
}

Writing myfile.json


In [8]:
mydataasstring=open('myfile.json').read()

In [9]:
mydataasstring

'{\n    "somekey": ["a list", "with values"]\n}'

In [10]:
mydata = json.loads(mydataasstring)

In [11]:
mydata['somekey']

['a list', 'with values']

This is a very nice solution for loading and saving python datastructures.

It's a very common way of transferring data on the internet, and of saving datasets to disk.

There's good support in most languages, so it's a nice inter-language file interchange format.

### Yaml

Yaml is a very similar dataformat to Json, with some nice additions:

* You don't need to quote strings if they don't have funny characters in
* You can have comment lines, beginning with a #
* You can write dictionaries without the curly brackets: it just notices the colons.
* You can write lists like this:

In [12]:
%%writefile myfile.yaml
somekey:
    - a list # Look, this is a list
    - with values

Writing myfile.yaml


In [15]:
%%bash

pip install yaml

Collecting yaml


  Could not find a version that satisfies the requirement yaml (from versions: )
No matching distribution found for yaml


In [16]:
%%bash

pip install pyyaml

Collecting pyyaml
  Downloading PyYAML-3.12.tar.gz (253kB)
Building wheels for collected packages: pyyaml
  Running setup.py bdist_wheel for pyyaml: started
  Running setup.py bdist_wheel for pyyaml: finished with status 'done'
  Stored in directory: /Users/blackswan/Library/Caches/pip/wheels/2c/f7/79/13f3a12cd723892437c0cfbde1230ab4d82947ff7b3839a4fc
Successfully built pyyaml
Installing collected packages: pyyaml
Successfully installed pyyaml-3.12


In [17]:
import yaml

In [18]:
yaml.load(open('myfile.yaml'))

{'somekey': ['a list', 'with values']}

Yaml is my favourite format for ad-hoc datafiles, but the library doesn't ship with default Python, (though it is part
of Anaconda and Canopy) so some people still prefer Json for it's univerality.

Because Yaml gives the **option** of serialising a list either as newlines with dashes, *or* with square brackets,
you can control this choice:

In [19]:
yaml.safe_dump(mydata)

'somekey: [a list, with values]\n'

In [20]:
yaml.safe_dump(mydata, default_flow_style=False)

'somekey:\n- a list\n- with values\n'

### XML

*Supplementary material*: [XML](http://www.w3schools.com/xml/) is another popular choice when saving nested data structures. 
It's very careful, but verbose. If your field uses XML data, you'll need to learn a [python XML parser](https://docs.python.org/2/library/xml.etree.elementtree.html),
(there are a few), and about how XML works.

### Exercise: Saving and loading data

Use YAML or JSON to save your maze datastructure to disk and load it again.

In [21]:
myhouse = {
    'living' : {
        'exits': {
            'north' : 'kitchen',
            'outside' : 'garden',
            'upstairs' : 'bedroom'
        },
        'people' : ['James'],
        'capacity' : 2
    },
    'kitchen' : {
        'exits': {
            'south' : 'living'
        },
        'people' : [],
        'capacity' : 1
    },
    'garden' : {
        'exits': {
            'inside' : 'living'
        },
        'people' : ['Sue'],
        'capacity' : 3
    },
    'bedroom' : {
        'exits': {
            'downstairs' : 'living',
            'jump' : 'garden'
        },
        'people' : [],
        'capacity' : 1
    }
}

#### JSON attempt

In [28]:
%%writefile ./examples/myhouse.json

json.dumps(myhouse)

Overwriting ./examples/myhouse.json


In [32]:
with open("./examples/myhouse.json") as f:
    
    fromJSON = json.loads(f)
    f.close()
    
fromJSON

TypeError: the JSON object must be str, bytes or bytearray, not 'TextIOWrapper'

In [36]:
with open("./examples/myhouse.json", "r") as f:
    
    fromJSON = f.read()
    f.close()
    
fromJSON

'\njson.dumps(myhouse)'

In [37]:
with open("./examples/myhouse_correct.json", "w") as f:
    
    f.write(json.dumps(myhouse))
    f.close()

In [38]:
with open("./examples/myhouse_correct.json", "r") as f:
    
    fromJSON = json.loads(f.read())
    f.close()
    
fromJSON

{'bedroom': {'capacity': 1,
  'exits': {'downstairs': 'living', 'jump': 'garden'},
  'people': []},
 'garden': {'capacity': 3, 'exits': {'inside': 'living'}, 'people': ['Sue']},
 'kitchen': {'capacity': 1, 'exits': {'south': 'living'}, 'people': []},
 'living': {'capacity': 2,
  'exits': {'north': 'kitchen', 'outside': 'garden', 'upstairs': 'bedroom'},
  'people': ['James']}}

#### YAML attempt

In [50]:
%%writefile ./examples/myhouse.yaml

{
    'living' : {
        'exits': {
            'north' : 'kitchen',
            'outside' : 'garden',
            'upstairs' : 'bedroom'
        },
        'people' : ['James'],
        'capacity' : 2
    },
    'kitchen' : {
        'exits': {
            'south' : 'living'
        },
        'people' : [],
        'capacity' : 1
    },
    'garden' : {
        'exits': {
            'inside' : 'living'
        },
        'people' : ['Sue'],
        'capacity' : 3
    },
    'bedroom' : {
        'exits': {
            'downstairs' : 'living',
            'jump' : 'garden'
        },
        'people' : [],
        'capacity' : 1
    }
}

Overwriting ./examples/myhouse.yaml


In [53]:
with open("./examples/myhouse.yaml", "r") as f:
    
    print(yaml.load(f))
    f.close()


{'bedroom': {'capacity': 1, 'exits': {'downstairs': 'living', 'jump': 'garden'}, 'people': []}, 'garden': {'capacity': 3, 'exits': {'inside': 'living'}, 'people': ['Sue']}, 'kitchen': {'capacity': 1, 'exits': {'south': 'living'}, 'people': []}, 'living': {'capacity': 2, 'exits': {'north': 'kitchen', 'outside': 'garden', 'upstairs': 'bedroom'}, 'people': ['James']}}


In [52]:
with open("./examples/myhouse.yaml", "w") as f:
    
    f.write(yaml.safe_dump(myhouse))
    f.close()
    