# File Formats 2: Structured Data

## Json

Another good common data format is Json.

This allows us to represent data which is combinations of lists and dictionaries as a text file which
looks a bit like a Javascript (or Python) data literal.

In [52]:
import json

Any nested group of dictionaries and lists can be saved:

In [53]:
mydata =  {'key': ['value1', 'value2'], 'key2': {'key4':'value3'}}

In [54]:
json.dumps(mydata)

'{"key2": {"key4": "value3"}, "key": ["value1", "value2"]}'

Loading data is also really easy:

In [55]:
%%writefile myfile.json
{
    "somekey": ["a list", "with values"]
}

Overwriting myfile.json


In [60]:
mydata=json.load(open('myfile.json'))

In [61]:
mydata

{u'somekey': [u'a list', u'with values']}

In [62]:
mydata['somekey']

[u'a list', u'with values']

This is a very nice solution for loading and saving python datastructures.

It's a very common way of transferring data on the internet, and of saving datasets to disk.

There's good support in most languages, so it's a nice inter-language file interchange format.

## Yaml

Yaml is a very similar dataformat to Json, with some nice additions:

* You don't need to quote strings if they don't have funny characters in
* You can have comment lines, beginning with a #
* You can write dictionaries without the curly brackets: it just notices the colons.
* You can write lists like this:

In [28]:
%%writefile myfile.yaml
somekey:
    - a list # Look, this is a list
    - with values

Overwriting myfile.yaml


In [29]:
import yaml

In [30]:
yaml.load(open('myfile.yaml'))

{'somekey': ['a list', 'with values']}

Yaml is my favourite format for ad-hoc datafiles, but the library doesn't ship with default Python, (though it is part
of Anaconda and Canopy) so some people still prefer Json for it's univerality.

Because Yaml gives the **option** of serialising a list either as newlines with dashes, *or* with square brackets,
you can control this choice:

In [34]:
yaml.dump(mydata)

'key: [value1, value2]\nkey2: {key4: value3}\n'

In [35]:
yaml.dump(mydata, default_flow_style=False)

'key:\n- value1\n- value2\nkey2:\n  key4: value3\n'

*Supplementary material*: [XML](http://www.w3schools.com/xml/) is another popular choice when saving nested data structures. 
It's very careful, but verbose. If your field uses XML data, you'll need to learn a [python XML parser](https://docs.python.org/2/library/xml.etree.elementtree.html),
(there are a few), and about how XML works.

## Exercise:

Use YAML and JSON to save your maze datastructure to disk and load it again.

## Exercise:

GeoJSON is a json-based file format for sharing geographic data. One example dataset is the USGS earthquake data:

In [58]:
import requests
quakes=requests.get("http://earthquake.usgs.gov/fdsnws/event/1/query.geojson",
                    params={
        'starttime':"2000-01-01",
        "maxlatitude":"58.723",
        "minlatitude":"50.008",
        "maxlongitude":"1.67",
        "minlongitude":"-9.756",
        "minmagnitude":"1",
        "endtime":"2015-07-13",
        "orderby":"time-asc"}
                   )

In [63]:
quakes.text

u'{"type":"FeatureCollection","metadata":{"generated":1436872441000,"url":"http://earthquake.usgs.gov/fdsnws/event/1/query.geojson?orderby=time-asc&maxlongitude=1.67&minlatitude=50.008&minlongitude=-9.756&maxlatitude=58.723&minmagnitude=1&starttime=2000-01-01&endtime=2015-07-13","title":"USGS Earthquakes","status":200,"api":"1.0.17","count":110},"features":[{"type":"Feature","properties":{"mag":2.6,"place":"England, United Kingdom","time":956553055700,"updated":1415322596133,"tz":null,"url":"http://earthquake.usgs.gov/earthquakes/eventpage/usp0009rst","detail":"http://earthquake.usgs.gov/fdsnws/event/1/query?eventid=usp0009rst&format=geojson","felt":null,"cdi":null,"mmi":null,"alert":null,"status":"reviewed","tsunami":0,"sig":104,"net":"us","code":"p0009rst","ids":",usp0009rst,","sources":",us,","types":",impact-text,origin,phase-data,","nst":null,"dmin":null,"rms":null,"gap":null,"magType":"ml","type":"earthquake","title":"M 2.6 - England, United Kingdom"},"geometry":{"type":"Point","

Your exercise: determine the location of the largest magnitude earthquake in the UK this century.

You'll need to:
* Get the text of the web result
* Parse the data as JSON
* Understand how the data is structured into dictionaries and lists
   * Where is the magnitude?
   * Where is the place description or coordinates?
* Program a search through all the quakes to find the biggest quake.
* Find the place of the biggest quake
* Decide how to display it? Latitude and longitude? A placename? A map?
* Display it.