# Table of Contents
* [Machine and Human Readable Formats](#Machine-and-Human-Readable-Formats)
* [Learning Objectives:](#Learning-Objectives:)
	* [Scale of difficulty](#Scale-of-difficulty)
	* [Common uses](#Common-uses)
	* [Terms](#Terms)
	* [JSON : JavaScript Object Notation](#JSON-:-JavaScript-Object-Notation)
		* [Why JSON?](#Why-JSON?)
		* [Why not JSON?](#Why-not-JSON?)
	* [YAML: YAML Ain't Markup Language](#YAML:-YAML-Ain't-Markup-Language)
		* [Why YAML?](#Why-YAML?)
		* [Why not YAML?](#Why-not-YAML?)
	* [XML: eXtensible Markup Language](#XML:-eXtensible-Markup-Language)
		* [Why (should you use) XML?](#Why-%28should-you-use%29-XML?)
		* [Why (should you) not (use) XML?](#Why-%28should-you%29-not-%28use%29-XML?)


# Machine and Human Readable Formats

# Learning Objectives:

* Learn what JSON, YAML, and XML are
* Learn when and why to use them
* Learn how to manipulate and construct each type
* Learn the limitations and risks associated with each

## Scale of difficulty

1. JSON (easiest)
2. YAML
3. XML

## Common uses

* JSON is used to great success in programmatic web design (REST APIs for example)
* XML is used for heavyweight 
* YAML for config files

## Terms

1. Serialization
  * Serialization is the process of translating data structures or object state into a format that can be stored for later reconstruction and use. [Wikipedia](https://en.wikipedia.org/wiki/Serialization)
2. Markup Language
  * A markup language is a system for annotating a document in a way that is syntactically distinguishable from the text. [Wikipedia](https://en.wikipedia.org/wiki/Markup_language)

## JSON : JavaScript Object Notation   

What is [JSON](http://json.org/)? 
JSON is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate.

### Why JSON?

* It has a 5 page specification
  * Easy to parse, and therefore very fast to parse
* It is cross-language (i.e. every major and lots of minor ones has a json encoder and decoder)
* Simple structure, and easy to understand

### Why not JSON?

* No NaN
* Everything is a string (which means that information will be lost when converting to JSON)
  * You need to keep data types sometimes
* The kind of information that can be "JSONified" is more limited
* Simple structure, can be difficult to represent complex or interdependent structure
*

## YAML: YAML Ain't Markup Language

What is [YAML](http://www.yaml.org/spec/1.2/spec.html)?
YAML is a **data serialization** language, **not** a markup language

### Why YAML?

* YAML is a superset of JSON
  * YAML imposes additional constraints on input data that JSON doesn't, like the uniqueness of keys.
* YAML is easy for a human to read
* Indentation matters (just like in Python)
* It has datatypes

### Why not YAML?

* Not nearly as widely adopted as JSON or XML
*

## XML: eXtensible Markup Language

What is [XML](http://www.w3.org/TR/2008/REC-xml-20081126/#sec-intro)?

### Why (should you use) XML?

* Very stable and capable
* Wide adoption
* Structure can be pre-defined and enforce with DTDs (Document Type Definitions)

### Why (should you) not (use) XML?

Examples:
* OOXML (Microsoft Office)
* ODT
* RSS
* XHTML
* SVG (Scalable Vector Graphics)

If you do not have lxml installed in your conda environment run
```
% conda install -y lxml
```

In [None]:
import json
import yaml
import lxml.etree

json_first = '''{"libraries":["numpy", "scipy"], 
               "dependencies": ["fftw", "mkl"], 
               "name":"my_new_module"}'''
                
#What is the difference between json_first and json_second?
json_second = {"libraries":["numpy", "scipy"], 
               "dependencies": ["fftw", "mkl"], 
               "name":"my_new_module"}

my_dict = {"libraries":["numpy", "scipy"], 
           "dependencies": ["fftw", "mkl"], 
           "name":"my_new_module"}

yamlized = yaml.dump(my_dict)
jsonized = json.dumps(my_dict)

xml = lxml.etree.Element("module")
xml.append(lxml.etree.Element("name"))
xml[-1].text="my_new_module"

xml.append(lxml.etree.Element("libraries"))
xml[-1].text = "numpy"
xml.append(lxml.etree.Element("libraries"))
xml[-1].text = "scipy"

xml.append(lxml.etree.Element("dependencies"))
xml[-1].text = "fftw"
xml.append(lxml.etree.Element("dependencies"))
xml[-1].text = "mkl"

xmlized = lxml.etree.tostring(xml)