# 16) Data in a Box - Persistent Storage <a class="tocSkip">

An active program accesses data stored in Random Access Memory (RAM). RAM is fast, but it is expensive and requires a constant supply of power. Disk drives are slower than RAM but have more capacity, cost less and retain data. Thus a huge amount of effort in computer systems has been devoted to making the best trade-offs between storing data on disk and RAM. As programmers, we need persistence: storing and retrieving data using nonvolatile media such as disks.

### Tabular text files

With simple text files, the only level of organization is the line. Sometimes, you want more structure than that. You might want to save data for your program to use later, or send data to another program. There are many formats, and various ways to distinguish them:

    - A delimiter character like tab ('\t'), comma (',') or vertical bar ('|'). Examples: CSV.
    - '<' and '>' around tags. Examples: XML and HTML.
    - Punctuation. Examples: JavaScript Object Notation (JSON).
    - Indentation. Examples: YAML.

#### CSV

Delimited files are often used as an exchange format for spreadsheets and databases. You could read CSV files manually, but it is better to use the standard csv module, because parsing these files can get complicated. There are some characteristics to keep in mind when working with CSV:

    - Some have alternate delimiters besides a comma, such as tabs and vertical lines.
    - Some have escape sequences. If the delimiter can occur within a field, the entire field might be surrounded by quote characters or preceded by some escape character.
    - Files have different line-ending characters.
    - The first line may contain column names.

In [1]:
import csv

In [9]:
# Creating a csv

names = [['Mr', 'Joe', 'Cooke'],
        ['Miss', 'Sarah', 'Coe'],
        ['Mr', 'John', 'Smith'],
        ['Mrs', 'Alice', 'Thompson']]

with open('names', 'wt') as fout:
    csvout = csv.writer(fout)
    csvout.writerows(names)

In [10]:
# Reading back the csv

with open('names', 'rt') as fin:
    cin = csv.reader(fin)
    names = [row for row in cin]
    
names

[['Mr', 'Joe', 'Cooke'],
 [],
 ['Miss', 'Sarah', 'Coe'],
 [],
 ['Mr', 'John', 'Smith'],
 [],
 ['Mrs', 'Alice', 'Thompson'],
 []]

In [11]:
# Reading back the csv as dictionaries

with open('names', 'rt') as fin:
    cin = csv.DictReader(fin, fieldnames = ['title', 'first', 'last'])
    names = [row for row in cin]
    
names

[{'title': 'Mr', 'first': 'Joe', 'last': 'Cooke'},
 {'title': 'Miss', 'first': 'Sarah', 'last': 'Coe'},
 {'title': 'Mr', 'first': 'John', 'last': 'Smith'},
 {'title': 'Mrs', 'first': 'Alice', 'last': 'Thompson'}]

We can avoid having to be explicit about the column names by outputting the csv with field names to begin with. We do this by using DictWriter():

In [13]:
# Outputting the csv with DictWriter() and reading the csv back

names = [{'title': 'Mr', 'first': 'Joe', 'last': 'Cooke'},
        {'title': 'Miss', 'first': 'Sarah', 'last': 'Coe'},
        {'title': 'Mr', 'first': 'John', 'last': 'Smith'},
        {'title': 'Mrs', 'first': 'Alice', 'last': 'Thompson'}]

with open('names.csv', 'wt') as fout:
    cout = csv.DictWriter(fout, ['title', 'first', 'last'])
    cout.writeheader()
    cout.writerows(names)
    
with open('names.csv', 'rt') as fin:
    cin = csv.DictReader(fin)
    names = [row for row in cin]
    
names

[{'title': 'Mr', 'first': 'Joe', 'last': 'Cooke'},
 {'title': 'Miss', 'first': 'Sarah', 'last': 'Coe'},
 {'title': 'Mr', 'first': 'John', 'last': 'Smith'},
 {'title': 'Mrs', 'first': 'Alice', 'last': 'Thompson'}]

#### XML

Delimited files convey only two dimensions: rows and columns. If you want to exchange data structures among programs, you need a way to encode hierarchies, sequences, sets and other structures as text. XML is a markup format that uses tags to delimit data. Below is an example .xml file:

Following are a few important characteristics of XML:
    
    - Tags begin with a < character. The tags were menu, breakfast, lunch, dinner and item.
    - Whitespace is ignored.
    - Usually a start tag is followed by other content and then a final matching end tag.
    - Tags can nest within other tags to any level.
    - Optional attributes can occur within the start tag, such as price in the above.
    - Tags can contain values. Each item has a value such as "pancakes".

XML is often used for data feeds and messages. The simplest way to parse XML in Python is by using the standard ElementTree module. For each element in the nested lists, tag is the tag string and attrib is a dictionary of its attributes. Other standard Python XML libraries include xml.dom and xml.sex. Simple API for XML (SAX) parses XML on the fly, so it does not have to load everything into memory at once. Therefore, it can be a good choice if you need to process very large streams of XML.

#### JSON

JavaScript Object Notation (JSON) has become a very popular data interchange format, beyond its JavaScript origins. The JSON format is a subset of JavaScript and often legal Python syntax as well. Its close fit to Python makes it a good choice for data interchange among programs. There is one main JSON module, json. The program below encodes data to a JSON string and decodes a JSON string back to data:

In [20]:
import json

In [19]:
# Defining the earlier menu

menu = \
{
    "breakfast": {
        "hours": "7 - 11",
        "items": {
            "pancakes": "$5.00",
            "sausage and eggs": "$4.00"
            }
    },
    "lunch": {
        "hours": "12 - 3",
        "items": {
            "burger": "$8.00"
            }
    },
    "dinner": {
        "hours": "5 - 9",
        "items": {
            "spaghetti": "$10.00"
        }
    }
}

menu

{'breakfast': {'hours': '7 - 11',
  'items': {'pancakes': '$5.00', 'sausage and eggs': '$4.00'}},
 'lunch': {'hours': '12 - 3', 'items': {'burger': '$8.00'}},
 'dinner': {'hours': '5 - 9', 'items': {'spaghetti': '$10.00'}}}

In [21]:
# Encoding to JSON

menu_json = json.dumps(menu)
menu_json

'{"breakfast": {"hours": "7 - 11", "items": {"pancakes": "$5.00", "sausage and eggs": "$4.00"}}, "lunch": {"hours": "12 - 3", "items": {"burger": "$8.00"}}, "dinner": {"hours": "5 - 9", "items": {"spaghetti": "$10.00"}}}'

In [22]:
# Decoding to Python data structure

menu2 = json.loads(menu_json)
menu2

{'breakfast': {'hours': '7 - 11',
  'items': {'pancakes': '$5.00', 'sausage and eggs': '$4.00'}},
 'lunch': {'hours': '12 - 3', 'items': {'burger': '$8.00'}},
 'dinner': {'hours': '5 - 9', 'items': {'spaghetti': '$10.00'}}}

#### YAML

Similar to JSON, YAML has keys and values, but handles more data types such as dates and times. The following is an example of a YAML file about the poet James McIntyre: