## Reading CSV

Not every comma in a CSV file represents the boundary between two cells. CSV files have a set of escape characters that allow you to include commas and other characters as part of the values. The split() method doesn’t handle these escape characters. Because of these potential pitfalls, the csv module provides a more reliable way to read and write CSV files.

In [6]:
import csv
example_file = open('example3.csv')
example_reader = csv.reader(example_file)
example_data = list(example_reader)
example_data


[['4/5/2035 13:34', 'Apples', '73'],
 ['4/5/2035 3:41', 'Cherries', '85'],
 ['4/6/2035 12:46', 'Pears', '14'],
 ['4/8/2035 8:59', 'Oranges', '52'],
 ['4/10/2035 2:07', 'Apples', '152'],
 ['4/10/2035 18:10', 'Bananas', '23'],
 ['4/10/2035 2:40', 'Strawberries', '98']]

In [7]:
# First row, first column
example_data[0][0]

'4/5/2035 13:34'

In [9]:
import csv
example_file = open('example3.csv')
example_reader = csv.reader(example_file)
for row in example_reader:
    print("Row #" + str(example_reader.line_num) + ' ' + str(row))

Row #1 ['4/5/2035 13:34', 'Apples', '73']
Row #2 ['4/5/2035 3:41', 'Cherries', '85']
Row #3 ['4/6/2035 12:46', 'Pears', '14']
Row #4 ['4/8/2035 8:59', 'Oranges', '52']
Row #5 ['4/10/2035 2:07', 'Apples', '152']
Row #6 ['4/10/2035 18:10', 'Bananas', '23']
Row #7 ['4/10/2035 2:40', 'Strawberries', '98']


## Writing CSV

In [10]:
import csv
output_file = open('output.csv', 'w', newline='')
output_writer = csv.writer(output_file)
output_writer.writerow(['spam', 'eggs', 'bacon', 'ham'])

21

In [11]:
output_writer.writerow(['Hello, world!', 'eggs', 'bacon', 'ham'])

32

In [12]:
output_writer.writerow([1, 2, 3.141592, 4])

16

In [13]:
output_file.close()

## Tab separated values

In [16]:
import csv
output_file = open('output.tsv', 'w', newline='')
output_writer = csv.writer(output_file, delimiter='\t', lineterminator='\n\n')
output_writer.writerow(['spam', 'eggs', 'bacon', 'ham'])
output_writer.writerow(['Hello, world!', 'eggs', 'bacon', 'ham'])
output_writer.writerow([1, 2, 3.141592, 4])
output_file.close()

## Handling Header Rows

For CSV files that contain header rows, it’s often more convenient to work with the DictReader and DictWriter objects rather than the reader and writer objects. While reader and writer read and write to CSV file rows by using lists, DictReader and DictWriter perform the same functions using dictionaries, treating the values in the first row as the keys.

In [None]:
import csv
example_file = open('example3.csv')
example_dict_reader = csv.DictReader(example_file)
example_dict_data = list(example_dict_reader)
example_dict_data

[{'4/5/2035 13:34': '4/5/2035 3:41', 'Apples': 'Cherries', '73': '85'},
 {'4/5/2035 13:34': '4/6/2035 12:46', 'Apples': 'Pears', '73': '14'},
 {'4/5/2035 13:34': '4/8/2035 8:59', 'Apples': 'Oranges', '73': '52'},
 {'4/5/2035 13:34': '4/10/2035 2:07', 'Apples': 'Apples', '73': '152'},
 {'4/5/2035 13:34': '4/10/2035 18:10', 'Apples': 'Bananas', '73': '23'},
 {'4/5/2035 13:34': '4/10/2035 2:40', 'Apples': 'Strawberries', '73': '98'}]

In [21]:
example_file = open('example3_dict.csv')
example_dict_reader = csv.DictReader(example_file)
for row in example_dict_reader:
    print(row['Time'], row['Fruit'], row['Price'])

4/5/2035 13:34 Apples 73
4/5/2035 3:41 Cherries 85
4/6/2035 12:46 Pears 14
4/8/2035 8:59 Oranges 52
4/10/2035 2:07 Apples 152
4/10/2035 18:10 Bananas 23
4/10/2035 2:40 Strawberries 98


In [None]:
# If using a csv without header names, you can provide them
example_file = open('example3.csv')
example_dict_reader = csv.DictReader(example_file, ['Time', 'Fruit', 'Price'])
for row in example_dict_reader:
    print(row['Time'], row['Fruit'], row['Price'])

4/5/2035 13:34 Apples 73
4/5/2035 3:41 Cherries 85
4/6/2035 12:46 Pears 14
4/8/2035 8:59 Oranges 52
4/10/2035 2:07 Apples 152
4/10/2035 18:10 Bananas 23
4/10/2035 2:40 Strawberries 98


In [28]:
import csv
output_file = open('dict_output.csv', 'w', newline='')
output_dict_writer = csv.DictWriter(output_file, ['Name', 'Pet', 'Phone'])
output_dict_writer.writeheader()
output_dict_writer.writerow({'Name': 'Alice', 'Pet': 'cat', 'Phone': '555-1234'})
output_dict_writer.writerows([{'Name': 'Bob', 'Phone': '555-9999'}, {'Phone': '555-5555', 'Name': 'Carol', 'Pet': 'dog'}])
output_file.close()

## Project 13: Remove the Header from CSV Files

In [29]:
# Removes the header line from csv files
import csv, os

os.makedirs('headerRemoved', exist_ok=True)

# Loop through every file in the current working directory.
for csv_filename in os.listdir('.'):
    if not csv_filename.endswith('.csv'):
      continue  # Skip non-CSV files.

    print('Removing header from ' + csv_filename + '...')

    # Read the CSV file (skipping the first row).
    csv_rows = []
    csv_file_obj = open(csv_filename)
    reader_obj = csv.reader(csv_file_obj)
    for row in reader_obj:
        if reader_obj.line_num == 1:
            continue  # Skip the first row.
        csv_rows.append(row)
    csv_file_obj.close()

    # Write the CSV file.
    csv_file_obj = open(os.path.join('headerRemoved', csv_filename), 'w', 
                 newline='')
    csv_writer = csv.writer(csv_file_obj)
    for row in csv_rows:
        csv_writer.writerow(row)
    csv_file_obj.close()

Removing header from output.csv...
Removing header from dict_output.csv...
Removing header from example3.csv...
Removing header from example3_dict.csv...


## Versatile Plaintext Formats

CSV files are good for storing rows of data with exactly the same columns

JSON, XML, YAML, TOML
- store a variety of data structures
- are accessible by many programming language
- are easy for people to read
- organise data into key-value pairs and lists

Disadvantages
- Not the most efficient files to work with in terms of disk space or memory

Note
- JSON is simpler than XML and more widely adopted than YAML
- TOML is chiefly used as a format for configuration files

## JSON

This format is commonly used in API responses

Similar to Python dictionary/list layout with some differences:
- uses "null" in place of "none"
- boolean are lowercase "ture" and "false" keywords
- etc

Python’s json module handles the details of translating between a string formatted as JSON data and corresponding Python values with the json.loads() and json.dumps() functions.

In [None]:
{
  "name": "Alice Doe",
  "age": 30,
  "car": null,
  "programmer": true,
  "address": {
    "street": "100 Larkin St.",
    "city": "San Francisco",
    "zip": "94102"
  },
  "phone": [
    {
      "type": "mobile",
      "number": "415-555-7890"
    },
    {
      "type": "work",
      "number": "415-555-1234"
    }
  ]
}

In [None]:
# To translate a string containing JSON data into a Python value, pass it to the json.loads()
import json
json_string = '{"name": "Alice Doe", "age": 30, "car": null, "programmer": true, "address": {"street": "100 Larkin St.", "city": "San Francisco", "zip": "94102"}, "phone": [{"type": "mobile", "number": "415-555-7890"}, {"type": "work", "number": "415-555-1234"}]}'
python_data = json.loads(json_string)
python_data

{'name': 'Alice Doe',
 'age': 30,
 'car': None,
 'programmer': True,
 'address': {'street': '100 Larkin St.',
  'city': 'San Francisco',
  'zip': '94102'},
 'phone': [{'type': 'mobile', 'number': '415-555-7890'},
  {'type': 'work', 'number': '415-555-1234'}]}

In [36]:
import json
json_string = '{"name": "Alice Doe", "age": 30, "car": null, "programmer": true, "address": {"street": "100 Larkin St.", "city": "San Francisco", "zip": "94102"}, "phone": [{"type": "mobile", "number": "415-555-7890"}, {"type": "work", "number": "415-555-1234"}]}'
json_string = json.dumps(python_data)
print(json_string)

{"name": "Alice Doe", "age": 30, "car": null, "programmer": true, "address": {"street": "100 Larkin St.", "city": "San Francisco", "zip": "94102"}, "phone": [{"type": "mobile", "number": "415-555-7890"}, {"type": "work", "number": "415-555-1234"}]}


In [37]:
json_string = json.dumps(python_data, indent=2)
print(json_string)

{
  "name": "Alice Doe",
  "age": 30,
  "car": null,
  "programmer": true,
  "address": {
    "street": "100 Larkin St.",
    "city": "San Francisco",
    "zip": "94102"
  },
  "phone": [
    {
      "type": "mobile",
      "number": "415-555-7890"
    },
    {
      "type": "work",
      "number": "415-555-1234"
    }
  ]
}


## XML

Syntax similar to HTML. involves nesting opening and closing tags inside angle brackets that contain other content. These tags are called elements. 
- SVG image files are made up of text written in XML. 
- The RSS and Atom web feed formats are also written in XML
- Microsoft Word documents are just ZIP files that have the .docx file extension and contain XML files.

We store XML-formatted text in plaintext files with the .xml file extension. 

In [None]:
<person>
    <name>Alice Doe</name>
    <age>30</age>
    <programmer>true</programmer>
    <car xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>
    <address>
        <street>100 Larkin St.</street>
        <city>San Francisco</city>
        <zip>94102</zip>
    </address>
    <phone>
        <phoneEntry>
            <type>mobile</type>
            <number>415-555-7890</number>
        </phoneEntry>
        <phoneEntry>
            <type>work</type>
            <number>415-555-1234</number>
        </phoneEntry>
    </phone>
</person>

Valid XML documents must have a single root element that contains all the other elements, such as the <person> element in this example. A document with multiple root elements like the following is not valid:

In [None]:
<person><name>Alice Doe</name></person>
<person><name>Bob Smith</name></person>
<person><name>Carol Watanabe</name></person>

In [37]:
import xml.etree.ElementTree as ET
xml_string = """<person><name>Alice Doe</name><age>30</age> <programmer>true</programmer><car xsi:nil="true" xmlns:xsi= "http://www.w3.org/2001, XMLSchema-instance”/><address><street> 100 Larkin St.</street><city>San Francisco</city><zip>94102</zip> </address><phone><phoneEntry><type>mobile</type><number>415-555- 7890</number></phoneEntry><phoneEntry><type>work</type><number> 415-555-1234</number></phoneEntry></phone></person>"""
root = ET.fromstring(xml_string)
root

ERROR! Session/line number was not unique in database. History logging moved to new session 709


AssertionError: 

AssertionError: 