# Reading and Writing Files

open() returns a file object, and is most commonly used with two arguments: open(filename, mode).

When you’re done with a file, call f.close() to close it and free up any system resources taken up by the open file. After calling f.close(), attempts to use the file object will automatically fail.

In [2]:
f = open('data/example.txt', 'r')
f.close()
f.read()

ValueError: I/O operation on closed file

Now let's create a file and write some data in the file.

First, we define a variable with the file path because we are going to use it quite a lot.

In [3]:
filepath = 'data/example.txt'

It is good practice to use the with keyword when dealing with file objects. This has the advantage that the file is properly closed after the block, even if an exception is raised on the way. It is also much shorter than writing equivalent try-finally blocks.


In [4]:
with open(filepath, 'w') as f:
    f.write('some text here\n')
    f.write('more text in a new line.')
f.write('laksdfjlasjdf')


ValueError: I/O operation on closed file

It's the same as

In [5]:
f = open(filepath, 'w')
f.write('some text here\n')
f.write('more text in a new line.')
f.close()

Remember that both python and bash can be used in Jupyter notebooks.
An example in bash:

In [6]:
!cat $filepath

some text here
more text in a new line.

**read** reads ALL the contents of the file. It is not a good practice to use it because all the file is loaded in memory.

In [7]:
with open(filepath, 'r') as f:
    print f.read() # NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

some text here
more text in a new line.


**readline** reads a single line from the file, the last read character is \n.

In [8]:
with open(filepath, 'r') as f:
    print f.readline()

some text here



For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code:

In [9]:
with open(filepath, 'r') as f:
    for line in f:
        print 'anotherline: ' + line

anotherline: some text here

anotherline: more text in a new line.


Sometimes you need to work with lines. In this example, last character (\n) i removed before printing the data.

In [10]:
with open(filepath, 'r') as f:
    for line in f:
        print 'anotherline: ' + line.strip()

anotherline: some text here
anotherline: more text in a new line.


# Saving structured data

## JSON

Python allows you to use JSON (JavaScript Object Notation, http://www.json.org/). In fact, JSON objects are similar to dictionaries.

The standard module called json can take Python data hierarchies, and convert them to string representations; this process is called serializing. Reconstructing the data from the string representation is called deserializing. Between serializing and deserializing, the string representing the object may have been stored for example in a file.

Even that, there are some packages with significant performance advantages. One of them is UltraJSON, an ultra fast JSON encoder and decoder written in pure C.

To be used, we have to install the package:

In [11]:
# !conda install -y ujson

and load it

In [12]:
import ujson

you can view a JSON string representation of a dictionary with **dumps***

In [13]:
ujson.dumps([{"key": "value"}, 81, True])

'[{"key":"value"},81,true]'

And convert an string to a dictionary with **loads**

In [14]:
ujson.loads("""[{"key": "value"}, 81, true]""")

[{u'key': u'value'}, 81, True]

ujson allows you to use double precision and control how many decimals are serialized.

In [15]:
import math
ujson.dumps(math.pi, double_precision=15)

'3.141592653589793'

## CSV

The so-called CSV (Comma Separated Values) format is commonly used when exporting spreadsheets.
The standard module called csv implements classes to read and write tabular data in CSV format. 

In [16]:
import csv

filepath = 'data/eggs.csv' 
with open(filepath, 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ',
                            quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
    spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])

In [17]:
!cat $filepath

Spam Spam Spam Spam Spam |Baked Beans|
Spam |Lovely Spam| |Wonderful Spam|


By default, csv rows are shown as lists:

In [18]:
with open(filepath, 'rb') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
    for row in spamreader:
        print row

['Spam', 'Spam', 'Spam', 'Spam', 'Spam', 'Baked Beans']
['Spam', 'Lovely Spam', 'Wonderful Spam']


In [19]:
with open(filepath, 'rb') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
    for row in spamreader:
        print ', '.join(row)

Spam, Spam, Spam, Spam, Spam, Baked Beans
Spam, Lovely Spam, Wonderful Spam


To load lines into dictionaries, use **DictWriter**

In [20]:
filepath = 'data/names.csv'
with open(filepath, 'w') as csvfile:
    fieldnames = ['first_name', 'last_name']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
    writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
    writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})

In [21]:
with open(filepath) as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print row
        # print(row['first_name'], row['last_name'])

{'first_name': 'Baked', 'last_name': 'Beans'}
{'first_name': 'Lovely', 'last_name': 'Spam'}
{'first_name': 'Wonderful', 'last_name': 'Spam'}


In [22]:
!cat $filepath

first_name,last_name
Baked,Beans
Lovely,Spam
Wonderful,Spam
